papers
- 2025 01 23 2025-01-23 RAG Pipelines 2215
- 2024 11 17 2024-11-17 - Mixture-of-Transformers A Sparse and Scalable Architecture for Multi-Modal Foundation Models 176
- 2024 11 03 2024-11-03 - ReMoE FULLY DIFFERENTIABLE MIXTURE-OF-EXPERTS WITH RELU ROUTING 0
- 2024 11 03 2024-11-03 - GATED DELTA NETWORKS IMPROVING MAMBA2 WITH DELTA RULE 158
- 2024 11 03 2024-11-03 - On the Efficiency of Convolutional Neural Networks 1206
- 2024 11 03 2024-11-03 - TokenFormer - RETHINKING TRANSFORMER SCAL-ING WITH TOKENIZED MODEL PARAMETERS 398
- 2024 10 10 2024-10-10 - Pixtral 12B 62
- 2024 10 04 2024-10-04 - Movie Gen A Cast of Media Foundation Models 278
- 2023 12 17 2023-12-17 - Stable and low-precision training for large-scale vision-language models 1790
- 2023 12 09 2023-12-09 - SILC Improving Vision Language Pretraining with Self-Distillation 656
- 2023 12 09 2023-12-09 - Text as Image Learning Transferable Adapter for Multi-Label Classification 249
- 2023 12 09 2023-12-04 - Rejuvenating image-GPT as Strong Visual Representation Learners 901
- 2023 12 09 2023-12-05 - Mamba Linear-Time Sequence Modeling with Selective State Spaces 328
- 2023 12 09 2023-04-14 - Combined Scaling for Zero-shot Transfer Learning 775
- 2023 12 09 2023-12-04 - MobileCLIP - Fast Image-Text Models through Multi-Modal Reinforced Training 1422