papers

2025 01 23 2025-01-23 RAG Pipelines 2215
2024 11 17 2024-11-17 - Mixture-of-Transformers A Sparse and Scalable Architecture for Multi-Modal Foundation Models 176
2024 11 03 2024-11-03 - ReMoE FULLY DIFFERENTIABLE MIXTURE-OF-EXPERTS WITH RELU ROUTING 0
2024 11 03 2024-11-03 - GATED DELTA NETWORKS IMPROVING MAMBA2 WITH DELTA RULE 158
2024 11 03 2024-11-03 - On the Efficiency of Convolutional Neural Networks 1206
2024 11 03 2024-11-03 - TokenFormer - RETHINKING TRANSFORMER SCAL-ING WITH TOKENIZED MODEL PARAMETERS 398
2024 10 10 2024-10-10 - Pixtral 12B 62
2024 10 04 2024-10-04 - Movie Gen A Cast of Media Foundation Models 278
2023 12 17 2023-12-17 - Stable and low-precision training for large-scale vision-language models 1790
2023 12 09 2023-12-09 - SILC Improving Vision Language Pretraining with Self-Distillation 656
2023 12 09 2023-12-09 - Text as Image Learning Transferable Adapter for Multi-Label Classification 249
2023 12 09 2023-12-04 - Rejuvenating image-GPT as Strong Visual Representation Learners 901
2023 12 09 2023-12-05 - Mamba Linear-Time Sequence Modeling with Selective State Spaces 328
2023 12 09 2023-04-14 - Combined Scaling for Zero-shot Transfer Learning 775
2023 12 09 2023-12-04 - MobileCLIP - Fast Image-Text Models through Multi-Modal Reinforced Training 1422