2024-11-17 - Mixture-of-Transformers A Sparse and Scalable Architecture for Multi-Modal Foundation Models November 17, 2024 1 min read modality specific parameters / architectures global cross attention across modalities mlpapers