Models
- [ ]
Papers
- [2410.23262] EMMA: End-to-End Multimodal Model for Autonomous Driving
- [2411.03313] Classification Done Right for Vision-Language Pre-Training
- [2406.06484] Parallelizing Linear Transformers with the Delta Rule over Sequence Length
- [2411.02959] HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
- [2411.04965] BitNet a4.8: 4-bit Activations for 1-bit LLMs
- [2411.04996] Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
- [2411.04905] OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
- [2410.17897] Value Residual Learning For Alleviating Attention Concentration In Transformers
- [2410.21228] LoRA vs Full Fine-tuning: An Illusion of Equivalence
- [2407.10964] No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
- [2405.17604] LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
- [2411.02853] ADOPT: Modified Adam Can Converge with Any with the Optimal Rate
Code
Articles
- [ ]
Videos
- YouTube
- Stanford Graph Learning Workshop 2024 - YouTube
- https://www.youtube.com/watch?v=0Yi3yUjB-3M&list=PPSV
Other
- [ ]