Models
Papers
- [2412.05270] APOLLO: SGD-like Memory, AdamW-level Performance
- [2406.06484] Parallelizing Linear Transformers with the Delta Rule over Sequence Length
- [2412.05271] Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
- [2412.05265] Reinforcement Learning: An Overview
- [2412.05117] Transformers Can Navigate Mazes With Multi-Step Prediction
- [2412.04862] EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
- [2412.04616] Assessing and Learning Alignment of Unimodal Vision and Language Models
- [2412.04786] Slicing Vision Transformer for Flexible Inference
- [2412.04429] Grounding Descriptions in Images informs Zero-Shot Visual Recognition
- [2412.06329] Normalizing Flows are Capable Generative Models
- [2412.06264] Flow Matching Guide and Code
- [2412.06769] Training Large Language Models to Reason in a Continuous Latent Space
- [2412.05796] Language-Guided Image Tokenization for Generation
- [2411.18814] Unifying Generative and Dense Retrieval for Sequential Recommendation
- [2412.06674] EMOv2: Pushing 5M Vision Model Frontier
- [2412.06774] Visual Lexicon: Rich Image Features in Language Space
- [2412.06464] Gated Delta Networks: Improving Mamba2 with Delta Rule
- [2412.06590v1] Bridging the Divide: Reconsidering Softmax and Linear Attention
- From Slow Bidirectional to Fast Causal Video Generators
- [2412.04626] BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
- RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
- [2410.06718] MatMamba: A Matryoshka State Space Model
- [2412.08905] Phi-4 Technical Report
- Byte Latent Transformer: Patches Scale Better Than Tokens | Research - AI at Meta
- Large Concept Models: Language Modeling in a Sentence Representation Space | Research - AI at Meta
- Meta CLIP 1.2 | Research - AI at Meta
- Memory Layers at Scale | Research - AI at Meta
- [2409.15254] Archon: An Architecture Search Framework for Inference-Time Techniques
- [2412.09607] Spectral Image Tokenizer
Code
- GitHub - BeSpontaneous/Scala-pytorch: Scala(NeurIPS 2024)
- https://github.com/facebookresearch/flow_matching
- GitHub - NX-AI/flashrnn: FlashRNN - Fast RNN Kernels with I/O Awareness
Articles
- Ways to use torch.compile : ezyang’s blog
- REML-tutorial-slides.pdf
- SIGIR-AP 2024 Tutorial: Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
- Muon: An optimizer for hidden layers in neural networks | Keller Jordan blog
Videos
- Lecture 39: Torchtitan - YouTube
- [Building Machine Learning Systems for a Trillion Trillion Floating Point Operations - YouTube](https://www.youtube.com/watch?v=139UPjoq7Kw
- LTI Special Seminar by Yi Wu - YouTube
- Efficient LLM Inference with SGLang, Lianmin Zheng, xAI - YouTube
Other
- [ ]