2025-01-20
Models
Papers
- [2501.09891] Evolving Deeper LLM Thinking
- DeepSeek R1
- [2501.10318] HiMix: Reducing Computational Complexity in Large Vision-Language Models
- [2501.04765] TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
- [2501.12326] UI-TARS: Pioneering Automated GUI Interaction with Native Agents
- [2501.12370] Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
- [2501.09747] FAST: Efficient Action Tokenization for Vision-Language-Action Models
- [2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Code
Articles
- Implementing Character.AI’s Memory Optimizations in nanoGPT | njkumarr
- Literature Review on Sampling Techniques for Language Models | njkumarr
- EvaByte: Efficient Byte-level Language Models at Scale
Videos
- RobotLearningIntroPart2 - YouTube
- FlashInfer - YouTube
- Mosaic GPU - YouTube
- ML Scalability & Performance Reading Group Session 5: Paged Attention - YouTube
Other
Tweets



