Models
- [ ]
Papers
- [2412.11834] Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture
- [2412.19442] A Survey on Large Language Model Acceleration based on KV Cache Management
- [2412.19255] Multi-matrix Factorization Attention
- [2412.19437] DeepSeek-V3 Technical Report
- [2412.21139] Training Software Engineering Agents and Verifiers with SWE-Gym
- [2412.21079] Edicho: Consistent Image Editing in the Wild
- [2412.20993] Efficiently Serving LLM Reasoning Programs with Certaindex
- [2402.14547] OmniPred: Language Models as Universal Regressors
- [2412.21037] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
- [2412.21187] Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- [2409.19606] Hyper-Connections
- [2412.16112] CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
- [2412.14058] Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
- [2412.21059] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
- [2501.00656] 2 OLMo 2 Furious
- [2501.00663] Titans: Learning to Memorize at Test Time
- [2501.00958] 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
- [2501.00658] Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
- [2501.01005] FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Code
- GitHub - riverstone496/awesome-second-order-optimization
- multi head latent attention (MLA) · GitHub
- GitHub - declare-lab/TangoFlux: TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
Articles
- Ways to use torch.export : ezyang’s blog
- The State of Generative Models | wh
- Process Reinforcement through Implicit Rewards
- [Transformers Laid Out | Pramod’s Blog](https://goyalpramod.github.io/blogs/Transformers_laid_out/
- Llama 3.2 Vision — A Deep Dive - Graphcore Research Blog
Videos
- Teaching AI to See: A Technical Deep-Dive on Vision Language Models with Will Hardman of Veratai - YouTube
- SGLang Developer Sync 20241228 - YouTube
Other
- [ ]