2025-07-14
Models
- Kimi K2: Open Agentic Intelligence
- T5Gemma: A new collection of encoder-decoder Gemma models - Google Developers Blog
- SmolLM3: smol, multilingual, long-context reasoner
- Voxtral | Mistral AI
- Audio Flamingo 3 - NVIDIA ADLR
- Dream-Coder 7B | HKU NLP Group
Papers
- [2506.03487] ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking
- [2507.06457] A Systematic Analysis of Hybrid Linear Attention
- Kimi-VL/Kimi-VL.pdf at main · MoonshotAI/Kimi-VL · GitHub
- [2507.09404] Scaling Laws for Optimal Data Mixtures
- [2507.10524] Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
- [2502.00382] Masked Generative Nested Transformers with Decode Time Scaling
- [2507.07101] Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
- [2507.11851] Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
Code
Articles
- QK-Clip:让Muon在Scaleup之路上更进一步 - 科学空间|Scientific Spaces
- On the Tradeoffs of SSMs and Transformers | Goomba Lab
- DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-Size Canvas | HKU NLP Group
Videos
- Zed Inferred: Diffusion Language Models - YouTube
- Simple Diffusion Language Models - YouTube
- Arthur Douillard - Distributed Training in Machine Learning - YouTube
- JSALT 2025 - Plenary Talk - L.Barrault - Large Concept Model: beyond token-based LLMs - YouTube
- Timothée Darcet - Scaling Self Supervised Learning for Vision An Introduction to DINOv2 - YouTube
- Stanford CS25: V5 I Transformers in Diffusion Models for Image Generation and Beyond - YouTube
Other
- [ ]