2025-07-14

July 16, 2025 updated December 14, 2025 2 min read

Models

Papers

[2506.03487] ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking
[2507.06457] A Systematic Analysis of Hybrid Linear Attention
Kimi-VL/Kimi-VL.pdf at main · MoonshotAI/Kimi-VL · GitHub
[2507.09404] Scaling Laws for Optimal Data Mixtures
[2507.10524] Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
[2502.00382] Masked Generative Nested Transformers with Decode Time Scaling
[2507.07101] Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
[2507.11851] Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

Code

GitHub - EPFL-VILAB/fm-vision-evals

Articles

Videos

Other

[ ]

Tweets