2025-10-06
updated December 14, 2025 11 min read
Models
Papers
Code
Articles
Videos
Other
- [1908.08962] Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
- [2405.19504] MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
- [2407.14679v2] Compact Language Models via Pruning and Knowledge Distillation
- [2408.11796] LLM Pruning and Distillation in Practice: The Minitron Approach
- [2502.13129] Is Noise Conditioning Necessary for Denoising Generative Models?
- [2508.10751] Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
- [2510.01179] TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
- [2510.04871v1] Less is More: Recursive Reasoning with Tiny Networks
- LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
- Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning
- Fast-dLLM v2: Efficient Block-Diffusion LLM
- CoDA: Coding LM via Diffusion Adaptation
- Optimal Scaling Needs Optimal Norm
- Self-Speculative Masked Diffusions
- Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
- JS Python stack comparison
- Vision language models review
- RL methods review
- Refactor library for LLM agent workflows
- Mobile app LLM integration strategy - Claude
- claude.ai
- Parallel key generation for schemas - Claude
- cruft
- mlx-lm - Outlines
- ChromeDevTools/chrome-devtools-mcp: Chrome DevTools for coding agents
- guidance-ai/llguidance: Super-fast Structured Outputs
- guidance-ai/llguidance: Super-fast Structured Outputs
- HomebrewML/HeavyBall: Efficient optimizers
- huggingface/lerobot: 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
- inclusionAI/Ming-UniAudio
- m-a-n-i-f-e-s-t/retention: Language modeling with infinite context size
- microsoft/playwright-mcp: Playwright MCP server
- SkyRL/skyrl-train/examples/megatron at main · NovaSky-AI/SkyRL
- NVlabs/DiffusionNFT: DiffusionNFT: Online Diffusion Reinforcement with Forward Process
- NVlabs/Fast-dLLM: Official implementation of “Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding”
- Fast-dLLM/v2 at main · NVlabs/Fast-dLLM
- openai/chatkit-js
- Posttraining Library · Issue #1771 · pytorch/torchtitan
- Qwen3-VL/cookbooks at main · QwenLM/Qwen3-VL
- sail-sg/oat: 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
- SalesforceAIResearch/CoDA: Official repo for CoDA model.
- ServiceNow/PipelineRL: A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
- shangshang-wang/Tora: Tora: Torchtune-LoRA for RL
- Tencent-Hunyuan/HunyuanVision
- TheAgentArk/Toucan: Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
- trycua/cua: Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
- recipes/OpenAI/GPT-OSS.md at main · vllm-project/recipes
- VsonicV/es-fine-tuning-paper: This repo contains the source code for the paper “Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning”
- zichongli5/NorMuon: Official Implementation for NorMuon paper
- Ziems/arbor: A framework for optimizing DSPy programs with RL
- LLGuidance: Making Structured Outputs Go Brrr
- BERT Hash Nano Models - a NeuML Collection
- BERT Hash Nano Models - a NeuML Collection
- ColBERT - a NeuML Collection
- ColBERT - a NeuML Collection
- HuggingFaceFW/fineweb · Datasets at Hugging Face
- m-a-p/FineFineWeb · Datasets at Hugging Face
- Imitation Learning on Real-World Robots
- Efficient-Large-Model/Fast_dLLM_v2_7B · Hugging Face
- lightx2v/Qwen-Image-Lightning at main
- LiquidAI/LFM2-8B-A1B · Hugging Face
- manifestai/powercoder-3b · Hugging Face
- microsoft/UserLM-8b · Hugging Face
- NeuML/bert-hash-femto · Hugging Face
- nvidia/gpt-oss-120b-Eagle3-v2 · Hugging Face
- openbmb/VoxCPM-0.5B · Hugging Face
- Paper page - Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
- Paper page - Qwen-Image Technical Report
- Paper page - DINOv3
- Paper page - NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
- Paper page - InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
- Paper page - A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
- Paper page - OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
- Paper page - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
- Paper page - Visual Representation Alignment for Multimodal Large Language Models
- Paper page - Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
- Paper page - A Survey of Reinforcement Learning for Large Reasoning Models
- Paper page - VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
- Paper page - SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
- Paper page - MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
- Paper page - Qwen3-Omni Technical Report
- Paper page - MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
- Paper page - Seedream 4.0: Toward Next-generation Multimodal Image Generation
- Paper page - LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
- Paper page - LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
- Paper page - SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
- Paper page - Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
- Paper page - ModernVBERT: Towards Smaller Visual Document Retrievers
- Paper page - TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
- Paper page - Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
- Paper page - F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data
- Paper page - CoDA: Coding LM via Diffusion Adaptation
- Paper page - Optimal Scaling Needs Optimal Norm
- Paper page - Multi-Agent Tool-Integrated Policy Optimization
- Paper page - Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
- Paper page - SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
- Paper page - NorMuon: Making Muon more efficient and scalable
- Paper page - Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation
- Paper page - Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
- Paper page - Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
- Paper page - Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
- Paper page - RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
- Paper page - OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
- Paper page - Native Hybrid Attention for Efficient Sequence Modeling
- Paper page - Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods
- Paper page - MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline
- Salesforce/CoDA-v0-Instruct · Hugging Face
- ServiceNow-AI/Apriel-1.5-15b-Thinker · Hugging Face
- The Ultimate Fast Weights — Blog Post
- BS - Chat
- Just a moment…
- Fast-dLLM v2
- Introducing Liquid Nanos — frontier‑grade performance on everyday devices | Liquid AI
- LFM2-8B-A1B: An Efficient On-device Mixture-of-Experts | Liquid AI
- VLAs that Train Fast, Run Fast, and Generalize Better
- Real-Time Action Chunking with Large Models
- John Nguyen on X: “Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow https://t.co/aTInWr6gtK” / X
- x.com/_clashluke/status/1975264948621304006
- x.com/AdinaYakup/status/1974816646645928086
- Ant Ling on X: ”🔥 Ming-UniAudio: The 「Nano Banana」moment for speech is here! A single model for universal understanding, generation & free-form editing. First Unified Continuous Tokenizer 「MingTok-Audio」and Unified Und & Gen Speech LLM built on it. First Universal Free-form Speech Editing https://t.co/eTmwVEEI16” / X
- x.com/AravSrinivas/status/1975421740554879421
- alphaXiv on X: “Following an intense weekend of GRPO discussion, Bytedance put out a paper on why GRPO is NOT optimal! Similar to the classic knapsack problem, they suggest that each exploration has a “value” and “cost”, which needs to be adaptively distributed. This yields 20-40% more signal https://t.co/Gyfr0K9i8Y” / X
- Benjamin Warner on X: “It’s great to see our efficient architecture improvements powering a new ecosystem of encoder models.” / X
- x.com/BerenMillidge/status/1975695642560549170
- x.com/changhiskhan/status/1975251509299511523
- Charles 🎉 Frye on X: “New post in the GPU 𝕻𝖊𝖗𝖋𝖔𝖗𝖒𝖆𝖓𝖈𝖊 Glossary on memory coalescing — a hardware feature that CUDA programmers need to mind to get anywhere near full memory bandwidth utilization. The article includes a quick µ-benchmark, reproducible with Godbolt. What a tool! https://t.co/PVI26NTt7S” / X
- Chelsea Finn on X: “RL fine-tuning often prematurely collapses policy entropy. We consider a general framework, called set RL, i.e. RL over a set of trajectories from a policy. We use it to incentivize diverse solutions & optimize for inference time performance. Paper: https://t.co/6q4DaPsZJt” / X
- x.com/CShorten30/status/1975569368709804044
- x.com/ekzhang1/status/1975421055671148711
- x.com/elliotarledge/status/1975336518425460932
- x.com/fredsala/status/1975250150311535036
- X
- 𝚐𝔪𝟾𝚡𝚡𝟾 on X: “ColBERT MUVERA series Small retrievers with the same reasoning depth, using 50–80 dim late-interaction representations that preserve token-level structure even under extreme compression. The retrieval-side mirror of the NanoBERTs, with all three models under 1M parameters (Femto https://t.co/4lZkigUM0V” / X
- Hadi Pouransari on X: “Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵 https://t.co/2goPzvN988” / X
- x.com/hu_yifei/status/1975253439035940950
- DailyPapers on X: “Unlock robust and stable reasoning in LLMs A novel variational framework treats thinking traces as latent variables, optimized via variational inference. This unifies RL-style methods, leading to stable & consistent reasoning improvements across diverse tasks on Qwen 2.5 & Qwen https://t.co/ongkgovqpI” / X
- DailyPapers on X: “NVIDIA just unveiled SANA-Video for incredibly efficient AI video generation! Generates high-res, minute-long videos 16x faster than SoTA models. Powered by Linear DiT and a constant-memory KV cache for unmatched speed and quality. Training on 64 H100s took only 12 days. https://t.co/9sdZbjblZK” / X
- DailyPapers on X: “LLaVA-OneVision-1.5 by lmms-lab is here! Achieves state-of-the-art multimodal performance with drastically reduced training costs. This fully open framework sets a new standard for efficient, democratized LMM development, outperforming larger models across many benchmarks. https://t.co/VFVrqQ3TBw” / X
- DailyPapers on X: “Meta just released SSDD (Single-Step Diffusion Decoder) on Hugging Face It’s a novel image tokenizer with a diffusion decoder that achieves higher reconstruction quality and faster sampling than traditional VAEs. https://t.co/nRUgmCaA2P” / X
- George Grigorev on X: “ok silly me, in my code i had wrong call arguments for Triton Muon v = newton_schulz_triton(grad.bfloat16(), 5) is actually v = newton_schulz_triton(grad.bfloat16(), eps=5), not steps=5; steps are hardcoded inside the kernel and has different ns coefficients (in fact, in Dion https://t.co/qEeGuNalPI” / X
- x.com/iamgrigorev/status/1975154768436711813
- Jackson Atkins on X: “My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It’s called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here’s how https://t.co/MD2ZWYI1AQ” / X
- jianlin.su on X: “Why does linear attention need Short Conv? https://t.co/luUybG3RXj” / X
- x.com/johnschulman2/status/1975231718979522632
- Ken Klippenstein (NSPM-7 Compliant) (@kenklippenstein) / X
- x.com/lancedb/status/1975569300829180311
- x.com/lateinteraction/status/1975931104663384143
- Milad Aghajohari on X: “Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵 https://t.co/72pV3tcER9” / X
- x.com/MassimoBardetti/status/1974396031157838033
- Michel on X: “Finished my first tutorial on improving the quality of smaller LLMs for creative tasks using a teacher + GEPA. I’ll publish it on Monday. What style of signature do you prefer? I usually go with classes, but for this tutorial I decided to use inline https://t.co/vJonlJ5qDo” / X
- NeuML on X: ”✨ We’re proud to release the ColBERT Nano series of models. All 3 of these models come in at less than 1 million parameters (250K, 450K, 950K)! Late interaction models perform shockingly well with small models. Collection: https://t.co/gSVLMUrWcf Model: https://t.co/wUDXXDFRv7 https://t.co/6TR7PCEIxn” / X
- x.com/QianzhongChen/status/1974227449996267974
- x.com/rohanpaul_ai
- x.com/rohanpaul_ai/status/1974978910665228480
- x.com/rohanpaul_ai/status/1975050381379215464
- x.com/SergioPaniego/status/1975105467694424440
- SkyPilot on X: “Torchtitan is a great platform for large-scale training from @PyTorch & @AIatMeta! How to run it on your own AI infra other than Slurm, e.g., k8s or clouds? A tutorial for scaling torchtitan is now available in SkyPilot docs + torchtitan README: https://t.co/SvYqTGeRDe https://t.co/cZ7UH6vYax” / X
- sway on X: “Near REPA performance without any external model alignment: Paper name: No alignment needed for generation: Learning linearly seperable representations in diffusion models https://t.co/e2cP2hbiqq” / X
- x.com/TencentHunyuan/status/1974522542858911935
- Hunyuan on X: “We are excited to introduce Hunyuan-Vision-1.5-Thinking, our latest and most advanced vision-language model. Hunyuan-Vision-1.5-Thinking is ranked No. 3 in @arena, and the model is now available on Tencent Cloud. The model and technical report will be released in late October. https://t.co/lATzRSOOzN” / X
- TestingCatalog News 🗞 on X: “BREAKING 🚨: Anthropic is preparing Claude Code to be released on the mobile app! Users will be able to connect Claude app to GitHub and run their coding prompts on the go. Claude Codex 👀 https://t.co/sMAi14qljw” / X
- x.com/TheZachMueller/status/1974430880807673990
- Zach Mueller on X: “DataLoader Dispatching When constrained by a variety of reasons to where you can’t include multiple copies (or mmaps) of datasets in memory, be it too many concurrent streams, low resource availability, or a slow CPU, dispatching is here to help. Dispatching works by keeping https://t.co/Zs7MAecF7n” / X
- x.com/TheZachMueller/status/1974926219704439149
- x.com/vikhyatk/status/1974517237831942531
- x.com/webalorn/status/1975555815294791719
- Ross Wightman on X: “Last week, I decided to work on something delightfully boring, model initialization. The size of vision encoders is large enough for this to matter now. I threaded nearly 500 nn.Modules in timm with device/dtype factory kwargs and fixed a few small issues with ‘meta’ device https://t.co/lU4mDriBYI” / X
- William J.B. Mattingly on X: “Dots.OCR is my favorite model VLM for OCR at the moment. It can even handle some HTR. For the past few months, it’s been quite difficult to use unless you were prepared to deal with a lot of headaches. That changed a couple weeks ago with vLLM’s update and now this is even” / X
- Xinyuan Wang on X: “Big update for OpenCUA! OpenCUA-72B-preview now ranks #1 on the OSWorld-Verified leaderboard (https://t.co/xqWZLkuCc0). It is a pure GUI action, end-to-end computer-use foundation model (Website: https://t.co/nSQQTZT8Fc). Huge thanks to the effort of OpenCUA team and the great https://t.co/cAfuqbid4o” / X
- Yulu Gan on X: “Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for https://t.co/bwkWxDrItB” / X
- Ming-Unitok-Audio