2025-11-10
Models
- [ ]
Papers
- [2511.08923] TiDAR: Think in Diffusion, Talk in Autoregression
- [2511.09554] RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
- [2511.10643] Black-Box On-Policy Distillation of Large Language Models
- [2511.07384] Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Code
- GitHub - marin-community/marin: Open-source framework for the research and development of foundation models.
- GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.
Articles
- the-art-of-debugging/pytorch at master · stas00/the-art-of-debugging · GitHub
- Newton-Schulz - docs.modula.systems
- Beyond Standard LLMs - by Sebastian Raschka, PhD
- fleuret.org/public/slides-talk-free-transformer.pdf
- Beyond Quantization: Bringing Sparse Inference to PyTorch – PyTorch
- Accelerating Large-Scale Mixture-of-Experts Training in PyTorch | NVIDIA Technical Blog
- SGLang Diffusion: Accelerating Video and Image Generation | LMSYS Org
- Notion
- Text-to-image Architectural Experiments
Videos
- [M2L 2025] 5.2 Diffusion models - Sander Dieleman - YouTube
- Ray Summit 2025 Keynote: Building Cursor Composer with Sasha Rush - YouTube
- Kimi K2 and Our Contributions to Open Source - Yuxin Wu, Moonshot AI - YouTube
- Helion: A High-level DSL for Kernel Authoring - Jason Ansel, Meta - YouTube
- Self-Speculative Masked Diffusions | Andrew Campbell - YouTube
- Hybrid Models as First-Class Citizens in vLLM – PyTorch
- Making GPUs Actually Fast: A Deep Dive into Training Performance - YouTube
Other
- nvidia/Nemotron-ClimbLab · Datasets at Hugging Face
- nvidia/Nemotron-ClimbMix · Datasets at Hugging Face