michal.i/o

❯

❯

2024-12-09

Jan 21, 20252 min read

Models

GitHub - deepseek-ai/DeepSeek-VL2: DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Papers

[2412.05270] APOLLO: SGD-like Memory, AdamW-level Performance
[2406.06484] Parallelizing Linear Transformers with the Delta Rule over Sequence Length
[2412.05271] Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
[2412.05265] Reinforcement Learning: An Overview
[2412.05117] Transformers Can Navigate Mazes With Multi-Step Prediction
[2412.04862] EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
[2412.04616] Assessing and Learning Alignment of Unimodal Vision and Language Models
[2412.04786] Slicing Vision Transformer for Flexible Inference
[2412.04429] Grounding Descriptions in Images informs Zero-Shot Visual Recognition
[2412.06329] Normalizing Flows are Capable Generative Models
[2412.06264] Flow Matching Guide and Code
[2412.06769] Training Large Language Models to Reason in a Continuous Latent Space
[2412.05796] Language-Guided Image Tokenization for Generation
[2411.18814] Unifying Generative and Dense Retrieval for Sequential Recommendation
[2412.06674] EMOv2: Pushing 5M Vision Model Frontier
[2412.06774] Visual Lexicon: Rich Image Features in Language Space
[2412.06464] Gated Delta Networks: Improving Mamba2 with Delta Rule
[2412.06590v1] Bridging the Divide: Reconsidering Softmax and Linear Attention
From Slow Bidirectional to Fast Causal Video Generators
[2412.04626] BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
[2410.06718] MatMamba: A Matryoshka State Space Model
[2412.08905] Phi-4 Technical Report
Byte Latent Transformer: Patches Scale Better Than Tokens | Research - AI at Meta
Large Concept Models: Language Modeling in a Sentence Representation Space | Research - AI at Meta
Meta CLIP 1.2 | Research - AI at Meta
Memory Layers at Scale | Research - AI at Meta
[2409.15254] Archon: An Architecture Search Framework for Inference-Time Techniques
[2412.09607] Spectral Image Tokenizer

Code

GitHub - BeSpontaneous/Scala-pytorch: Scala(NeurIPS 2024)
https://github.com/facebookresearch/flow_matching
GitHub - NX-AI/flashrnn: FlashRNN - Fast RNN Kernels with I/O Awareness

Articles

Ways to use torch.compile : ezyang’s blog
REML-tutorial-slides.pdf
SIGIR-AP 2024 Tutorial: Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
Muon: An optimizer for hidden layers in neural networks | Keller Jordan blog

Videos

Lecture 39: Torchtitan - YouTube
[Building Machine Learning Systems for a Trillion Trillion Floating Point Operations - YouTube](https://www.youtube.com/watch?v=139UPjoq7Kw
LTI Special Seminar by Yi Wu - YouTube
Efficient LLM Inference with SGLang, Lianmin Zheng, xAI - YouTube

Other

[ ]

Tweets

Models
Papers
Code
Articles
Videos
Other
Tweets

Backlinks

No backlinks found

Graph View

Created with Quartz v4.4.0 © 2025