michal.i/o

❯

❯

2024-09-30

Jan 21, 20252 min read

rnn
distributed
skypilot
rl

Models

SAM 2.1 with training code
- x.com
Liquid Foundation Models: Our First Series of Generative AI Models
GitHub - THUDM/CogView3: text to image to generation: CogView3-Plus and CogView3(ECCV 2024)
Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo | NVIDIA Technical Blog
vidore/colqwen2-v0.1 · Hugging Face
Announcing FLUX1.1 [pro] and the BFL API - Black Forest Labs

Papers

[2409.18869v1] Emu3: Next-Token Prediction is All You Need
[2409.17692] MIO: A Foundation Model on Multimodal Tokens
[2405.03882] Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
[2409.16280] MonoFormer: One Transformer for Both Diffusion and Autoregression
Site Unreachable
[2409.20370] The Perfect Blend: Redefining RLHF with Mixture of Judges
[2408.05088] UNIC: Universal Classification Models via Multi-teacher Distillation
[2407.09111] Inference Optimization of Foundation Models on AI Accelerators
[2402.10376] Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
- x.com
[2410.01806v1] Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking
[2410.01201] Were RNNs All We Needed?rnn
[2410.01679] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
https://arxiv.org/abs/2409.20370
Movie Gen: A Cast of Media Foundation Models
[2410.02746] Contrastive Localized Language-Image Pre-Training

Code

GitHub - THUDM/SwissArmyTransformer: SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
x.com
- Self contained example of how pipeline parallel works (AFAB and 1F1B) in 200 LOC · GitHub
GitHub - evanatyourservice/kron_torch: An implementation of PSGD Kron second-order optimizer for PyTorch

Articles

Recreating PyTorch from Scratch (with GPU Support and Automatic Differentiation) | by Lucas de Lima Nogueira | Towards Data Science
Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo | NVIDIA Technical Blog
Cheng Luo - MINI-SEQUENCE TRANSFORMER (MST)
Distributed Training Of Deep Learning Models : Part ~ 1 distributed
How to train a model on 10k H100 GPUs?distributed
Self contained example of how pipeline parallel works (AFAB and 1F1B) in 200 LOC · GitHub
Deploy SkyPilot on existing machines — SkyPilot documentation skypilot
Transformers Inference Optimization Toolset | AstraBlog

Videos

PyTorch Conference 2024 - YouTube
2024 RL Conference rl
Modern GPU Architecture

Other

ECCV
- ILR2024 instance level recognition
litellm/model_prices_and_context_window.json at main · BerriAI/litellm · GitHub LLM specs
Llm Pricing - a Hugging Face Space by philschmid

Models
Papers
Code
Articles
Videos
Other

Backlinks

No backlinks found

Graph View

Created with Quartz v4.4.0 © 2025