Models
- SAM 2.1 with training code
- Liquid Foundation Models: Our First Series of Generative AI Models
- GitHub - THUDM/CogView3: text to image to generation: CogView3-Plus and CogView3(ECCV 2024)
- Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo | NVIDIA Technical Blog
- vidore/colqwen2-v0.1 · Hugging Face
- Announcing FLUX1.1 [pro] and the BFL API - Black Forest Labs
Papers
- [2409.18869v1] Emu3: Next-Token Prediction is All You Need
- [2409.17692] MIO: A Foundation Model on Multimodal Tokens
- [2405.03882] Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
- [2409.16280] MonoFormer: One Transformer for Both Diffusion and Autoregression
- Site Unreachable
- [2409.20370] The Perfect Blend: Redefining RLHF with Mixture of Judges
- [2408.05088] UNIC: Universal Classification Models via Multi-teacher Distillation
- [2407.09111] Inference Optimization of Foundation Models on AI Accelerators
- [2402.10376] Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
- [2410.01806v1] Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking
- [2410.01201] Were RNNs All We Needed?rnn
- [2410.01679] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
- https://arxiv.org/abs/2409.20370
- Movie Gen: A Cast of Media Foundation Models
- [2410.02746] Contrastive Localized Language-Image Pre-Training
Code
- GitHub - THUDM/SwissArmyTransformer: SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
- x.com
- GitHub - evanatyourservice/kron_torch: An implementation of PSGD Kron second-order optimizer for PyTorch
Articles
- Recreating PyTorch from Scratch (with GPU Support and Automatic Differentiation) | by Lucas de Lima Nogueira | Towards Data Science
- Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo | NVIDIA Technical Blog
- Cheng Luo - MINI-SEQUENCE TRANSFORMER (MST)
- Distributed Training Of Deep Learning Models : Part ~ 1distributed
- How to train a model on 10k H100 GPUs?distributed
- Self contained example of how pipeline parallel works (AFAB and 1F1B) in 200 LOC · GitHub
- Deploy SkyPilot on existing machines — SkyPilot documentationskypilot
- Transformers Inference Optimization Toolset | AstraBlog
Videos
Other
- ECCV
- ILR2024 instance level recognition
- litellm/model_prices_and_context_window.json at main · BerriAI/litellm · GitHub LLM specs
- Llm Pricing - a Hugging Face Space by philschmid