Models
- [2409.11402] NVLM: Open Frontier-Class Multimodal LLMsvlm
- Qwen2.5: A Party of Foundation Models! | Qwen
- GitHub - ictnlp/LLaMA-Omni: LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.speech
- GitHub - kyutai-labs/moshispeech
- GitHub - microsoft/GRIN-MoE
Papers
- EUREKA: Evaluating and Understanding Large Foundation Models - Microsoft Research
- [2409.11321] SOAP: Improving and Stabilizing Shampoo using Adamoptimizers
- [2409.10173] jina-embeddings-v3: Multilingual Embeddings With Task LoRAtext-embeddings
- [2409.12191] Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
- [2409.11564] Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
Code
- GitHub - kyleliang919/Online-Subspace-Descent: This repo is based on https://github.com/jiaweizzhao/GaLore, paper coming soonoptimizerscompression
- GitHub - NVIDIA/Megatron-Energon: Megatron’s multi-modal data loadermultimodaldataloaderpytorch
- GitHub - TorchDR/TorchDR: TorchDR - PyTorch Dimensionality Reductionpytorchumaptsne
- GitHub - modelscope/ms-swift: Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, …)vision-languagevlmtuning
- Release v0.3.0 Release Note · linkedin/Liger-Kernel · GitHub
- GitHub - voideditor/void open source cursor alternative
- GitHub - pytorch-labs/LeanRL: LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.pytorchrl
Articles
- Improved Data Loading with Threads | NVIDIA Technical Blog
- uses noGIL mode to evaluate a threaded torch DataLoader
- CUDA context switches a larger issue with process based workers
- no difference in Pillow
- rerankers: A Lightweight Python Library to Unify Ranking Methods – Answer.AI
- [Distributed w/ TorchTitan] Introducing Async Tensor Parallelism in PyTorch - distributed / torchtitan - PyTorch Forumspytorchdistributed
- Polars — GPU acceleration with Polars and NVIDIA RAPIDStabular
- How to make LLMs go fast
- Fine-tuning LLMs to 1.58bit: extreme quantization made easy
- static.sched.com/hosted_files/pytorch2024/8f/Pytorch Conference - Making LLM training faster.pdf
- Inference-Friendly Models with MixAttention | Databricks Blog
- Optimizing AI Inference at Character.AI
- https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
Videos
- CS 194/294-196 (LLM Agents) - Lecture 1 - YouTube
- YouTube twiml Simon Williamson video about LLMs for code
- YouTube Noam Brown from OpenAI on test time compute and planning
- Tabular Learning: skrub and Foundation Models with Gaël Varoquaux, PhD - YouTubetabular
Other
- Illuminate - paper to podcast tool from google