2024-09-16
Models
- [2409.11402] NVLM: Open Frontier-Class Multimodal LLMs #vlm
- Qwen2.5: A Party of Foundation Models! | Qwen
- GitHub - ictnlp/LLaMA-Omni: LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level. #speech
- GitHub - kyutai-labs/moshi #speech
- GitHub - microsoft/GRIN-MoE
Papers
- EUREKA: Evaluating and Understanding Large Foundation Models - Microsoft Research
- [2409.11321] SOAP: Improving and Stabilizing Shampoo using Adam #optimizers
- [2409.10173] jina-embeddings-v3: Multilingual Embeddings With Task LoRA #text-embeddings
- [2409.12191] Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
- [2409.11564] Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
Code
- GitHub - kyleliang919/Online-Subspace-Descent: This repo is based on https://github.com/jiaweizzhao/GaLore, paper coming soon #optimizers #compression
- GitHub - NVIDIA/Megatron-Energon: Megatron’s multi-modal data loader #multimodal #dataloader #pytorch
- GitHub - TorchDR/TorchDR: TorchDR - PyTorch Dimensionality Reduction #pytorch #umap #tsne
- GitHub - modelscope/ms-swift: Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, …) #vision-language #vlm #tuning
- Release v0.3.0 Release Note · linkedin/Liger-Kernel · GitHub
- GitHub - voideditor/void open source cursor alternative
- GitHub - pytorch-labs/LeanRL: LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs. #pytorch #rl
Articles
- Improved Data Loading with Threads | NVIDIA Technical Blog
- uses noGIL mode to evaluate a threaded torch DataLoader
- CUDA context switches a larger issue with process based workers
- no difference in Pillow
- rerankers: A Lightweight Python Library to Unify Ranking Methods – Answer.AI
- [Distributed w/ TorchTitan] Introducing Async Tensor Parallelism in PyTorch - distributed / torchtitan - PyTorch Forums #pytorch #distributed
- Polars — GPU acceleration with Polars and NVIDIA RAPIDS #tabular
- How to make LLMs go fast
- Fine-tuning LLMs to 1.58bit: extreme quantization made easy
- static.sched.com/hosted_files/pytorch2024/8f/Pytorch Conference - Making LLM training faster.pdf
- Inference-Friendly Models with MixAttention | Databricks Blog
- Optimizing AI Inference at Character.AI
- https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
Videos
- CS 194/294-196 (LLM Agents) - Lecture 1 - YouTube
- YouTube twiml Simon Williamson video about LLMs for code
- YouTube Noam Brown from OpenAI on test time compute and planning
- Tabular Learning: skrub and Foundation Models with Gaël Varoquaux, PhD - YouTube #tabular
Other
- Illuminate - paper to podcast tool from google