2024-09-23
Models
- Llama 3.2: Revolutionizing edge AI and vision with open, customizable models #vlm
- molmo.allenai.org/blog molmo multimodal models #vlm
- stepfun-ai/GOT-OCR2_0 · Hugging Face #ocr
- GitHub - ByungKwanLee/Phantom: [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enlarged hidden dimension to build super frontier vision language models. #vlm
Papers
- [2409.13523] EMMeTT: Efficient Multimodal Machine Translation Training
- [2409.14683] Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
- [2409.15278] PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Code
- RWKV-LM/RWKV-v7 at main · BlinkDL/RWKV-LM · GitHub
- GitHub - willccbb/mlx_parallm: Fast parallel LLM inference for MLX #mlx #inference
Articles
- The Practitioner’s Guide to the Maximal Update Parameterization | EleutherAI Blog #scaling
- Understanding how LLM inference works with llama.cpp llama.cpp
- Techniques for KV Cache Optimization in Large Language Models #kvcache #llm #inference
- The basic idea behind FlashAttention #flash-attention #softmax
- FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention | PyTorch
- Tune Llama3 405B on AMD MI300x (our journey) - Felafax Blog - Obsidian Publish #amd #jax
- Exploring Parallel Strategies with Jax | AstraBlog #distributed #jax
- Power of Diffusion Models | AstraBlog #diffusion
- GenAI Handbook
Videos
- Boris Hanin | Scaling Limits of Neural Networks - YouTube
- MLBBQ: Flash Atttention by Mike Doan - YouTube #flash-attention
Other
- [ ]