michal.i/o

❯

❯

2024-09-02

Jan 21, 20252 min read

transformers
moe
code-model
AIResearch
AIEfficiency
triton
python
uv
docker
pytorch

Papers

[2404.16710] LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding transformers
[2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[2409.02060] OLMoE: Open Mixture-of-Experts Language Models

Models

OLMoE - 1B Mixture of Expertsmoe
Meet Yi-Coder: A Small but Mighty LLM for Code - 01.AI Blog code-model
Reflection 70B
- Matt Shumer on X: “The technique that drives Reflection 70B is simple, but very powerful. Current LLMs have a tendency to hallucinate, and can’t recognize when they do so. Reflection-Tuning enables LLMs to recognize their mistakes, and then correct them before committing to an answer. https://t.co/pW78iXSwwb” / X
- mattshumer/Reflection-70B · Hugging Face
Salesforce xLAM
- - Salesforce AI Research on X: “Introducing the full xLAM family, our groundbreaking suite of Large Action Models! 🚀 From the ‘Tiny Giant’ to industrial powerhouses, xLAM is revolutionizing AI efficiency!AIResearch AIEfficiency 🤗 Hugging Face Collection: https://t.co/FTnNVMIXCV 🤩 Research Blog https://t.co/yLcCj1isGx” / X

Videos

Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training - YouTubetriton
- int64 addressing slower than int32, need to cast to int64 for large tensors
Cohere For AI - Community Talks: Mostafa Elhoushi & Akshat Shrivastava - YouTube

Dev

Production-ready Python Docker Containers with uv python uv docker
CUDA-Free Inference for LLMs | PyTorch pytorch
SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision | LMSYS Org
Advanced Python: Achieving High Performance with Code Generation | by Yonatan Zunger | Medium

Random

Ilya Sutskever’s SSI Inc raises $1B | Hacker News
Dylan Freedman on X: “The new Qwen2-VL-7B Instruct model gets 100% accuracy extracting text from this handwritten document. This is the first open weights model (Apache 2.0) that I’ve seen OCR this accurately. (Thank you @fdaudens for the tip!) https://t.co/AB9r3bKDF0 https://t.co/nAEY7cp1w8” / X
Fetching Title#xsta

Papers
Models
Videos
Dev
Random

Backlinks

No backlinks found

Graph View

Created with Quartz v4.4.0 © 2025