michal.i/o

❯

❯

2024-08-26

Jan 21, 20252 min read

vlm

Papers

LLM distillation

[2408.11796] LLM Pruning and Distillation in Practice: The Minitron Approach
- > Possibly everything-you-need to know about pruning and distillation of generative models
to SSMs
- [2408.15237] The Mamba in the Llama: Distilling and Accelerating Hybrid Models
[2408.15664] Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Multi Modal Models

[2408.15998] Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encodersvlm
- GitHub - NVlabs/EAGLE: EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
LLaVA-OneVision: Easy Visual Task Transfer
- LLaVA-OneVision: Easy Visual Task Transfer
Qwen2-VL: To See the World More Clearly | Qwen
- GitHub - QwenLM/Qwen2-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
- Binyuan Hui on X: “Try our new Qwen2-VL: https://t.co/uyIeCOtRUa ⚠️ Three Secrets of Success for Qwen2-VL ⚠️ 1️⃣ A key architectural improvement in Qwen2-VL is the implementation of Naive Dynamic Resolution support. Unlike its predecessor, Qwen2-VL can handle arbitrary image resolutions, mapping https://t.co/QtokQgJcqT” / X
[2408.16500] CogVLM2: Visual Language Models for Image and Video Understanding

Code

GitHub - mixedbread-ai/batched

Datasets

Nous Research on X: “Introducing a new open dataset release, Hermes Function Calling V1, the datamix that gave Hermes 2 Pro its tool use and structured output capabilities. HuggingFace Repo: https://t.co/BEYzJzbesq The dataset includes single and multiturn Function Calling and Structured JSON https://t.co/ag4S1n1wwr” / X

Articles

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)
LLM Evaluation doesn’t need to be complicated

Videos

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs - YouTube
The Mamba in the Llama: Distilling and Accelerating Hybrid Models - YouTube
Stanford CS229 I Machine Learning I Building Large Language Models (LLMs) - YouTube
Arvind Narayanan: AI Scaling Myths, The Core Bottlenecks in AI Today & The Future of Models | E1195 - YouTube
Anthropic CEO Dario Amodei on AI’s Moat, Risk, and SB 1047 - YouTube

Tweets

[ ]

Papers
LLM distillation
Multi Modal Models
Code
Datasets
Articles
Videos
Tweets

Backlinks

No backlinks found

Graph View

Created with Quartz v4.4.0 © 2025