michal.i/o

❯

❯

2024-10-21

Jan 21, 20252 min read

Models

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

Papers

[2410.14072v1] Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
[2410.11190] Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
[2410.15458] Allegro: Open the Black Box of Commercial-Level Video Generation Model
[2406.15786] What Matters in Transformers? Not All Attention is Needed
[2410.15732v1] ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
- fork pretrained ViT (DINOv2) by replicating FFN weights into multiple experts
- route on CLS token to same experts at image level
- load balancing loss for balanced routing
- shared experts that are always active for “common knowledge”
- top-1 expert routing
- $y = E_{s} (x) + \sum_{i \in T} g_{i} (x_{[CLS]}) \cdot E_{i} (x)$
  - Es = shared expert
[2410.16261v1] Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
https://openreview.net/pdf?id=vI95kcLAoU
[2410.16512v1] TIPS: Text-Image Pretraining with Spatial Awareness
[2410.17243v1] Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
[2410.17251v1] Altogether: Image Captioning via Re-aligning Alt-text
[2410.18967] Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Code

[ ]

Articles

Reaching 1B context length with RAG
How Speculative Decoding Boosts vLLM Performance by up to 2.8x | vLLM Blog
Simplifying, stabilizing, and scaling continuous-time consistency models | OpenAI
Building Vectorize, a distributed vector database, on Cloudflare’s Developer Platform

Videos

[ ]

Other

[ ]

**

Tweets

Models
Papers
Code
Articles
Videos
Other
Tweets

Backlinks

No backlinks found

Graph View

Created with Quartz v4.4.0 © 2025