Vision Language Models
- [2412.01818] [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster
- [2411.03312v1] Inference Optimal VLMs Need Only One Visual Token but Larger Models
- [2412.06263] iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
- [2412.11475] OmniVLM: A Token-Compressed, Sub-Billion-Parameter Vision-Language Model for Efficient On-Device Inference
- [2412.13180] Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration