Models
- [ ]
Papers
- [2411.18363] ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
- [2411.18207v1] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
- [2411.18674v1] Active Data Curation Effectively Distills Large-Scale Multimodal Modelsstar
- [2411.19865] Reverse Thinking Makes LLMs Stronger Reasoners
- [2411.18933v1] Efficient Track Anything
- [2411.16085] Cautious Optimizers: Improving Training with One Line of Code
- [2411.16828] CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
- [2412.00714] Scaling New Frontiers: Insights into Large Recommendation Models
- [2411.16205] MH-MoE: Multi-Head Mixture-of-Experts
Code
- GitHub - IDEA-Research/ChatRex: Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
- GitHub - 343gltysprk/ovow
- GitHub - Tencent/HunyuanVideo: HunyuanVideo: A Systematic Framework For Large Video Generation Model Training