Models
- Welcome to the Falcon 3 Family of Open Models!
- GitHub - foundation-model-stack/bamba: Train, tune, and infer Bamba model
- GitHub - AnswerDotAI/ModernBERT: Bringing BERT into modernity via both architecture changes and scaling
Papers
- [2412.07752] FlashRNN: Optimizing Traditional RNNs on Modern Hardware
- [2412.10360] Apollo: An Exploration of Video Understanding in Large Multimodal Models
- [2412.10117] CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
- [2410.18779] A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
- [2412.10302] DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
- [2412.09607] Spectral Image Tokenizer
- [2412.12095] Causal Diffusion Transformers for Generative Modeling
- [2412.10437] SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion
- [2410.02899] FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMs
- [2412.01951] Self-Improvement in Language Models: The Sharpening Mechanism
- [2412.13061] VidTok: A Versatile and Open-Source Video Tokenizer
- [[2412.14164] MetaMorph: Multimodal Understanding and Generation via Instruction Tuning](https://arxiv.org/abs/2412.14164
- [2412.12432] Three Things to Know about Deep Metric Learning
- GitHub - foundation-model-stack/bamba: Train, tune, and infer Bamba model
- [2412.13303] FastVLM: Efficient Vision Encoding for Vision Language Models
- [2412.15115] Qwen2.5 Technical Report
- [2412.15213] Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
- [2412.13663] Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
- [2412.14475] MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Code
- GitHub - hao-ai-lab/FastVideo: FastVideo is an open-source framework for accelerating large video diffusion model.
- GitHub - Genesis-Embodied-AI/Genesis: A generative world for general-purpose robotics & embodied AI learning.
- GitHub - huggingface/picotron: Minimalistic 4D-parallelism distributed training framework for education purpose
Articles
- [ ]
Videos
- [ ]
Other
- [ ]