2024-11-25

November 26, 2024 updated December 14, 2025 1 min read

Models

Qwen/QwQ-32B-Preview · Hugging Face
- QwQ: Reflect Deeply on the Boundaries of the Unknown | Qwen

Papers

[2411.17465] ShowUI: One Vision-Language-Action Model for GUI Visual Agent
[2411.17116] Star Attention: Efficient LLM Inference over Long Sequences
[2411.17685] Attamba: Attending To Multi-Token States
- GitHub - NVIDIA/Star-Attention: Efficient LLM Inference over Long Sequences
[2411.15242] The Zamba2 Suite: Technical Report
[2410.19055] Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms
[2402.15898] Transductive Active Learning: Theory and Applications

Code

GitHub - ClashLuke/HeavyBall: Efficient optimizers

Articles

Modular: Understanding SIMD: Infinite Complexity of Trivial Problems

Videos

Other

Pretraining Large Language Models

Tweets