RoPE Expansion
- Gradient Blog: Scaling Rotational Embeddings for Long-Context Language Models
- rope theta scaling + fine tuning on longer context data
- Extending the RoPE | EleutherAI Blog
Sliding Window Attention
Ring Attention
Tree Attention
StreamingLLM
- [2309.17453] Efficient Streaming Language Models with Attention Sinks
- StreamingLLM - Efficient Streaming Language Models with Attention Sinks Explained - YouTube