Long Context Transformers

September 25, 2024 updated October 30, 2024 1 min read

RoPE Expansion

Gradient Blog: Scaling Rotational Embeddings for Long-Context Language Models
- rope theta scaling + fine tuning on longer context data
Extending the RoPE | EleutherAI Blog

YaRN

GitHub - jquesnelle/yarn: YaRN: Efficient Context Window Extension of Large Language Models
[2309.00071] YaRN: Efficient Context Window Extension of Large Language Models

Sliding Window Attention

Ring Attention

RING Attention explained: 1 Mio Context Length - YouTube

Tree Attention

[2408.04093] Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

StreamingLLM

GitHub - princeton-nlp/ProLong: Homepage for ProLong (Princeton long-context language models) and paper “How to Train Long-Context Language Models (Effectively)”

Transformer-XL

Longformer

Linformer

Reformer

Blockwise Attention

Adaptive Attention Span

Infini-Attention

ml transformers