Multihead Self Attention

Individual Weights and Concat

Single Wqkv

Einsum

Torch Scaled Dot Product Attention (SDPA)

FlashAttention

FlexAttention

Masking

PrefixLM

Grouped Query Attention

Sliding Window

KV Cache