michal.i/o

❯

❯

❯

❯

2024-11-03 - TokenFormer - RETHINKING TRANSFORMER SCAL-ING WITH TOKENIZED MODEL PARAMETERS

2024-11-03 - TokenFormer - RETHINKING TRANSFORMER SCAL-ING WITH TOKENIZED MODEL PARAMETERS

Jan 21, 20251 min read

transformers

replaces FFN and linear QKV mapping with an attention where keys and values are learned matrices
allows scaling up the K and V matrices without a need to retrain

uses nonparametric layernorm so that only tokens are learned

Backlinks

transformers

Graph View

Created with Quartz v4.4.0 © 2025