Tokenization
Common Tokenizers
BPE - Byte Pair Encoding
WordPiece
SentPiece
TokenMonster
Over-Tokenization

Length-MAX
Getting Rid of Tokenization
Byte Latent Transformer
-
Byte Latent Transformer: Patches Scale Better Than Tokens | Research - AI at Meta
-
[2401.13660] MambaByte: Token-free Selective State Space Model