Tokenization

Vocab Size

Larger Vocab:

  • + more characters per token
  • + shorter input sequences
  • + less tokens to process
  • - larger embedding tables

Memory

Compute

Residual Stream