Tokenization Vocab Size Larger Vocab: + more characters per token + shorter input sequences + less tokens to process - larger embedding tables Memory Compute Residual Stream