Transformer Properties October 9, 2024 updated October 13, 2024 1 min read Tokenization Vocab Size Larger Vocab: + more characters per token + shorter input sequences + less tokens to process - larger embedding tables Memory Compute Residual Stream ml