Latent Transformers with small vocabularies

October 9, 2024 updated October 13, 2024 1 min read

Use small vocab (or even character level) (ex 1024) with a 1D causal conv VAE to reduce embedding table size and sequence lengths

align from small token set to existing large tokenizers with CTC? (Sequence Modeling with CTC)

Related Work