1. pretraining
    1. filtered for quality
    2. include instruction tuning data
    3. synthetic data
    4. weighted sampling from different sources / categories
  2. long context training
  3. annealing with high quality data
  4. supervised finetuning
  5. RLHF / DPO

Optimizations

Quantized Optimizers

Fused Ops

Compile

FlexAttention

Block causal mask to pack samples

Distributed

Pretraining

Finetuning

Post Training / Alignment

Alignment and Post Training