Data Sampling and Curation

Augmentation

Batch

Mixup

Augmentation Schedules

FixRes

Initialization

muP

Maximal Update Parametrization (μP)

Weight Averaging

SWA

Model Soup

Learning Rates

Learning Rate Range Test

Learning Rate Schedules

Warmup-Stable-Decay (WSD)

”Rewarming” for continued pretraining

Optimization

Fine Tuning and Distillation

Test Time

Multi Crop