TIL;DR: CLIP Scaling
Accelerating Self-Supervised Learning via Efficient Training Strategies
Reproducible scaling laws for contrastive language-image learning
- All previous scaling law research use:
- private data
- language modeling or vision unimodal tasks
- This paper uses:
- CLIP contrastive image-language pretraining
- LAION public dataset
batch size 86-88K, on 1520 A100 GPUs, using pytorch DDP
AdamW b1=0.9 b2 = 0.98 weight decay = 0.2
InfoNCE loss
bfloat16
TLDR: it scales
GitHub - LAION-AI/scaling-laws-openclip: Reproducible scaling laws for contrastive language-image learning GitHub - LAION-AI/CLIP_benchmark: CLIP-like model evaluation