TIL;DR: CLIP Scaling

clipopencliptildr

Accelerating Self-Supervised Learning via Efficient Training Strategies

Reproducible scaling laws for contrastive language-image learning

  1. All previous scaling law research use:
    1. private data
    2. language modeling or vision unimodal tasks
  2. This paper uses:
    1. CLIP contrastive image-language pretraining
    2. LAION public dataset

batch size 86-88K, on 1520 A100 GPUs, using pytorch DDP

AdamW b1=0.9 b2 = 0.98 weight decay = 0.2

InfoNCE loss

bfloat16

TLDR: it scales

GitHub - LAION-AI/scaling-laws-openclip: Reproducible scaling laws for contrastive language-image learning GitHub - LAION-AI/CLIP_benchmark: CLIP-like model evaluation