PaliGemma 2
TLDR
- Take pretrained SigLip Model and Gemma LLM, add linear projection from SigLip tokens to Gemma
- Train whole thing end to end on 224x224 resolution (1 Billion Examples)
- Tune at larger resolutions (50 mil at 448, then 10 mil at 896)
- sample tasks that require larger resolution like OCR
- Tune for downstream tasks