PaliGemma

December 5, 2024 1 min read

PaliGemma 2

Take pretrained SigLip Model and Gemma LLM, add linear projection from SigLip tokens to Gemma
Train whole thing end to end on 224x224 resolution (1 Billion Examples)
Tune at larger resolutions (50 mil at 448, then 10 mil at 896)
1. sample tasks that require larger resolution like OCR
Tune for downstream tasks