Vision Language Pre Training
https://github.com/salesforce/lavis
https://github.com/uta-smile/TCL
https://github.com/YehLi/xmodaler
https://github.com/sangminwoo/awesome-vision-and-language
Masked Vision and Language Modeling for Multi-modal Representation Learning (2022-08-03)
GitHub - guilk/VLC: Research code for "Training Vision-Language Transformers from Captions Alone"
GitHub - RERV/UniAdapter![[Screen Shot 2023-04-19 at 1.13.36 PM.png]
Contrastive
CLIP
GitHub - baaivision/EVA: EVA Series: Vision Foundation Model Fanatics from BAAI
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
[2311.17049] MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
[2307.16634] CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification
[2309.05551] OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data
Vision to Language
- [2306.07915v3] Image Captioners Are Scalable Vision Learners Too
- [2311.03079] CogVLM: Visual Expert for Pretrained Language Models