Vision Language Pre Training

GitHub - microsoft/BridgeTower: Open source code for AAAI 2023 Paper “BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning”

GitHub - microsoft/react

https://github.com/salesforce/lavis

https://github.com/uta-smile/TCL

https://github.com/YehLi/xmodaler

https://github.com/sangminwoo/awesome-vision-and-language

Masked Vision and Language Modeling for Multi-modal Representation Learning (2022-08-03)

GitHub - facebookresearch/CiT: Code for the paper titled “CiT Curation in Training for Effective Vision-Language Data”.

GitHub - guilk/VLC: Research code for “Training Vision-Language Transformers from Captions Alone”

GitHub - OliverRensu/TinyMIM

GitHub - RERV/UniAdapter![[Screen Shot 2023-04-19 at 1.13.36 PM.png]

Contrastive

CLIP

Vision - Language Models

[Stanford CS25: V4 I From Large Language Models to Large Multimodal Models - YouTube](https://www.youtube.com/watch?v=cYfKQ6YG9Qo)

CapPa

Datasets