Early Fusion Multimodal Encoder Models

October 26, 2024 updated November 4, 2024 1 min read

All to all pretraining

MOE with default active experts for each modality

block router with choice of conv, attention and etc

large register bank

conv encoders on characters and pixels (separate encoders)

distill from a bunch of SOTA models