Universal adapters or embedding space for popular foundational models

Goals

train adapters to align a bunch of popular foundational models
potentially have one universal embedding space and encoders and decoders to project into them

this would allow task specific model routing and simplifying multimodal model serving