pose multimodal learning as a listwise ranking problem where all modalities are ranked together
potentially use late interaction colbert style for fine grained alignment
hard negative mining + synthetic data
Nov 20, 20241 min read
pose multimodal learning as a listwise ranking problem where all modalities are ranked together
potentially use late interaction colbert style for fine grained alignment
hard negative mining + synthetic data