pose multimodal learning as a listwise ranking problem where all modalities are ranked together
potentially use late interaction colbert style for fine grained alignment
hard negative mining + synthetic data
Jan 21, 20251 min read
pose multimodal learning as a listwise ranking problem where all modalities are ranked together
potentially use late interaction colbert style for fine grained alignment
hard negative mining + synthetic data