The Sentence Transformers library now supports training and finetuning for multimodal embedding and reranker models. This update enables developers to align text and images within a shared vector space using a unified API. It streamlines the creation of visual search systems. Practitioners can now deploy custom multimodal retrieval pipelines with significantly less boilerplate code.