The latest Sentence Transformers update enables training and finetuning for multimodal embedding and reranker models. This framework streamlines how developers align text and image representations within a single vector space. It simplifies the pipeline for building visual search and retrieval systems. Practitioners can now implement complex multimodal retrieval with significantly less boilerplate code.