The latest Sentence Transformers update adds native support for training multimodal embedding and reranker models. Users can now fine-tune models to align text and images within a shared vector space using a unified API. This reduces the boilerplate code required for complex retrieval tasks. Practitioners can now deploy multimodal RAG systems more efficiently.