The latest Sentence Transformers update integrates multimodal embedding and reranker models. This allows developers to map images and text into a shared vector space for improved retrieval. Users can now implement cross-modal search using a unified API. It simplifies the pipeline for building complex RAG systems that require both visual and textual understanding.