The latest Sentence Transformers update introduces multimodal embedding and reranker models. These tools allow developers to map images and text into a shared vector space for improved retrieval. This integration simplifies the pipeline for building visual search engines. Practitioners can now deploy complex multimodal RAG systems using a single, unified library.