Direct Preference Optimization now applies to vision and audio models, moving past simple text chat. Researchers use this method to align model outputs with human preferences without the complexity of reward models. This shift simplifies the training pipeline. Practitioners can now refine multimodal systems using the same preference-based logic used for LLMs.