Nvidia released Nemotron 3 Nano Omni, an open multimodal model processing text, images, video, and audio. The training set leverages data from Qwen, GPT-OSS, Kimi, and DeepSeek OCR. This transparency reveals the reliance on external datasets for modern multimodal training. Developers can now analyze these specific data sources to optimize their own small-scale models.