The MixAtlas framework optimizes multimodal data mixtures using systematic domain decomposition and small proxy models. This approach replaces manual tuning of data formats and task types with principled domain reweighting. It improves sample efficiency during midtraining. Practitioners can now refine multimodal datasets with significantly less compute than traditional trial-and-error methods.