The MixAtlas framework optimizes multimodal LLM midtraining by using small proxy models to determine ideal data mixtures. This approach replaces manual tuning with systematic domain decomposition to improve sample efficiency. It targets the gap in how researchers balance different data formats. Practitioners can now refine training recipes with less compute overhead.