Training on power-law distributions outperforms uniform data in multi-step arithmetic and state tracking. Researchers at arXiv found that asymmetry allows models to learn compositional skills with significantly less data. This contradicts the common intuition that data curation should favor a uniform distribution. Practitioners should reconsider aggressive data balancing for reasoning tasks.