Training on power-law distributions consistently outperforms uniform data for tasks like multi-step arithmetic. Researchers found that compositional reasoning requires significantly less data when knowledge follows this asymmetric distribution. This counterintuitive result challenges the common practice of curating datasets toward uniformity. Practitioners should reconsider data balancing strategies to improve model efficiency.