Training data that exceeds a model's capacity limit triggers suboptimal fact accuracy. Researchers at Apple found that pruning redundant information improves a model's ability to memorize specific facts. This information-theoretic approach reduces hallucinations by optimizing data distributions. Practitioners can now refine datasets to maximize parameter efficiency without increasing model size.