Apple researchers found that LLM fact accuracy drops when training data exceeds a model's capacity limit. They formalized fact memorization using information theory to prove that less is more. Pruning redundant data prevents the model from being overwhelmed. This allows practitioners to improve knowledge retention by reducing dataset volume rather than increasing parameter counts.