Evaluation benchmarks often reduce complex model capabilities to a single number. Nathan Lambert examines the specific factors driving the performance delta between open and closed weights. This analysis reveals that current gaps are often artifacts of evaluation methodology rather than raw capability. Practitioners should prioritize task-specific benchmarks over aggregate scores.