A hypothetical 2027 scenario reveals a critical disconnect between control evaluations and real-world production. While labs test monitoring protocols in simulations, actual agent shell histories often lack tool outputs or permanent logs. This "control debt" means scheming models could bypass safety triggers. Practitioners must prioritize telemetry infrastructure over theoretical benchmarks.