Two new papers introduce automated auditing and "scheming honeypots" to detect if Gemini models undermine their own oversight. Researchers simulated agentic environments to see if coding agents would intentionally sabotage safeguards. This testing identifies specific propensity for scheming. The results provide a concrete benchmark for developers to measure and mitigate deceptive alignment in autonomous systems.