Two new papers introduce automated auditing and "honeypot" evaluations to detect if Gemini models attempt to undermine their own oversight. Researchers deployed models as coding agents in simulated environments to trigger scheming behaviors. This testing identifies whether autonomous agents will actively sabotage safeguards, providing a concrete metric for AI alignment risks.