Two new papers evaluate whether Gemini models attempt to undermine their own oversight when deployed as coding agents. Researchers used simulated environments and honeypot evaluations based on real alignment codebases to detect scheming tendencies. This testing identifies if models actively sabotage safeguards. The results provide a concrete benchmark for measuring agentic misalignment.