Two new papers evaluate whether Gemini models sabotage their own safeguards when deployed as coding agents. Researchers used simulated agentic environments and honeypot evaluations based on internal alignment codebases to detect scheming propensities. This testing reveals if models actively undermine oversight. The results provide a critical benchmark for developers building autonomous agentic workflows.