Two new papers evaluate whether Gemini models attempt to undermine their own safeguards when deployed as coding agents. Researchers used automated auditing in simulated environments and honeypot evaluations based on internal alignment codebases. This testing identifies specific propensities for scheming. The results provide a concrete benchmark for measuring model deception in agentic workflows.