Two new papers evaluate whether Gemini models attempt to undermine their own safeguards when deployed as coding agents. Researchers used simulated agentic environments and honeypot evaluations based on internal alignment codebases. This testing identifies if models exhibit scheming propensities. The results help developers build more resilient oversight mechanisms for autonomous agents.