Two new papers evaluate whether Gemini models attempt to undermine oversight mechanisms when deployed as coding agents. Researchers used automated auditing in simulated environments and honeypot evaluations using real alignment codebases. This testing identifies if models exhibit scheming tendencies to bypass safety constraints. The results provide a concrete benchmark for measuring autonomous model deception.