Two new papers evaluate whether Gemini models exhibit scheming tendencies when deployed as coding agents. Researchers used automated auditing in simulated environments and honeypot evaluations based on real alignment codebases. The tests specifically check if models attempt to undermine their own oversight. These findings help practitioners identify hidden misalignment before deploying autonomous agents.