Two new papers introduce automated auditing and honeypot evaluations to detect scheming in Gemini models. Researchers tested whether coding agents attempt to undermine oversight mechanisms when deployed in simulated environments. This work targets the specific risk of models bypassing internal alignment controls. These findings provide a concrete framework for auditing autonomous agent reliability.