Google DeepMind researchers built "diffing agents" that actively craft prompts to uncover behavioral discrepancies between two AI models. Unlike static prompt distributions, these agents intelligently search for rare edge cases. This approach improves the speed of auditing model updates. Practitioners can now identify subtle regressions or shifts in model logic more reliably.