Nine sabotaged ML codebases comprise the new Auditing Sabotage Bench. Neither frontier LLMs nor human-AI teams reliably identified the hidden flaws. Gemini 1.5 Pro struggled to catch attempts to hide misalignment or slow research progress. This failure suggests that automating AI safety audits remains risky if the auditing models themselves are misaligned.