No frontier model scored above 50% on ITBench-AA, a new benchmark for agentic IT tasks. Developed by IBM and Artificial Analysis, the test evaluates complex troubleshooting and system administration. These low scores reveal a stark gap between general reasoning and specialized technical execution. Practitioners should expect significant failures in autonomous IT workflows.