No frontier model scored above 50% on ITBench-AA, a new benchmark for agentic IT tasks. Developed by IBM and Artificial Analysis, the test evaluates complex workflows like cloud configuration and troubleshooting. These results expose a critical gap between general LLM reasoning and the precision required for autonomous enterprise infrastructure management.