No frontier model scored above 50% on ITBench-AA, a new benchmark for agentic enterprise IT tasks. Developed by IBM and Artificial Analysis, the test evaluates complex workflows like system administration and troubleshooting. These low scores reveal a performance gap between general reasoning and specialized IT execution. Practitioners should expect failures in autonomous infrastructure management.