No frontier model scored above 50% on the new ITBench-AA benchmark. Developed by IBM and Artificial Analysis, the test evaluates agentic ability to handle complex IT tasks. Current LLMs struggle with multi-step reasoning and tool use in enterprise environments. This gap highlights a critical failure in deploying autonomous agents for technical infrastructure.