No frontier model scored above 50% on ITBench-AA, a new benchmark for agentic IT tasks. Developed by IBM and Artificial Analysis, the test evaluates complex workflows like cloud configuration and troubleshooting. Current LLMs struggle with the precise tool-use and multi-step reasoning required. This gap highlights a significant hurdle for autonomous enterprise deployment.