No leading AI model scored above 50% on ITBench-AA, a new benchmark for agentic enterprise IT tasks. Developed by IBM and Artificial Analysis, the test reveals a steep drop in performance when models move from simple chat to complex system administration. This gap confirms that current agents struggle with real-world technical reliability.