A new research paper introduces controllable 2D grid environments to measure how Language Model agents balance exploration and exploitation. The framework uses a Directed Acyclic Graph to track decision-making accuracy without accessing internal policies. This allows developers to isolate whether an agent fails due to poor discovery or flawed execution.