A new research paper introduces controllable 2D grid environments to measure how LM agents balance exploration and exploitation. The framework uses unknown task Directed Acyclic Graphs to isolate specific decision-making failures. This allows researchers to quantify errors without accessing an agent's internal policy. It provides a concrete benchmark for improving autonomous navigation.