A new research paper introduces controllable 2D grid maps and task DAGs to quantify how agents balance discovery and knowledge use. The framework measures these errors from observed actions without needing access to internal policies. This provides LM agents developers a concrete way to benchmark decision-making efficiency in embodied AI scenarios.