A new framework from Apple Machine Learning Research moves tool-calling evaluation into the active execution loop. This replaces post-hoc trajectory assessments with a specialized reviewer agent that identifies errors during inference. By providing immediate feedback, the system enables agents to course-correct in real time. This approach reduces the reliance on iterative prompt-tuning and costly retraining.