A new reviewer agent evaluates tool-calling trajectories during inference to fix errors in real time. Apple Machine Learning Research moves evaluation into the execution loop, replacing post-hoc assessments. This approach bypasses slow prompt-tuning and retraining cycles. Practitioners can now deploy agents that self-correct parameter accuracy and tool selection without needing a full model update.