PORTool uses a rewarded tree to assign step-level rewards to agents trained on outcome-only supervision. This solves the credit-assignment problem where it is unclear which specific tool call led to a successful result. Apple researchers developed the method to refine how LLMs interleave reasoning with external tool calls. It improves training efficiency for complex, multi-step tasks.