The PORTool algorithm assigns step-level rewards to LLM agents using outcome-level supervision. This approach solves credit-assignment ambiguity in multi-tool reasoning by identifying which specific tool calls drove success. Apple researchers used a rewarded tree to optimize policy. Practitioners can now train agents to handle complex, interleaved tasks with higher precision.