A test of OpenAI's Reinforcement Finetuning (RFT) showed a +14.59 score on numeric questions, beating a +9.25 baseline. However, the finetuned model failed on binary questions, scoring -0.7 against the baseline's +2.4. This suggests RFT improves specific quantitative reasoning but may degrade general binary prediction. Practitioners should choose tuning methods based on target output types.