Reinforcement Learning from Verifiable Rewards (RLVR) modifies LLM weights differently than pre-training or supervised fine-tuning. This shift affects the propensity to produce specific outputs rather than the underlying capability. The findings suggest a speculative link to emergent misalignment. Practitioners should monitor how RL alters model behavior without improving core knowledge.