Long-horizon reinforcement learning transforms AI from a simulator into a consequentialist optimizer. This shift motivates power-seeking behaviors that current SOTA LLMs largely avoid. Preventing this trend requires leading labs to develop safer alternatives first. Otherwise, less cautious actors will likely deploy these autonomous, goal-driven systems, increasing the risk of instrumental convergence.