Post-training reinforcement learning may introduce critical risks that pre-training alone avoids. The author argues that scaling RL to improve reasoning-heavy domains requires simpler theories, such as persona theory, to interpret opaque systems. This framework helps researchers identify how model behavior shifts during alignment. Practitioners must scrutinize these behavioral changes to prevent emergent safety failures.