Replaying previous user conversations with candidate models allows labs to preview behavior before public release. This Deployment Simulation method complements traditional red-teaming by providing a realistic signal of potential failures. It identifies risks that static evaluations miss. Practitioners can now quantify behavioral shifts between model versions using actual historical data.