Post-training focuses on RLHF and supervised fine-tuning to align raw base models. Finbarr Timbers details the specific iterative loops required to move from raw pre-training to a helpful assistant. This technical deep-dive clarifies why data quality outweighs quantity in final stages. Practitioners can now better optimize their alignment pipelines for higher reasoning performance.