Early Nvidia architectures transitioned from fixed-function pipelines to programmable shaders, enabling the general-purpose computing used today. This shift allowed researchers to repurpose graphics hardware for massive parallel matrix multiplication. Modern LLM training relies entirely on this legacy. Practitioners should study these origins to optimize current CUDA kernels and hardware utilization.