Early Nvidia architectures shifted from fixed-function pipelines to programmable shaders, enabling general-purpose computing. This transition allowed researchers to repurpose graphics hardware for massive parallel matrix multiplication. While the technical shift happened years ago, these foundations now dictate the efficiency of modern LLM training. Practitioners should study these constraints to optimize current inference kernels.