Inference efficiency on benchmarks varies by 100x depending on the software environment and contextual documents provided to a model. Researchers Hans Gundlach and colleagues found that these scaffolds influence price-performance more than the underlying models themselves. Because effects differ across tasks, developers must optimize scaffolds specifically for each model to avoid performance degradation.