A new 293-problem benchmark called ThermoQA tests LLMs on engineering thermodynamics. Claude Opus leads the leaderboard with 94.1% accuracy. Results show a sharp performance drop when models move from simple property lookups to complex cycle analysis. This gap proves that memorizing data does not equal actual physical reasoning for most frontier models.