An OpenAI large language model beat physicians on clinical reasoning tasks using real emergency room records. The study, published in Science, suggests LLMs can navigate complex diagnostic steps better than human practitioners in specific scenarios. This performance challenges the need for rigid, rule-based decision systems. Practitioners should monitor how these models handle high-stakes triage.