The GPT 5.6 Preview system card uses a Multi Select Virology Troubleshooting benchmark to measure biological risk. This evaluation tracks how model capabilities scale relative to the tokens spent on a task. Practitioners should prioritize this efficiency metric over raw accuracy to identify dangerous capabilities emerging at lower resource costs.