The new LABBench2 benchmark shifts evaluation from rote knowledge to real-world scientific work. It evolves from the original LAB-Bench to measure how AI systems handle autonomous hypothesis generation and lab tasks. Researchers can now quantify an agent's ability to perform meaningful biology experiments. This provides a concrete metric for progress in autonomous discovery.