The LABBench2 framework evolves previous benchmarks to measure how AI systems perform actual scientific work. It shifts focus from rote knowledge toward real-world capabilities like autonomous hypothesis generation. Researchers designed the tool to track progress in AI-driven labs. This provides a concrete metric for developers building agentic systems for biological discovery.