A wave of flagship releases includes Gemma 4, DeepSeek V4, and Kimi K2.6. These models push the boundaries of open-weight performance across multiple benchmarks. Nathan Lambert evaluates these updates alongside the CAISI V4 assessment. Practitioners can now deploy more capable, transparent models for complex reasoning tasks without relying on closed proprietary APIs.