A flood of new releases includes Gemma 4, DeepSeek V4, and Kimi K2.6. These models push the boundaries of open-weights performance across multiple families. Nathan Lambert evaluates these updates using the CAISI V4 assessment framework. Practitioners can now benchmark these diverse architectures against a standardized set of high-reasoning tasks to determine the best fit.