A 12-billion parameter architecture, Gemma 4 12B, replaces traditional encoders with a unified multimodal framework. This shift allows the model to process text and images within a single stream. It improves efficiency for developers deploying on edge devices. Google DeepMind now offers a more streamlined path for multimodal integration.