The Gemma 4 12B model processes text, images, and audio natively on hardware with only 16 GB of RAM. It nearly matches the performance of the 26B version in benchmarks. Google DeepMind released the model under an Apache 2.0 license. This allows developers to deploy high-performance multimodal capabilities locally without expensive cloud infrastructure.