IBM used a curated dataset of 10 trillion tokens to train Granite 4.1. The models employ a mixture-of-experts architecture to balance efficiency with high performance. This approach reduces inference costs while maintaining reasoning capabilities. Developers can now deploy these open-weights models for enterprise tasks requiring strict data provenance and high reliability.