Decoupled DiLoCo allows distributed AI training across networks with high latency and unstable connections. Google DeepMind researchers decoupled the weight-averaging process from the gradient updates. This architecture prevents slow nodes from bottlenecking the entire cluster. Practitioners can now train large models across geographically dispersed data centers without sacrificing convergence speed or stability.