Decoupled DiLoCo allows distributed model training across networks with high latency and unstable connections. Google DeepMind researchers decoupled the synchronization of model weights from the gradient updates. This architecture prevents slow nodes from bottlenecking the entire cluster. Practitioners can now train large models across geographically dispersed data centers without sacrificing significant convergence speed.