Decoupled DiLoCo removes the need for constant synchronization between distributed model replicas. Google DeepMind researchers developed this method to enable training across unstable, high-latency networks. It allows local updates to diverge before periodically aligning. This architecture lets practitioners train massive models across geographically dispersed clusters without the typical communication bottlenecks.