A new survey of 100 papers identifies a shift toward World Action Models that simulate environment changes before executing movement. These architectures learn from unlabeled everyday videos, bypassing the need for expensive robot action labels. This allows agents to predict physical outcomes, reducing the trial-and-error failure rate in complex environments.