Apple developed a method to model scene dynamics using long-term motion embeddings derived from large-scale tracker trajectories. This approach generates realistic motions via text prompts or spatial pokes without the overhead of full video synthesis. It enables efficient prediction of multiple futures. Practitioners can now generate complex kinematics orders of magnitude faster.