Apple developed a method to model scene dynamics by operating on long-term motion embeddings derived from tracker models. This approach generates realistic motions via text prompts or spatial pokes without the cost of full video synthesis. It enables efficient exploration of multiple future trajectories. Practitioners can now generate complex kinematics with significantly lower compute overhead.