Apple developed a method to model scene dynamics by operating on long-term motion embeddings derived from tracker trajectories. This approach generates realistic motions via text prompts or spatial pokes without the overhead of full video synthesis. It offers a more efficient alternative for predicting complex trajectories in visual intelligence tasks.