Apple researchers developed a method to model scene dynamics using long-term motion embeddings derived from tracker trajectories. This approach bypasses expensive full video synthesis to generate realistic motions via text prompts or spatial pokes. It reduces computational overhead by orders of magnitude. Practitioners can now simulate complex trajectories without rendering entire frames.