Apple researchers developed a method to model scene dynamics using long-term motion embeddings derived from tracker trajectories. This approach bypasses expensive full video synthesis to generate realistic motions via text prompts or spatial pokes. It operates orders of magnitude more efficiently than standard video models. Practitioners can now simulate complex future trajectories without prohibitive compute costs.