Apple researchers created a motion embedding system that generates long-term trajectories using data from tracker models. This approach bypasses expensive full video synthesis to predict scene dynamics via text prompts or spatial pokes. The method operates orders of magnitude faster than traditional video models. It streamlines how agents simulate realistic physical movement.