Apple researchers created a motion embedding system that generates long-term trajectories without full video synthesis. By learning from large-scale tracker data, the model produces realistic movements via text prompts or spatial pokes. This approach operates orders of magnitude faster than traditional video models. It streamlines kinematics generation for visual intelligence tasks.