Large-scale trajectories from tracker models now power a new motion embedding system from Apple. This approach bypasses expensive full video synthesis to predict scene dynamics more efficiently. Users can generate long, realistic motions using text prompts or spatial pokes. It provides a faster alternative for practitioners needing precise kinematic generation without rendering entire frames.