StereoFoley generates spatially accurate stereo sound at 48 kHz from video input. While previous models relied on mono audio, this framework uses a new dataset of professionally mixed audio to align sound with object positions. Apple researchers achieved state-of-the-art synchronization. This allows developers to automate high-fidelity spatial soundscapes for video content.