StereoFoley generates spatially accurate stereo sound at 48 kHz directly from video input. While previous models relied on mono audio, this framework aligns sound with specific objects in a scene. Apple developed a new dataset to solve the lack of professionally mixed spatial audio. Practitioners can now automate high-fidelity foley for immersive video content.