On Tuesday, OpenAI announced Sora 2, its second-generation video-synthesis AI mannequin that may now generate movies in varied types with synchronized dialogue and sound results, which is a primary for the corporate. OpenAI additionally launched a brand new iOS social app that permits customers to insert themselves into AI-generated movies by way of what OpenAI calls “cameos.”
OpenAI showcased the brand new mannequin in an AI-generated video that encompasses a photorealistic model of OpenAI CEO Sam Altman speaking to the digital camera in a barely unnatural-sounding voice amid fantastical backdrops, like a aggressive ride-on duck race and a glowing mushroom backyard.
Concerning that voice, the brand new mannequin can create what OpenAI calls “refined background soundscapes, speech, and sound results with a excessive diploma of realism.” In Might, Google’s Veo 3 grew to become the primary video-synthesis mannequin from a serious AI lab to generate synchronized audio in addition to video. Just some days in the past, Alibaba launched Wan 2.5, an open-weights video mannequin that may generate audio as effectively. Now OpenAI has joined the audio social gathering with Sora 2.
OpenAI demonstrates Sora 2’s capabilities in a launch video.
The mannequin additionally options notable visible consistency enhancements over OpenAI’s earlier video mannequin, and it could actually additionally observe extra complicated directions throughout a number of pictures whereas sustaining coherency between them. The brand new mannequin represents what OpenAI describes as its “GPT-3.5 second for video,” evaluating it to the ChatGPT breakthrough through the evolution of its text-generation fashions over time.
Sora 2 seems to display improved bodily accuracy over the unique Sora mannequin from February 2024, with OpenAI claiming the mannequin can now simulate complicated bodily actions like Olympic gymnastics routines and triple axels whereas sustaining reasonable physics. Final 12 months, shortly after the launch of Sora 1 Turbo, we saw a number of notable failures of comparable video-generation duties that OpenAI claims to have addressed with the brand new mannequin.
“Prior video fashions are overoptimistic—they may morph objects and deform actuality to efficiently execute upon a textual content immediate,” OpenAI wrote in its announcement. “For instance, if a basketball participant misses a shot, the ball could spontaneously teleport to the ring. In Sora 2, if a basketball participant misses a shot, it’ll rebound off the backboard.”

