On Tuesday, Google launched Veo 3, a brand new AI video synthesis mannequin that may do one thing no main AI video generator has been capable of do earlier than: create a synchronized audio observe. Whereas from 2022 to 2024, we noticed early steps in AI video era, every video was silent and often very quick in length. Now you may hear voices, dialog, and sound results in eight-second high-definition video clips.
Shortly after the brand new launch, folks started asking the obvious benchmarking query: How good is Veo 3 at faking Oscar-winning actor Will Smith at consuming spaghetti?
First, a quick recap. The spaghetti benchmark in AI video traces its origins again to March 2023, when we first covered an early instance of horrific AI-generated video utilizing an open supply video synthesis mannequin referred to as ModelScope. The spaghetti instance later grew to become well-known sufficient that Smith parodied it virtually a yr later in February 2024.
This is what the unique viral video regarded like:
One factor folks neglect is that on the time, the Smith instance wasn’t the very best AI video generator on the market—a video synthesis mannequin referred to as Gen-2 from Runway had already achieved superior outcomes (although it was not but publicly accessible). However the ModelScope end result was humorous and peculiar sufficient to stay in folks’s recollections as an early poor instance of video synthesis, useful for future comparisons as AI fashions progressed.
AI app developer Javi Lopez first got here to the rescue for curious spaghetti followers earlier this week with Veo 3, performing the Smith check and posting the results on X. However as you will discover under while you watch, the soundtrack has a curious high quality: The fake Smith seems to be crunching on the spaghetti.
On X, Javi Lopez ran “Will Smith consuming spaghetti” in Google’s Veo 3 AI video generator and obtained this end result.
It is a glitch in Veo 3’s experimental capacity to use sound results to video, seemingly as a result of the coaching knowledge used to create Google’s AI fashions featured many examples of chewing mouths with crunching sound results. Generative AI fashions are pattern-matching prediction machines, and so they must be proven sufficient examples of varied sorts of media to generate convincing new outputs. If an idea is over-represented or under-represented within the coaching knowledge, you will see uncommon era outcomes, akin to jabberwockies.