Sora’s AI video revolution is still a ways off

The first version of OpenAI’s Sora can generate video of just about anything you throw at it — superheroes, cityscapes, animated puppies. It’s an impressive first step for the AI video generator. But the actual results are far from satisfactory, with many videos so heavily plagued with oddities and inconsistencies that it’s hard to imagine anyone finding much use for them.

Sora was released on Monday after almost a year of teasers heralding its capabilities. There are a few hurdles before you get to the video generation features, though. For one, account creation was closed within hours of launching due to the overwhelming demand. Those who did manage to sign up will find that its features also require a subscription to unlock: a $20 monthly “Plus” membership will let you generate videos at 480p or 720p, capped at either five or 10 seconds in length depending on the resolution. To unlock everything, including 1080p quality and 20-second-long videos, you need to cough up $200 a month for the “Pro” Sora subscription.

My results from testing the Plus tier have been underwhelming. Simple prompts with limited descriptions seem to work best — “a cat playing with a ball of yarn,” for example, generates a very realistic-looking cat bouncing excitedly around the floor. But Sora gave the cat a second tail for a few moments, and the yarn itself was jittery and looked like badly inserted CGI.

These visual issues were more frequent and glaring for complex prompts that provided detailed scene descriptions. It’s difficult to get human motion to be remotely natural: hands flailed everywhere when I asked it to show me someone applying makeup, and videos of people eating salad and sausage rolls were nightmarishly reminiscent of the viral AI clips of Will Smith inhaling spaghetti.

Sora includes an interesting Storyboard feature that’s supposed to help with laying out prompt instructions for longer videos. It resembles a video editing timeline, allowing users to explain what they want Sora to generate every two seconds rather than inserting one massive description for the entire video. It’s easy enough to use, but the results were even poorer. The more detail I added, the more distortions and weirdness appeared.

Some things did impress me, though. Video generation was faster than expected, generally under 30 seconds for even 10-second-long clips. Patterns on fur and textiles also remained consistent, even throughout fast-paced movement, and the lighting, shadow, and mirror effects generated by Sora do a fantastic job of simulating the real thing. Sunlight coming through a window would provide a flash of glare and beautifully shine through all the materials you’d expect. Even at low resolutions, most objects have high levels of detail and don’t scramble into a pixelated mess.

For all its faults, Sora did a better job than Runway AI, which is considered to be one of the better AI video generators for simulating photorealism. When identical prompts were entered into both platforms, Sora’s results looked more realistic and contained far fewer visual distortions. The quality of Sora’s outputs is also on par with the demos I saw in October of Adobe’s Firefly Video Model at Adobe Max, though OpenAI obviously lacks the perk of promising that generated outputs are commercially safe. Adobe achieved this by only training its AI models on licensed or public-domain content, an ethos that OpenAI hasn’t followed.

[The above video was generated using Runway.AI using the same prompt I gave Sora.]

Nothing that Sora generated from scratch was actually usable, though. It’s definitely not ready for entertainment or commercial work that needs narrative coherence, and you’d really have to reach to even use this as a replacement for a quick flash of stock footage. Perhaps getting high-quality videos that don’t include any obvious AI weirdness is possible with enough time, experience, and editing skills, but if that’s the case, then it doesn’t feel like Sora is substantially “democratizing” content creation just yet.

There are also several guardrails in place that aim to prevent copyright infringement or anything nasty from being generated, but with varying levels of success. Sora outright blocks attempts to generate political figures like Donald Trump and Kamala Harris, warning the user that such prompts may violate OpenAI’s terms of service. Celebrity names like Taylor Swift and Lewis Hamilton aren’t blocked but will instead just insert into the video a random person that bears no resemblance to them. It’s pretty good at avoiding recognizable characters and brand icons, too, even with descriptions that try to force results like “a blue bipedal cartoon hedgehog wearing red shoes.”

Things get shakier when it comes to the scenes you’re requesting. Some violent terms like “a truck driving into frightened protestors” were blocked, but it generated a clip of an explosion at the Empire State Building — even if the results were laughably cartoonish. It also produced videos of toddlers modeling swimsuits on a runway and pointing guns at their smiling parents.

Sora includes a feature that allows you to upload your own reference images. A pop-up message forces users to tick a bunch of boxes before it can be used, promising that you own the rights to those images and won’t upload anything containing minors, violence, or explicit themes, or else risk your account being suspended or banned “without refund.” But the biggest deterrent preventing the feature from being abused is financial — only users with Pro-tier subscriptions can upload images with people in them. If this is the feature used to create the more impressive Sora demos we’ve seen, that’s a significant limitation.

It’s early days and there are some obvious issues to iron out, but nothing I’ve seen so far makes me think that Sora is going to revolutionize video production overnight. The features to create high-quality outputs are locked behind a subscription that’s about as pricey as traditional filming and video creation tools, making it inaccessible for many. It’s hard to imagine an entire movie being produced using this technology in its current state that would actually be pleasant to watch.

Quality issues haven’t stopped people from already trying to profit from the convenience AI video tools provide, though — YouTube is already saturated with nonsensical AI-generated slop targeted toward young children. Sora is more than capable of churning out similar content right now, and it’ll only cost you $20 a month to do so.

Source link

You may also like