Farewell Photoshop? Google’s new AI lets you edit images by asking.

Multimodal output opens up new prospects

Having true multimodal output opens up attention-grabbing new prospects in chatbots. For instance, Gemini 2.0 Flash can play interactive graphical video games or generate tales with constant illustrations, sustaining character and setting continuity all through a number of photos. It is from excellent, however character consistency is a brand new functionality in AI assistants. We tried it out and it was fairly wild—particularly when it generated a view of a photograph we supplied from one other angle.

Textual content rendering represents one other potential power of the mannequin. Google claims that inner benchmarks present Gemini 2.0 Flash performs higher than “main aggressive fashions” when producing photos containing textual content, making it doubtlessly appropriate for creating content material with built-in textual content. From our expertise, the outcomes weren’t that thrilling, however they had been legible.

An example of in-image text rendering generated with Gemini 2.0 Flash. — An instance of in-image textual content rendering generated with Gemini 2.0 Flash.

Credit score:

Google / Ars Technica

Regardless of Gemini 2.0 Flash’s shortcomings to date, the emergence of true multimodal picture output appears like a notable second in AI historical past due to what it suggests if the know-how continues to enhance. In the event you think about a future, say 10 years from now, the place a sufficiently complicated AI mannequin may generate any sort of media in actual time—textual content, photos, audio, video, 3D graphics, 3D-printed bodily objects, and interactive experiences—you mainly have a holodeck, however with out the matter replication.

Coming again to actuality, it is nonetheless “early days” for multimodal picture output, and Google acknowledges that. Recall that Flash 2.0 is meant to be a smaller AI mannequin that’s sooner and cheaper to run, so it hasn’t absorbed your entire breadth of the Web. All that data takes a whole lot of area by way of parameter depend, and extra parameters means extra compute. As a substitute, Google educated Gemini 2.0 Flash by feeding it a curated dataset that additionally possible included focused artificial knowledge. Consequently, the mannequin doesn’t “know” the whole lot visible in regards to the world, and Google itself says the coaching knowledge is “broad and normal, not absolute or full.”

That is only a fancy manner of claiming that the picture output high quality is not excellent—but. However there may be loads of room for enchancment sooner or later to include extra visible “data” as coaching strategies advance and compute drops in price. If the method turns into something like we have seen with diffusion-based AI picture turbines like Steady Diffusion, Midjourney, and Flux, multimodal picture output high quality could enhance quickly over a brief time frame. Prepare for a totally fluid media actuality.

Source link

Farewell Photoshop? Google’s new AI lets you edit images by asking.

WhatsApp provides no cryptographic management for group messages

Trump admin to roll back Biden’s AI chip restrictions

DOGE software engineer’s computer infected by info-stealing malware

AI use damages professional reputation, study suggests

New pope chose his name based on AI’s threats to “human dignity”

New attack can steal cryptocurrency by planting false memories in AI chatbots

Apple iPhone 16E Specs vs. iPhone 15 Pro: New Entry-Level or Last Year’s Pro

The US factory that lays bare the contradiction in Trump’s policy

The Automation Trap: Why Low-Code AI Models Fail When You Scale

Inside the story that enraged OpenAI

Featured Picks

Stern Pinball’s Brand-New King Kong Game Is Totally Bananas

Best AI Girl Generators in 2024

North Korean hackers stole $1.3bn in crypto this year, report says

Farewell Photoshop? Google’s new AI lets you edit images by asking.

Multimodal output opens up new prospects

Related Posts