Google unveils Lumiere, a truly stunning video-generating AI


Rémi Bouvet

January 26, 2024 at 3:02 p.m.

12

Light Google Research © Google Research

Light Google Research

This AI model abandons the classic waterfall design for a more advanced approach; with quite promising results.

While image generators using artificial intelligence are now legion and often very successful, similar tools for video remain fewer and far less convincing. A team of researchers, several of whom work on behalf of Google Research, intends to remedy this with Lumiere, a new kind of video generation AI model.

A model that takes a different approach

Creating a video using AI is more complex than producing a static image for several factors. The main thing is the coherence of movements: it is difficult to make a gait natural, for example. There may also be problems with jerking or managing interactions with the setting.

To overcome this problem, rather than assembling a succession of individual images resulting in a more or less satisfactory agglomerate, Lumiere shapes the entire video in a single process via simultaneous management of the placement of objects and their movement.

The authors specify: “The U-Net Space-Time architecture generates the entire temporal duration of the video at once, through a single pass through the model. This contrasts with existing video models that synthesize distant keyframes followed by temporal super-resolution, an approach that inherently makes global temporal coherence difficult..

Hila Chefer, one of the contributors, posted some demos on her X.com account.

The researchers compare the consistency offered by Lumiere versus that of Imagen Video – another Google AI video tool – based on a more traditional waterfall design.

ImagenVideo and Lumiere comparison © Google

Comparison of ImagenVideo and Lumiere © Google Research

The result Lumiere achieves is also shown in the video below.


5 second clips

Lumiere is able to generate 80 images at a rate of 16 frames per second, which corresponds to a 5 second sequence. We remain far from the feature film (and even the short), but this duration is consistent with the majority of current solutions. The Stable Video Diffusion model, for example, results in sequences of 14 to 25 images for refresh rates of between 3 and 30 images per second. In addition, the definition is 576 x 1024 pixels compared to 1024 × 1024 for Lumiere. Other competing solutions include that of Pika Labs.

Lumiere can generate videos from different requests, starting with text-to-video. Like a traditional image generator, it is a simple written description of the request; “a dog wearing sunglasses that drives a car” to take one of those expressed in the illustrative video.

Lumiere also supports an image-to-video prompt. It consists of generating a video from an image. It is also possible to request stylized videos from a reference image. Finally, in addition to generating videos, the model can edit existing videos, to animate or fill certain areas, which is less ordinary.

It is not possible to play with Lumiere at the moment; this remains a research project. There is only one certainty: in the relatively near future, video generators using artificial intelligence will become as easy to access as image generators.

Stunet approach © Google

STUNet model © Google Research

TSR approach © GoogleTSR approach © Google

Comparison of TSR (temporal super-resolution) and STUNet models © Google Research

You will find details on the different TSR and STUNet models to finish. Feel free to consult the source if you want to explore the subject further.

The best AI to generate your content

The emergence of artificial intelligence as a mainstream tool has opened up numerous possibilities for all content producers. Text, image, sound… This new fashionable technology can now provide assistance in many areas, and facilitate work in the most difficult stages of creation. And with an ever-increasing offering, it is important to distinguish which tools provide real added value. So you don’t waste hours trying everything the Google results pages offer!
Read more

Source : Google Research



Source link -99