Alibaba launches an AI that makes all photos sing realistically

Generative AI models touch all areas, whether text with ChatGPT, video with Sora or images with Midjourney. Technology is even tackling music generation, proof of which is Adobe which launched its Project Music GenAI Control. Meanwhile, Alibaba is unveiling a stunning new tool in China: EMO.

EMO can make every portrait sing

In a research article dated February 27, 2024, Alibaba’s scientific arm reveals its AI model called EMO, capable of transforming photos into video clips. To put it simply, you just need to give him a portrait and, during a “advanced audio and video synthesis”the person on it starts singing.

The proof in images since the model was used to make Joaquin Phoenix sing as Joker in the feature film of the same name, Leonardo DiCaprio, Audrey Hepburn and even the Mona Lisa.

Where EMO impresses is that it doesn’t just move lips. Facial expressions, blinking and lip syncing are very realistic. Alibaba is pleased with its technology and test results. For the company that owns Aliexpress, the videos are “convincing”.

An AI powered by an audio-video database

To supply EMO, Alibaba’s scientific branch explains that it relied on audio-video data made up of 250 hours of content and 150 million images. Using audio data linked to facial movement information, AI can generate realistic facial expressions.

As Stable Diffusion mentioned when launching its version 3, Alibaba says it is aware of the ethical problems that EMO could cause. In terms of disinformation and use of the image of third parties, for example in an electoral context, this AI does not escape legitimate fears. The company is also committed to creating methods to detect fake videos like those generated by its tool.

Advertising, your content continues below

Source link -98