3 seconds of recording is enough for this Microsoft AI to copy your voice


It’s the commotion at Microsoft on artificial intelligence: the firm has developed a tool called “Vall-E” which makes it possible to create voice replicas from a three-second recording. In addition to simply reproducing a voice, this AI can reproduce emotions.

Source: Turag Photography via Unsplash

At the start of 2023, the trend is undeniably towards artificial intelligence and automatic generation tools. On Microsoft’s side, the company has created its own DALL-E 2, and would like to integrate ChatGPT into Bing to compete with Google. Also, Microsoft would like to invest 10 billion dollars in OpenAI to integrate AI tools into the Office suite. A busy start to the year that is not over: with Vall-E, Microsoft can reproduce the human voice from just three seconds of recording.

Vall-E: Microsoft’s artificial intelligence that can reproduce a voice

A few days ago, Microsoft published a scientific article presenting ” a language modeling approach for text-to-speech synthesis “. A text-to-speech tool that doesn’t just turn text into a voicerobotics created from scratch, but in a voice created from a real human voice. The developers say they trained their model for 60,000 hours in English. According to them, these are hundreds of times more than existing systems “.

Diagram of how Vall-E works // Source: Microsoft

With his abilities, Vall-E “ can be used to synthesize high quality personalized speech with only a 3 second recording of an unknown speaker as an acoustic guide“. Words can therefore be pronounced by a voice without the latter ever having pronounced them. In addition to that, the tool can preserve the emotion of the speaker and the acoustic environment of the acoustic guest in the synthesis “.

Obviously, the more samples, the more accurate the recreated voice. If the recordings generated and published by Microsoft are not all convincing, they were with three seconds of recording. With more samples, one can imagine that the AI ​​is more efficient.

What can this reproduction voice synthesis be used for?

In the presentation of Vall-E, some possible uses were detailed: “ VALL-E directly enables various speech synthesis applications, such as TTS(text-to-speech, text to voice in French)voice editing and content creation, in combination with other generative AI models like GTP-3“.

However, Vall-E could be used for less honest purposes. For many years, technologydeep fakeis becoming more democratic: it consists of modifying videos or images to attach a person’s face to a body that does not belong to them, with the aim of deceiving. If at the moment Vall-E is not available, Microsoft has not put anything in place to prevent these problems.

The developers imagine that “speech editing models should be accompanied by relevant components, including the protocol to ensure that the speaker agrees to perform the editing and the system to detect the edited speech“.

An explanatory diagram about Dall-E // Source: OpenAI

If the tool exists and if the demonstrations are encouraging, Microsoft’s biggest challenge is not technical, but ethical. Public figures, some of whom are already victims ofdeep fakes, could be the most impacted naturally. Moreover, one can imagine that Vall-E is used in addition to a tool fordeep fakevideo, to create scandalous fake videos.

Also, Vall-E could very well be used to impersonate someone on the phone. As for artists with automatic image generation AIs, Microsoft’s tool could endanger the jobs of many people: voiceover professionals, dubbing professionals, etc.

Everyone is in the race for generative AI

At the same time, other automatic generation tools are under development. A few weeks ago, OpenAI, the company behind ChatGPT, presented Point-E, a tool for generating 3D models. Microsoft is far from being the only GAMAM in the game, since Meta manages to create videos from text and Google is working hard to develop tools from AI.

Result for “An astronaut riding a horse in a photorealistic style” // Source: OpenAI

Apple has even gone further since the company is marketing a series of audio books with an artificial narrator, generated by AI. In the video gameHigh On Lifea character was even dubbed by an AI.

We asked 8 tech questions to ChatGPT: a stunning AI, but not without limits

ChatGPT has been making headlines on every tech news site since its launch, and rightly so. This automated chat can be stunning in some of its responses. We put it to the test with several questions…
Read more

Logo

To follow us, we invite you to download our Android and iOS application. You can read our articles, files, and watch our latest YouTube videos.



Source link -102