YouTube warns OpenAI not to steal its videos to train Sora


YouTube boss Neal Mohan does not know if OpenAI uses videos hosted on the platform to train Sora, its AI which generates clips. But if it does, it would be a violation of its rules.

It’s a little phrase which, undoubtedly, will not fail to be brought out and commented on in the future, especially when it comes time to talk again about the protection of intellectual property and the training of artificial intelligence. Because it’s a statement that risks coming back to Google’s face during the next controversy on this subject.

It all started with an exchange that occurred on April 4, 2024 between Bloomberg and Neal Mohan, the current boss of YouTube. During the discussion, the person concerned was asked to comment on a particular scenario: what would be his position if it turns out that a third-party company is using the videos on YouTube to train an AI system?

This would be a clear violation of the platform’s rules. When a creator uploads their work to the platform, they have certain expectations, he said. One of these expectations is that the terms of service are respected. They don’t allow you to download things like transcripts or video clips.

This remark does not come out of nowhere. It is an extension of an interview given by Mira Murati, the technical director of OpenAI, to the Wall Street Journal in mid-March. At that time, the American company – well known for its ChatGPT chatbot – was at the heart of the news with Sora, its video generative AI project.

The interview then inevitably touched on questions relating to the formation of the model allowing Sora to function. However, Mira Murati’s answers on this subject turned out to be vague and uncertain. Officially, the technical director of OpenAI was not sure of the sources used to train Sora.

YouTube says it doesn’t know if OpenAI uses “its” videos to train Sora

Neal Mohan said he had no specific indication on whether or not YouTube would be used in OpenAI’s strategy for Sora. From a strictly technical point of view, and notwithstanding YouTube’s rules of use, this is in any case not absurd: YouTube is one of the biggest video platforms on the internet, if not the biggest.

Neal Mohan’s intervention, however, generated mocking comments on the web, with some trying to highlight the discrepancy between Neal Mohan’s position and the behavior of Google, the parent company of YouTube, when it comes to also train the AI. It is what this tweet reflectspublished on April 4:

Google to publishers – we may use your content to train our search engines and AI; Google to OpenAI – you can’t use YouTube to train your AI. »

Sora, the AI ​​that transforms text into video // Source: OpenAI
An example of a video generated by Sora, converting text into a clip. // Source: OpenAI

Access to protected content, a challenge for generative AI

This charge against Google must be placed in a context where the Mountain View firm has also found itself accused of exploiting content under intellectual property to train its artificial intelligence tools, such as Gemini. This criticism was notably included in a sanction decided in France by the Competition Authority.

Google, however, is working to sign agreements with publishers to have a legal framework within which to use this data. Beyond that, the group has also approached other major sources of information, notably Reddit (where we find numerous publications by Internet users) and Stack Overflow (for computer code).

Ultimately, there remains one question: if OpenAI theoretically does not have the right to use YouTube to train Sora, does Google apply the same rule with Gemini? Neal Mohan explained that yes, while specifying that this is done in accordance with YouTube’s rules or via agreements signed with certain creators, individually.


If you liked this article, you will like the following: don’t miss them by subscribing to Numerama on Google News.





Source link -100