GPT-4: OpenAI would have used more than a million hours of video on YouTube to train its AI


Samir Rahmoune

April 8, 2024 at 3:13 p.m.

2

The OpenAI logo displayed next to the face of the firm's boss, Sam Altman © Meir Chaimowitz / Shutterstock

The OpenAI logo displayed next to the face of the firm’s boss, Sam Altman © Meir Chaimowitz / Shutterstock

OpenAI is always looking for new data to train its language models. And it seems that Sam Altman’s firm has, with this idea, turned to YouTube, where it would have been widely used!

If artificial intelligence systems like ChatGPT seem so exceptional to us, it is because for several years they have ingested enormous quantities of data, thanks to which they are today machines capable of generating an exceptional number of contents, often of quality. But the problem is that the amount of data available and able to be used is finite. Companies in the sector must therefore be creative in order to find new ones elsewhere. This seems to be what OpenAI did by turning to YouTube!

OpenAI turned to YouTube

THE New York Times has been in open conflict with OpenAI for many months. So if the famous American newspaper can find potentially embarrassing information about the firm headed by Sam Altman, it will not hesitate to publish it. And that’s what he did, revealing in recent days that OpenAI would have recovered nearly 1 million hours of YouTube videos in order to develop its GPT-4 language model.

To do this, the Californian company would have used its Whisper tool, which notably allows audio and video to be transcribed into text, to recover the content in written format, which can then be ingested by GPT-4. It must be said that according to the other major American newspaper, the Wall Street Journalthe giants working on AI are currently short of quality data to improve their systems.

YouTube has largely contributed

YouTube has largely contributed

For Google, companies cannot train on data from YouTube

THE New York Times he believes that OpenAI had reached the end of quality data available for its AI from 2021. At that time, discussions would have already emerged on the possibility of turning to alternative resources such as videos, audiobooks or podcasts. Which ultimately would have been done, by opening the door to YouTube.

Contacted by The VergeGoogle, the parent company of YouTube, explained that it had heard of “ unconfirmed reports » indicating OpenAI activity on its platform. Spokesman Matt Bryant also made a point of reminding us that “ our robots.txt files and terms of service prohibit scraping or unauthorized downloading of content from YouTube. » A new legal front soon about to open for OpenAI?

Source : Engadget

Samir Rahmoune

Tech journalist, specializing in the impact of high technologies on international relations. I am passionate about all the new developments in the field (Blockchain, AI, quantum...), the...

Read other articles

Tech journalist, specializing in the impact of high technologies on international relations. I am passionate about all the new developments in the field (Blockchain, AI, quantum...), energy issues, and astronomy. Often one foot in Asia, and always ready to put on the gloves.

Read other articles



Source link -99