OpenAI itself raises the hare: impossible to make Gen AI without making fun of copyright


It would be “impossible to train the best AI models without using copyrighted material,” OpenAI admits in a paper presented as part of a UK inquiry into LLMs. A case closely followed by The Guardian.

The company justifies this theft by the fact that copyright today covers “virtually all forms of human expression, including blog articles, photographs, forum posts, snippets of software code and government documents.

She adds that “using training data from public domain books and drawings created over a century ago could make for an interesting experiment, but would not result in AI systems meeting today’s needs.

“The lawsuit filed by the New York Times is without merit”

In a new blog post titled “OpenAI and journalism,” the company claims to support journalism, work “in partnership with news organizations” and above all, “that the lawsuit filed by the New York Times is without merit” .

Because the extremely famous The New York Times filed a complaint against OpenAI on December 27. He accuses, with supporting evidence, that Gen AI models are trained with his texts, without taking into account copyright. The complaint contains examples where ChatGPT provided users with “near-verbatim excerpts” of paid articles.

“We explained to the New York Times that, like any single source, its content did not contribute significantly to the training of our existing models” defends OpenAI, which insists on the fact that negotiations took place before the complaint between the two organizations.

Regurgitation

Above all, OpenAI ensures that the New York Times texts “regurgitated” by ChatGPT “appear to come from articles several years old which are present on many websites”.

OpenAI finally accuses the NYT of intentionally manipulating the prompts, in particular by including long extracts from articles, in order to encourage ChatGPT to regurgitate the incriminated content.

OpenAI also claims to allow publishers to prevent its indexing bots from accessing their websites.

Legal actions are increasing against OpenAI on the question of respect for intellectual property. Recently, a couple of authors filed a complaint on this point.



Source link -97