Meta admits to having trained his AI with… pirated books!


Camille Coirault

January 15, 2024 at 5:25 p.m.

7

Robot/AI carrying books © © Vasilyev Alexandr / Shutterstock

Llama 2, an outlaw bookworm? © Vasilyev Alexandr / Shutterstock

Meta finds himself at the heart of a controversy. Indeed, the company admits that it trained its AI models using a set of pirated books. Enough to rekindle the flames of the debate concerning copyright and artificial intelligence.

Tech giants, when faced with the issue of copyright when it comes to AI, often play a dangerous game. Meta is no exception and is currently responding to legal action initiated by several authors: yes, the company would have used part of the Books3 database (which includes many pirated books) in order to train its Llama models. A rather scandalous revelation, when we know the effort of vigilance shown by copyright holders to ensure that they are respected.

Books3: a controversial tool at the service of AI

Books3 is a database created in 2020 by Shawn Presser, an AI researcher. This brought together nearly 37 GB of pirated books (around 200,000 works) from the Bibliotik site and was hosted by the collective called The Eye. The idea was to foster innovation in the field of AI.

Meta and others, like OpenAI, have therefore happily drawn from this database to refine their generative AI models. A use bordering on legality, which necessarily attracted the attention of publishers and authors.

Letter Meta © © Screenshot / Meta

Meta’s confession to a California federal court. The company used extracts from the Books3 database to train its Llama AI model © Screenshot / Meta

Reaction from rights holders and legal implications

A fairly diverse set of rights holders have therefore fought back against Meta, OpenAI and other companies developing AI models outside the legal framework. Among these, we can find: individual authors, record labels, visual artists and even the New York Times.

The majority of these lawsuits include a component related to piracy and accuse these companies of using protected content without offering adequate compensation. Under pressure from a Danish anti-piracy collective, Rights Alliance, The Eye removed Books3 in the summer of 2023.

Meta’s defense

In a lawsuit filed by Sarah Silverman (comedian, singer and writer), Richard Kadrey (writer) and other rights holders, Meta confessed. She would have used parts of Books3 to strengthen the game of her two AI models, Llama 1 and Llama 2. However, she denied other allegations made against her. In its defense, the company invoked the fair use (fair use), an element of legal defense which could well tip the scales in their favor.

As contradictory as it may seem, the legal doctrine of fair use allows the use of copyrighted material without permission from the copyright holders. However, certain specific circumstances are necessary for this doctrine to be applied. Meta maintains the following line of defense: it acknowledges having used Books3, but entirely disputes the need to obtain consent or offer compensation for having used these copyrighted works. A position that is frankly questionable given the rather significant increase in their turnover over the year 2023 made possible thanks to the enthusiasm around artificial intelligence.

This legal case bringing AI and copyright into tension is far from the last. In any case, as long as firm regulations do not define new ethical and legal standards governing the artificial intelligence industry. This set of lawsuits, which could reach the Supreme Court, could have positive repercussions on this specific aspect. Well, that’s if we consider this problem from an optimistic point of view.

Source : Torrent Freak



Source link -99