These authors are suing OpenAI and Meta for copyright infringement


A speech by Sarah Silverman, May 05, 2022 in New York.

US authors Sarah Silverman, Richard Kadrey and Christopher Golden announce they are suing Meta and OpenAI in a double copyright infringement claim.

They claim that they never consented to their copyrighted books being used as training material for the Large Used Language Models (LLMs) behind OpenAI’s ChatGPT and Meta’s LLaMa.

An LLM is a type of artificial intelligence algorithm trained using massive amounts of information from books and texts on the internet to learn language patterns, grammar and context until can generate human-like texts and have chat interactions with users.

Models trained on pirate sites

According to the complaints filed, the models “remix the copyrighted works of thousands of book authors — and many more — without consent, compensation, or credit.”

Copyright infringement has been one of the many concerns of AI critics since ChatGPT became widely available in November, sparking the generative AI boom and questions about how AI will affect the creativity and copyright process.

The lawsuits claim that the LLMs were trained on illegally acquired materials, such as those found on so-called “shadow libreary” websites. The complaint document against OpenAI states:

“OpenAI’s Books2 dataset can be estimated to contain about 294,000 titles. The only “internet-based book corpuses” that have ever offered so much material are notorious “ghost library” websites ( ndlr. shadow library) such as Library Genesis (aka LibGen), Z-Library (aka B-ok), Sci-Hub and Bibliotik. The books collected by these sites are also available in bulk via torrent systems”.

Complaint documents against Meta make similar statements. They refer to the sources where the workout data for the books was collected. She divides them into two: The first is from Project Gutenberg, which is an online archive of copyrighted books in the public domain; and the second is from the “Books3 section of ThePile”, which is a dataset available on the popular AI project hosting site, Hugging Face, and which appears to represent the aforementioned Bibliotik set.

The plaintiffs are represented by the same attorneys who also represent authors Mona Awad and Paul Tremblay, who filed a lawsuit in June against OpenAI for copyright infringement.


Source: “ZDNet.com”



Source link -97