Comedian Sarah Silverman was the first to sue the makers of generative AI tools like ChatGPT for essentially stealing their book but the problem is a lot more extensive.
Authors like Stephen King and Margaret Atwood also found their work in ChatGPT’s training set.
Thousands of the world’s most popular authors found their work fed into the training models of generative AI tools like ChatGPT from OpenAi and LLaMA from Meta, Facebook’s mother company.
Among the affected authors are popular contemporary writers like Stephen King, Margaret Atwood, and Jon Krakauer, with more than 170,000 pirated books being fed into algorithms.
“Books3 was used to train Meta’s LLaMA, one of a number of large language models – the best-known of which is OpenAI’s ChatGPT – that can generate content based on patterns identified in sample texts. The dataset was also used to train Bloomberg’s BloombergGPT, EleutherAI’s GPT-J and it is “likely” it has been used in other AI models.
The titles contained in Books3 are roughly one-third fiction and two-thirds nonfiction, and the majority were published within the last two decades. Along with Smith, King, Cusk and Ferrante’s writing, copyrighted works in the dataset include 33 books by Margaret Atwood, at least nine by Haruki Murakami, nine by bell hooks, seven by Jonathan Franzen, five by Jennifer Egan and five by David Grann.
Books by George Saunders, Junot Díaz, Michael Pollan, Rebecca Solnit and Jon Krakauer also feature, as well as 102 pulp novels by Scientology founder L Ron Hubbard and 90 books by pastor John MacArthur.
The titles span large and small publishers including more than 30,000 published by Penguin Random House, 14,000 by HarperCollins, 7,000 by Macmillan, 1,800 by Oxford University Press and 600 by Verso.”
Right now, no publisher has announced plans to sue a generative AI company based on this news but we will update accordingly if that situation changes. As with Silverman’s lawsuit, OpenAI has not commented yet.