Generative AI Challenges Copyright, Bends It to the Breaking Point

The purpose of copyright law, as stated in the Constitution, is to promote the progress of science and useful arts. Copyright protection promotes a competitive market for the work, and a diverse and rich culture for the public. As technology advances, the definition of copyright has been constantly debated and revised. With the emergence of generative artificial intelligence (AI), the question of copyright has been raised again, this time with regards to AI's use of copyrighted material for training and if the public benefits from this use.

Generative AI refers to machines that can produce novel, original content. This can be anything from images, art, and music, to text, poetry, and books. Generative AI usually uses large language models that have been trained on massive amounts of text data to mimic and create content. Several lawsuits have recently been brought against AI companies, claiming that they have used copyrighted books to train their AI models, thus violating copyright law.

The lawsuits claim that this use constitutes as "systematic theft on a mass scale", with hundreds of millions of dollars at stake. AI companies, in response, argue that their models learn from the texts and produce original, transformative work, and therefore they are not in violation of copyright. Meta argued in a court filing that using texts to train their AI model "is transformative by nature and quintessential fair use". The fundamental question that these lawsuits bring up is whether the use of copyrighted material by AI companies provides a net public benefit, which is the goal of copyright law.

On one side, tech companies argue that an AI product will make knowledge more accessible. On the other side, plaintiffs argue that AI will reduce the incentives for sharing knowledge in the first place, pointing to the way AI models are often presented without citation and authorship, harming the health of research communities and harming incentives for cultivating and sharing expertise. They also claim that AI training is not fair use because it transforms the authors' original works, referring to an argument that the judge ruled in favor of Google in the Authors Guild v. Google case. In this case, the judge ruled that Google's scanning of millions of books was fair use because the use of the books was primarily as a research tool, with strict limits on how much copyrighted text was revealed, and had a purpose different from the purpose of the books used to build it.

The argument that AI training is fair use because it transforms the original works does not seem to apply to generative AI products such as ChatGPT, which often serve a similar purpose to the books and artworks they are trained on, and in some cases can replace the purchase of a book or the commission of an illustration. In a motion to dismiss The New York Times' lawsuit against OpenAI, the company insisted that ChatGPT is a tool for "efficiency" and "not in any way a substitute" for a subscription to the paper. However, this argument may not hold up in court, as in some cases, AI models have been shown to reproduce their training text, which could be considered as plagiarism. Although these cases have yet to be won in court, they bring up a fascinating and important debate about the role of copyright law in the age of AI, and whether it is still fit for purpose.