As a person involved in copyright on a daily basis, I’ve observed a number of events and requests for comment over the last few years on the issue of whether artificial intelligence (AI) systems can be “authors” in the copyright sense (or inventors of patents). I can see the appeal of this question, as it is fundamentally interesting and futuristic. I have often felt, however, that these issues were a bit of a misdirection, with at least part of the tech community treating the copyright community like dogs distracted by squirrels. After all, while we are pondering the weighty issue of future ownership, we are not focusing on the fundamental issue of wholesale copying of works to train AI in a wide variety of situations.
This, of course, could be an accident based on true intellectual curiosity, but I do not believe it. Regardless, as of this writing there are now five cases that may provide some clarity on this less frequently discussed but foundational issue of the unauthorized use of copyrighted materials as training data for AI (I use “AI” here as a shorthand which also includes text and data mining and machine learning). Each of these cases is unique, fact dependent, and likely, if fully litigated on the merits, to shed light on different aspects of copyright law. Below are my thoughts on what is interesting about these cases. Please note that this is in no ways meant to be a comprehensive analysis of the lawsuits.
Case 1- Doe 1 v. GitHub Inc., N.D. Cal., No. 3:22-cv-06823– Whither transformative?
As I mentioned in my January 5, 2023 post on this case, plaintiff’s attorneys filed this class action under the theory that using openly licensed code without retaining the license and credit language is a violation of the Digital Millennium Copyright Act (DMCA), among other things, but did not allege the obvious claim of copyright infringement per se. I speculated that this was an attempt to avoid a messy fair use dispute.
As I also mentioned, Microsoft’s lawyers seem to think that fair use excuses copying for AI purposes everywhere, so I would expect Microsoft to try that defense here, given its lack of other arguments. One core concept in AI-relevant cases that both find for, and against, fair use (Google Books and Fox v. TVEyes respectively) is the reliance by Defendants on claims of “transformative use.” “Transformative use” is not mentioned in Section 107 of the Copyright Act but has been read into the first of four fair use factors. It is somehow different from the right to make transformative derivative works (where the word “transformed” is used in Section 101) such as film adaptations of books, which clearly require copyright owner consent. If you are confused by the difference between transformation that excuses infringement and transformation that is the exclusive right of the creator, welcome to my world.
As a lawyer by training, I am interested in the art of lawyering. If Microsoft defends on the basis of fair use, assuming it is even relevant to the DMCA claim, it will of course want to assert that the use is “transformative.” Unlike scanning books to perform semantic analysis on the evolution of language (found transformative in Google Books), the functional code alleged to be copied by Microsoft, et al., is being used as code. I look forward to the creativity that will be on display.
Case 2- Anderson, et al. v Stability A.I. Ltd, et al.- Real market harm
Filed January 13, this class action pits illustrators against generative AI companies who, according to the Complaint, used images without permission as training data and allowed people to create works in their “style” without compensation. The Complaint is well drafted, and while I wouldn’t have filed it in the Northern District of California, given that court’s reputation as being unsympathetic to copyright holders, it is worth reading for its detailed but clear explanation of the technology, seemingly sincere accusations of “betrayal” by Defendant DeviantArt, and the addition of a Lanham Act claim.
It is clear from the Complaint that the Plaintiffs are expecting a fair use defense based almost entirely on the issue of transformativeness, and as a result break down the technology in a manner which shows why output from Defendants, in their telling, merely creates “unauthorized derivative works.” Given that the infringements alleged are commercial (Factor 1), the infringed works are highly creative (Factor 2), the works were copied in their entirety (Factor 3), and the infringing output seems to compete in the market with the originals (Factor 4), Defendants need to somehow hit transformative out of the park. Even then, they may still lose as happened to the Defendants in the TVEyes case, where a finding of transformative use did not overcome liability.
I will mention, however, that Cases 1 and 2 are both class action lawsuits, and class actions are strange beasts with complicated rules which often yield unusual results. This raises the possibility that the courts might not get to the substantive copyright issues at play.
Case 3- Thomson Reuters Enterprise Center GMBH and West Publishing Corp. V Ross Intelligence, Inc. – (Some) answers coming soon
This case, which involves the alleged surreptitious copying of the entire Westlaw database (after having been denied a license) in order to create an allegedly competing product, is already significant in that the Complaint survived a motion to dismiss. In other words, alleging infringement by making copies for training purposes, even where the competing product does not itself display the copyrighted content, states an actionable claim.
As the case approaches its three-year birthday, we are now in the summary judgement phase. The Defendant argues (1) that breach of contract (essentially downloading in violation of the terms and conditions) is preempted by copyright law, and (2) that the copying was fair use. I don’t see the court buying the preemption argument so we may get an on-point fair use ruling. Summary judgement can only be granted if there is no dispute as to material facts and therefore no fact finder could legally rule against the moving party. In other words, summary judgement is only granted if the law is settled and the parties aren’t arguing over the facts at issue but, instead, dispute how the law should apply to the facts both parties agree are true. With the bad faith alleged of the Defendant, the wholesale copying and competition, summary judgement seems unlikely. Regardless of who prevails on summary judgement, a court decision on the issue will have a ripple effect on the other US cases discussed here.
Cases 4 and 5- Getty Images v Stability AI – Clean facts in two jurisdictions
Getty Images has filed two parallel cases as of this writing; one in the US and one in the UK. I know little about the UK case other than what is in this press release. That does not, however, diminish my excitement. While US law on training data and AI may be complex (e.g., trying to square Google Books with TVEyes; trying to square the definition of transformative under Section 107 with transformative under 101), UK law is clear. The UK was in the vanguard of creating a non-commercial research exception for TDM, and, as I wrote in the Scholarly Kitchen last July, the UK Intellectual Property Office recently mooted an expansion to commercial use. This proposed expansion of the exception was recently rejected by the UK government. In other words, in the UK, there is a copyright exception for non-commercial research, everything else requires in a license, and there is little if any ambiguity.
While UK law arguably offers more certainty, the US offers statutory damages. In the US case, Getty alleges millions of works were copied by Stability AI, and it specifically cites 7,216 works for which it has copyright registrations. Minimum statutory damages are $750 per work infringed; maximum damages are $150,000 per work if found to be willful. Thus, as long as infringement is found, minimum damages are $5,412,000 and maximum are $1,082,400,000, plus possibly an award of attorneys’ fees.
Almost as interesting is the trademark/Lanham Act claim. In the Complaint, Getty is including images which show AI-generated distortions of Getty’s trademarks and watermarks on images created by the Defendant’s system, presumably trained using Getty works without consent from Getty. This will be hard to defend.
If Case 2 were brought in a jurisdiction that recognized more traditional moral rights, that would provide another basis for a claim. Will a lawsuit in an EU jurisdiction be next?
What does this mean for the future of AI?
These cases are not about the future of AI itself, and even if all of the Defendants are found liable, AI innovation will not cease. While training AI usually involves large data sets, significant AI innovation occurs today by virtue of tech companies (and others) using large datasets licensed by entities such as Getty, STM publishers, and news outlets, among others.
These cases are not against AI. Rather, they will determine whether those who create works have a voice in the use of those works by commercial entities, some of whom compete with the original creators. As such, innovation through AI is not at risk, but these cases may have a long-term impact upon the rules governing reuse of copyrighted, valuable and reliable inputs and the incentives of ongoing creation.
Discussion
4 Thoughts on "Some Thoughts on Five Pending AI Litigations — Avoiding Squirrels and Other AI Distractions"
A bit of sleight of hand here on one point: “the infringing output seems to compete in the market with the originals (Factor 4)” I notice that you didn’t say “the copy seems” because you have to admit the copy itself is not competing at all. Most people, especially in academe where plagiarism and copyright are often confused, don’t really understand what it means that you cannot copyright an idea, just a specific expression of that idea. Creating works in the “style” of someone else that are not actual copies of the work done by that person is much more akin to using an uncopyrightable idea than it is doing any actual infringing copying. I think the courts have been pretty clear that ingesting copyrighted works without permission into some kind of database/software for the purpose of outputting something that is not even close to a direct copy of what went in is indeed Fair Use. The coders have to make darn sure that their algorithms prevent the program from inadvertently outputting something that is close to identical to an original, which I think I heard happened with one of the art cases, but the “style” argument should go nowhere under copyright law (perhaps under trademark law).
Melissa, you write: “I think the courts have been pretty clear that ingesting copyrighted works without permission into some kind of database/software for the purpose of outputting something that is not even close to a direct copy of what went in is indeed Fair Use.” That appears to be a kind of ChatGPT hallucination. What courts have held that to be a fair use? Are you thinking of Google Books? If so, I think you are mistaken. In Google Books, the reproduction was done for the purpose of facilitating research. Mind you—I’m no fan of the Google Books decision and think that it was poorly judged, but it’s certainly not determinative here in a case where images are reproduced at scale and “transformed” into new images. The output may be transformative, but that doesn’t excuse the mass reproduction that is likely to have a transformative economic effect on the interests of creators whose works were used without their consent to generate downstream production.
Perhaps you will find this interesting:
“Coming back to the premise of this piece, there are many complicated cases in which competing equities place the doctrine of fair use front & center, and in which parties may reasonably disagree (or agree) about where & how to draw appropriate boundaries in pursuit of achieving the Constitutional purpose of copyright. To my mind, the unauthorized use of copyright works to train AI is not such a case. And by this, I don’t mean that there are no legal arguments that may be made to justify such use. I think there are, although I think they are weak. But I do think that there are no moral arguments to be made to justify this unprecedented appropriation. That new transformative works may be made from a database developed on the back of conscripted labor may have some resonance in law. However, it fundamentally fails the humanity test.”
https://medium.com/@nturkewitz_56674/the-fair-use-tango-a-dangerous-dance-with-re-generative-ai-models-f045b4d4196e
The Google Books case, yes but there is also a case mentioned in the post, Fox News v. TVEYES, Inc., 43 F. Supp. 3d 379 (S.D. N.Y. 2014). While the overall finding was against Fair Use, the discussion in it about transformative aspects (and what TVEyes failed to do in that regard) suggests to me that what the AIs are doing would pass that hurdle as long as they don’t accidentally reproduce the original work. That is just a 2nd district decision, though, not SCOTUS. I’m not a lawyer though, so we’ll have to let the lawyers decide.
Thanks for the reply. TV Eyes is, in my view (pun intended) very unlikely to be cited by lawyers arguing that mass ingestion of copyright works for the purpose of training AI falls under fair use. More likely to be cited by plaintiffs. In any event, very far removed from the proposition that courts have held that training AI with copyright works in the absence of consent is a fair use.