The Scholarly Kitchen

What’s Hot and Cooking In Scholarly Publishing

  • About
  • Archives
  • Collections
    Scholarly Publishing 101 -- The Basics
    Collections
    • Scholarly Publishing 101 -- The Basics
    • Academia
    • Business Models
    • Discovery and Access
    • Diversity, Equity, Inclusion, and Accessibility
    • Economics
    • Libraries
    • Marketing
    • Mental Health Awareness
    • Metrics and Analytics
    • Open Access
    • Organizational Management
    • Peer Review
    • Strategic Planning
    • Technology and Disruption
  • Translations
    topographic world map
    Translations
    • All Translations
    • Chinese
    • German
    • Japanese
    • Korean
    • Spanish
  • Chefs
  • Podcast
  • Follow

Protecting Commercial AI Rights is Harder than You Think — EU Edition

  • By Roy Kaufman
  • Feb 1, 2024
  • 1 Comment
  • Artificial Intelligence
  • Controversial Topics
  • Copyright
  • Policy
  • Research
  • Technology
  • Tools
Share
Share
0 Shares

In the quaint days of 2019, when the EU issued its Digital Single Market Copyright Directive (DSM), much attention was focused on issues such as a news publishers’ rights and the obligations of platforms to take down infringing materials. It seemed that outside of STM publishing, not many people engaged in discussions around the scope of the text and data mining (TDM) exceptions contained in Articles 3 (non-commercial research) and 4 (commercial research).

Generative AI changed this dynamic. After all, text and data mining is the technological approach by which generative AI systems are trained. As noted in the current draft of the EU’s AI Act, “[t]ext and data mining techniques may be used extensively in this [training] context for the retrieval and analysis of such content, which may be protected by copyright and related rights.” The current draft of the AI Act explicitly requires compliance with the DSM to access the EU market, regardless of the country in which the copyright-relevant acts of training occur.

There are, however, many open questions about the DSM, and especially the rights reservation language in Article 4 for commercial TDM which are likely to confound rights holders and AI companies alike.

Judge's gavel next to the letters A and I

DSM Articles 3 and 4 Revisited

Article 3 of the DSM, which is similar in scope to the exception that was (and is) then in place in then-EU member the United Kingdom, allows non-commercial TDM on lawfully acquired content by research organizations. As research organizations are typically publishers’ customers or using content available under open access licenses, STM publishers were generally supportive of this exception.

Article 4, which created a non-commercial exception subject to rights reservation by the copyright owner, seemed more problematic given that copyright is an “opt in” regime. However, at the time — and based on conversations I had with EU officials — the law seemed to impute a distinction between professional content, placed on the websites owned and controlled by publishers, and non-professional content such as Reddit comments and Facebook posts. My understanding is that the EU saw little harm in expecting that the former could reserve its rights when desired, while the latter was unlikely to care.

Recent lawsuits have increased my concern about this issue, especially now that text and data mining is being used as part of large-scale commercial AI.

Challenges of Rights Reservation

The rights reservation language of Article 4 provides:

  • The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online. (italics added)

In explanatory text, the DSM states:

  • In the case of content that has been made publicly available online, it should only be considered appropriate to reserve those rights by the use of machine-readable means, including metadata and terms and conditions of a website or a service. Other uses should not be affected by the reservation of rights for the purposes of text and data mining. In other cases, it can be appropriate to reserve the rights by other means, such as contractual agreements or a unilateral declaration. Rightholders should be able to apply measures to ensure that their reservations in this regard are respected. (italics added)

This language leaves many questions unanswered. What does “machine readable” mean in this context? After all, the TDM exception is an exception to allow very smart machines to “read” and process information, so isn’t anything on a website “machine readable?” What level of granularity is required under DSM Article 4? Is a copyright notice sufficient? What about the words “all rights reserved?” Would it be enough to include “CC BY-NC” in metadata fields? Or does it need to state “commercial rights are expressly reserved under Article 4 of the DSM?” The ambiguity is troubling.

Where is the Content?

Even the foregoing unanswered questions assume the content is in the control of the rights owner. There are many situations in which this is not true.

First, there is pirated content. It has been well documented that some AI companies have trained systems on illegal sets of content. Would an EU-based court hold that the failure to have rights reservation language on illegal content means that such rights have been waived? That is highly unlikely, so let’s move to the next category.

Content may be legally posted online over the objections of the copyright owner. For example, in the recent case Am. Soc’y for Testing and Materials v. Public.Resource.Org, Inc., 82 F.4th 1262 (D.C. Cir. 2023), the Court of Appeals for the District of Columbia Circuit ruled that the non-commercial posting of standards incorporated into reference by law is fair use. It is safe to assume that the entity posting the standards over the objection of copyright owners will not take steps to reserve the copyright owner’s commercial AI rights in the EU. Would an EU-based court hold that the failure to reserve rights on a “non-commercial” website where the content is posted over the objections constitutes a waiver? Doubtful, but murky.

Let’s take this further. What about preprint servers? Today, many journal publishers allow authors to post preprints of author manuscripts on servers, notwithstanding the fact that copyright often is subsequently transferred to publishers. Does the preprint server need to expressly reserve TDM rights, or is it enough that they are reserved on the version of record? How would an AI company know it is the same? Similar questions are raised with respect to other aggregation sites such as PubMed Central and institutional repositories.

Will this Change?

Legislative changes, like lawsuits, are often a lagging indicator of the times. In 2019, the legislators in the EU seemed focused on commercial and non-commercial research aspects of TDM. They were not likely worried that well-funded commercial entities were developing AI systems through mass infringement and ignoring Article 4 rights reservation clauses, nor did they seem focused on how copyright compliant AI companies would be able to identify reservations for content on multiple sites.

In an ideal world the EU would revisit Article 4, but that is unlikely to happen. Until such time, rights owners should reserve AI rights as explicitly as possible, as granularly as possible, using machine and human readable language, and should require licensees who republish their content online to do the same. And with the AI Act removing any ambiguity about compliance requirements, AI companies seeking to train on copyrighted content would do best to license content directly from rightsholders. Relying on the absence of rights reservation language is risky, unless the AI developer is absolutely certain that it is using an official version.

Share
Share
0 Shares
Share
Share
0 Shares
Roy Kaufman

Roy Kaufman

Roy Kaufman is Managing Director of both Business Development and Government Relations for the Copyright Clearance Center (CCC). Prior to CCC, Kaufman served as Legal Director, John Wiley and Sons, Inc. He is a member of, among other things, the Bar of the State of New York, the Author’s Guild, and the editorial board of UKSG Insights. Kaufman also advises the US Government on international trade matters through membership in International Trade Advisory Committee (ITAC) 13 – Intellectual Property and the Library of Congress’s Copyright Public Modernization Committee in addition to serving on the Board of the United States Intellectual Property Alliance (USIPA).

View All Posts by Roy Kaufman

Discussion

1 Thought on "Protecting Commercial AI Rights is Harder than You Think — EU Edition"

Thank you for this great article. Very useful. I did remind me of: “For some years, the Internet Archive did not crawl sites with robots.txt, but in April 2017, it announced that it would no longer honour directives in the robots.txt files.[21] “Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes”.[22] This was in response to entire domains being tagged with robots.txt when the content became obsolete.[22]” Source: https://en.wikipedia.org/wiki/Robots.txt

  • By Emanuel Raymond
  • Feb 1, 2024, 7:36 AM

Comments are closed.

Official Blog of:

Society for Scholarly Publishing (SSP)

The Chefs

  • Rick Anderson
  • Todd A Carpenter
  • Angela Cochran
  • Lettie Y. Conrad
  • David Crotty
  • Joseph Esposito
  • Roohi Ghosh
  • Robert Harington
  • Haseeb Irfanullah
  • Lisa Janicke Hinchliffe
  • Phill Jones
  • Roy Kaufman
  • Scholarly Kitchen
  • Alice Meadows
  • Ann Michael
  • Alison Mudditt
  • Jill O'Neill
  • Charlie Rapple
  • Dianndra Roberts
  • Roger C. Schonfeld
  • Avi Staiman
  • Randy Townsend
  • Tim Vines
  • Jasmine Wallace
  • Karin Wulf
  • Hong Zhou

Interested in writing for The Scholarly Kitchen? Learn more.

Most Recent

  • Reflections on Shared Infrastructure and Distinctive Collections
  • Editing in the Age of Misinformation: A Report on the 2025 EASE Conference
  • Guest Post: Will JAG’s New Models Give Libraries and Publishers a Better Seat at the Federal Funding Table?

SSP News

View photos from the 2025 EPIC Awards

Jun 17, 2025

View photos from the 47th Annual Meeting!

Jun 17, 2025

Society for Scholarly Publishing Awards Six Members for Outstanding Contributions

Jun 16, 2025
Follow the Scholarly Kitchen Blog Follow Us

Related Articles:

  • Computer components with AI and brain outline superimposed Ask The Chefs: The US Executive Order on Artificial Intelligence
  • 3D illustration of a robot working at a laptop surrounded by copyright symbols The United States Copyright Office Notice of Inquiry on AI: A Quick Take
  • A red canoe rests on a rocky shore of a calm blue lake Swimming in the AI Data Lake: Why Disclosure and Versions of Record Are More Important than Ever

Next Article:

woman working at a laptop in a home kitchen How We Work, AI, and Human Engagement
Society for Scholarly Publishing (SSP)

The mission of the Society for Scholarly Publishing (SSP) is to advance scholarly publishing and communication, and the professional development of its members through education, collaboration, and networking. SSP established The Scholarly Kitchen blog in February 2008 to keep SSP members and interested parties aware of new developments in publishing.

The Scholarly Kitchen is a moderated and independent blog. Opinions on The Scholarly Kitchen are those of the authors. They are not necessarily those held by the Society for Scholarly Publishing nor by their respective employers.

  • About
  • Archives
  • Chefs
  • Podcast
  • Follow
  • Advertising
  • Privacy Policy
  • Terms of Use
  • Website Credits
ISSN 2690-8085