In this post I attempt to reveal inherent conflicts in our drive to be as open as possible, authors’ need to understand their rights, and a library’s mandate to provide their patrons with the enhanced discovery that comes with AI’s large language models (LLMs).
In my view, authors should be enabled to make decisions about how their content is used. In our current world of Creative Commons licensing, content can be reused, but not without attribution. What does this mean for publishers and libraries under immense pressure to provide LLM services, but perhaps not fully grasping the ramifications of their actions? Are there ethical considerations at play that may help guide our approach?
Many of us are not shy expressing our opinions on AI. Some of these opinions are borne out of fear – which is often a response to feeling a lack of control. AI seems to be a pressing inevitability, and we fear that we will be subsumed – our content, our rights, our jobs – all that we know. Of course, fear in response to the unknown is natural, and it is so much harder to grapple with potential positive outcomes from AI if we are collectively in a state of shock.
I work at the American Mathematical Society, a 30,000 strong membership society representing mathematicians — from student to researcher, from educator to professional. Mathematicians care deeply about their endeavor, and indeed are fueling much of the evolution of AI technology itself. They also are deeply invested in rigor – the rigor of a mathematical proof, the elegance of expression, and of course having their hard work attributed to them.
This is not just true of mathematicians of course. Janet Salmons’ recent guest post here in The Scholarly Kitchen (Supply Chain of Writing Fools), eloquently made the point from the perspective of a writer and qualitative scholar. “…I found out I’m just a kink in generative AI’s content supply chain.” Her books have been “swallowed up” by LLMs without her consent. As Salmons puts it, “For scholarly writers, the issues go beyond compensation. For us, the irreparable loss is to the integrity of our work.”
So where lie the priorities? It really depends on where you sit. Salmons describes well the point of view of a researcher, and in her case, one who understands the nature and value of a Creative Commons license. This is not true for many authors, who happily embrace a CC BY license without understanding the deeper nature of copyright and licensing. As an academic institution, you care about your faculty, and you care that authors retain their copyright. Many publishers no longer require the transfer of copyright. Even when employing a Creative Commons license, the author still retains copyright over the work (although depending on the license, some rights are effectively softened). An institution also cares that they provide services to their faculty through their library, and this is where things get a little sticky. Many libraries are embracing AI as an extension of text and data mining services with a desire to use AI internally to enable discovery. Beyond that, they want to provide faculty at their institution with ever richer resources to enhance teaching and research.
When a publisher such as the AMS talks to a scholarly library about purchasing content (and here perhaps I will focus mainly on eBook content), we all agree that it is essential that we move to accessible formats – which is a complex endeavor in mathematics, but one which is rapidly evolving. We also collectively understand that we are selling our content for use by a library’s patrons in teaching and research. On top of this there is an implicit agreement that authors of this content are recognized as contributing to the richness of a library’s scholarship, and indeed they are rewarded for this with academic recognition and, of course royalties.
At the same time though, many libraries are talking with publishers about allowing AI licensing. And this is where here there appears to be some real dissonance among the stakeholders. To some extent this has ballooned into suspicion of publishers’ motivations in holding back on such AI licensing clauses in content deals. But there are reasons for this: for example, here at the AMS, authors routinely retain copyright on their books, granting the AMS an exclusive license to publish their book. It is therefore hard for us as a publisher to then turn to a library customer and grant an AI license in which attribution is absent, or partially achieved by a non-legally binding method – although there have been efforts within the Creative Commons license to address this issue through Preference Signaling. It will of course be interesting to see how the many fair use/copyright litigations in play will be resolved.
Authors need to decide the fate of their work. As publishers, it is our collective responsibility to inform and educate authors on copyright, and the implications of licensing arrangements so they may be as informed as possible.
Thus the dissonance lies in the real conflict between the ideals of openness that many authors, publishers, and libraries embrace, and the benefits of providing AI services to users, libraries, and publishers alike — not to mention the outsized potential of material gain from corporates in the AI sector who appear loathe to share those rewards with authors.
A perhaps tangential, but in my view relevant, concern for all stakeholders is the question of what company authors keep when their material ends up in an LLM. Do authors really want their ideas to appear alongside pirated content from outfits such as the Books3 Database, which most LLMs use? (A 2023 article on the Books3 Database in Wired magazine entitled The Battle over Books3 Could Change AI forever is worth looking at here.) Do librarians and authors really think such content should be mixed in with their legitimate content? And how does this interaction affect the integrity of the scholarly record, not to mention author, library and publisher integrity?
There is no doubt in the potential value in library AI services for faculty and students. There is no doubt that copyright and licensing matters. There is no doubt that an author must publish with attribution to their work, and their content published and sold with the understanding of who the rewards accrue to, the integrity of content, and the importance of acknowledging author contributions.
How do we solve for this dissonance?
Discussion
5 Thoughts on "A Dissonance of Ideals: Openness, Copyright, and AI"
Ah, the delicious irony! You stormed the gates, demanded Open Access revolution, and cheered as authors’ rights were turned into Creative Commons confetti. “Free the knowledge!” you cried, as publishers dutifully obliged—CC-BY shackles firmly in place.
But now, oh dear, here comes Big Tech, vacuuming up that very same “liberated” content to train their shiny new AIs, while you gasp, clutching your pearls at publishers.
Authors sold out? Libraries complicit? Scholarly integrity under siege? Don’t blame the publishers, darlings—we’re simply delivering the remixing you so fervently demanded. Next time, perhaps read the fine print before lighting the bonfire of the rights vanities.
Thank you, Robert, for your post. I want to put a little framing around “library AI services.” At my library (and probably most), we do not have great answers for the few users (conscientious enough) who ask us about using licensed e-resources with public or private versions of AI. We aim to secure the rights to use scholarly research with AI tools, similar to TDM, so that we can provide options and clear guidance. We are all in the same boat, trying to find legitimate ways for AI to work with scholarly content for discovery and research.
Thank you, Daniel for accentuating the academic library perspective on AI use of copyrighted materials (licensed by the institution). We should explicitly demarcate the academic pursuit and use of AI as opposed to commercial use, balancing legitimate non-commercial research with copyright protections. The former allows for more transparency and attribution, much in line with European Union legislation in this area.
We have been developing a balanced policy, founded upon recommendations from respected leaders in copyright and scholarly communication, such as at Berkeley, to manage the needs and requirements of researchers while respecting publisher/vendor contracts. At this point, we are simply providing guidance and recommendations, which are also shaped by broader discussions of AI policies at the institutional level. Of course, we cannot guarantee all ethical use by our local constituencies, but we should make more than good faith efforts for the education and development of guardrails particularly as legislation and/or jurisprudence evolve.
From all I’m seeing, I think It’s likely that MOST educational content will be either generated by AI or processed/transformed by AI in some way in the future.
The metaphor of the internet is no longer a library. The internet is a STUDENT.
If you believe what you produce is important for the world, then you should be fighting to influence the future by having your output included in future AI training. This is how we fight bias in current AI.
Open means open.
(This is not just about the few bigger companies that get attention but for all the thousands and more LLMs that are out there).
Thank you for this contribution. Two things stick out to me:
First, if the forces behind LLMs are able to convice courts that harvesting copyrighted texts to feed these models fits under the label of “fair use,” then the whole conversation about licensing would be a moot point, at least within the US. Since fair use is a US legal doctrine and not applicable within Europe, for example, I am unsure what the implications would be for these same models within Europe. I can imagine a world where LLMs are only accessible on particular continents based on the materials they use to “feed” there model. But I am not a lawyer, so this is based on my limited understanding and experience as someone who was trained in the US, but currently works in Germany.
Second, I find it quite ironic that open access advocates who once rallied for “authors’ rights” and “author choice” are now running campaigns such as “Open Access means CC BY” (see here Projekt DEAL’s recent campaign: https://deal-konsortium.de/en/why-ccby). Of course there are reasons to alert authors to the fact that if they select CC BY NC, but sign over exclusive copyright to publishers, will not actually retain the rights they think they are retaining. However, this narrow interpretation of “open access” concerns me. Advocates for open access often point to the three open access documents from the early aughts (Bethesda, Berlin, Budapest) to justify this definition of open access, but I become nervous when we treat these documents as almost biblical-like documents given to the open access community on Mount Sinai.