Seven Questions about CC Signals - The Scholarly Kitchen

As most Scholarly Kitchen readers will know, in 2002 the Creative Commons Organization (CCO) created a suite of licenses that copyright holders can apply to their works in order to make them available for free reuse by the public. Different licenses grant permission for everything from functionally unlimited reuse (CC BY) to a much narrower range of uses (CC-BY-NC-ND or CC-BY-NC-SA), and once applied the licenses are pretty much irrevocable.

CC BY creates a blanket permission for the work to be “translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor” by anyone, in any manner and for any purpose, as long as the creator of the original work is acknowledged. It’s generally considered the only license that makes a work fully open access (OA), since the other licenses impose restrictions that go against key principles outlined in the Budapest Open Access Initiative statement. For this reason, ensuring that scholarly products such as articles, books, and data sets will be made available on a CC BY basis (or the equivalent) has been a broadly shared goal within the OA movement for decades.

The Tyranny of Unintended Consequences

There are three certainties in life: death, taxes, and unintended consequences. One unintended consequence of 25 years of pervasive and international pressure to make scholarly publications and data sets freely available for functionally unlimited reuse is that millions of scholarly products are now being used enthusiastically by (among others) a growing horde of artificial intelligence (AI) agents in ways that make the copyright holders in those products uncomfortable. Those who have applied irrevocable CC BY licenses to their works, however, have no obvious legal recourse.

It’s worth pointing out, though, that although the availability of these scholarly products for exploitation by massive commercial AI companies may have been unintended on the part of the works’ creators, this outcome was not unforeseen. On the contrary, the potential for machine learning to take advantage of open content has been on our collective radar for decades, and was in fact explicitly celebrated by OA advocates from the beginning of the movement. As early as 2006, John Willinsky was arguing for the “automated indexing of the scholarly literature” and for “automated systems (for) citation rankings of article and journal” – both of which would necessarily be AI functions based on large-scale ingestion of text – and saw these as part and parcel of a shift to open access. As recently as five years ago SPARC was happily anticipating the use of “a modern machine learning, natural language processing approach” to analyze how federal data sets are used across “millions” of publications and thereby “demonstrate the value of data as a strategic asset.” SPARC’s website continues to champion the development of “powerful text and data mining tools that can analyze the entire research literature, uncovering trends and connections that no human reader could,” and advocates for open data policies that “(permit) any user to download, copy, analyze, re-process, pass to software, or use (the data) for any other purpose.” The benefits of such openness are both clear and considerable; some of the costs and downsides, however, have also been clear to many for some time, but are now apparently becoming clearer to those who have up until now been unreservedly advocating for it. (See, for example, the very different consideration of these implications in SPARC’s 2024 news item titled “Reclaiming Control: Privacy, Platforms, AI & Governance in the Public Interest, in which they call on “institutions [to] negotiate more assertively to limit data collection and non-academic use of academic data.”)

Creative Commons Starts Working on a Solution

In June 2025, CCO took a step in the direction of helping authors try to rein in the massive reuse of their CC-licensed products by introducing CC Signals, a “new preference signals framework designed to increase reciprocity and sustain a creative commons in the age of AI.” Framed as “a major step forward in building a more equitable, sustainable AI ecosystem rooted in shared benefits,” CC Signals is intended to give copyright holders a mechanism for signaling how they would (and would not) prefer that their CC-licensed content be reused.

Without getting into the legal technicalities, the fundamental limitation of this approach is obvious: once they’ve made their work CC BY, copyright holders’ preferences no longer carry any weight. The right to reuse the licensed content in functionally unlimited ways has been irrevocably granted to the general public, including those who operate AI agents and LLMs, and the license requires nothing of the reuser except acknowledgment of the licensor as original creator — and even then, only when a genuinely derivative work has been created. And once the license has been granted the licensor can’t change his mind and restrict the general public’s rights in the work, either subsequently or retroactively. Despite some of the language contained in the current version of the CC Signals framework (referring to it, for example, as “a set of criteria that AI developers must meet”), the signals ultimately constitute requests, not legally-binding criteria that anyone “must meet.”

As part of its ongoing effort to address this issue – and after soliciting public input and feedback on the initial CC Signals framework – last week the CCO announced an update to the program. Framing it as a suite of “several high-impact interventions… to restore trust, strengthen participation, and embed public interest values into the AI knowledge ecosystem,” this new effort will “define best practices for attribution in AI contexts,” propose and advocate for “the development and usage of carefully scoped AI opt-outs,” and seek to develop “a new tool designed to enable conditional access to openly shared collections and compilations.”

Seven Questions for the Creative Commons Organization

These efforts raise many, many questions, of course, as CCO has acknowledged; the organization is clearly working very hard to address the most important of these.

I reached out to CCO multiple times, asking if someone would be willing to respond to a few questions about the new CC Signals framework, and received no response. So since an interview isn’t possible, I’m going to pose the questions here and solicit input from our readers. How do you think these issues will or should be addressed?

Public interest use. The distinction between “public interest use” and other kinds of use is repeatedly invoked in the CCO’s discussion of issues related to AI-based use of open content. But nowhere is “public interest” defined. What distinguishes a “public interest use” from any other use of a CC-licensed work? In this context, does the phrase simply refer to not-for-profit use, or will the new CC Signals framework contemplate narrower criteria? (And of course, it bears pointing out that nowhere in CC license language is there any requirement that reuse of the licensed work be undertaken in the public interest.)
Honoring the agency of creators (1). CCO is advocating for “the development and usage of carefully scoped AI opt-outs that simultaneously sustain creator agency while protecting public interest uses.” But hasn’t the creator exercised agency, in the first instance, by choosing to grant the public an irrevocable license to make free and unrestricted use of the work? After that license has been granted, on what basis should the creator expect to exercise agency over others’ use of the work? (Perhaps CCO is planning to create a new version of the CC BY license that incorporates such opt-out language — but for the millions of articles, books, and copyrightable data sets already licensed under its existing terms, the horse has left the barn; there is no “opting out” for those works, and rights granted by existing CC licenses can’t be narrowed retroactively.)
Honoring the agency of creators (2). CCO notes that feedback it received “was direct and consistent in stating that preference signals without enforcement do not meaningfully shift power. Signals alone cannot create agency in a system that many people did not choose to participate in.” But again, there’s no need to “create agency” in this system – under most copyright regimes, the moment a work is created agency is fully and automatically vested in its creator, who then chooses for herself whether or not to license her work (unless, of course, she’s been compelled to do so by a third party). “Power” over the work “shifts” precisely when the copyright holder exercises her agency by relinquishing control over her work and handing it over to the general public — and CC licenses are explicitly designed to prevent any shift in power from the public back to the copyright holder.
Attribution. Given that CC BY licenses require acknowledgment of the original creator when the work is reused or derivatives are created, what would be a reasonable threshold for acknowledgment when an AI agent creates a document that contains duplication or derivatives of by CC BY licensed content? For example, if a bot formulates a query response that includes duplicated or derivative content from a 10,000-book corpus that includes one CC BY book, does the license require the resulting output to acknowledge the author of that book? On the other hand, what if the bot quoted or created derivatives from 10,000 books, 5,000 of which are CC BY, each with a different author? Would the resulting output need to provide attribution for all 5,000 authors?
“Responsible” reuse. CC notes that “many want to continue sharing their collections while ensuring that AI developers use them responsibly by respecting attribution, ensuring transparency, and meeting other safeguards aligned with their public interest missions. We want to build tooling to enable this in standardized, legally enforceable ways.” Here the question is similar to that posed above regarding “public interest” use. Who will define “responsible,” and how will CC Signals account for differing views as to what constitutes “responsible” reuse of openly licensed content? Legal technicalities aside, is it even within the spirit of openness for a work’s creator to define “responsible use” in a way that is binding on others?
Applying licenses to non-copyrightable products. CCO’s guidelines on “Using CC-licensed Works for AI Training” indicate that “when training data is subject to the ShareAlike condition, model outputs and the model itself, if shared publicly, should be made available under the same CC license as the original works.” But how can AI developers or users apply any license at all to outputs created by AI agents, given that only the copyright holder can apply a CC license and works created by AI agents are not copyrightable (in the U.S., anyway)? When a work has no copyright holder, no one has authority to apply a license to it or otherwise restrict its reuse.
Community control of data. CCO’s update indicates that the organization’s “north star remains the same: sustain access to human knowledge. Today, that means more than enabling sharing. It means questioning long-held assumptions, and ensuring communities are in control of their own data.” What assumptions is CCO referring to? And in particular, in what way is open access to knowledge sustained when a group can exert control over “(its) data”? It may be the case that all things considered, certain groups (which ones?) should retain control over data related to them – but surely such control would represent a compromise in terms of open access to human knowledge, not a manifestation of openness.

To our readers: if you were the Creative Commons Organization, how might you answer these questions? And what other questions should we be asking?

Rick Anderson

Rick Anderson is University Librarian at Brigham Young University. He has worked previously as a bibliographer for YBP, Inc., as Head Acquisitions Librarian for the University of North Carolina, Greensboro, as Director of Resource Acquisition at the University of Nevada, Reno, and as Associate Dean for Collections & Scholarly Communication at the University of Utah.

Discussion

15 Thoughts on "Seven Questions about CC Signals"

“But he hasn’t got anything on!”

Please, please, please encourage CC zero (CC0) for data—not, CC-BY, and especially not -NC/ND versions. CC0 enables many important future uses–often published data/data sets from individual works need to be selected and integrated with other data for real value (think weather and climate data but thousands of valuable examples–we have a noisy planet, economy, society, ecosystems, health systems and combining data are key to understanding these). Publications, figures, graphs, etc. are not data and are fine to be licensed differently. Same for software, where CC licenses don’t work well.

Yes, some groups may consider more restrictive licenses and access for their data, and rightly feel that this is justified from past, and current, abuses–but an important growing consideration is that they need to consider if this risks these data being “forgotten” in future compilations and reuses, including in training AI/ML. This risks introducing and/or increasing bias in AI/ML or other results and actually then potentially harming those that thought they were being protected. A growing tradeoff.

I don’t quite understand this plea – at least in the US. Data as statements of fact are not creative works and not copyrightable so cannot be licensed. As you say Brooks, figures, etc. are different, but the data are already effectively Public Domain and free already. One can put technical barriers in the way but not legal licenses. (The situation is different in Europe, where as I understand sui generis may apply.)

In the US, data sets may be copyrighted under certain conditions. The canonical case law is Feist:

https://supreme.justia.com/cases/federal/us/499/340/

… and there’s a good overview document of the broader issue here:

https://datamanagement.hms.harvard.edu/share-publish/intellectual-property

It is important to highlight the fact that this is not a problem unique to content licensed with CC licenses. First, much of the content *not* under CC licenses is being licensed to LLMs by publishers. Here, authors have even less agency than they do when choosing what CC license to assign. Second, it is possible — and I welcome correction here — that this may all be a moot point, if use for machine learning falls under “fair use” — at least in the US. This is, according to my knowledge, not yet settled, but if it turns out to be the case, then CC licenses would make no difference one way or another.

More broadly, while I agree we should attend to possible unintended consequences of initiatives like those brought about by using CC licenses, we ought to be careful not to make them scapegoats. Broader copyright law was largely authored in a time before the existence of LLMs/machine learning as currently constituted. (Yes, machine learning has been around for a while, but LLMs/machine learning in their current moment are different in scale and impact.) Keeping this in mind, just as CC licenses are facing challenges due to these technological innovations, so too is copyright law writ large, as well as other more “traditional” licensing agreements that authors enter into with publishers.

Agreed — but I’m not really sure what “scapegoating” would mean in the context of this discussion. What we’re talking about here are questions about how CC licenses work, and the degree to which those who apply CC BY licenses to their work are left with any right or ability to control their subsequent reuse by AI agents. These issues are being raised, in the first instance, by the CCO itself. Scapegoating is a matter of attributing guilt, and questions of guilt really don’t enter into this.

Thank you for this thoughtful discussion! From my understanding, the Creative Commons licenses were designed not only to enable more permissible sharing than traditional copyright, but to uphold the moral rights of authors and their connection to their work. This makes attribution a core principle in using CC licensed work, which generative AI tends not to do. In terms of “responsible” reuse, I feel personally affronted that these companies take Open Access works that are made by humans, incorrectly present it as their technology’s “intelligence”, and will eventually put it behind a paywall. To me, that is a much more harmful interpretation of the “spirit of openness” than allowing a creator to add restrictions on responsible use of their own work.

It’s important to note that CC licenses can’t alter or go beyond traditional copyright — they can only reduce limitations on users of the content:
https://scholarlykitchen.sspnet.org/2025/11/03/can-a-cc-license-constrain-fair-use-or-other-copyright-limitations-or-exemptions/

Also worth noting that attribution is not the same thing as citation. Attribution is required for a reuse of the work, not a use of the work. If I republish your paper, I need to attribute it to you. If I read your paper and use the ideas in it to answer a question or do a new experiment, I am not required by a CC BY license to attribute anything to you.

I agree that merely reading a work and then using the underlying ideas in a new way does not require attribution under CC BY — ideas aren’t subject to copyright, and therefore can’t be licensed. But according to the text (or “deed”) of the CC BY license, attribution is required when the work itself has been “translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under… Copyright” — so obviously this applies under many circumstances that go beyond mere republication.

I’ll also point out that while Lisa’s post is correct in asserting that CC licenses can’t create restrictions on fair use, it is nevertheless true that CC licenses can create obligations not provided for under the law. For example, attribution is not a legal requirement — in other words, plagiarism is not illegal — but plagiarizing a CC BY-licensed work would be a breach of the license’s restrictions.

But again, that “translation, alteration, arrangement, transformation, or otherwise modification” does not require attribution unless it is made public (and done so in a manner such that a court would declare that it would not qualify as fair use). You can do all of those things to the content for yourself (or your AI) to your heart’s content. That’s “use” of the content. Attribution only comes into play if you publish that “use”, making it a “reuse”. And that’s a different thing from transforming the content and using it in your tool to create something new which you then make public.

Yes, agreed — the license only applies to works that are based on the licensed content, and that themselves become “subject to copyright and similar right” (i.e., are published or at least fixed in a tangible medium). My point was that these works can take many forms that fall somewhere between straight republication and mere influence. Some of those derivative forms could easily manifest as AI-generated content — which, of course, can’t be copyrighted, which brings us to my question #6 above.

By Rick Anderson
May 27, 2026, 10:25 AM

“If I read your paper and use the ideas in it to answer a question or do a new experiment, I am not required by a CC BY license to attribute anything to you.”

This is an essential issue for all to understand and embrace. Thanks for pointing it out! I suppose that if your original contribution is anchored in a past experience it would be best if attribution was given, if the original source is actually recalled. Let’s be honest that over a lifetime of reading and research an idea casually encountered is most likely of origin unknown.

That’s the tradition of academia — standing on the shoulders of giants and all, and the practice is to cite your sources. This makes your own outputs more believable. But it is not a legal requirement, and I know we have all ready a ton of papers that “should” have cited certain sources but didn’t. People need to understand the difference between a legal attribution requirement for republication and the academic practice of citation.

I know it is the topic of another article by you Rick, but CC “licenses” are more revocable that CCO wants to admit. The placement of CC license text on an item is legally more of an offer to license than a license, and offers are revocable except to parties who have acted in reliance. Licenses also require consideration, and we can debate whether redistribution and “credit” truly can function in that way. None of this detracts from this great piece. As you note, whether or not the publishers knew it at the time, enabling mass commercial reuse by tech companies has always been a feature of the CC system’s design, not a bug.

Roy, it sounds like you’re not so much arguing that CC licenses are revocable, but rather that they may not be legitimate and binding licenses at all. Is that correct?

If so, do you know whether there’s any case law on them? It would be interesting to know whether they’ve ever been tested in court.

The Tyranny of Unintended Consequences

Creative Commons Starts Working on a Solution

Seven Questions for the Creative Commons Organization

Rick Anderson

Leave a Comment Cancel reply

Related Articles:

Next Article: