Making AI Use of Scholarly Content Traceable, Measurable, and Trustworthy: A Meeting Report from Cambridge Scholarly AI Workshop

Editor’s Note: Today’s post is by Scholarly Kitchen Chef Todd Carpenter, Tasha Mellins-Cohen, and Monica Westin. Tasha is Executive Director at COUNTER and Founder of Mellins-Cohen Consulting. Monica is the Director of Open Policy Development at Cambridge University Press.

This is the second in a series of posts on AI systems, provenance tracking in generative artificial intelligence systems, and the implications on usage and assessment. The first post was published last week. The next piece in this series will cover forthcoming community work related to provenance tracking and usage.

As generative AI systems and agentic tools are adopted more broadly in scholarly research systems, we must retain a commitment to the core values of verifiability, of traceability, of trustworthiness and connecting elements of the scholarly record. The scholarly communications infrastructure needs to adapt and incorporate new standards that support provenance, attribution and various types of assessment in these new tools.

Photograph of an old stone courtyard. In the center of the courtyard, a medium‑sized tree with a broad, rounded canopy stands in a circular bed edged with a low woven or wooden border. The grass around the tree fills most of the lower half of the image and looks slightly patchy with different shades of green and yellow. The surrounding building is made of light gray and beige stone with Gothic or medieval-style architecture, including pointed gables, tall chimneys, and decorative stonework. Several multi‑paned windows are visible; one window on the left side glows with warm yellow light, while the others appear unlit. The sky above is overcast, giving the scene a muted, cool tone. Image description provided with Be My Eyes. — A famed “Flower of Kent” apple tree outside the entrance to Trinity College in Cambridge. This tree was grafted from the original apple tree at Newton’s childhood home. That tree was the inspiration for Newton’s theory of gravity when he saw an apple fall from its branches in the famous story.

The very notion of “Artificial Intelligence” began with a workshop hosted at Dartmouth College in 1956, when computer science researchers John McCarthy, Claude Shannon, Nathaniel Rochester, and Marvin Minsky, co-organized the Dartmouth Summer Research Project on Artificial Intelligence. Since those early days, convenings to discuss technical issues related to various computer science challenges have been a regular occurrence as technologists seek to understand and improve various elements of the technology.

Last month, Cambridge University Press, COUNTER, and NISO co-hosted a workshop to frame the interrelated problems associated with attribution, provenance and usage metrics in research communications. Nearly three dozen experts joined in the discussion, each representing a different participant perspective, with libraries, AI tool developers, funders, societies, publishers, platforms, and other infrastructure service providers all represented.

A summary report of the meeting has been released, and initial work to advance the practical outcomes is beginning.

Laying to Groundwork in the Workshop

Building on a set of ongoing conversations about the need for standardization of AI, this meeting set out to narrowly define a constrained set of potential projects that might improve the current state of the art. While ongoing conversations have been underway for some time, this meeting sought to move beyond the abstract concerns and toward concrete infrastructure changes that might be implementable over the next year. Noting that there is tension between the pace of consensus development and the pace of technological change, the meeting sought to scope new work that might help publishers, libraries, researchers, and AI providers work from common assumptions.

Foundationally, the meeting began with the recognition that traditional usage metrics, including downloads, views, searches, and requests, do not fully capture how AI systems interact with scholarly content. AI tools may retrieve passages, retrieve metadata, generate summaries, or synthesize findings across numerous papers without a researcher ever clicking through to the original article or book chapter. Many of the sources that are included in an output may not even be attributed or referenced, for that researcher to follow back if they wanted to.

Conversation in the meeting uncovered various complexities of even simple-seeming approaches in these areas. For example, a research output may be involved both in an explicit AI interaction at the time of retrieval (inference time), as well as an implicit AI interaction during the training of the underlying LLM (training time). Metrics for these two uses would need to be distinguished, though for the purposes of the workshop, we focused on inference metrics. Versioning, corrections, and retractions represent additional challenges to clean metrics even for a single stable output. One frequent metaphor during the day was that of “supply chain” problems.

While some aspects of the meeting were exploratory and sought to expose concerns and issues, the main goals were more concrete. The workshop had a practical focus, namely, to identify specific, narrowly focused, and implementable areas for future work related provenance and usage. With these two related focal points, the group identified at the outset two separate potential workstreams, leaving aside questions of model development, citation accuracy and rationale for citation. Building upon the existing COUNTER AI best practice, the first workstream will include exploring how to extend the AI reporting structure to capture agentic usage, and third-party AI tool use of content – something that COUNTER’s AI working group is already tackling. The second workstream will focus on citation accuracy, provenance, and usage reporting as related but separate. This will include a pilot to test provenance models in research applications and AI retrieval systems.

Usage in AI Systems

It has been widely recognized that traditional usage metrics are insufficient for AI-driven usage, and COUNTER has published a preliminary best practice to start addressing the issue. The lack of visibility into AI usage is problematic both for institutional collection decisions and justifying subscription value, and for publishers seeking insights into the value of their content in an AI-first discovery environment.

COUNTER’s work so far has focused on meaningful use, defined as content that is specifically incorporated into the final AI-generated answer, rather than content merely retrieved or considered. There are several future metrics questions that are likely to need to measure a range of computational activities. First, as noted is a need for component-level tracking. Measurement should extend beyond the COUNTER Item (article/chapter) to the component level. While mechanisms exist for component reporting and for rolling that usage up to an item or title level, further work is needed to incorporate this into work on AI systems. A broader examination of holistic assessment questions is also necessary to develop metrics that capture usage of all content assessed by an AI system, in addition to the existing meaningful use metrics.

Crucially, the resulting metrics must roll up to the item (article/chapter) level to maintain value for library collection decisions. A top priority for COUNTER is determining how to bring external AI providers and third-party tools into a shared reporting infrastructure.

Provenance in AI Systems

Other issues of provenance and attribution are complex and particularly challenging regarding different types of AI systems, as outlined last week, without digging too deeply in the technical weeds. One important distinction that was highlighted during the workshop was distinguishing between attribution inside a foundation LLM model and attribution in systems built around retrieval, APIs, agents, and publisher-controlled content delivery. Model-level attribution is both a technically challenging endeavor and is less tractable in the near term, therefore less actionable. Realistically, as well, the entire scholarly communications industry, including publishers, libraries and research institutions, combined has very limited influence on the decisions surrounding foundation model development.

A more practical path, therefore, is to improve provenance at the point where content is retrieved, supplied, transformed, cited, or reused. A standards opportunity exists around developing a minimum provenance payload specifically for research applications and contexts. Such a model might have broader application but is critical for the research applications the scholarly publishing industry serves. This would include information such as persistent identifiers, source location, timestamps, content hashes, digital signatures, version and copyright information, component identifiers, and relationships between source objects and AI outputs. Such work might also create standard display approaches so that researchers using these systems might immediately be able to recognize a reference in generated outputs, in the same way such references are recognizable in research literature. This approach would also need to capture, recognize and output to the user—or possibly exclude from outputting—questionable research outputs, such as retractions and other expressions of concern regarding research outputs. The timeline of content would need to be incorporated to recognize that the state of scholarly knowledge is ever evolving and what might have been accurate at the time of publication, might not be recognized as the state of the art today.

This framework would be equally applicable for component parts as it would be for complete works. AI systems do not always capture entire works, such as complete articles, books, or reports, in their outputs. A robust attribution model must also be able to capture component-level information and extend and link back to parent content objects. AI systems are often operating using sections, figures, tables, images, datasets, snippets, or semantic chunks and a robust system for provenance and attribution needs to include these linkages, as well as rights and citation information. Such component usage also needs to be captured for assessment purposes, which the current standards ecosystem doesn’t yet support.

Practical Next Steps

Last week, in the first post of this series we discussed how verifiable, and trackable provenance representation in generative AI systems is core to trust and retaining the system of credit that underpins the scholarly exchange system. Regardless of whether content distribution is open access or subscription-based, it is vital that we ensure content retains its source information while it is processed by and incorporated in results of generative AI tools. Earlier this year, the STM Association outlined its focus on the values of publishing and how it relates to generative AI tools. This was important work, but it needs to be backed up with guidance on how to implement that vision. Based on the discussions during the Cambridge workshop, there is a robust and open conversation about how these values can work in practical terms.

There have been many robust legal arguments about how AI content is replicated in generative AI outputs. There have also been discussions about the removal of provenance information and copyright information during the process of content ingestion for AI systems. These legal issues are certainly important and will have significant impacts on the business models of both publishers and generative AI system developers. These issues certainly have a place in discussions around how AI tool providers interact with scholarly publishers. More fundamentally, though, these concerns around provenance, usage, citation, attribution and trustworthiness are essential to what it means to disseminate research. Any system that seeks to serve the research communities needs to take these concerns seriously. The developers and companies that offer these systems should be expected to work to ensure these values are honored and implemented in their services. The meeting in Cambridge is a clear signal that many are interested and willing to engage. It pointed toward practical next steps that included potential pilot projects, consensus development on standards alignment, and how we can support collaboration among publishers, libraries, infrastructure providers and AI developers. The next step will be to turn that engagement into action.

Todd A Carpenter

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.

Tasha Mellins-Cohen

Tasha Mellins-Cohen, Executive Director at COUNTER and Founder of Mellins-Cohen Consulting, joined the scholarly publishing industry in 2001. She has held roles within learned societies and commercial publishers across operations, technology, editorial and executive functions, while donating time to key industry initiatives and bodies such as UKSG, ALPSP and STM. From 2022 she took over the running of COUNTER, the usage metrics standard, alongside her consulting work with not-for-profit publishers.

Monica Westin

Monica is a librarian whose career has spanned roles at UK and US university libraries, Google Scholar, Google copyright policy, and the Internet Archive. She is currently the Director of Open Policy Development at Cambridge University Press, where she leads work on the Press’ policies related to open research, transparency, responsible research assessment, and other public policy positions. Monica volunteers her time as a trustee of Wikimedia UK, as a steering committee member of DORA, and as a director on the boards of directors at the standards organizations NISO and COUNTER Metrics.

Discussion

1 Thought on "Making AI Use of Scholarly Content Traceable, Measurable, and Trustworthy: A Meeting Report from Cambridge Scholarly AI Workshop"

Thank you for the insightful meeting report from the Cambridge Scholarly AI Workshop, co-hosted by Cambridge University Press, COUNTER, and NISO. As you rightly point out, retaining the core values of verifiability, traceability, and trustworthiness is critical as generative AI systems and agentic tools are integrated into scholarly research systems.
To move beyond abstract concerns and toward concrete infrastructure changes, the industry needs formats that AI can consume responsibly and sustainably. Adopting solutions like TopicLake Insights Engine Compute Ready Documents (CRD) directly aligns with the technical hurdles identified in your workshop, while also addressing critical computational efficiency needs:
1. Enabling Component-Level Tracking and Linkage The workshop accurately highlighted that AI systems do not always capture entire works, but instead operate using “sections, figures, tables, images, datasets, snippets, or semantic chunks”. By transforming traditional scholarly formats into TopicLake Insights Engine Compute Ready Documents (CRD), content is natively structured and tagged into these precise semantic components. This allows for the robust attribution models you called for, ensuring that component-level information is captured and properly linked back to the parent content object.
2. Delivering a “Minimum Provenance Payload” You noted the practical necessity of improving provenance at the point where content is retrieved or reused by developing a “minimum provenance payload specifically for research applications”. TopicLake Insights Engine Compute Ready Documents (CRD) are designed to carry this exact payload natively. By embedding vital metadata, such as persistent identifiers, source location, timestamps, content hashes, digital signatures, and version information, directly into the machine-readable chunks, these documents ensure AI systems have the data required to accurately cite sources and exclude questionable outputs like retractions.
3. Modernizing Usage Metrics for an AI-First Environment Traditional metrics completely fail to capture when an AI system retrieves passages or synthesizes findings without a researcher clicking through to the original text. Engines managing TopicLake Insights Engine Compute Ready Documents (CRD) can monitor and distinguish usage at the explicit point of retrieval (“inference time”) versus implicit interactions during LLM training. This provides the data necessary to support COUNTER’s objective of tracking meaningful use at the component level while rolling it up to the item level for library collection decisions.
4. Sustainable Inference and Cost Observability Beyond provenance, a complete AI scholarly infrastructure must address the immense computational and environmental costs of LLMs. By integrating TopicLake Insights Engine Compute Ready Documents (CRD) with technologies like the Fractional LLM Inference Token Efficiency (FLITE) engine, organizations gain an observability architecture that provides unprecedented transparency into computational costs. FLITE routes queries through a “zero-inference” pathway utilizing a matrix of thousands of pre-computed historical FAQs. By bypassing expensive Large Language Model (LLM) generation for known questions, this architecture significantly reduces cloud computing costs while simultaneously tracking real-time savings in energy, carbon, and water usage.
5. A Real-World Application: The Federalist Papers To demonstrate these concrete capabilities, CRDs of the Federalist Papers have been created to work alongside the FLITE engine. To commemorate the 250th anniversary of the Declaration of Independence, the scholarly publishing industry has an opportunity to redefine how masterworks are researched. A side-by-side package is currently being distributed featuring the traditional web version of The Federalist Papers alongside a revolutionary application accessible at https://bit.ly/CRDFedPapers.
Ultimately, frameworks like TopicLake Insights Engine Compute Ready Documents (CRD) provide the technical scaffolding necessary to ensure content retains its source information, honors the system of scholarly credit, and operates within a sustainable, verifiable AI ecosystem.