Editor’s Note: Today’s guest post is by Ginny Herbert, Publisher for Researcher Experience and Operations at AIP Publishing.

While the adoption of open scholarship practices has grown significantly, enthusiasm still surpasses uptake. Specifically, when we consider trends in sharing research artifacts beyond a traditional paper, such as data and code, methods, registered reports, or null results, we see advocacy, product launches, and modest growth but no “new normal.”

Opening up the research lifecycle for increased collaboration and transparency has no intrinsic cost barrier, and it excites editors, librarians, funders, and publishers alike. In theory, open scholarship aligns well with the volume-based economies encompassed by both the “publish or perish” culture and the scholarly publishing industry’s recent focus on publication volume. So, what gives?

To answer our question, let’s draw on the Value Creation-Value Capture Framework from business strategy, which holds that for a firm to succeed, it must:

  1. Create value, or in other words, offer something that others find useful, interesting, or beneficial; and
  2. Capture value, or generate sufficient return to sustain the activities required for value creation.

Viability depends on doing both. If early-stage research artifacts function as the “product” and researchers as the “firms,” then the question of slow open scholarship uptake becomes analogous to a standard strategy problem: why aren’t more firms offering these products?

Intuition or future vision concept. Male entrepreneur trusts inner feelings and sensations and makes decisions unconsciously. Self knowledge for business. Cartoon modern flat vector illustration

Value Isn’t Sufficiently Created

The first half of the Value Creation-Value Capture Framework suggests that if an offering isn’t garnering interest, then it may not be valuable for a sufficient share of consumers — or in other words, people don’t actually want the product you’re offering. In this context, the slow expansion of open scholarship practices could be interpreted as a rational judgment by researchers that the demand for data, code, methods, null results, and preregistrations among readers remains limited.

Empirical work lends some support to the demand-side hypothesis: within the context of data sharing, for example, earlier studies indicate that 8588 percent of datasets accrue no citations, while a newer analysis notes that the vast majority of shared datasets have no recorded reuse and typically receive only a single citation when cited. There’s a rebuttal to that argument, however, that the citation is a poor mechanism for quantifying usage in the context of non-article outputs and thus underestimates impact. Keep this in mind — it comes up again in the value capture side of the equation.

Usership versus Readership

There’s also a rebuttal that open scholarship outputs are “packaged” incorrectly, thus preventing full value realization. Above, I describe the consumers of non-article output as “readers” because, as publishers, “readership” is our dominant paradigm; however, in the context of non-article output, “usership” may be more apt.

Though it may seem semantic, the distinction between “reader” and “user” affects our approach to information dissemination. Readers require developed content to consume — context, narrative, a compelling argument. Users, however, require systems that they can engage with to use the information provided to the system — and not directly to the users themselves. To put it more clearly, “reader” emphasizes information consumption, while “user” emphasizes system interaction.

When we consider artifacts such as data and code, or the universe of registered reports and null results that exist around a set of research questions, usership more accurately reflects how these materials produce value than readership. Null results, for example, hold limited narrative appeal but, in aggregate, are essential for assessing novelty or prior exploratory work. However, evaluating this via traditional reading — working through individual papers or registrations — is inefficient. Researchers would benefit more from the ability to query the literature at scale.

Why, then, did most researchers remain readers rather than users? The short answer is a technological gap. Before LLMs, no system allowed meaningful interaction with the literature. Discovery tools stopped at search: once a query returned results — whether that be in Web of Science, Google, or a clinical trial registry — users had to read and synthesize each item themselves. In other words, system interactions largely ended the moment search ended.

A New Mode of Information Interaction

ChatGPT, however, has disrupted search for the first time since the rise of Google. Thanks to LLMs, people can now interact with complex information as users rather than merely readers: instead of finding and interpreting individual sources, they can ask a system to synthesize relevant knowledge. This marks a fundamental shift from reader-based knowledge consumption to system-mediated digestion.

In the context of scholarly communication, we’re already seeing evidence that LLMs are reshaping information consumption behavior: BMJ Chief Technology Officer Ian Mulvaney recently shared data on the increase in LLM-driven traffic that the BMJ has observed, and Oxford University Press Product Strategy Director John Campbell suggested that Google’s AI-powered search summaries were responsible for a 19 percent decline in click-throughs to their academic reference services.

This could be a huge opportunity for open scholarship: If a key barrier to value realization for non-article outputs is their usability in contexts where they add the most value, LLMs can bridge the gap between dissemination mechanisms and use. Furthermore, ChatGPT’s widespread adoption means that baseline familiarity with LLMs already exists. The challenge, then, is not user adoption itself but expanding the breadth of use cases to include open scholarship, which is a much simpler problem than the behavioral change implicit in initial adoption.

With ChatGPT and tools like it, we see that it becomes quite easy and natural to ask: “Has this research question been asked before, and what other ones like it have been?” Indeed, one of the most common research applications of LLMs to date has been assisting with literature reviews, which mirrors this process of inquiry.

So, if we conclude that researchers have been slow to share non-article outputs at scale because of a lack of audience, and LLMs can now bridge that gap by making these materials more useful, we should see a surge in open scholarship adoption, right?

Value Can’t Be Sufficiently Captured

Let’s return to the second half of the framework: value capture. Value creation alone is insufficient; actors must be able to retain enough value to justify continued participation. In the context of open scholarship, this requires that the act of sharing non-article outputs be recognized and rewarded within academic evaluation systems.

Within the academy, value capture operates largely through career advancement. While promotional structures differ by geography and institution, research outputs remain central and are often mediated by citation. In some cases, evaluation is explicitly citation-based — success is based upon how many times a researcher’s work has been cited. In others, output is counted via publications, but the venues that “count” publications are often defined by indexability and journal standing, both of which are historically tied to citation performance as measures of impact. Thus, even nominally, non-citation-based models rely on citation indirectly.

Consequently, for researchers to capture value from sharing research artifacts, those artifacts must either accrue citations themselves or be published in venues that prioritize citation rate.

Citation as a Measure in Aggregated Contexts

Citation has never been an adequate mechanism for evaluating non-article outputs, and its shortcomings intensify in an LLM-mediated context. Citation assumes discrete units, linear reading, and stable provenance, assumptions incompatible with systems that synthesize across thousands of inputs without attribution.

LLMs do not have a strong track record with citation, and even if they did, the resulting lists would be unwieldy and of limited interpretive value. Thus, the forms of use enabled by LLMs exceed the measurement capacities of citation-based metrics.

One might argue that publishing non-article outputs in journals would solve this problem by introducing a validation step and allowing them to “count” within evaluation frameworks. But this approach runs into structural constraints. Journals curate for readers; they are built on the expectation that each published unit will be meaningful, interpretable, and ideally engaging on its own. Many non-article outputs derive their value not from readability but from utility in aggregate. A dataset or a collection of null results may be highly useful systemically, yet offer little narrative cohesion on its own.

In short, technological advances have expanded our ability to create value from open scholarship, but the incentive structures governing capture remain unchanged.

Sound-science journals partially address this mismatch but encounter structural constraints of their own. Their high-volume, low-citation profiles make indexability and category placement difficult, and because journal viability is tied to citation, editorial decisions — even in methodologically focused models — tend to reinforce citation-oriented content strategies. For researchers, this means that opportunities to publish non-article outputs remain limited; at scale, the model is untenable because it depresses the very citation performance journals need to survive.

Taken together, this leaves us with a square-peg / round-hole problem. Journals are optimized for curation and discrete, consumable knowledge units, while the value of many non-traditional outputs in an LLM-enabled ecosystem comes from their function as data points within a larger system of synthesis. If we continue relying on bibliometrics designed for journals, we reproduce the same limitations that have long constrained non-article outputs. But if we push these outputs into journals, we encounter a different scalability issue: publishing them at scale undermines the very citation-based criteria that sustain journal viability.

In short, technological advances have expanded our ability to create value from open scholarship, but the incentive structures governing capture remain unchanged.

Where Do We Go From Here?

If LLMs have shifted the value of research outputs from what can be consumed by individual readers to what can be synthesized by systems, then our evaluation structures need to catch up. Current reform efforts — DORA, CoARA, narrative CVs, etc. — have broadened what counts as a research output, but they still assume that value is realized at the level of discrete, human-readable units. In an LLM-enabled ecosystem, much of the value instead comes from the accumulation of many small contributions: datasets, code, preregistrations, null results, and other artifacts that function less as standalone products and more as essential inputs to a shared knowledge system.

If that’s how value is created, then we have to rethink how value is captured. Researchers will continue to hesitate to share these non-article outputs at scale if the only contributions that “count” are those that attract citations or fit neatly into journal formats. For open scholarship to scale, we need incentive models that recognize the collective nature of knowledge creation — frameworks that reward researchers not only for singular, impressive outputs but also for the smaller, infrastructural contributions that make the whole system work. That means treating machine-actionable artifacts as first-class research outputs instead of good-citizenship practices, developing indicators of use and reuse that reflect system-level engagement, and designing evaluation processes that can acknowledge when a researcher’s work enables others, even if it doesn’t command traditional attention.

It is clear: The technology now exists to create unprecedented value from open scholarship practices. Whether that value is ever fully realized depends on whether we build evaluation systems that can see it.

Discussion

6 Thoughts on "Guest Post — Open Scholarship is Poised to Create More Value than Ever, but Are We Ready?"

When I see proposals like this, I always have to put them in a “yes, and…” context. That is, the value created by these types of outputs are not replacements for the value created by publishing one’s work, rather they are an additional value that could/should be recognized. As always, there is a reason why the research paper has survived for more than 350 years, and that value will continue to exist even in an age where LLMs offer further (but different) value.

I worry though, that any assessment matrix that is based on the accumulation of small contributions will further exacerbate the quantity over quality problems that are increasingly plaguing research assessment. If there’s no way to curate the value of a given small piece of information in an LLM, then are we just going to end up in a counting game where bulk production of low value outputs is favored over important conceptual breakthroughs?

Hi Ginny! This is a really interesting and thought-provoking piece. What do you think would change around value capture for Open Data if one could (magically) detect all instances where datasets were re-used by published articles? There would be many more data citations, but (as you note) unless those get hitched to a target metric like Impact Factor, maybe very little would change?

It is great to see such a considered discussion of open science practices and motivations given this space on SK, thanks for showcasing this Ginny!
Something we spend a lot of time investigating at Taylor & Francis is the potential for open science to improve trust and strengthen research integrity of published research. The value of sharing foundational elements of the research process and ensuring connections are made between them forms a significant role in making open science a reality and increasing trust in published content. This can include additional publications for researchers to showcase their skills and expertise in software or data curation and gain credit for that work, and if we want this to be a more common practice we need researchers to be rewarded for that work and incentive structures to encourage this behaviour.
Beyond this, publishers can add value, demonstrate commitment to open science and the rigour of the research process through activities such as enriching metadata, verifying author contributions, and linking related outputs including data, code, methods, peer reviews, and early versions of papers.
Open science depends on a multi-stakeholder approach; engaging researchers and institutions is of course a crucial aspect, and publishers have a role in shaping and demonstrating the benefits both in quantity of outputs, but crucially, the quality so readers can trust what they are reading.

Hey Ginny! I really appreciated the value creation / value capture framing. It helps explain why, for most researchers, open artifacts still feel like extra work rather than part of the core job.

Building on your conclusion, and also on David Crotty’s concern in the comments, I keep wondering where the most credible locus of curation actually sits if we move toward recognising many small, machine actionable contributions.

In your view, who is best placed to play that role in practice: journals and publishers, shared infrastructure providers, or communities and scholarly societies in particular disciplines? Or is the answer less about a single locus of curation and more about aligning assessment practices with a mix of these signals? Thanks Ginny!

How could research institutes help plug the gaps? My thinking is that most research institutes end up holding the output from projects after the funding has run out, so perhaps with a pretty minor investment as part of the research infrastructure they could package that data and make it accessible for all (within any requirements of the funding and laws), thus realising any residual value from data that would otherwise simply cost money to store. Going one step further, new projects should include end of project rules that complete the data package(s) for dissemination -regardless of positive/negative/null interpretations.

Leave a Comment