The journal brand has proven to be the great intangible asset of the scholarly publisher. It signals trust and authority to authors and readers alike. So even as libraries came to license bundles rather than discrete titles and users came to discover and access content through platforms, publishers have worked hard to defend the journal brand and extend it, for example through cascades and author workflow integrations. The version of record, which publishers typically control exclusively, has been their vehicle for doing so. But everywhere you look, the version of record is declining in relative importance, as interest grows in preregistration, datasets, preprints, source code, and protocols, among other elements of the scholarly record. Looking ahead, we see real tensions emerging in how the scholarly record will be structured and who will have ownership and control over it.
What are the opportunities and challenges as publishers seek to extend the reach — and value — of their journal brands by supporting research materials beyond the version of record? Digging into the evolving context of preprints and research data offers valuable clues.
Preprints are inherently neither beneficial to the scholarly communication domain nor threatening to the established publishing networks. As Oya has analyzed, preprint services face a range of policy issues involved in building trust and durable business models in a contested information environment.
One strand of preprints, and the original motivation for this format, is in the informal sharing networks among research communities. arXiv is a well-known example of this model, starting with high-energy physics and then extending to a variety of adjacent fields in physics and mathematics. SSRN began with a similar approach focused on business and law. Services like these quickly became important components of the community infrastructure for scholarly communication in the fields that they covered. Initially, publishers entered into an uneasy truce with these preprint communities, recognizing that they could become a source of potential competition depending on how they developed.
More recently, as Oya and Roger analyzed in the spring, an alternative vision for preprints has emerged, one pursued by all of the major commercial publishers, among others. In this new model, publishers are promoting preprints but at the same time working to domesticate them, bringing them within their article submission workflows and linking preprints and versions of record in a way that will over time serve to deprecate the ability of the former to disrupt the latter. By restructuring the place of preprints less as part of a global research community (for example, for high energy physics) and instead linked directly with journal brands, publishers hope they will reinforce the existing value proposition. It remains to be seen how this vision will dovetail with, or perhaps over time impede, the mandate of community-based preprint services such as arXiv and bioRxiv to provide publisher-neutral platforms, decoupling the early sharing of research from the formal publishing stage in a way that enables authors to avoid having their findings associated exclusively with specific journals.
If anything, the landscape for research data is more complicated than that for preprints. It has come to include domain-specific structures, cross-institutional generalist structures, and increasingly substantial institutional investments. There are also some interesting new models developing for dataset discovery and capturing datasets within records associated with researcher identity.
The research data landscape is currently characterized by a vast array of domain-specific repositories. Many of these were developed from the ground up through the work of what Danielle has termed scholar-led data communities, which share a certain type of data, typically across disciplinary and institutional boundaries. Some data communities persist over decades while others may emerge and dissipate more quickly in response to specific research directions and specific societal needs, and we’ve profiled a number of both established and emerging data communities at Ithaka S+R. It is typically best practice for a researcher to use domain specific repositories whenever possible in recognition of the importance of maintaining close relationships between the data and the scholars who use it.
While institutional data repositories have not emerged as a dominant approach for scholars to deposit their data, there is growing investment by individual universities to address enterprise needs for data security and compliant storage both for administrative and research data, along with an array of other institutional research data services. Institutional models are far less a factor for preprints. There is also a movement to provide data curation services cross-institutionally through the expertise and leadership of information scientists, such as through the work of the Data Curation Network in the U.S. and Portage in Canada.
Many publishers have a keen sense of the growing importance of sharing research data — 2020 was, after all, the year of data — but have struggled to understand if research data will become a meaningful part of their business. The research data landscape includes a number of well known generalist repositories, some of which are owned by publishers. Partnering closely with the generalist repositories makes sense given the cross-institutional and cross-disciplinary nature of publisher infrastructure at scale, as well as the prospect of linkages into publisher workflows. Thus far, however, few seem to be pursuing models that would incorporate research data into publisher-specific services and workflows, as they have been doing with preprints, or other mechanisms for extracting value from them. Perhaps this is in recognition of the massive complexity of research datasets, which includes everything from privacy and other ethical factors to metadata description and standardization, far more difficult than the long-derided but ever-durable PDF.
An emerging strategic direction is for publishers to focus on ensuring data sharing compliance. DataSeerAI is a promising example of how a tool can be built out for publishers to offer better services in this space. Another approach is to have tighter control on data sharing policy, as evidenced by how some publishers are involved in advocating for specific repository selection criteria. COAR offers a critique of this move, arguing that it will enable publishers to set a bar for data repository compliance that will privilege criteria that only their own commercial, generalist-oriented offerings can meet.
In contrast, few publishers have built strong partnerships with data communities, and even fewer have identified models to enable, support, or even provide services to data communities. In these respects, among publishers, scholarly societies may have an advantage in being able to connect data communities and other relevant research records with their publications. The trend of some publishers preferencing generalist repositories and seeking to more tightly control repository selection criteria also arguably will not help to foster the broader development of data communities and may even serve to impede it, given the need for researcher communities to take the lead in defining and delimiting how data sharing is useful for them. Brian Nosek also reminds us that too strongly intertwining data sharing with publisher interests runs the risk of exacerbating publication bias, in contrast to more expansive approaches to data sharing, which encourages researchers to share data regardless of the results.
The scholarly record is fracturing, as shown by these twin examples of preprints and research datasets. Publishers are pursuing an effort to integrate preprints into their workflows and value propositions, but whether they will succeed in doing so remains to be seen. They seem to be far less certain of how to similarly integrate research data, which does make sense given that datasets correspond less directly to the published article than does a preprint.
To truly engage with other research artifacts from a workflow perspective, publishers need to invest not only in bilateral connections with the version of record but also develop a network with the researcher, the laboratory or other research team, and the research community more broadly. Only a few major publishers appear to have either the scope or the field-specific depth to take on such a project. Perhaps a white label service is needed.
For the publishing sector, this fracture seems to pose challenges. Those parties that are concerned about consolidation and profit margins in publishing might see in these challenges an opportunity. While perhaps unrealistic, as a thought exercise, we wonder what it would look like to make a large-scale capital investment in promoting the fracture? Might scholarly societies or others interested in stewarding research communities find a way to promote a refactored scholarly record?
7 Thoughts on "Can Publishers Maintain Control of the Scholarly Record?"
In response to the questions raised in this post — first, an open data mandate is expected in the early period of the Biden administration, meaning that scientists receiving US Federal funds will be required to make the data behind their publications publicly available. This will be the impetus to cause the movement you’re seeking, much as the Holdren Memo drove things like CHORUS in order to provide public access to research papers. As you’ve written about many times, the largest of publishers are looking to expand their businesses into other parts of the research workflow, and I would expect data management, curation, and preservation to be part of those efforts (see Mendeley Data as an example).
While I would love to see research societies take the lead here, by the time these policies are in place it’s unclear that there will be many independent publishing research societies left, as consolidation continues and more and more sign partnerships with larger publishers. Further, as we’ve seen for things like megajournals, the commercial publishers are good at what they do, and when something catches on and proves profitable, they seem able to regularly outcompete the non-profits, so I am less hopeful that this will be the case.
So even as libraries came to license bundles rather than discrete titles and users came to discover and access content through platforms, publishers have worked hard to defend the journal brand and extend it
I’m sure everyone else has thought of this before, but it didn’t hit me until I read the sentence above: Normally libraries don’t really think that much about a publisher’s brand as a publisher when selecting journals to subscribe to. (I don’t think I’ve ever heard a librarian or a researcher say “You know what this library needs? More Wiley journals.”) Historically, brand strength has always been most important at the level of the journal title itself: in other words, what I have regularly heard librarians and research say is “You know what this library needs? A subscription to Cell,” and in that circumstance it doesn’t much matter whether Cell is published by Elsevier or Wiley or OUP or any other reasonably reputable publisher.
However, when publishers sell their journals in bundles, they’re actually leveraging the value of their brand as a publisher: “You know what your library needs? All of our journals.” Now, the library may not be saying “We need all Elsevier journals,” but may merely be saying “We need a significant subset of Elsevier’s journals, so we might as well buy all of them at a dramatically lower unit price.” But the point remains that the unifying brand in the case of a journal package is that of the publisher itself, and that in itself represents a significant departure from the past. (Or from the distant past, anyway–the past before Big [and Medium] Deals.)
Hi Rick, I think this is absolutely right for library-channel sales and marketing, and of course it’s been challenged over the past year by those efforts to “unbundle” the big deal and return to title level selection. But there is also the reader side, where choices in the discovery and access process are shaped by journal title/brand, and especially on the author side, where APC-based models make journal brand most important. At this point, I’m not aware of any scholars who would say “I only would publish with Wiley journals” or “I make a special effort to read articles of interest when they appear in an Elsevier title.”
To be fair, publisher reputation seems to play a much bigger role in academic books than in journals.
I think it’s accurate that few/no say “I would only publish with xyz” … but I do hear “I wouldn’t publish with xyz” … which is a publisher brand effect? Just not a positive one! It seems scholarly societies with a larger journal portfolio do at least sometimes have a positive publisher-brand effect in that one does hear faculty say that they make an effort to publish with their society and/or read those journals.
Totally agree with you Lisa. Another variation I hear through my research is “I will not REVIEW for publisher X” as well as “I make sure to REVIEW with publisher X to ensure I can maintain access to their content now that my institution cancelled our subscription.”
Thank you for a though-provoking post. The publishers and other groups working on the repository criteria project recently responded to the points raised by COAR in a letter, which we have posted on Zenodo here (https://doi.org/10.5281/zenodo.4458082).
As stated in the letter, we are committed to working with all stakeholders to ensure this work is valuable and will continue to update in response to feedback. Please do share this information with others who may be working in this area.
If you have questions and want to discuss further please do let us know.
Sarah (on behalf of the repository criteria working group – firstname.lastname@example.org)