Editor’s Note: Today’s post is by Rebecca Bryant, Charles Watkinson, and Rebecca Welzenbach. Rebecca Bryant is Senior Program Officer with OCLC. Charles Watkinson is is Director of University of Michigan Press and Associate University Librarian for Publishing at the University of Michigan. Rebecca Welzenbach iis Research Impact and Information Science Librarian at the University of Michigan. 

Research Information Management (RIM) is an area of considerable growth for North American institutions today. Research universities like the University of California, the University of Michigan, and Virginia Tech use RIM systems in support of a variety of use cases. A recent OCLC Research report documents the RIM practices at five US case study institutions, and identifies six RIM use cases:

  • Faculty activity reporting (FAR)
  • Public portals for expertise discovery and showcasing research
  • Metadata reuse
  • Strategic reporting and decision support
  • Open access workflows
  • Compliance monitoring

RIM systems aggregate, curate, and utilize metadata about institutional research activities, and they are able to rapidly collect information about the institution’s research activity through publication metadata harvesting at scale.

For example, at the University of California, San Francisco, the UCSF Profiles system, using the open source Profiles RNS platform, regularly identifies, disambiguates, and imports publications from PubMed into individual researcher profiles. The Experts@Minnesota portal, using Elsevier’s Pure platform works similarly, but instead draws primarily from the Scopus index to showcase institutional research. The University of California system, University of Michigan, and Virginia Tech all use Symplectic Elements to support metadata harvesting from multiple sources such as Web of Science, Crossref, and ArXiv.

The ability to harvest accurate and mostly complete metadata for researchers in STEM disciplines is quite good in all of these examples. However, metadata harvesting from any of these sources provides disappointing results for humanities and some social science scholars.  When we look at a researcher’s profile, we may notice that a book (or more than one) is missing. But it is not always clear what happened to it. Why didn’t information about the title make it from the publisher to the researcher’s profile, just as STEM journal article metadata does?

Metadata harvesting at scale is made possible by metadata residing on the network, and the order brought to that metadata by standards and persistent identifiers (PIDs). The adoption of PIDs like DOIs for publications and datasets, ORCID iDs for persons, and ROR identifiers for organizations are essential components for identification of works, the people who author them, and their institutional affiliations. These PIDs — first DOIs, then ORCID iDs, and now RORs — have become embedded within the scholarly communications workflow, captured at the point of publication and then fed throughout the ecosystem to support disambiguation, discovery, and sharing. For instance, today more than 100 publishers worldwide require ORCID iDs from corresponding authors, (and more than 2,000 journals). However,  when you examine the list of (mostly scientific) publishers requiring ORCID iDs, the reasons for the gap in humanities content starts to come into focus.

bridge under construction

Problem: disappearing metadata

As monograph metadata moves through the scholarly publishing supply chain, information can be lost, dropped, or even augmented. This topic was discussed in a recent event hosted by the Crossref Books Group on “Fixing the Information Supply Chain for ebooks”, which demonstrated the problems through the example reshared here.

In 2021, the University of Michigan Press published a monograph entitled Coronavirus Politics, edited by four co-editors and with chapters by more than sixty authors. The volume was a timely and a highly important contribution during the midst of the global pandemic, and an open access version was made available, in addition to the print copy for sale.

The University of Michigan (U-M) Press uses the Firebrand title management system to manage the multiple steps it must take to make the book and metadata about it available to potential readers, bookstores, and libraries. This includes creation of the complete metadata record for the book, collection of author information, assignment of BISG BISAC subject headings, assignment of a unique ISBN for each format to be published, registration of a Crossref DOI for the title, and submission of a Library of Congress cataloging in-process request. In addition, U-M Press proactively captures ORCID iDs from editors and authors, but does not assign DOIs for book chapters at this time. It also captures ROR identifiers for internal use. This metadata is then delivered to the press’s distributor, CDC (Chicago Distribution Center).

CDC then delivers this metadata via ONIX XML format to a range of vendors. These include channels for individual purchase, including Amazon; wholesalers and library vendors, like Ingram; and library-facing companies such as EBSCO and ProQuest. ProQuest supplies ebook aggregations as well as a suite of management tools, such as the Online Acquisitions and Selection Information System (OASIS), which many academic libraries, including the University of Michigan Libraries, use to select and purchase books.

So far, so good.

But as we move farther downstream, we can start to observe problems. For instance, the metadata record for the work in ProQuest’s OASIS platform now indicates a single author instead of multiple editors. Even more concerning, there is no record of the open access version! Without further investigation, the librarian wouldn’t know about the OA copy and might unwittingly spend $45 USD on the print copy. This can have a significant impact on the discoverability of and access to the open access version, as this information may then fail to flow into library catalogs.

Farther downstream, the problems are even greater. In examining how the metadata for this monograph is entering (or not) institutional RIM systems, we can observe  at least three separate outcomes for three different editors or authors:

  • In the Michigan Research Experts portal at the University of Michigan, the volume appears in editor Scott Greer’s profile, but without an abstract. Furthermore, none of the four chapters in the volume authored by Greer are included.
  • Greer co-authored a chapter entitled “The European Union Confronts COVID-19: Another European Rescue of the Nation-state?” with Eleanor Brooks at the University of Edinburgh, but this contribution does not appear in her profile in the University of Edinburgh Research Explorer.
  • However, the chapter authored by Minakshi Raj at the University of Illinois at Urbana-Champaign does appear in the Illinois Experts RIM system, along with a link to the open access monograph. How did this happen? It’s a bit complex, but one of Illinois’s academic units, the College of Applied Health Sciences, recently decided to use the Experts system as the publication database of record for its faculty. A comprehensive CV review resulted, and the missing content was identified, verified on the publisher’s website, and keyed into the RIM system.

Diagnosing the Issues

There are at least two key issues at play:

  1. The metadata at the point of origin (the publisher’s title management system) is incomplete insofar as it doesn’t have all the relevant PIDs, including DOIs for book chapters. While the U-M Press captured ORCID iDs, many other monograph publishers may not. ROR identifiers are not yet widely used to help tie researchers to their affiliations.
  2. The pipeline from publisher to numerous other systems, and ultimately to readers, is riddled with gaps and breaks, where metadata is lost, garbled, and sometimes added to in unpredictable and nonstandard ways.

PIDs drop out of the pipeline at various stages in the process. This leads to the frustrating situation at Michigan where books published by the University of Michigan Press appear without the identifiers assigned by the Press in either the library’s catalog records or in the University’s RIM system.

Why are the PIDS getting dropped? For lots of reasons, but some reasons include the incompatibility of ONIX and MARC standards, a lack of database functionality (including simply having a field for a new identifier), and perhaps even a failure of system of administrators to appreciate the value of the PIDs.

A Call to Action

Simply understanding and communicating the problem is an essential first step — and with that, we invite you as a reader to share this blog post with your networks. In addition, we offer the following recommendations:

  • Title management software providers serving humanities scholarly publishing are encouraged to adapt their systems to make the collection and processing of PIDs more convenient and complete — and analogous to those practices occurring in STEM publishing.
  • Publishers can make a significant difference by registering DOIs for their publications — including for each chapter in edited books. They can also play a powerful role in promoting ORCID iDs to their scholars, perhaps even requiring them, and adding ROR affiliations if possible.
  • Research libraries are usually already working to educate scholars about ORCID, but they may be able to increase their impact by partnering with the university press or library publishing initiative at their own institution. They can play an influential role in educating other local stakeholders about how institutional support and integration of ORCID iDs and other PIDs can reduce the burden on scholars. They can also ask RIM vendors to support metadata harvesting from more sources that include humanities and social science content.
  • Metadata aggregators should ensure that they are up to date on the metadata they should be collecting — and not fail to collect information for reasons of performance or failure to understand future uses.
  • Humanities and social science scholars should claim their ORCID iD and populate — and maintain — their ORCID records. This will save them time in the long run, as they can link it to the local RIM system, and, in time, use it for other applications throughout their careers.

Author acknowledgements: This piece was inspired by conversations earlier this year with Brian O’Leary (Book Industry Study Group) and Jennifer Kemp (Crossref). Thanks also to the several people who offered comments to a draft version of this essay, including Annette Dortmund (OCLC), Jeff Edmunds (Penn State University), Jennifer Kemp, and Mark Zulauf (University of Illinois). 

Rebecca Bryant

Rebecca Bryant, PhD, is Senior Program Officer with OCLC. She previously served as Assistant Dean in the University of Illinois Graduate College and Director of Community at ORCID. Her research interests include research information management (RIM), persistent identifiers, and institutional scholarly communication practices.

Charles Watkinson

Charles Watkinson is Director of University of Michigan Press and Associate University Librarian for Publishing at the University of Michigan. He previously held a similar role at Purdue University. He is President Elect of the Association for University Presses and a member of the Board of Directors of the OAPEN Foundation, the infrastructure service for open access books.

Rebecca Welzenbach

Rebecca Welzenbach is Research Impact and Information Science Librarian at the University of Michigan. She has been a librarian at U-M since 2009, working on various aspects of digital humanities, scholarly publishing and communications, open access, and research impact. As research impact librarian, she empowers scholars to create the conditions under which they can establish a strong public identity, a coherent account of their contributions to the scholarly enterprise, and a persuasive body of evidence for the impact of their work within the academy, and for the public.