When I first saw Roger Schonfeld’s 2015 presentation of common “stumbling blocks” in the modern research workflow, I must say I was overwhelmed by the numerous potential gaps in the supply-chain teamwork required to populate the world’s online information channels upon release of a new scholarly work. Each time-sensitive hand-off from publisher to platform host to search engine to library was a possible weak link in the path between the author and the reader.
Metadata, of course, is the lingua franca of the publishing supply chain and we have been working together to refine these data exchanges over the last few decades. Information standards initiatives – from KBART to ResourceSync – aim to improve time efficiencies for professionals working to push content through the pipelines of content discovery and delivery. Democratizing efforts from Kudos to Redlink are harnessing the latest technologies and greasing the wheels of time in order to close the gap between the point of publication and the point of access.
There are solid connections between some of these publishing supply-chain links, such as when a publisher deposits a DOI with Crossref, there is close coordination between parties and a vast majority of these deposits and subsequent links work brilliantly. Other connections, however, are less established and require engagement beyond the usual publishing industry routines. Regardless of whether metadata exchanges are strong or brittle, they all have different temporal realities that often put the various stream of information at odds.
From a publisher perspective, we understand the distribution of citation metadata takes effect within a few days upon DOI deposit – which often triggers downstream inclusion of these citations in a range of databases and informational services. Many content providers then distribute additional metadata to friendly indexers of subject databases, discovery services, and authentication tools. Generally, we expect to send updated KBART and MARC records as needed, typically within the one-month window after release of new content. While these deliveries may be done anywhere from one day to a few months after publication, the addition of this new content to these partner databases is often a bit of a mystery to content providers and librarians alike. And if there are errors in these datasets or disconnects during handoff, it may be months before they are uncovered and remediated, if at all.
Before all the flurry of typical scholarly metadata interactions, new scholarly works are thrust into murky unknown timelines of the open web.
But before all the flurry of typical scholarly metadata interactions, new scholarly works are thrust into murky unknown timelines of the open web. Publishers know that once we post a new journal article or case study to our online platform, mainstream search engines and other crawlers can begin tickling our site to gather up any metadata in front of the paywall – however, there are no guarantees here. Depending on the SEO health of a site, Google can take up to a few months to update their index with your latest publications. Chances are, metadata progresses more quickly through commercial-web channels than you might calculate.
One example of this can be seen when the mainstream web beats every other scholarly information channel to the punch – this happened to me just this week, where I encountered a new article via Google Scholar alert, shared my new find with the Twittersphere, and learned that the author hadn’t yet received notification from the publisher that his paper was live online.
As a doctoral student, I’ve stumbled through a similar time warp, where Google Scholar has spotted a new article that’s relevant to my research – but my library’s Summon installation has not yet been updated with the latest issue. So, instead I had to start from the top and browse through the database list to the journal to the volume to the issue, then to my authenticated copy of the article. And I know better than to follow that email alert from my phone, because mobile metadata and authentication is another tangled web of delayed updates and data exchanges across platforms.
Publishers might see these temporal stumbling blocks as a reminder that the wheels of time keep pushing against the traditional models of dissemination, demanding regular revisions to the workflows and technologies that feed the universe of scholarly knowledge. Wherever possible, publication metadata is best delivered via integrated and automated systems, leveraging basic web service standards and data indexing protocols.
And some organizations are doing just that – for example, Cambridge University Press is tinkering with semi-automated MARC records, to close quality and time gaps in their metadata transmissions to libraries. Scope eKnowledge and PCG have worked together to highlight where more granular ebook metadata will improve user experiences in the discovery and access of monograph content. And OCLC has been championing automated metadata exchanges and cross-sector collaborations to improve efficiencies for some time.
As content usage is a commercial priority for publishers, so too should be the smooth transmission of metadata to optimize the researcher experience of engaging with our online content. This means prioritizing digital transformation and more aggressive pursuit of leading-edge web services for scholarly platforms. Indexers and publishers should work more closely to honestly acknowledge what is working and face the commercial risks collectively. This means publishers monitoring their metadata in semi-automated ways to catch the stumbling blocks more quickly, in both institutional discovery systems as well as mainstream search. This means more strategic use of content APIs across the board hold promise for publishers to tackle the uneven distribution of metadata and avoid the publication time warp.