When people talk about online, they often focus on the most basic part of the online infrastructure — the Internet itself. But as with our transportation system, online infrastructure is multi-layered and evolving. What started as cow paths became roads or tracks, and the two intermingled. Bicycles, horses, carts, and trolleys gave way to motorcars. These became semis, motorcycles, sports sedans, and hybrids.
Infrastructures are also in some ways indeterminate. Roads don’t actually predict carbon bicycles, hybrid cars, or Vespas, are adapted as time passes, and require differentiation at important points (interstate highway, city streets, crosswalks).
I was thinking the other day that in all our diligent argumentation over open access (OA) and related matters, we may have neglected to notice the multiple infrastructure layers that have emerged to form a new basis of infrastructure for scholarly publishing. From the late 1990s until now, there have been significant changes in how scholarly publishing is accomplished. These sustaining technologies have been adopted without much argument or fanfare, and their significance might be underestimated.
The digital object identifier (DOI) — I recall the early days of the DOI, at least from a publisher’s perspective. The potential was obvious and intriguing, but how to get from where we were to a fully implemented DOI ecosystem wasn’t obvious. Thanks to CrossRef and other resolver services, we have had that kind of enabling ecosystem for years now. It’s growing nicely, both in size and importance, and is starting to support some major innovations.
Online submission systems — One of the first major new infrastructures to be installed on a widespread basis came in the form of online submission systems, which almost invariably led to an increase in submissions of 20% or more at any journal adopting them as they increased the ease of submission and thereby the rate of churn across the industry.
XML — HTML and SGML co-existed uncomfortably in the early days of online scholarly journals, until their cooler sibling showed up (XML, like X-Games and GenX). Now, standards are evolving within the XML realm, but this fundamental piece of infrastructure seems cemented in place, supporting a vast range of publishing initiatives.
Video players — Borrowed from media companies, we’ve cycled through a few high-speed providers and their players, and will cycle through others as market consolidation continues in peripheral markets. Nonetheless, video players, streaming services, and UI/UX aspects of video are now integral to many scholarly publishers’ activities.
Production tracking systems — As volume and efficiency have become concerns, production tracking has evolved from informal systems, some on office walls or carried on sheets of paper from desk to desk. Now, database-driven systems, production add-ons to manuscript systems or layout systems, or bespoke systems allow for more outsourcing, better information management, and quicker turnaround times.
Other infrastructures enabled by those started in the aughts are now becoming mainstream in the teen years of the 21st century:
ORCID — This novel disambiguation infrastructure is now being knitted into other infrastructures, adding an important layer that should facilitate yet another layer of services utilizing ORCIDs.
ISNI — As Todd Carpenter wrote yesterday, ISNI and I2 are increasingly useful standards from the books and institutional world that are tying in with complementary infrastructures in scholarly publishing, through initiatives like ISNI2ORCID.
Many other pieces of scholarly publishing infrastructure have been installed over the past decade, from plagiarism-checking software to semantic technologies to production pre-flight systems to automated reference checking and formatting to databases disambiguating institutions.
One particularly galling lack of infrastructure is a common e-commerce solution for scholarly publishers, something you’d think some enterprising firm would have created and promulgated given the past decade of need and the generally stable and common requirements across scholarly publishers.
Other parts of the infrastructure remain maddeningly inadequate, including search. In my career, I have been able to achieve an excellent search infrastructure only once. Other search approaches suffer from comparative compromises or inadequacies. Given the uniformity of the information we publish and the standards we aspire to, it seems like a better common local search infrastructure would have emerged by now. Instead, our customers rely on external sites (PubMed, Google, others) to find our content.
There is also a tendency with infrastructure builds to forget that we’re working in a networked environment. PubMed Central is one such example, where an incomplete centralized repository exists rather than a complete index ala Google or CHORUS. Other ideas about text- and data-mining sometimes fall prey to a similar mindset — that is, rather than creating networked solutions, some people want to create a centralized and redundant repository to support the activity. This kind of thinking is either a relic of the pre-networked era or possibly an attempt to consolidate some influence, or both. In either case, creating networked capabilities benefits more people, as these work at greater scale and are more efficient in the long run.
The great infrastructure build-out of the twenty-aughts has provided a rich array of possibilities that we’re taking advantage of nicely in the teen years. But there are a few gaping holes remaining. What are the next layers? And who will plug the holes?