English: Artist's rendering of Babcock Ranch r...
English: Artist’s rendering of Babcock Ranch roadway and infrastructure. (Photo credit: Wikipedia)

When people talk about online, they often focus on the most basic part of the online infrastructure — the Internet itself. But as with our transportation system, online infrastructure is multi-layered and evolving. What started as cow paths became roads or tracks, and the two intermingled. Bicycles, horses, carts, and trolleys gave way to motorcars. These became semis, motorcycles, sports sedans, and hybrids.

Infrastructures are also in some ways indeterminate. Roads don’t actually predict carbon bicycles, hybrid cars, or Vespas, are adapted as time passes, and require differentiation at important points (interstate highway, city streets, crosswalks).

I was thinking the other day that in all our diligent argumentation over open access (OA) and related matters, we may have neglected to notice the multiple infrastructure layers that have emerged to form a new basis of infrastructure for scholarly publishing. From the late 1990s until now, there have been significant changes in how scholarly publishing is accomplished. These sustaining technologies have been adopted without much argument or fanfare, and their significance might be underestimated.

The digital object identifier (DOI) — I recall the early days of the DOI, at least from a publisher’s perspective. The potential was obvious and intriguing, but how to get from where we were to a fully implemented DOI ecosystem wasn’t obvious. Thanks to CrossRef and other resolver services, we have had that kind of enabling ecosystem for years now. It’s growing nicely, both in size and importance, and is starting to support some major innovations.

Online submission systems — One of the first major new infrastructures to be installed on a widespread basis came in the form of online submission systems, which almost invariably led to an increase in submissions of 20% or more at any journal adopting them as they increased the ease of submission and thereby the rate of churn across the industry.

XML — HTML and SGML co-existed uncomfortably in the early days of online scholarly journals, until their cooler sibling showed up (XML, like X-Games and GenX). Now, standards are evolving within the XML realm, but this fundamental piece of infrastructure seems cemented in place, supporting a vast range of publishing initiatives.

Video players — Borrowed from media companies, we’ve cycled through a few high-speed providers and their players, and will cycle through others as market consolidation continues in peripheral markets. Nonetheless, video players, streaming services, and UI/UX aspects of video are now integral to many scholarly publishers’ activities.

Production tracking systems — As volume and efficiency have become concerns, production tracking has evolved from informal systems, some on office walls or carried on sheets of paper from desk to desk. Now, database-driven systems, production add-ons to manuscript systems or layout systems, or bespoke systems allow for more outsourcing, better information management, and quicker turnaround times.

Other infrastructures enabled by those started in the aughts are now becoming mainstream in the teen years of the 21st century:

FundRef — Taking the notion of author IDs and resource IDs to funders, FundRef is enabling another key infrastructure already, CHORUS.

ORCID — This novel disambiguation infrastructure is now being knitted into other infrastructures, adding an important layer that should facilitate yet another layer of services utilizing ORCIDs.

ISNI — As Todd Carpenter wrote yesterday, ISNI and I2 are increasingly useful standards from the books and institutional world that are tying in with complementary infrastructures in scholarly publishing, through initiatives like ISNI2ORCID.

Many other pieces of scholarly publishing infrastructure have been installed over the past decade, from plagiarism-checking software to semantic technologies to production pre-flight systems to automated reference checking and formatting to databases disambiguating institutions.

One particularly galling lack of infrastructure is a common e-commerce solution for scholarly publishers, something you’d think some enterprising firm would have created and promulgated given the past decade of need and the generally stable and common requirements across scholarly publishers.

Other parts of the infrastructure remain maddeningly inadequate, including search. In my career, I have been able to achieve an excellent search infrastructure only once. Other search approaches suffer from comparative compromises or inadequacies. Given the uniformity of the information we publish and the standards we aspire to, it seems like a better common local search infrastructure would have emerged by now. Instead, our customers rely on external sites (PubMed, Google, others) to find our content.

There is also a tendency with infrastructure builds to forget that we’re working in a networked environment. PubMed Central is one such example, where an incomplete centralized repository exists rather than a complete index ala Google or CHORUS. Other ideas about text- and data-mining sometimes fall prey to a similar mindset — that is, rather than creating networked solutions, some people want to create a centralized and redundant repository to support the activity. This kind of thinking is either a relic of the pre-networked era or possibly an attempt to consolidate some influence, or both. In either case, creating networked capabilities benefits more people, as these work at greater scale and are more efficient in the long run.

The great infrastructure build-out of the twenty-aughts has provided a rich array of possibilities that we’re taking advantage of nicely in the teen years. But there are a few gaping holes remaining. What are the next layers? And who will plug the holes?

Enhanced by Zemanta
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

9 Thoughts on "Layers Upon Layers — Taking Advantage of the Great Infrastructure Build-Out of the Twenty Aughts"

Interesting insights, but what do you mean by a “common local search infrastructure”? All major publishers offer local search, in many cases with a variety of features such as more like this, etc. Moreover in many cases, perhaps most, readers do not want to be limited to one publisher so they use the global scholarly search engines like Google Scholar. Do you mean local search with a common look and feel, or local search that searches other publishers, or what? What do you see as missing? I ask because I do work in this area.

I mean a solution that works to provide site search, but doesn’t require reinvention of the wheel (and the invention of some poor wheels) by every vendor.

It sounds like you want a third party, plug and play search engine that works well with scholarly content. That makes sense because website design and search are very different critters.

Interesting thoughts. Just three issues — the first a point of information, the other two maybe more interesting.

Online submission systems […] almost invariably led to an increase in submissions of 20% or more at any journal adopting them as they increased the ease of submission.

I’ve yet to hear any researcher praise online submission systems: even the best of them (including PeerJ’s) are more likely to be the subject of cursing. I certainly find it much easier to email a manuscript to an editor than to hack through an automated submission system. That’s not to say online submission systems lack value, of course — only that the value is for the editor and publisher rather than the author.

Given the uniformity of the information we publish and the standards we aspire to, it seems like a better common local search infrastructure would have emerged by now. Instead, our customers rely on external sites (PubMed, Google, others) to find our content.

That’s not a bug, it’s a feature. Elsewhere in the article, you rightly decry centralised solutions: in the same spirit, centralising search within the content provider’s platform limits the opportunities for innnovation to the ideas that one company can come up with, and pretty much rules out the possibility of cross-platform integration. Instead, leaving search to external agencies — whoever wants to crawl and index the site — opens up many more possibilities.

At present Google is the most prominent of the companies that does this, but there’s no more reason to think they have any kind of stranglehold than there was to think that AltaVista had one in the late 1990s. Third-party searching is a market, with competition. That’s good for users.

Other ideas about text- and data-mining sometimes fall prey to a similar mindset — that is, rather than creating networked solutions, some people want to create a centralized and redundant repository to support the activity.

In the world of software engineering (where I do my paid work), the term “redundant” is a compliment, and connotes robustness and reliability. That’s not to endorse a centralised solution, of course — you’re quite right that decentralised, networked solutions offer many advantages, including incubating innovation — but that the outputs of these various initiatives need to be harvested and archived as a backup and as an alternative point of access.

Doing this ensures persistence in the event of a publisher going bankrupt, veering off-mission or simply changing its web presence in a backwards-incompatible way, which I need hardly say is hugely important. But beyond this, it also provides users with more choices regarding how to access publications. I love it that my sauropod-neck paper at PeerJ, as well as being available as PeerJ’s own HTML and as a formatted PDF is also available to read in PubMed Central’s classic presentation, in PMC PubReader and in eLife’s Lens reader. Since neither I nor the publisher can possibly anticipate how a given reader is likely to want to consume my article, why not offer a range of options?

So to your list of enabling infrastructure technologies, we can add the JATS XML format (formerly known as the NLM format), whose standard representation of research articles is what makes it possible to render a single article in all these ways (and more).

I’ve yet to hear any researcher praise online submission systems: even the best of them (including PeerJ’s) are more likely to be the subject of cursing.

For what it’s worth, Kent never claimed that authors like online submission systems. He only claimed that those systems increase submissions.

As a former author, I can vouch that online submission systems are light years better than the old days of having to put together multiple packets, printouts of each figure, collate them all and Fedexing them in to the editorial office (and hoping they keep track of all the bits and pieces). And as a reviewer, much better to grab everything online and send in your review that way, rather than by mail.

Good points about scholarly publishing infrastructure – though as Executive Director of CrossRef I am biased. To add to what you’ve said, Kent, it’s interesting to note how successful things can be built on and are interconnected. “FundRef” is an extension of the CrossRef system of metadata collection and distribution that was original developed for reference linking and in a further extension we’ll be launching text and data mining functionality in the next month. You mention that the CrossRef infrastructure is enabling CHORUS – we are also working with SHARE to see if they can use the same infrastructure. An essential part of publishers submitting funding and other information to CrossRef is the online submission systems and production tracking systems which have been, or are being, adapted to enable this. XML is crucial and underpins CrossRef’s entire system.

CrossRef was heavily involved in getting ORCID started (with both financial and technical support and “in kind” staff contributions) and we continue to be actively involved.

Finally, I’m happy to say that the business model of the publisher doesn’t matter for the infrastructure – all types of publishers with all types of business models (OA only, subscription, hybrid, etc) benefit from it.

However CHORUS at this point only exists as a pilot project. Moreover FundRef, upon which CHORUS crucially depends, is growing very slowly. As I understand it only 30 publishers have signed on and only 9 of those are depositing funder data, almost a year after FundRef was launched. See http://www.slideshare.net/CrossRef/fundref-chorus, slide 81.

Publishers must gear up in large numbers if FundRef and CHORUS are to succeed. Also, Kent seems to say that CHORUS will be indexing and searching full text but last I knew that was not the case. So at this point CHORUS and FundRef are more of a gap than an infrastructure, a gap to be filled.

Great article, Kent.

One opportunity that lies before the scholarly publishing community is a common open access payment collection system. There are a couple of players in the field (CCC and OAK), but as far as I can tell, uptake by publishers has been slow.

Common OA payments systems can serve a broker’s role, much as the subscription agents played back in the last century. One can imagine institutional support for OA payments handled through a handful of OA payment vendors rather than than through scores of bespoke payment collection systems.

Comments are closed.