Revisiting: Why Hasn't Scientific Publishing Been Disrupted Already?

barometer — Proteus Barometer, Image via Daderot.

(Editor’s Note: On November 8, The Scholarly Kitchen will host the latest in its series of “The Future Of…” webinars, this time focusing on Preprints. One of the questions I’m most interested in exploring is whether preprints are substitutive or complementary to journals. To answer that question, one must understand the functions that journals serve, and since I have yet to read a better explanation, I felt it worth revisiting Michael Clarke’s 2010 post on disruption.

More than six years later, the post holds up extremely well — in fact, several of the exciting new developments Clarke mentions, UniPHY and the Nature Network, have come and gone while traditional journals soldier on. Long before the current ASAPBio-driven enthusiasm for preprints, Clarke pointed out that the Dissemination and Registration functions of journals were already somewhat obsolete, and that the other necessary functions were much more difficult to replace. Food for thought that I hope will fuel our upcoming discussion.

In the meantime, this post remains a seminal piece of writing in understanding journal publishing. It’s something I point new employees to, as a required text. If you haven’t read it for a while, or if it’s new to you, the post is worth a read.

Looking back on 2009, there was one particular note that seemed to sound repeatedly, resonating through the professional discourse at conferences and in posts throughout the blogosphere: the likelihood of disruptive change afoot in the scientific publishing industry.

Here in the digital pages of the Scholarly Kitchen, for example, we covered John Wilbanks’ presentation at SSP IN and Michael Nielsen’s talk at the 2009 STM Conference. They were both thoughtful presentations and I agree with many of the points raised by both speakers. I think Wilbanks is right when he says that thinking of information in terms of specific containers (e.g. books, journals, etc.) presents an opening to organizations in adjacent spaces who are able to innovate without the constraints of existing formats. I also agree with Nielsen’s point that acquiring expertise in information technology (and especially semantic technology)—as opposed to production technology—is of critical importance to scientific publishers and that those publishers who do not acquire such expertise will fall increasing behind those organizations that do.

It has occurred to me, however, that I would likely have agreed with arguments that scientific publishing was about to be disrupted a decade ago—or even earlier. That we are speculating on the possibility of the disruption (here were are talking of “disruption” in the sense described by Clay Christensen in his seminal book The Innovator’s Dilemma) of scientific publishing in 2010 is nothing short of remarkable.

Lest we forget (and this is an easy thing to do from the vantage of the second the decade of the 21^st century), the World Wide Web was not built for the dissemination of pornography, the sale of trade books, the illegal sharing of music files, dating, trading stocks, reading the news, telecommunications, or tracking down your high school girlfriend or boyfriend. As it turns out, the Web is particularly good for all these activities, but these were not its intended uses.

When Tim Berners-Lee created the Web in 1991, it was with the aim of better facilitating scientific communication and the dissemination of scientific research. Put another way, the Web was designed to disrupt scientific publishing. It was not designed to disrupt bookstores, telecommunications, matchmaking services, newspapers, pornography, stock trading, music distribution, or a great many other industries.

And yet it has.

It is breathtaking to look back over the events of the last 18 years since the birth of the Web. It has grown from an unformed infant, to a promising adolescent, to a sometimes-unruly teenager. In that time we have witnessed vast swaths of the global economy reconfigured as new industries emerged and old industries were upended. New modes of communication have transformed the workplace—and the home lives—of hundreds of millions of people. From the vantage of 1991, it would have been impossible to predict all that has happened in the last 18 years. No one would have believed that much could change that quickly.

And yet it has.

The one thing that one could have reasonably predicted in 1991, however, was that scientific communication—and the publishing industry that supports the dissemination of scientific research—would radically change over the next couple decades.

And yet it has not.

To be sure, many things have changed. Nearly all scientific journals (and an increasing number of books) are now available online. Reference lists are interconnected via digital object identifiers (DOIs). Vast databases such as Genbank and SciFinder have aggregated and parsed the structures of millions of biological and chemical sequences and structures. Published research is more accessible than ever via search tools such as Google Scholar, PubMed, and Scopus. New business models, such as open access and site licensing, have emerged. And new types of communication vehicles have emerged such as the preprint server ArXiv, video journals such as JoVE and the Video Journal of Orthopaedics, and online networks such as Nature Network, Mendeley, and (most recently) UniPHY—to name just a few innovations. To be sure, scientific publishers have not ignored the Web. They have innovated. They have experimented. They have adapted. But it has been incremental change—not the disruptive change one would have predicted 18 years ago.

Looking back at the publishing landscape in 1991, it does not look dramatically different from today, at least in terms of the major publishers. The industry has been relatively stable. And one would be hard pressed to characterize the number of mergers and acquisitions that have occurred as particularly numerous relative to other industries. Moreover, these mergers and acquisitions are more likely to be explained by the rise of private equity and the availability of cheap capital than by technological innovations related to publishing.

The question then becomes, not whether scientific publishing will be disrupted, but rather why hasn’t it been disrupted already?

In examining the reason for this surprising industry stability, I think it is useful to start by looking closely at the functions that journals—still the primary vehicles for the formal communication of research—serve in the scientific community. Why were journals invented in the first place? What accounts for their remarkable longevity? What problems do they solve and how might those same problems be solved more effectively using new technologies?

Initially, journals were developed to solve two problems: Dissemination and registration.

Dissemination. Scientific journals were first and foremost the solution to the logistical problem of disseminating the descriptions and findings of scientific inquiry. Prior to 1665, when both the Journal des sçavans and the Philosophical Transactions were first published, scientists communicated largely by passing letters between each other. By 1665, however, there were too many scientists (or, more accurately, there were too many educated gentlemen with an interest, and in some cases even an expertise, in “natural philosophy”) for this method to be practical. The solution was to ask all such scientists to mail their letters to a single person (such as, in the case of the Philosophical Transactions, Henry Oldenburg) who would then typeset, print, and bind the letters into a new thing called a journal, mailing out copies to all the other (subscribing) scientists at once.

While the journal was a brilliant solution to the dissemination problems of the 17^th century, I think it is safe to say that dissemination is no longer a problem that requires journals. The Internet and the World Wide Web allow anyone with access (including, increasingly, mobile access) to the Web to view any page designated for public display (we will leave aside the issue of pay walls in this discussion). If dissemination were the only function served by journals, journals would have long since vanished in favor of blogs, pre-print servers (e.g. ArXiv), or other document aggregations systems (e.g. Scribd).

Registration. Registration of discovery—that it to say, publicly claiming credit for a discovery—was, like dissemination, an early function of journal publishing. Ironically, the Philosophical Transactions was launched just in time to avert the most notorious scientific dispute in history—and failed to do so. The Calculus Wars were largely a result of Newton, who developed his calculus by 1666, failing to avail himself of Oldenburg’s new publication vehicle. By the time the wars ended in 1723, Newton and Leibniz can be credited with doing more to promote the need for registration than any other individuals before or since. Oldenburg could not have scripted a better marketing campaign for his invention.

As enduring as journals have been as a mechanism for registration of discovery, they are no longer needed for this purpose. A preprint server that records the time and date of manuscript submission can provide a mechanism for registration that is just as effective as journal publication. Moreover, by registering a DOI for all manuscripts an additional record is created that can further validate the date of submission and discourage the possibility of tampering.

While journals are no longer needed for the initial problems they set out to solve (dissemination and registration), there are 3 additional functions that journals serve that have developed over time. These later functions—comprising validation (or peer review), filtration, and designation—are more difficult to replicate through other means.

Validation. Peer review, at least in the sense most journals practice it today, was not a common function of early scientific journals. While journal editors reviewed submitted works, the practice of sending manuscripts to experts outside of the journal’s editorial offices for review was not routine until the last half of the 20^th century. Despite the relatively late provenance of peer review, it has become a core function of today’s journal publishing system—indeed some would argue its entire raison d’etre.

Schemes have been proposed over the years for decoupling peer review from journal publishing, Harold Varmus’ “E-Biomed” being perhaps the most well-known example. There have additionally been several experiments in post-publication peer review—whereby review occurs after publication—though in such cases, journal publication is still attached to peer review, simply at a different point in the publication process. To date, no one has succeeded in developing a literature peer-review system independent of journal publication. One could imagine a simple online dissemination system, like ArXiv, coupled with peer review. And indeed one could make the case that this is precisely what PLOS One is, though PLOS considers PLOS One to be a journal. It is perhaps not an important distinction once one factors out printed issues, which I don’t think anyone would argue are central to the definition of a journal today.

Filtration. In 1665 it was fairly easy to keep up with one’s scientific reading—it required only 2 subscriptions. Over the last few centuries, however, the task has become somewhat more complicated. In 2009 the number of peer-reviewed scientific journals is likely over 10 thousand with a total annual output exceeding 1 million papers (both Michael Mabe and Carol Tenopir have estimated the number of peer-reviewed scholarly journals between 22,000 and 25,000, with STM titles being a subset of this total). Keeping up with papers in one’s discipline, never mind for the whole of science, is a challenge. Journals provide important mechanisms for filtering this vast sea of information.

First, with the exception of a few multi-disciplinary publications like Nature, Science, and PNAS, the vast majority of journals specialize in a particular discipline (microbiology, neuroscience, pediatrics, etc.). New journals tend to develop when there is a branching of a discipline and enough research is being done to justify an even more specialized publication. In this way, journals tend to support a particular community of researchers and help them keep track of what is being published in their field or, of equal importance, in adjacent fields.

Second, the reputations of journals are used as an indicator of the importance to a field of the work published therein. Some specialties hold dozens of journals—too many for anyone to possibly read. Over time, however, each field develops a hierarchy of titles. The impact factor is often used as a method for establishing this hierarchy, though other less quantitative criteria also come into play. This hierarchy allows a researcher to keep track of the journals in her subspecialty, the top few journals in her field, and a very few generalist publications, thereby reasonably keeping up with the research that is relevant to her work. Recommendations from colleagues, conferences, science news, and topic-specific searches using tools such as Google Scholar or PubMed, might fill in the rest of a researcher’s reading list.

Still, filtration via journal leaves a lot of reading on behalf of scientists. This has prompted a number of developments over the years, from informal journal clubs to review journals to publications like Journal Watch that summarize key articles from various specialties. Most recently, Faculty of 1000 has attempted to provide an online article rating service to help readers with the growing information overload. These are all welcome developments and provide scientists with additional filtration tools. However, they themselves also rely on the filtration provided by journals.

Journal clubs, Journal Watch, and Faculty of 1000 all rely on editors (formally or informally defined) to scan a discipline that is defined by a set of journals. Moreover, each tool tends to weight its selection towards the top of the journal hierarchy for a given discipline. None of these tools therefore replace the filtration function of journals—they simply act as a finer screen. While there is the possibility that recent semantic technologies will be able to provide increasingly sophisticated filtering capabilities, these technologies are largely predicated on journal publishers providing semantic context to the content they publish. In other words, as more sophisticated filtering systems are developed—they tend to augment, not disrupt, the existing journal publication system.

Designation. The last function served by scientific journals, and perhaps the hardest to replicate through other means, is that of designation. By this I mean that many academic institutions (and other research organizations) rely, to a not insignificant degree, on a scientists’ publication record in career advancement decisions. Moreover, a scientists’ publication record factors into award decisions by research funding organizations. Career advancement and funding prospects are directly related to the prestige of the journals in which a scientist publishes. As such a large portion of the edifice of scientific advancement is built upon publication records, an alternative would need to be developed and firmly installed before dismantling the current structure. At this point, there are no viable alternatives—or even credible experiments—in development.

There are some experiments that seek to challenge the primacy of the impact factor with the aim of shifting emphasis to article-centric (as opposed to journal-centric) metrics. Were such metrics to become widely accepted, journals would, over time, cease to carry as much weight in advancement and funding decisions. Weighting would shift to criteria associated with an article itself, independent of publication venue. Any such transition, however, would likely be measured not in years but in decades.

The original problems that journals set out to solve—dissemination and registration—can indeed be handled more efficiently with current technology. However, journals have, since the time of Oldenburg, developed additional functions that support the scientific community—namely validation, filtration, and designation. It is these later functions that are not so easily replaced. And it is by closely looking at these functions that an explanation emerges to explain why scientific publishing has not been disrupted by new technology as yet: these are not technology-driven functions.

Peer review is not going to be substantively disrupted by new technology (indeed, nearly every STM publisher employs an online submission and peer-review system already). Filtration may be improved by technology, but such improvements are likely to take the form of augmentative, not disruptive, developments. Designation is firmly rooted in the culture of science and is also not prone to technology-driven disruption. Article-level metrics would first have to become widely adopted, standardized, and accepted, before any such transition could be contemplated—and even then, given the amount of time that would be required to transition to a new system, any change would likely be incremental rather than disruptive.

Given these 3 deeply entrenched cultural functions, I do not think that scientific publishing will be disrupted anytime in the foreseeable future. That being said, I do think that new technologies are opening the door for entirely new products and services built on top of—and adjacent to—the existing scientific publishing system:

Semantic technologies are powering new professional applications (e.g. ChemSpider) that more efficiently deliver information to scientists. They are also beginning to power more effective search tools (such as Wolfram Alpha) meaning researchers will spend less time looking for the information they need.
Mobile technologies are enabling the ability to access information anywhere. Combined with GPS systems and cameras, Web enabled mobile devices have the potential to transform our interaction with the world. As I have described recently in the Scholarly Kitchen, layering data on real-world objects is an enormous opportunity for scientists and the disseminators of scientific information. The merger of the Web and the physical world could very well turn out to be the next decade’s most significant contribution to scientific communication.
Open data standards being developed now will allow for greater interoperability between data sets, leading to new data-driven scientific tools and applications. Moreoever, open data standards will lead to the ability to ask entirely new questions. As Tim Berners-Lee’s pointed out in his impassioned talk at TED last year, search engines with popularity-weighted algorithms (e.g. Google, Bing) are most helpful when one is asking a question that many other people have already asked. Interoperable, linked data will allow for the interrogation of scientific information in entirely new ways.

These new technologies, along with others not even yet imagined, will undoubtedly transform the landscape of scientific communication in the decade to come. But I think the core publishing system that undergirds so much of the culture of science will remain largely intact. That being said, these new technologies—and the products and services derived from them—may shift the locus of economic value in scientific publishing.

Scientific journals provide a relatively healthy revenue stream to a number of commercial and not-for-profit organizations. While some may question the prices charged by some publishers, Don King and Carol Tenopir have shown that the cost of journals is small relative to the cost, as measured in the time of researchers, of reading and otherwise searching for information (to say nothing of the time spent conducting research and writing papers). Which is to say that the value to an institution of workflow applications powered by semantic and mobile technologies and interoperable linked data sets may exceed that of scientific journals. If such applications can save researchers (and other professionals that require access to scientific information) significant amounts of time, their institutions will be willing to pay for that time savings and its concatenate increase in productivity.

New products and services that support scientists through the more effective delivery of information may compete for finite institutional funds. And if institutions designate more funds over time to these new products and services, there may be profound consequences for scientific publishers. While it will not likely result in a market disruption as scientific journals will remain necessary, it will nonetheless create a downward pressure on journal (and book/ebook) pricing. This could, in turn, lead to a future where traditional knowledge products, while still necessary, provide much smaller revenue streams to their publishers. And potentially a future in which the communication products with the highest margins are not created by publishers but rather by new market entrants with expertise in emerging technologies.

The next decade is likely to bring more change to scientific publishing than the decade that just ended. However, it will likely continue to be incremental change that builds on the existing infrastructure rather than destroying it. It will be change that puts pressure on publishers to become even more innovative in the face of declining margins on traditional knowledge products. It will be change that requires new expertise and new approaches to both customers and business models. Despite these challenges, it will be change that improves science, improves publishing, and improves the world we live in.

Michael Clarke

Michael Clarke is the Managing Partner at Clarke & Esposito, a boutique consulting firm focused on strategic issues related to professional and academic publishing and information services.

Discussion

16 Thoughts on "Revisiting: Why Hasn’t Scientific Publishing Been Disrupted Already?"

Good read, and hello from Charlottesville. There are many disruptive technologies and philosophies hovering in the wings, especially around the ideas of open and reproducible science and research, open peer review, and a more democratized incentive structure that may further erode the “impact factor” model for influence and career advancement. The assumptions in all of these arguments is that we NEED these journals to tell us what’s the best science. Wouldn’t it be something if the best science was based on results that could be validated openly, rather than what was subscribed to and labeled as such?

By Rusty Speidel
Oct 26, 2016, 10:25 AM

“Should” is always a tough thing to achieve, and usually we’re left coping with “does” instead. Everyone knows that the ideal situation should be for everyone to carefully read and understand everything written about a subject and to do a thorough analysis of the data and the conclusions, perhaps going so far as to try to reproduce the experiments.

But that’s not realistic, especially when one considers the very limited amount of time and the sheer quantity of material that we are all dealing with and the need for interpretation by those without direct subject area knowledge. Some systems are needed for dealing with these, and while journals are imperfect, I’ve yet to hear of any practically implementable alternative. We know that unless we carefully organize and continuously and actively drive the review process, it doesn’t happen (or at least it happens highly irregularly).

I think we should also consider the notion that there is or will ever be some objective and completely neutral way of determining “what’s the best science”. This is a qualitative decision, an opinion, basically, what do you think is good? Even the simpler question, is this paper methodologically sound seems to fall into the realm of opinion as well. Maybe we should just accept that the letter of recommendation may be the best we can do when asking these sorts of questions.

By David Crotty
Oct 26, 2016, 11:18 AM

Hi Rusty – more transparency is terms of research data would generally speaking (with a few caveats around patient and other sensitive data) be a good thing, and of course there are many initiatives and policies working in this direction. This article does not assume journals are needed as some sort of first principle. Rather, it describes what journals do and explains why they do so. If another system comes along that does these things better than journals, journals might be displaced. The purpose of this article was to explain why displacing journals is actually very hard because they do a number of things (not just dissemination). There has been a constant barrage of blog posts, presentations, and commentaries over the last decade or so claiming that journals about to be displaced (or “disrupted” to use Silicon Valley speak) by (insert latest tech fad here) because journal publishers are (insert your techno-determinist criticism of the day here). This hasn’t happened. This article explains why techno-deterministic predictions (such as those of Michael Nielsen) have not come to pass — writing it was less time consuming that replying to every nitwit on the Internet.

In a world where all data was openly assessable (and all protocols well communicated, which is a separate thing) journals (or something that does the work of journals) would still be needed. The research would need to be made public (“published”) somehow along with a summary narrative (a.k.a “article”). We would want to register this somehow so that researchers can claim credit otherwise they will become more secretive and less open. You would also need a mechanism to find out about the research amid the noise and din of all the other research that you don’t care about. You might want to have some other people that are not you spend a little time making sure the research in question at least looks legit (“peer review”) before you spend your valuable attempting to validate the data (whatever that process might entail). Also, I should note that many journals have open data requirements and they continue to exist which suggests that openly accessible data is not a disruptive trend for the market.

By Michael Clarke
Oct 26, 2016, 11:24 AM

Yes … this was an excellent read that I managed to make time for. Now I’ll go back to worrying about what I’ve failed to read in the meantime. I’ll gently nudge my organization about linked open data being an area that academic librarians must explore.

By matthewm53
Oct 26, 2016, 10:53 AM

We can only hope that the incremental technological changes that are so well described in your article are paralleled by changes to the economics of academic publishing, in particular as regards to transparency. Technology was supposed to make everything more accessible, the process quicker and the result cheaper; the later has not been realised. For example,the price of academic textbooks in the USA has increased more than 1,000% since 1978! I still recall with great fondness, of my days employed by a learned society, of the annual joyous letter which would arrive from our publisher, informing us that ‘journal price increases would be limited to only 15% this year’, when inflation was under 3%. Disruption in this respect cannot be incremental. Users cannot afford to wait. Hopefully content generators will not end up paying the resulting price.

By Daniel BErze
Oct 26, 2016, 12:07 PM

Looked at on a per-article basis, technology has delivered. The price of articles to libraries has dropped year-over-year during the last decade. The problem is that there are more articles and so the overall price keeps going up and library budgets have not kept up with research output. From the libraries and publishers we talk with, 2 – 5% annual increases appears to be the new normal — despite the growth in articles (double digit increases appear to be a thing of the past). Textbooks are a whole other topic!

By Michael Clarke
Oct 26, 2016, 12:21 PM

Where disruption could get really interesting is in the revenue protection/walled garden area. We just assume these functions will need to be paid for and that publishers will be the ones getting paid. We just assume that academic authority is hierarchical and based on a long history of publishing in respected journals. One could see a day when academics regularly self-publish, and the production/dissemination/validation loop is community-based. Open source platforms indexing millions of freely submitted research papers, peer-reviewed by the users (with some rating-based/topic-based authority attached), and published for all to see. It sounds utopian, but the massive subscription costs and per-article fees are tempting targets.

By Rusty Speidel
Oct 26, 2016, 1:20 PM

We do already see an enormous level of activity by the academic community itself in publishing, through university presses and research society journals. These are owned and run by the community itself, rather than outside commercial concerns.

And as noted above, leaving the review process to serendipity and hoping that things will get reviewed without an active driving editorial force behind the process has so far failed to provide anywhere near the level of review needed for the scientific literature. Take a look at the percentage of papers that receive any kind of comment at all on biorxiv, let alone a thorough review, as an example. Or F1000 Research, where papers can sit waiting for years for a review to be posted. There needs to be a mechanism in place for making this process happen.

By David Crotty
Oct 26, 2016, 1:30 PM

It is amazing how some publishers can realise annual net profits of 39% on 2-5% price increases! Or the vast divergence in APC/BPC rates. Sorry, but something doesn’t add up.

By Daniel BErze
Oct 26, 2016, 2:28 PM

I think you’re confusing annual price increases with profit margins. If you’re already making a 39% margin, and your costs go up 2%, then increasing your prices by 2% keeps your margins the same.

By David Crotty
Oct 26, 2016, 4:30 PM

Michael’s article described changes that have taken place in the last few decades, not in one year! The compounded price increases that I have described, which may (or may not) have moderated in recent years, still result in massive price increases, much more than the inflation rate. These have created situations in which 39% profits are possible. The disruption which is necessary is to get these profits down and make the entire cost/profit/price formula transparent to the content generator and user.

By Daniel BErze
Oct 26, 2016, 5:56 PM

You were referring to Michael’s comment above, where he talked about current annual price increases, not historical ones over the past decade. 2-5% is described as “the new normal”, which implies a previous state that was different.

And let’s be clear, not every publisher makes profits in that range.

By David Crotty
Oct 26, 2016, 6:28 PM

Michael’s article profiles the disruptions that have take place since 1991 – that was my reference point. The annual profits that I have referred to are a product of long term price abuse. Of course, not every publisher makes profits in that range, but many of the larger commercial ones do. My plea is for greater transparency in the economic model. Profits are necessary to support growth and investment, but not at the levels that many publishers charge.

By Daniel BErze
Oct 27, 2016, 4:29 AM

Publishers are businesses. It’s not in their best interest to explain their pricing models, or to accommodate lower pricing for the public good. The more valuable the content, the higher the price. Libraries are really feeling that pain. That is why I think disruption might come in the form of community peer review and publication using open source technologies, if that community can figure out how to keep peer review timely and high quality. Thoughts?

By Rusty Speidel
Oct 27, 2016, 10:52 AM

It seems to me the question concerns both what journals do and the notion of disruption. The former regarding what they do is addressed but disruption takes money. The fact remains that it takes a lot of money to disrupt any established market and even as large as the combined revenues of Elsevier, Wiley, T&F and Springer are there really is not a enough money to be made to justify the costs associated with disruption. Elsevier that evil empire only makes $25 billion per year and that is chicken feed when compared to GM’s $105 Billion.

By harvey kane
Oct 26, 2016, 12:14 PM

Elsevier’s revenues haven’t quite reached $25bn a year. Last year they were £2.07bn, or $3.17bn at an exchange rate of $1.53 to sterling.

You can find details in the annual report on page 11 at: http://www.relx.com/investorcentre/reports%202007/Documents/2015/relxgroup_ar_2015.pdf

You might also note the comments of the negotiators of this week’s deal in the UK with Elsevier:

Paul Feldman, CEO of Jisc, said: ‘Jisc Collections’ analysis shows that over the course of our previous agreement, Elsevier research articles have been of excellent quality at a price per accessible article below the average for the agreements that Jisc Collections negotiates with other publishers. https://www.researchinformation.info/news/elsevier-unveils-jisc-collaboration

For transparency, I work for Elsevier’s parent company.