(Joe Esposito Notes: Michael Clarke’s piece on disruption in scientific publishing is to my mind the most incisive post yet to appear on the Scholarly Kitchen. The key point Michael makes is that for all the talk of disruption, scientific publishing in fact has not been disrupted. There is the appearance of disruption (e.g., most journals are now electronic), but the business continues to proceed pretty much as it has for a couple decades or more. Michael goes on to explain why the Great Disruption has not taken place, and locates the reasons in a series of network externalities. it is worth noting that advocates of open access, focusing purely on access and ignoring the externalities that Michael identifies, seem to be unaware that there is much more to scholarly communications than the ability to read a text.)
Looking back on 2009, there was one particular note that seemed to sound repeatedly, resonating through the professional discourse at conferences and in posts throughout the blogosphere: the likelihood of disruptive change afoot in the scientific publishing industry.
Here in the digital pages of the Scholarly Kitchen, for example, we covered John Wilbanks’ presentation at SSP IN and Michael Nielsen’s talk at the 2009 STM Conference. They were both thoughtful presentations and I agree with many of the points raised by both speakers. I think Wilbanks is right when he says that thinking of information in terms of specific containers (e.g. books, journals, etc.) presents an opening to organizations in adjacent spaces who are able to innovate without the constraints of existing formats. I also agree with Nielsen’s point that acquiring expertise in information technology (and especially semantic technology)—as opposed to production technology—is of critical importance to scientific publishers and that those publishers who do not acquire such expertise will fall increasing behind those organizations that do.
It has occurred to me, however, that I would likely have agreed with arguments that scientific publishing was about to be disrupted a decade ago—or even earlier. That we are speculating on the possibility of the disruption (here were are talking of “disruption” in the sense described by Clay Christensen in his seminal book The Innovator’s Dilemma) of scientific publishing in 2010 is nothing short of remarkable.
Lest we forget (and this is an easy thing to do from the vantage of the second the decade of the 21st century), the World Wide Web was not built for the dissemination of pornography, the sale of trade books, the illegal sharing of music files, dating, trading stocks, reading the news, telecommunications, or tracking down your high school girlfriend or boyfriend. As it turns out, the Web is particularly good for all these activities, but these were not its intended uses.
When Tim Berners-Lee created the Web in 1991, it was with the aim of better facilitating scientific communication and the dissemination of scientific research. Put another way, the Web was designed to disrupt scientific publishing. It was not designed to disrupt bookstores, telecommunications, matchmaking services, newspapers, pornography, stock trading, music distribution, or a great many other industries.
And yet it has.
It is breathtaking to look back over the events of the last 18 years since the birth of the Web. It has grown from an unformed infant, to a promising adolescent, to a sometimes-unruly teenager. In that time we have witnessed vast swaths of the global economy reconfigured as new industries emerged and old industries were upended. New modes of communication have transformed the workplace—and the home lives—of hundreds of millions of people. From the vantage of 1991, it would have been impossible to predict all that has happened in the last 18 years. No one would have believed that much could change that quickly.
And yet it has.
The one thing that one could have reasonably predicted in 1991, however, was that scientific communication—and the publishing industry that supports the dissemination of scientific research—would radically change over the next couple decades.
And yet it has not.
To be sure, many things have changed. Nearly all scientific journals (and an increasing number of books) are now available online. Reference lists are interconnected via digital object identifiers (DOIs). Vast databases such as Genbank and SciFinder have aggregated and parsed the structures of millions of biological and chemical sequences and structures. Published research is more accessible than ever via search tools such as Google Scholar, PubMed, and Scopus. New business models, such as open access and site licensing, have emerged. And new types of communication vehicles have emerged such as the preprint server ArXiv, video journals such as JoVE and the Video Journal of Orthopaedics, and online networks such as Nature Network, Mendeley, and (most recently) UniPHY—to name just a few innovations. To be sure, scientific publishers have not ignored the Web. They have innovated. They have experimented. They have adapted. But it has been incremental change—not the disruptive change one would have predicted 18 years ago.
Looking back at the publishing landscape in 1991, it does not look dramatically different from today, at least in terms of the major publishers. The industry has been relatively stable. And one would be hard pressed to characterize the number of mergers and acquisitions that have occurred as particularly numerous relative to other industries. Moreover, these mergers and acquisitions are more likely to be explained by the rise of private equity and the availability of cheap capital than by technological innovations related to publishing.
The question then becomes, not whether scientific publishing will be disrupted, but rather why hasn’t it been disrupted already?
In examining the reason for this surprising industry stability, I think it is useful to start by looking closely at the functions that journals—still the primary vehicles for the formal communication of research—serve in the scientific community. Why were journals invented in the first place? What accounts for their remarkable longevity? What problems do they solve and how might those same problems be solved more effectively using new technologies?
Initially, journals were developed to solve two problems: Dissemination and registration.
Dissemination. Scientific journals were first and foremost the solution to the logistical problem of disseminating the descriptions and findings of scientific inquiry. Prior to 1665, when both the Journal des sçavans and the Philosophical Transactions were first published, scientists communicated largely by passing letters between each other. By 1665, however, there were too many scientists (or, more accurately, there were too many educated gentlemen with an interest, and in some cases even an expertise, in “natural philosophy”) for this method to be practical. The solution was to ask all such scientists to mail their letters to a single person (such as, in the case of the Philosophical Transactions, Henry Oldenburg) who would then typeset, print, and bind the letters into a new thing called a journal, mailing out copies to all the other (subscribing) scientists at once.
While the journal was a brilliant solution to the dissemination problems of the 17th century, I think it is safe to say that dissemination is no longer a problem that requires journals. The Internet and the World Wide Web allow anyone with access (including, increasingly, mobile access) to the Web to view any page designated for public display (we will leave aside the issue of pay walls in this discussion). If dissemination were the only function served by journals, journals would have long since vanished in favor of blogs, pre-print servers (e.g. ArXiv), or other document aggregations systems (e.g. Scribd).
Registration. Registration of discovery—that it to say, publicly claiming credit for a discovery—was, like dissemination, an early function of journal publishing. Ironically, the Philosophical Transactions was launched just in time to avert the most notorious scientific dispute in history—and failed to do so. The Calculus Wars were largely a result of Newton, who developed his calculus by 1666, failing to avail himself of Oldenburg’s new publication vehicle. By the time the wars ended in 1723, Newton and Leibniz can be credited with doing more to promote the need for registration than any other individuals before or since. Oldenburg could not have scripted a better marketing campaign for his invention.
As enduring as journals have been as a mechanism for registration of discovery, they are no longer needed for this purpose. A preprint server that records the time and date of manuscript submission can provide a mechanism for registration that is just as effective as journal publication. Moreover, by registering a DOI for all manuscripts an additional record is created that can further validate the date of submission and discourage the possibility of tampering.
While journals are no longer needed for the initial problems they set out to solve (dissemination and registration), there are 3 additional functions that journals serve that have developed over time. These later functions—comprising validation (or peer review), filtration, and designation—are more difficult to replicate through other means.
Validation. Peer review, at least in the sense most journals practice it today, was not a common function of early scientific journals. While journal editors reviewed submitted works, the practice of sending manuscripts to experts outside of the journal’s editorial offices for review was not routine until the last half of the 20th century. Despite the relatively late provenance of peer review, it has become a core function of today’s journal publishing system—indeed some would argue its entire raison d’etre.
Schemes have been proposed over the years for decoupling peer review from journal publishing, Harold Varmus’ “E-Biomed” being perhaps the most well-known example. There have additionally been several experiments in post-publication peer review—whereby review occurs after publication—though in such cases, journal publication is still attached to peer review, simply at a different point in the publication process. To date, no one has succeeded in developing a literature peer-review system independent of journal publication. One could imagine a simple online dissemination system, like ArXiv, coupled with peer review. And indeed one could make the case that this is precisely what PLoS One is, though PLoS considers PLoS One to be a journal. It is perhaps not an important distinction once one factors out printed issues, which I don’t think anyone would argue are central to the definition of a journal today.
Filtration. In 1665 it was fairly easy to keep up with one’s scientific reading—it required only 2 subscriptions. Over the last few centuries, however, the task has become somewhat more complicated. In 2009 the number of peer-reviewed scientific journals is likely over 10 thousand with a total annual output exceeding 1 million papers (both Michael Mabe and Carol Tenopir have estimated the number of peer-reviewed scholarly journals between 22,000 and 25,000, with STM titles being a subset of this total). Keeping up with papers in one’s discipline, never mind for the whole of science, is a challenge. Journals provide important mechanisms for filtering this vast sea of information.
First, with the exception of a few multi-disciplinary publications like Nature, Science, and PNAS, the vast majority of journals specialize in a particular discipline (microbiology, neuroscience, pediatrics, etc.). New journals tend to develop when there is a branching of a discipline and enough research is being done to justify an even more specialized publication. In this way, journals tend to support a particular community of researchers and help them keep track of what is being published in their field or, of equal importance, in adjacent fields.
Second, the reputations of journals are used as an indicator of the importance to a field of the work published therein. Some specialties hold dozens of journals—too many for anyone to possibly read. Over time, however, each field develops a hierarchy of titles. The impact factor is often used as a method for establishing this hierarchy, though other less quantitative criteria also come into play. This hierarchy allows a researcher to keep track of the journals in her subspecialty, the top few journals in her field, and a very few generalist publications, thereby reasonably keeping up with the research that is relevant to her work. Recommendations from colleagues, conferences, science news, and topic-specific searches using tools such as Google Scholar or PubMed, might fill in the rest of a researcher’s reading list.
Still, filtration via journal leaves a lot of reading on behalf of scientists. This has prompted a number of developments over the years, from informal journal clubs to review journals to publications like Journal Watch that summarize key articles from various specialties. Most recently, Faculty of 1000 has attempted to provide an online article rating service to help readers with the growing information overload. These are all welcome developments and provide scientists with additional filtration tools. However, they themselves also rely on the filtration provided by journals.
Journal clubs, Journal Watch, and Faculty of 1000 all rely on editors (formally or informally defined) to scan a discipline that is defined by a set of journals. Moreover, each tool tends to weight its selection towards the top of the journal hierarchy for a given discipline. None of these tools therefore replace the filtration function of journals—they simply act as a finer screen. While there is the possibility that recent semantic technologies will be able to provide increasingly sophisticated filtering capabilities, these technologies are largely predicated on journal publishers providing semantic context to the content they publish. In other words, as more sophisticated filtering systems are developed—they tend to augment, not disrupt, the existing journal publication system.
Designation. The last function served by scientific journals, and perhaps the hardest to replicate through other means, is that of designation. By this I mean that many academic institutions (and other research organizations) rely, to a not insignificant degree, on a scientists’ publication record in career advancement decisions. Moreover, a scientists’ publication record factors into award decisions by research funding organizations. Career advancement and funding prospects are directly related to the prestige of the journals in which a scientist publishes. As such a large portion of the edifice of scientific advancement is built upon publication records, an alternative would need to be developed and firmly installed before dismantling the current structure. At this point, there are no viable alternatives—or even credible experiments—in development.
There are some experiments that seek to challenge the primacy of the impact factor with the aim of shifting emphasis to article-centric (as opposed to journal-centric) metrics. Were such metrics to become widely accepted, journals would, over time, cease to carry as much weight in advancement and funding decisions. Weighting would shift to criteria associated with an article itself, independent of publication venue. Any such transition, however, would likely be measured not in years but in decades.
The original problems that journals set out to solve—dissemination and registration—can indeed be handled more efficiently with current technology. However, journals have, since the time of Oldenburg, developed additional functions that support the scientific community—namely validation, filtration, and designation. It is these later functions that are not so easily replaced. And it is by closely looking at these functions that an explanation emerges to explain why scientific publishing has not been disrupted by new technology as yet: these are not technology-driven functions.
Peer review is not going to be substantively disrupted by new technology (indeed, nearly every STM publisher employs an online submission and peer-review system already). Filtration may be improved by technology, but such improvements are likely to take the form of augmentative, not disruptive, developments. Designation is firmly rooted in the culture of science and is also not prone to technology-driven disruption. Article-level metrics would first have to become widely adopted, standardized, and accepted, before any such transition could be contemplated—and even then, given the amount of time that would be required to transition to a new system, any change would likely be incremental rather than disruptive.
Given these 3 deeply entrenched cultural functions, I do not think that scientific publishing will be disrupted anytime in the foreseeable future. That being said, I do think that new technologies are opening the door for entirely new products and services built on top of—and adjacent to—the existing scientific publishing system:
- Semantic technologies are powering new professional applications (e.g. ChemSpider) that more efficiently deliver information to scientists. They are also beginning to power more effective search tools (such as Wolfram Alpha) meaning researchers will spend less time looking for the information they need.
- Mobile technologies are enabling the ability to access information anywhere. Combined with GPS systems and cameras, Web enabled mobile devices have the potential to transform our interaction with the world. As I have described recently in the Scholarly Kitchen, layering data on real-world objects is an enormous opportunity for scientists and the disseminators of scientific information. The merger of the Web and the physical world could very well turn out to be the next decade’s most significant contribution to scientific communication.
- Open data standards being developed now will allow for greater interoperability between data sets, leading to new data-driven scientific tools and applications. Moreoever, open data standards will lead to the ability to ask entirely new questions. As Tim Berners-Lee’s pointed out in his impassioned talk at TED last year, search engines with popularity-weighted algorithms (e.g. Google, Bing) are most helpful when one is asking a question that many other people have already asked. Interoperable, linked data will allow for the interrogation of scientific information in entirely new ways.
These new technologies, along with others not even yet imagined, will undoubtedly transform the landscape of scientific communication in the decade to come. But I think the core publishing system that undergirds so much of the culture of science will remain largely intact. That being said, these new technologies—and the products and services derived from them—may shift the locus of economic value in scientific publishing.
Scientific journals provide a relatively healthy revenue stream to a number of commercial and not-for-profit organizations. While some may question the prices charged by some publishers, Don King and Carol Tenopir have shown that the cost of journals is small relative to the cost, as measured in the time of researchers, of reading and otherwise searching for information (to say nothing of the time spent conducting research and writing papers). Which is to say that the value to an institution of workflow applications powered by semantic and mobile technologies and interoperable linked data sets may exceed that of scientific journals. If such applications can save researchers (and other professionals that require access to scientific information) significant amounts of time, their institutions will be willing to pay for that time savings and its concatenate increase in productivity.
New products and services that support scientists through the more effective delivery of information may compete for finite institutional funds. And if institutions designate more funds over time to these new products and services, there may be profound consequences for scientific publishers. While it will not likely result in a market disruption as scientific journals will remain necessary, it will nonetheless create a downward pressure on journal (and book/ebook) pricing. This could, in turn, lead to a future where traditional knowledge products, while still necessary, provide much smaller revenue streams to their publishers. And potentially a future in which the communication products with the highest margins are not created by publishers but rather by new market entrants with expertise in emerging technologies.
The next decade is likely to bring more change to scientific publishing than the decade that just ended. However, it will likely continue to be incremental change that builds on the existing infrastructure rather than destroying it. It will be change that puts pressure on publishers to become even more innovative in the face of declining margins on traditional knowledge products. It will be change that requires new expertise and new approaches to both customers and business models. Despite these challenges, it will be change that improves science, improves publishing, and improves the world we live in.