Those of us who labor in scholarly publishing can be forgiven for thinking that the world is a tiny place. The academic journal, the keystone of our industry, cumulatively brings in about $10 billion a year, not enough to get the CEOs of Uber or Pinterest out of bed in the morning; and the book, the much-despised book, is in retreat everywhere. While librarians continue to insist that there are huge publishers out there, corporations so big that they have a stranglehold on the academic community, if not the world overall, the actual figures for even the largest publishers are not much more than rounding errors for truly big concerns and industries. How is it that healthcare can be 17 percent of the U.S. economy while publishing about healthcare is worth pennies? Let’s play the game of Separated at Birth: We will pair Pearson with Apple, Reed Elsevier with Cisco, and John Wiley with Exxon Mobil. Now we can tell what big companies actually look like, and none of them send their kids to the same schools we do.
The scale of the Internet makes this point even more dramatically. An entertaining example of this can be found at the marvelous Web site Internet Live Stats. If you have never played around with this site, I strongly encourage you to do so. ILS has tapped into a number of Internet resources and presents some of the stunning numbers for Internet usage in real-time. So, for example, I just looked at the figures for YouTube videos viewed today–yes, just for today. It is 6:30 P.M. EDT as I write this, and already YouTube has logged 6,029,761,331 views. How is this possible? With numbers like these, why do we spend so much time thinking about the impact factor of The Lancet? The scale of Internet usage is staggering. It is time to think about what it means for scholarly publishing to operate at Internet scale.
A word of caution, however: thinking about the ultimate backdrop for any activity can invite foolishness. We can have all of our electronics burned out with the EMP of a solar flare, but should we factor in EMPs into our business plan? Or do we make the dystopian predictions of so many Hollywood films part of our forecast and insert a value for a breach of the San Andreas Fault, with Los Angeles sliding beneath the waves of the Pacific Ocean? I am particularly amused by asteroid paranoia: Should we stop what we are doing and think about how to alter our outlook as a civilization-destroying projectile makes it way to earth (probably landing in Los Angeles because it’s cheaper to film close to the studio)? Yes, there is a big picture — and it does not come from Hollywood — but it is not always relevant to what we do here and now in our little patch.
Superficially it would appear that open access services are more closely attuned to Internet scale than their brethren in the traditional publishing world. OA materials are placed on the Web, where they can be viewed, shared, sorted, whatever, subject only to the CC licenses that accompany them, assuming anyone actually reads or complies with these licenses. It’s often the case, however, that OA publishing is of the “post and forget” variety: put it on a Web server somewhere and then move on to the next task, which is likely to be unrelated. Such materials may get lucky and be “discovered” through search engines and social media, sometimes with a push from the publisher (PLOS has a sophisticated post-publication marketing program), but more often OA publishing is an effort in production rather than one of creating a readership. OA publishers often simply let the Internet to do the rest — and Internet scale sometimes does it. Here the properties of Internet scale are access from any part of the globe (infrastructure and law permitting) and Google and its ilk in the search and discovery space.
I say this is a superficial invocation of Internet scale because it touches only on access, which is a very small aspect of scholarly communications. Indeed it is one of the more remarkable achievements of OA advocates that scholarly communications has been shrunk down to a discussion of access, leaving aside what are arguably the more important issues of authority, importance, and originality — not to mention understanding. A more fundamental problem with access, however, is that it speaks to human scale, not to Internet scale.
How many books can the most assiduous reader read in a lifetime? How many articles can a cancer researcher read and learn from? While so much research goes into the science of life extension, how many years does a researcher in this field require to read the articles that came out just this past year? The great irony of scholarly communications today is that there are two fundamental issues warring in the world–access on one hand, information overload on the other — and they pull in the opposite direction. If you improve one (access) you worsen the condition of the other (information overload). So OA publishing can best be viewed as a benign, cultish activity lacking in long-term vision. It does the equivalent of solving the problem of climate change by shutting the light — one light — in the den. Traditional publishers should ask themselves if they do even this much.
To solve the information overload problem, which is a problem of enormous size even if the growth of accessible articles is limited to the paid-for collections in academic libraries, we have to find a new kind of reader, one that is not subject to the limited time of the human researcher. We know who that reader is: it is a machine. A well-designed machine, built for ingesting text and identifying meaningful patterns, is not subject to information overload. Indeed, the bigger the meal, the happier the bot. Here is where open access and Internet scale meet, in the world of text- and data-mining and the machines that conduct this task. (I first encountered this idea in a piece by Clifford Lynch.)
Publishers may choose to begin to think about this new readership. How will the nature of publications change when pattern-detecting machines develop into a partial or even full audience? How will we publish these new findings? It is easy and no doubt a great deal of fun to come up with jokey problems connected to this (How do you identify the PI of a pattern-recognition system? If a machine has a high citation count, is it eligible for tenure?), but there is a real and serious issue here. Scholarly communications sits upon a platform as solid and uncertain as the city of Los Angeles on the San Andreas fault. As Wikipedia says,
The San Andreas Fault is a continental transform fault that extends roughly 1300 km (810 miles) through California. It forms the tectonic boundary between the Pacific Plate and the North American Plate, and its motion is right-lateral strike-slip (horizontal). The fault divides into three segments, each with different characteristics and a different degree of earthquake risk, the most significant being the southern segment, which passes within about 35 miles of Los Angeles.
Maybe Hollywood had it right after all.
P.S. I just went back to look at the YouTube figures again. We are up to 7,185,269,451. Actually, that’s not accurate. While I was typing this the numbers went up. In fact, the numbers increased faster than I can type. For me to get an accurate figure, I would have to make a forecast. Please think about the implications of this: To understand the status of the Internet right now I have to make a prediction about the future.