There is no mention of Sci-Hub in Streaming, Sharing, Stealing, a book by Michael Smith and Rahul Telang, and it’s a shame, as the issues and strategies described therein are precisely the ones STM publishers are struggling with concerning pirate sites. A careful reading of this book will help scholarly publishers come to terms with two key points: how to account for the huge amount of piracy of academic materials (and how to deal with it) and, more importantly, how to explain the fact that many of Sci-Hub’s users already have authorized access to published materials at their home institutions but choose to download articles from Sci-Hub and other pirate sites anyway. Every business person in academic publishing should read this book, which is both insightful and entertaining — entertaining because all of its examples come from the entertainment business (the subtitle is Big Data and the Future of Entertainment).
The starting point for the book is Steve Jobs’ insight: you can’t stop piracy, so you have to learn to compete with it. Please, publishers, read that carefully: you cannot stop piracy. This does not mean that you should not outfit your lawyers (and especially your trade associations) to pursue offenders, but you must be mindful that many methods to prevent piracy involve inconveniencing legitimate users, who then are more likely to find their way to Sci-Hub. As a publisher, this makes you angry, right? Deal with it.
The question then is how to compete with piracy. The authors’ thesis is that careful data analysis can identify strategies to work around piracy. An interesting example is from trade publishing. When ebooks first became an important commodity (that is, when Amazon launched the Kindle in 2007), trade publishers faced an interesting problem: Should they withhold the ebook version of a book until the paperback edition came out (typically one year after the initial hardcover publication) or should they publish digital editions at the same time that the hardcover edition was introduced? The economic consequences of this decision could be stark: simultaneous publication of hardcover and digital editions could cannibalize high-priced hardcover sales, but failure to create a digital edition would likely result in pirated digital copies appearing on various sites around the Web. The term of art for this strategy is “windowing”: releasing certain formats at one time (that is, during one window) and certain formats later. We see this with motion pictures, where movies are usually released first to theaters and only later for DVD and streaming services. Careful data analysis of trade books demonstrated that publishers were better off releasing ebooks at the same time as hardcovers, which is the norm today. Paperbacks, interestingly, are most profitable for the publisher when they are withheld for one year.
I repeat: this book should be read by all business people in academic publishing, but I do have four qualifying remarks.
First, the discussion of the Long Tail is incomplete. Ever since Chris Anderson published The Long Tail, media executives have been focused on how the Internet makes properties with small audiences potentially a source of profit. Smith and Telang restate this case, and they are right to do so. But they miss the other half of the equation, that the “short head” — the bestsellers and blockbusters — do better nowadays than ever before. STM publishers might contemplate the growth of submissions at the highest-ranked journals. In other words, the story of the Long Tail is really a story about the Excluded Middle: small-selling titles become viable (the Long Tail) but hits are bigger than ever. Harry Potter and Fifty Shades of Grey sold at levels that no one had ever seen before. The Internet democratizes production, but creates an aristocracy of consumption.
The second point is close to home: an analysis of Encyclopaedia Britannica, where I used to work. I am named in the text a number of times, but the facts, especially the chronology, are often wrong. The authors attribute actions to me that I did not take and authority I could only dream about. There is also a great deal of information about Microsoft, which is at best incomplete, as there is no mention of Microsoft’s stubborn refusal to see any merit in Britannica’s early Internet product, which was launched commercially in 1994. The money quote from Microsoft, which does not appear in the book: ‘The Internet is useless.” When you see this many factual errors in a document in an area where you have special knowledge, it is natural to wonder about the truth of the information in other areas.
Number three: The data analysis in this book is a highly sophisticated version of what financial analysts have been doing for years, though it does indeed work with much larger amounts of data than previously. It does not reach into the new world of data science and machine learning. No one is going to build an AI here. This means that the kind of work recommended by the authors is useful for optimizing an existing business, but less useful for coming up with new ones built on emergent properties of data. For example, what would it look like for someone to get access to all of Sci-Hub’s logs and then package and market that data to publishers? Would this not give Mendeley a run for its money?
Finally, we get to the matter of genre. What kind of book is this? It’s not a work of original research, as it mostly summarizes other research and the consulting activities of the authors. It bears the imprimatur of MIT Press, which perhaps suggests an aura of authority that the book does not fully deserve. It is hard to escape the conclusion that the book’s purpose is to further the consulting activity of the authors, which is fine with me. Indeed, if I were Elsevier or John Wiley, I would retain them immediately. But this is not an academic monograph despite its provenance but an instance of content marketing — content created to further the aims of its creators. Nothing wrong with that, but readers should be aware that they are paying to read a long commercial.
Despite these reservations, I recommend the book for those charged with crafting a strategy out of the morass of copyright and piracy. It sure beats 20-character passwords and two-factor authentication.