At the Digital Book World (DBW) conference last week, I wandered around asking the question I always do: How much of publishing is generic–that is, with application to all publishing segments–and how much is specific to one segment? In the STM world there are many things that have no application elsewhere–for example, open access mandates from funding agencies–but it’s good to look for generic items. These can provide clues as to how certain things will develop, in effect allowing the publishers of one segment to do their R&D on the backs of publishers from other segments. An example of this kind from several years ago would be Google’s mass digitization of books without publishers’ permission, a prelude to the tech industry’s assault on copyright, which now includes such hard-to-believe policy statements as declaring that text-mining is not a copyright violation. If only we had seen the dark heart at the center of Google earlier! As for understanding in the early days where Amazon was headed, I doubt there is a book publisher on the planet who does not regret having supported Amazon wholeheartedly. Hmmm. When does Amazon decide to go after STM?
The generic item that was being talked about widely at DBW this year is the collection of end-user data. For the trade publishers that make up the bulk of DBW’s audience, end-user data is almost like a piece of moon rock: foreign, unexpected, unintelligible. This is because trade publishers historically have sold books indirectly through such channels as bookstores and wholesalers, and thus have had little or no knowledge of how their books are actually used. College publishers–almost all book publishers–have had the same problem. In the journals world the situation is a bit different. Professional societies have always had some idea of end-user activity because of the subscriptions sold to members and a few other interested parties, but in institutional markets, end-user data was always harder to come by, as librarians are pretty much united behind the idea that the collection of any data is a woeful violation of privacy. And maybe it is.
Enter digital media, and everything is different. At this point we all know about the revelations of Edward Snowden. We know that Google and Facebook are capturing our every keystroke. We know that in the world of the Internet, someone might feel lonely, but no one is ever alone. I am as creeped out about this as anybody who is not employed by the CIA or the NSA, but I do wonder if some data collection may be benign, that some of the surveillance economy may in fact be in my interest and in the interests of others, that it might indeed have a progressive component. Orwell gave us the fearsome Big Brother; Cory Doctorow gave us the chilling Little Brother; but now we may have the prospect of a Big Sister, a benign force that should not be tossed out as we attempt to flee from the depredations of government spying and commercial invasiveness.
By the way, in case you are not familiar with it, the 1984 of our time is The Circle by Dave Eggers, (discussed here) which is at times as disturbing as Orwell’s original. Eggers’s target is not a Stalinesque totalitarian society but the groupthink of social media. This book should be taught in high schools.
A contrary view was eloquently expressed by Francine Prose on a recent blog of The New York Review of Books. It’s clear that Prose is not anti-tech; she carries a Kindle and recognizes that technology can enhance lives. But she does feel uncomfortable that “someone” (is a machine someone?) is “watching” her as she reads Thackeray’s Vanity Fair; she resents having to share her private experience with Becky Sharp with whoever is reviewing her reading logs. Now, I know Becky Sharp and if I had to spend any time with her, I would welcome some company, and all the better if these observers came packing. But I pretty much share Prose’s point of view. I don’t want anyone to know that I stopped reading a book halfway through and I was both discomfited and amused when Prose asked if a reader would like “someone” to know that he or she had reread certain passages of Fifty Shades of Gray.
What really troubles Prose, though, is that all this new data–what we read, when we read, how far do we read–is going to be used against writers. If the data shows that readers turned off at a certain point, would writers be urged to write differently, to find a way around that point? Here is where we part company. I would like very much to have feedback on how readers respond to things I write. Is this Big Brother or the kind of information that helps us to grow?
End-user information for publishers comes in two varieties: information about individuals (Joe stopped reading Thinking Fast and Slow after he was 35% of the way through the book; Joe has purchased all of the books on this year’s Booker Prize shortlist) and aggregate data (50% of all readers stopped reading when they were 40% of the way through The Goldfinch; people who purchased The Second Machine Age also purchased Race Against the Machine and The Lights in the Tunnel–examples culled from Amazon). Publishers can use the aggregate data for planning, for forging marketing campaigns, and for wheedling authors to focus on some things instead of others. The application for trade authors, fiction writers in particular, is obvious, but it also has extensions into textbooks (no one can figure out the examples in that textbook) and journals publishing (readers are more likely to read an entire article if the abstract includes quantitative information–a made-up example). My anecdotal observation is that few people are terribly concerned about the collection of anonymized aggregate data. Publishers are thus likely to collect and mine this data intensively in the years ahead. Publishers that don’t do this, or that are too small to do this meaningfully, will be at a disadvantage. Big Sister will lead to more efficient product development, better curation of texts, more productive discovery services, and a lower cost basis for the enterprise.
Collection of data on individuals is another matter. Few of us like this; some of us oppose it very strongly. But even here there is a benign dimension. One mail-order company has figured out that I am tall and sends me catalogues with extra tall sizes. Do I feel violated by that or do I welcome the opportunity to shop from home instead of making my way to the nearest Rochester men’s store? A book publisher or online retailer notes my propensity for reading science fiction and sends me offers for more: Where is the Satan in this?
It is in educational and training materials, though, where the collection of user data may be most valuable. Adaptive learning materials adjust themselves to an individual user’s needs. Need a few more exercises in trigonometry? Well, here they are. Stumped by that chapter on organic chemistry? Let’s take it more slowly this time, with a series of examples to show you the way. Here again Big Sister is not studying my political leanings with the aim of throwing me in jail or snooping on some illicit reading pleasures with the intention of committing blackmail. No, here Big Sister is helping me learn more about a subject, helping me to get a better grade. And if we subscribe to the view, as I certainly do, that a better educated population makes for a better society, Big Sister is a positive contributor to civil society.
The most interesting illustration at DBW of the application of end-user data was offered by an executive of a college publishing company. His argument was that “name” authors would become less important in the future as each text comes to include a number of feedback loops. Thus an author would create a text; the text would be sold for classroom instruction; and then students would begin to use it. But over time the students’ experience with that text would be fed back to the publisher. Hmmm. It looks like we need more examples here, a more extensive description there. The text would continually be revised in the laboratory of the marketplace. Over time the contribution of the original author would become a smaller and smaller component of the text that the students use. Authors, in other words, would originate a text, but the feedback mechanisms would refine it.
So here is my prediction not for 2015 but for 2018: the publishers, regardless of segment, that will demonstrate the strongest proclivity for growth will be those that (a) gather end-user data (b) feed the implications of that data back into their products and services, and (c) develop business practices that derive value from that data. I encourage everyone to think long and hard about why Elsevier purchased Mendeley, a company with almost no revenue and with little prayer of ever securing much revenue. We should expect to see more companies acquired on the basis of the end-user data they accumulate. How to value such entities will be part of the art of managing a publishing company and will provide handsome bonuses for the bankers that lurk among us.