It is one of the more peculiar aspects of scholarly publishing that although everyone expects that academic books will find a place in libraries, no one knows how many books actually get there. This doesn’t mean that every scholarly book can be found in every library; far from it. Nor does it mean that the books found in libraries are in great demand (the common estimate is that 40% of all books in academic libraries never circulate, but I would like to see more evidence of this). The problem is simply that when a book is published, it is sent into the marketplace where a host of intermediaries move it along until it gets to the ultimate user. Those intermediaries may or may not let publishers know where the books end up. I am reminded of Longfellow:
I shot an arrow into the air,
It fell to earth, I knew not where.
The reason for this odd state of affairs stems from the structure of the supply chain. Publishers of books for the most part sell books on an indirect (called channel) basis; their customers are not their consumers. Downstream, a reader wants to have access to books of all publishers and thus looks for books at points of aggregation: libraries, bookstores, online catalogues. Publishers, sitting upstream, stare at these aggregators and the wholesalers that service them and can only dream of knowing who actually reads the books they publish.
I have been wrestling with how to figure out the answer to this question (“What percentage of books end up in libraries?”) for some time now, and can now confidently report that I am no further along than I was three years ago when I wrote a report on university presses and tried to unravel their supply chain. You can get a piece of the picture, but you can’t get the whole thing. What is the black hole that emits no light, making it impossible to get the information you need? Amazon.
Before I get into Amazon’s role (grrr) in all this, let me back up and draw the context. Let’s restrict this discussion to academic books — because this is, after all, a blog for the Society of Scholarly Publishing; and let’s further restrict it to university presses because I have more data about them at hand.
A university press sells books in a number of ways. Print books find their way into some bricks-and-mortar bookstores and to wholesalers that service bookstores and libraries. A small number of university press books are sold directly off the publisher’s Web site and some are still sold to this day from catalogues sent through the mail directly to scholars working in the field. Although publishers send materials to libraries, libraries rarely order books directly from publishers; most of their orders go to Baker & Taylor and Ingram. (B&T owns YBP and Blackwell, Ingram owns Coutts.) There is also a small percentage of university press books that are sold overseas. That figure varies by publisher and subject area, but let’s say the average is 10%.
I have left out two important categories: books sold for course adoptions, typically for upper-level undergraduate and graduate courses, and print and electronic books sold through online vendors. The biggest by far of these online vendors is, of course, Amazon. Most presses report that their sales through Amazon are 25% or more of their total volume; for some presses, the Amazon share is up near 50%. At this time the split between print and digital sales through Amazon is, for academic titles, heavily skewed to print (at least 90% of total revenue, not just for Amazon, and for most presses the ebook market share is much less). There are other complicating factors in this analysis — how does one account for books sold as part of aggregations, e.g., NetLibrary? — but this summary works as a general overview.
Now, about those sales for course adoptions. Presses have a hard time tracking these.The typical way to do it is to flag orders of 5 or more units and proclaim that these constitute course adoptions. I have talked with any number of marketing directors, and the consensus is that adoption sales are perhaps 20-25% of the total. Yes, these are only estimates and these estimates vary by press, subject area, and whether the titles are monographs or trade titles — or what passes as a trade title for a university press. But in a post about the difficulty of getting at solid sales figures, I hope I will be granted some leeway in estimating figures.
So the average, oversimplified university press distribution, expressed in dollars, looks something like this:
Course adoptions 25%
Other resellers 15%
How did we get that library number? That figure comes from disclosures by the principal wholesalers–that is, B&T and Ingram–as to where they ship books. If these wholesalers did not share this information, university presses would have no idea what their library sales are.
When you combine Amazon’s numbers with those for other resellers (that is, bookstores, including online venues), you get 40%, which presumably are sales to individuals, not including students. But since libraries often buy books from Amazon, the library figure could be higher. Amazon, however, will not disclose where its books end up, so the publishers are in the dark.
When presses report that their sales to Amazon are creeping up to 40% or so, almost certainly that means that Amazon is providing books for course adoptions and libraries, perhaps even for export sales, as well as to individuals. If Amazon would be willing to share aggregate numbers for where it sells books, publishers would have a better idea of their overall marketplace, which could help them in acquiring and marketing their titles. An exasperating aspect of this situation is that as Amazon’s strength grows and as more and more press sales go through Amazon, publishers know less and less about who actually reads their books. In an age of abundant information, the ability of a handful of tech companies to cordon off huge chunks of that data is simply mind-boggling. Perhaps open access advocates could get Amazon (and Google and Apple and Facebook) to open up about the information it now hoards.
Having just read Lisa Randall’s “Knocking on Heaven’s Door,” with its fascinating descriptions of how physicists deduce solutions where direct evidence is not available, I began to wonder if there might be an indirect way to determine Amazon’s sales to libraries. This is where Hawking Radiation comes in. Hawking Radiation refers to a means by which information escapes from a black hole. More importantly, it’s a great metaphor, up there with the uncertainty principle, dark matter, and black holes themselves. When you don’t understand the physics, you can always lapse into poetry.
One way to figure out how many books Amazon sells to libraries is to ask librarians. Unfortunately, this is a big task. You would have to interview a statistically meaningful sample of librarians and get access to collection records. You would then map those records against publishers’ lists and get a picture of which books ended up in what libraries. So you take all the books from Distinguished University Press and find out which of those titles appear in the test sample. Since the publisher of DUP knows the total volume of the press, the extrapolated figures from the sample provide the percentage of library sales.
I don’t see anyone coming up with the funding for this project — which, by the way, would take an awful lot of cooperation by hundreds of libraries and every single university press. Is there a back door? Perhaps. Over the past couple years I have been contemplating WorldCat, which I have come to believe is the great sleeper database of the information age. Here is a place where you can find out which books are in library collections around the world. If physicists can data-mine the information collected from radio telescopes, and if Facebook can data-mine your “likes” and your “friends,” surely we can data-mine WorldCat and find out any number of things about the book business.
By using WorldCat, we might be able to skip one step outlined above, the one where we gather collection records from hundreds of libraries. With WorldCat we could literally look at the collections of all the world’s libraries. WorldCat can’t tell us anything about multiple copies of a book in a collection — a problem for trade publishers, but for university presses, the prospect of selling multiple copies of a book to any one institution is too much to wish for — but it can tell us which collection contains particular titles. We would then take that information and map it against publishers’ output. The result: sales figures for university presses to libraries. We have just disintermediated Amazon as an obstreperous information source.
I began to think about this recently because of the PDA project I have been working on. If one of the questions about PDA is how it will impact university press sales, we need a baseline for those sales. At this time, we don’t have one. My own estimate is that university presses have total library sales of around $80 million a year. (A library vendor told me it could be a bit higher than that, but that my figure was in the ballpark.) Will PDA lower that figure by 10%? 50%? And what would that mean in dollars? If only Amazon could see beyond its own business interests and participate in the civic enterprise!
A footnote to this discussion is that we probably could do a lot more with WorldCat data once we begin to think about it. How is HarperCollins doing with its library sales for children’s books on the West Coast? With so many publishers questioning the economic and promotional value of having libraries purchase ebooks, wouldn’t we like to know which books are in a particular geographical area and map those figures against local bookstore sales (while we still have local bookstores)? There is a business here: WorldCat as database publisher, mining its own information and selling reports to publishers about market trends. WorldCat, in other words, could become the A.C. Nielsen of institutional markets.
And, boy, we could use an A.C. Nielsen. The irony of the small corner of the information industry that is known as the book business is the paucity of information about books. The book business is being upended by digital technology, but getting information to navigate these rough seas is almost impossible. Surely we should have as much information about books as we have information in them.
This has practical consequences. With university presses under so much financial pressure nowadays, any number of solutions have been put forward as to how to “save them.” (My own contribution to this debate is here.) Most of these proposals involve getting university presses to work more closely with university libraries. The question is, If libraries are a small and declining part of the university press ecosystem, should the presses be looking elsewhere — specifically, to sales to individuals, which make up the single largest segment for sales? But we can’t answer that question because we don’t have the data.
Mr. Bezos, tear down this wall!