It is one of the more peculiar aspects of scholarly publishing that although everyone expects that academic books will find a place in libraries, no one knows how many books actually get there. This doesn’t mean that every scholarly book can be found in every library; far from it. Nor does it mean that the books found in libraries are in great demand (the common estimate is that 40% of all books in academic libraries never circulate, but I would like to see more evidence of this). The problem is simply that when a book is published, it is sent into the marketplace where a host of intermediaries move it along until it gets to the ultimate user. Those intermediaries may or may not let publishers know where the books end up. I am reminded of Longfellow:
I shot an arrow into the air,
It fell to earth, I knew not where.
The reason for this odd state of affairs stems from the structure of the supply chain. Publishers of books for the most part sell books on an indirect (called channel) basis; their customers are not their consumers. Downstream, a reader wants to have access to books of all publishers and thus looks for books at points of aggregation: libraries, bookstores, online catalogues. Publishers, sitting upstream, stare at these aggregators and the wholesalers that service them and can only dream of knowing who actually reads the books they publish.
I have been wrestling with how to figure out the answer to this question (“What percentage of books end up in libraries?”) for some time now, and can now confidently report that I am no further along than I was three years ago when I wrote a report on university presses and tried to unravel their supply chain. You can get a piece of the picture, but you can’t get the whole thing. What is the black hole that emits no light, making it impossible to get the information you need? Amazon.
Before I get into Amazon’s role (grrr) in all this, let me back up and draw the context. Let’s restrict this discussion to academic books — because this is, after all, a blog for the Society of Scholarly Publishing; and let’s further restrict it to university presses because I have more data about them at hand.
A university press sells books in a number of ways. Print books find their way into some bricks-and-mortar bookstores and to wholesalers that service bookstores and libraries. A small number of university press books are sold directly off the publisher’s Web site and some are still sold to this day from catalogues sent through the mail directly to scholars working in the field. Although publishers send materials to libraries, libraries rarely order books directly from publishers; most of their orders go to Baker & Taylor and Ingram. (B&T owns YBP and Blackwell, Ingram owns Coutts.) There is also a small percentage of university press books that are sold overseas. That figure varies by publisher and subject area, but let’s say the average is 10%.
I have left out two important categories: books sold for course adoptions, typically for upper-level undergraduate and graduate courses, and print and electronic books sold through online vendors. The biggest by far of these online vendors is, of course, Amazon. Most presses report that their sales through Amazon are 25% or more of their total volume; for some presses, the Amazon share is up near 50%. At this time the split between print and digital sales through Amazon is, for academic titles, heavily skewed to print (at least 90% of total revenue, not just for Amazon, and for most presses the ebook market share is much less). There are other complicating factors in this analysis — how does one account for books sold as part of aggregations, e.g., NetLibrary? — but this summary works as a general overview.
Now, about those sales for course adoptions. Presses have a hard time tracking these.The typical way to do it is to flag orders of 5 or more units and proclaim that these constitute course adoptions. I have talked with any number of marketing directors, and the consensus is that adoption sales are perhaps 20-25% of the total. Yes, these are only estimates and these estimates vary by press, subject area, and whether the titles are monographs or trade titles — or what passes as a trade title for a university press. But in a post about the difficulty of getting at solid sales figures, I hope I will be granted some leeway in estimating figures.
So the average, oversimplified university press distribution, expressed in dollars, looks something like this:
Course adoptions 25%
Other resellers 15%
How did we get that library number? That figure comes from disclosures by the principal wholesalers–that is, B&T and Ingram–as to where they ship books. If these wholesalers did not share this information, university presses would have no idea what their library sales are.
When you combine Amazon’s numbers with those for other resellers (that is, bookstores, including online venues), you get 40%, which presumably are sales to individuals, not including students. But since libraries often buy books from Amazon, the library figure could be higher. Amazon, however, will not disclose where its books end up, so the publishers are in the dark.
When presses report that their sales to Amazon are creeping up to 40% or so, almost certainly that means that Amazon is providing books for course adoptions and libraries, perhaps even for export sales, as well as to individuals. If Amazon would be willing to share aggregate numbers for where it sells books, publishers would have a better idea of their overall marketplace, which could help them in acquiring and marketing their titles. An exasperating aspect of this situation is that as Amazon’s strength grows and as more and more press sales go through Amazon, publishers know less and less about who actually reads their books. In an age of abundant information, the ability of a handful of tech companies to cordon off huge chunks of that data is simply mind-boggling. Perhaps open access advocates could get Amazon (and Google and Apple and Facebook) to open up about the information it now hoards.
Having just read Lisa Randall’s “Knocking on Heaven’s Door,” with its fascinating descriptions of how physicists deduce solutions where direct evidence is not available, I began to wonder if there might be an indirect way to determine Amazon’s sales to libraries. This is where Hawking Radiation comes in. Hawking Radiation refers to a means by which information escapes from a black hole. More importantly, it’s a great metaphor, up there with the uncertainty principle, dark matter, and black holes themselves. When you don’t understand the physics, you can always lapse into poetry.
One way to figure out how many books Amazon sells to libraries is to ask librarians. Unfortunately, this is a big task. You would have to interview a statistically meaningful sample of librarians and get access to collection records. You would then map those records against publishers’ lists and get a picture of which books ended up in what libraries. So you take all the books from Distinguished University Press and find out which of those titles appear in the test sample. Since the publisher of DUP knows the total volume of the press, the extrapolated figures from the sample provide the percentage of library sales.
I don’t see anyone coming up with the funding for this project — which, by the way, would take an awful lot of cooperation by hundreds of libraries and every single university press. Is there a back door? Perhaps. Over the past couple years I have been contemplating WorldCat, which I have come to believe is the great sleeper database of the information age. Here is a place where you can find out which books are in library collections around the world. If physicists can data-mine the information collected from radio telescopes, and if Facebook can data-mine your “likes” and your “friends,” surely we can data-mine WorldCat and find out any number of things about the book business.
By using WorldCat, we might be able to skip one step outlined above, the one where we gather collection records from hundreds of libraries. With WorldCat we could literally look at the collections of all the world’s libraries. WorldCat can’t tell us anything about multiple copies of a book in a collection — a problem for trade publishers, but for university presses, the prospect of selling multiple copies of a book to any one institution is too much to wish for — but it can tell us which collection contains particular titles. We would then take that information and map it against publishers’ output. The result: sales figures for university presses to libraries. We have just disintermediated Amazon as an obstreperous information source.
I began to think about this recently because of the PDA project I have been working on. If one of the questions about PDA is how it will impact university press sales, we need a baseline for those sales. At this time, we don’t have one. My own estimate is that university presses have total library sales of around $80 million a year. (A library vendor told me it could be a bit higher than that, but that my figure was in the ballpark.) Will PDA lower that figure by 10%? 50%? And what would that mean in dollars? If only Amazon could see beyond its own business interests and participate in the civic enterprise!
A footnote to this discussion is that we probably could do a lot more with WorldCat data once we begin to think about it. How is HarperCollins doing with its library sales for children’s books on the West Coast? With so many publishers questioning the economic and promotional value of having libraries purchase ebooks, wouldn’t we like to know which books are in a particular geographical area and map those figures against local bookstore sales (while we still have local bookstores)? There is a business here: WorldCat as database publisher, mining its own information and selling reports to publishers about market trends. WorldCat, in other words, could become the A.C. Nielsen of institutional markets.
And, boy, we could use an A.C. Nielsen. The irony of the small corner of the information industry that is known as the book business is the paucity of information about books. The book business is being upended by digital technology, but getting information to navigate these rough seas is almost impossible. Surely we should have as much information about books as we have information in them.
This has practical consequences. With university presses under so much financial pressure nowadays, any number of solutions have been put forward as to how to “save them.” (My own contribution to this debate is here.) Most of these proposals involve getting university presses to work more closely with university libraries. The question is, If libraries are a small and declining part of the university press ecosystem, should the presses be looking elsewhere — specifically, to sales to individuals, which make up the single largest segment for sales? But we can’t answer that question because we don’t have the data.
Mr. Bezos, tear down this wall!
8 Thoughts on "Hawking Radiation: Figuring Out How Many Books Are Sold to Libraries"
Seems that university presses should set up their own direct sales efforts online and offer a library discount. They get more of the pie, and can collect data. While the digital age is good at eliminating some middle men, here it is building things up.
I personally buy many things from Amazon, but books are one of them.
I would think the number of libraries buying through Amazon is relatively small. Both YBP and Coutts-Ingram operate by offering libraries discounts. Wholesalers also offer approval plans (a way of supplying libraries with relevant books in areas they collect in) and the cataloguing and processing of the books they purchase from them. Amazon doesn’t offer approval plans, cataloguing, or, most importantly, discounts. Any library that buys substantial numbers of books from Amazon clearly has money to burn, a situation few libraries are in today.
Also, regarding the oft repeated phrase that most or a substantial number of books in library collections don’t circulate — your non-librarian readers would benefit from knowing that there are two types of academic libraries: undergraduate libraries and research libraries. Since undergraduate libraries support the teaching of undergraduates, circulation statistics are a valid measure of that collection’s effectiveness. A research collection, on the other hand, seeks to collect broadly and in depth in a particular subject area, not only for present, but also future use. Great research collections still attract scholars to work in particular universities through the depth of their collections in particular subject areas. While PDA may be appropriate for undergraduate libraries, it would be clearly be inappropriate for research collections. Among other things a research library that buys a 2012 imprint today pays the 2012 price — a library using a PDA plan that buys that same 2012 imprint in 2015 (when it is actually used) will be paying the 2015 price (that is, the extra cost of inflation over three years).
Last thing, when it comes to ebook sales to academic libraries, I think some background on the way that publishers are charging libraries would clear up some misconceptions. The overwhelming majority of academic publishers sell ebooks to academic libraries based on a “concurrent use” model. That is, the number of users simultaneously reading the same ebook. In this way, the larger the university, the larger the amount the publishers can charge the library. If you talk to librarians with large ebook collections though, you’ll discover that the number of ebooks that ever attract more than one concurrent user are few and far between. This price model then, has allowed the big academic publishers to reap profits via ebooks that are substantially higher than they did in the print world (where large research libraries would rarely purchase more than one copy of a print book).
My apologies for the length of this. I think this is an excellent blog, but the discussion around academic libraries is sometimes lacking in context.
Some years ago (maybe five?) the AAUP Marketing Committee conducted an online survey of academic librarians that asked, among other questions, how often they order books from Amazon. I forget the actual number, but it was larger than presses had expected and has undoubtedly grown even more since then. As I recall, purchasing from Amazon was often used to plug gaps left by the approval-plan system. But PDA systems offered by vendors may not have as many gaps, i.e., titles outside what is offered within these systems, so sales for Amazon may go down. Just a guess, of course.
At the press I once headed, we often did WorldCat searches for information about specific titles sold to libraries, but this source was not used in any systematic way by the AAUP or groups of presses as far as I’m aware. Joe’s suggestion to use it in this way sounds promising.
As for presses looking to cooperate more with libraries, the reasons for doing so are not related to sales but rather to staffing and other synergies. E.g., few smaller presses can afford to have any dedicated IT staff, but allying with a library can gain a press access to a great deal of assistance in this area, which is crucial for smaller presses trying to make the transition from print to electronic. On their side, libraries do not have much expertise in marketing and sales, and to the extent they have ambitions to generate revenue streams, allying with presses can gain them valuable assistance in doing so.
There’s a good reason that libraries buy from Amazon. It’s often faster and cheaper than ILL. One librarian recently told me that she can get delivery from Amazon in 24 hours.
For books not available in e (thus excluded from most PDA programs), or where e is price prohibitive, Amazon can guarantee that researchers have the books in hand far more quickly and less expensively than most publishers can.