Unless you happen to possess luck on a superhuman scale, bad data will lead to bad decisions. Alas, the situation is not symmetrical: good data may or may not lead to good decisions. Good data can be corrupted in context — by the misinterpreter, by the inattentive, by the intrusion of luck of the bleakest kind. The publishing business operates with data that no self-respecting industry would tolerate (can you imagine an executive at Exxon Mobil not knowing how many cars are on the road, how many miles they drive, and how much gasoline they consume?), and within publishing, book publishers have the worst of it, with no hard evidence about who actually purchases and uses their products, assuming they are purchased and used and not simply accessed on a pirate site somewhere or, in their print form, simply serving to dress up a furniture store.

In an attempt to improve the quality of data on the book industry, Ithaka S+R has just released a preliminary report on book acquisition patterns in academic libraries. Katherine Daniel of Ithaka has published a blog summarizing the project here, and the report itself, prepared by Katherine, Roger Schonfeld, and myself, can be found here. This project was made possible by the generous support of the Andrew W. Mellon Foundation. In this blog post I want to review some of the implications of the report, comment on its preliminary status, and explain how it came to be.

Hindenburg disaster
Zeppelin the Hindenburg on fire at the mooring mast of Lakehurst, NJ, 6 May 1937.

First, the background. About 10 years ago, a friend who consults in the public library sector, told me that she believed that Amazon accounted for 10% of all public library book purchases. I was astonished. Amazon is a retailer, not a wholesaler: how could this be? My friend asked the client who had commissioned her study for permission to share the report with me, but they declined, stating that the report was proprietary. Since the obvious place to go for information about Amazon is Amazon, I contacted a high-ranking member of the trade book business to make an introduction for me with a counterpart at Amazon. I got to speak to the head of Amazon’s book operations, who told me that he couldn’t help me — because Amazon does not know if libraries buy books from them. This remark was so blatantly dishonest that I resolved to try to find an answer somehow.

After a number of false starts, I put this project before Roger, whose team at Ithaka S+R came up with an intriguing (and, for me at least, entirely new) strategy. Since more and more libraries were moving to a new generation of ILS (integrated library system), it could be possible to get a data feed from the vendors of these systems. This was no small ambition. First the vendors had to be persuaded to participate (they were). Then the libraries had to grant permission to access their data (they did). Then Ithaka had to develop tools for ingesting the data and putting it into a useful form (mission accomplished). Finally, queries had to be put against that data — and that is part of the ongoing project. Indeed, one of the things we are now contemplating is what other kinds of questions can be put to this data set, questions that go far beyond the original one of how many books does Amazon sell to academic libraries. Note that this method of gathering data eliminates the guesswork. Roger’s group has been looking at the same data that libraries use to run their organizations.

You can get all the details of the methodology from the report linked to above, but to summarize a couple highlights: in our sample, Amazon is the second largest provider of books to academic libraries. So the notion that Amazon is not a wholesaler gets thrown out the window. Another surprising bit of information (highlighted to me by Rick Anderson) is how few books were acquired on approval plans. The fact is that we have had assumptions about the academic book market that probably are just not true.

Has bad data led to bad decisions? Yes. It is not uncommon to hear university press personnel moan that libraries have stopped buying scholarly monographs, partly because library budgets have been appropriated by the Big Deals from STM publishers. The “evidence” for this argument is two-fold. First, unit sales have dropped over the years, so that fall-off had to come from somewhere. Second, academic publishers can look at their sales figures to their principal wholesalers (YBP, renamed as GOBI and now part of EBSCO, Baker & Taylor, Ingram, etc.), and those figures are showing an ongoing decline over many years.

So about those bad decisions based on partial or bad data: let’s cite three. If you believe that libraries have stopped buying books, you get tempted to put them all into a heavily discounted aggregation and sell on the basis of price. The problem here is that many libraries apparently took the discounted aggregations in place of the full-price books they were already purchasing. Action #2: demand-driven acquisition. Once again, if you think libraries are not buying your books, why not put your books into a DDA program? Two problems here: the library may have been buying your books, but you don’t know that; and DDA, even when it works for publishers, delays payments for months, even years. This means that DDA books must be priced at a huge premium. Has anyone noticed a sharp uptick in DDA book prices? I didn’t think so. And #3: open access monographs. Part of the reason (not the only reason) for the enthusiasm for OA books is that it is believed that libraries are not buying books, so OA is the only way to make them available. But libraries are buying books; it’s simply that those purchases are not showing up in the figures at EBSCO and B&T.

Had publishers known the figures for Amazon, they would have realized that a big part of the alleged fall-off in library sales was simply channel-switching, that is, books that were formerly sourced through YBP and others were now being sourced through Amazon. Since Amazon was classified as a retail account, its impact on library acquisitions was overlooked. This piece of bad data has cost academic book publishers millions of dollars.

As for the question of whether avaricious STM journal publishers are depriving academic book publishers of their livelihood, the answer goes beyond the scope of the Ithaka study and I will not comment on it here (but — pssst! — if you want to sell more books to libraries, publish better books and enhance their integration with a variety of library tools and systems).

We are calling this a preliminary report because more data is forthcoming this autumn, which will be loaded into the Ithaka systems and analyzed. Currently the analysis uses data from 54 institutions, all of which use the OCLC WorldShare ILS. By autumn another group of institutions will be providing data through the Ex Libris Alma ILS; we anticipate that the total number of participating libraries will be in the range of 150-200. This brings us to a very important question: Is the data we are working with now representative of the U.S. academic library community as a whole; and will it be representative when we add in more libraries this fall? I think the proper answer to these questions is no. To get a representative sample we need more libraries and they have to be distributed along a broader axis. The Ex Libris data will help — Ex Libris has significant market share among the ARL institutions — but we still won’t have a truly representative sample. So, at best, we can call the current data suggestive and directional, but it is not definitive.

I anticipate that the inclusion of the Ex Libris data will continue to show that Amazon is a major vendor to academic libraries, but its market share will drop (because Ex Libris’s customers are among the largest libraries and the larger the library, the less likely that Amazon will be a sizable vendor). Also, it seems likely that the Ex Libris libraries are more likely to purchase books through approval plans. Another question we will be exploring is the ratio of print to ebooks, as the current figures show surprisingly small figures for ebooks. But we have to wait for the information to come in before making any judgments.

Beyond OCLC and Ex Libris, it will be a challenge to get a fully representative data set for the academic library data. To do that, among other things, Ithaka will have to get data from other ILS vendors, whose architectures may not lend themselves to the data extraction method that has been employed thus far. We will have to find a way around this technical obstacle.

With all these caveats in place, we will begin to analyze further the expanded data set this fall. We anticipate being able to make some comments about the subject areas for which libraries are most eagerly collecting titles and also about the university press sector. For example, what does the aggregate ILS data tell us about specific programs — at Michigan, at Princeton, at Duke, and elsewhere? And how can that data help these presses make sharper decisions, improve their business performance, and make their already valuable programs even more valuable to the research community?

In the meantime, let’s contemplate how much easier this all would be if Amazon were willing to answer one simple question.

Joseph Esposito

Joseph Esposito

Joe Esposito is a management consultant for the publishing and digital services industries. Joe focuses on organizational strategy and new business development. He is active in both the for-profit and not-for-profit areas.


24 Thoughts on "Good Data, Bad Data, You Know I’ve Had My Share: Library Book Acquisition Patterns"

I thought the title was brilliant before I read the article OR saw the video of that dynamite little drummer! Thanks for giving my day a great beginning.

Our library heavily relies on Amazon for several reasons (particularly when compared with Gobi).
– Delivery is much faster.
– Books become available to order (metadata loaded in, stock available) much faster.
– Amazon offers a much larger selection of titles.
– Delivery and availability are much more consistent.
– Inventory data is consistent.

When Amazon is shown as a seller to libraries does that include Third-party sellers on Amazon?

Very interesting question–the data we collected from libraries lists these acquisitions as simply being through Amazon, but it’s entirely plausible that some books could have been purchased through third-party sellers.

I continue to be really intrigued by this study and by the data you’re pulling from it, Joe (and Katherine and Roger) — thanks so much.

A response to one point you made about ways that publishers can respond when DDA creates payment delays or depresses library sales:

This means that DDA books must be priced at a huge premium. Has anyone noticed a sharp uptick in DDA book prices? I didn’t think so.

There is actually another possible publisher response: to simply stop making their books available via DDA. We’ve been seeing that pattern in our library; fewer and fewer of the books being published are being made available for us to load into the DDA program. Since we have taken a very aggressive approach to replacing librarian-driven purchasing with demand-driven purchasing, this gives us pause: as the percentage of available books that can be purchased by demand-driven means shrinks, how is our collection (or to put it more accurately, how is the diversity of books offered to our patrons for their use) being affected? Our Head of Collection Management and I are currently gathering data that will hopefully help to answer that question and I hope to have a posting ready for the Kitchen before the end of the summer.

We’ve also seen DDA purchasing triggers get more sensitive over time, so the books get bought far more quickly.

My academic library purchases from Amazon for several reasons:
Rush orders – we treat reserves requests as rushes and can get the titles quickly through Amazon and we pay for Amazon Prime so delivery is very fast.
Marketplace orders for replacement titles that may be out of print
A title may be in stock at Amazon and it’s not in stock at our book vendor

But, we also buy books in a variety of other ways – we have several approval plans with Proquest/Coutts and Gobi. We have a DDA plan. We purchase front list titles of ebooks through a variety of publishers direct. We purchase titles through our consortium.

As the acquisitions librarian, I tell our patrons/faculty/librarians that we will try our hardest to get them the material they want – and we investigate all possible ways to do that.

Anecdotally, our acquisitions staff had to order from Amazon via personal accounts for a long time. It was only in the past few years that they were able to get some kind of satisfactory institutional account set up with Amazon, the full details of which I don’t know.

It’s possible Amazon really truly doesn’t know, if they never bothered configuring account creation in a way that would identify an account as belonging to a library instead of to an individual. Perhaps they overlooked the channel-switching possibility, too.

Your comment is superhumanly generous toward Amazon. I hope you are correct and that I owe Jeff Bezos an apology for accusing him of cynicism.

Ha! I’m suggesting a possibility of willful negligence instead of malice, that’s all. And trying to be diplomatic. I guess that worked too well.

It would still be weird, though – they sure collect all the *other* data.

I had the same thought. I purchased books for an academic library from 2010-2014. I often used Amazon for rush and replacement orders, usually around $1500 per month, but the account and credit card was in my own name. There was another person at my library ordering in the same way. I don’t think my library’s name was even in the mailing address. What I don’t remember is how we managed the vendor name in our ILS—was it listed as Amazon or just as a credit card order? It’s been too long since I last thought about that detail!

We have known for some time that libraries are buying books from Amazon, and it will be useful to have some firm data on this at last. Thanks for giving us this preview. But it’s sad to read yet another caricature of university press book publishers here. Of course libraries haven’t stopped buying books. But it’s a fact that the library market for monographs has shrunk. To say that “libraries are buying books; it’s simply that those purchases are not showing up in the figures at EBSCO and B&T” implies that we publishers are selling as many monographs to libraries as ever, and that the missing sales have all gone through Amazon (which we’re misrecognizing as sales to individuals). I don’t think you can mean that, Joe. Also, whose bad decisions are we talking about? Bad decision #3 on your list is enthusiasm for OA monographs. But UPs aren’t nearly so enthusiastic about OA as are libraries. Finally, about your parenthetical advice–“if you want to sell more books to libraries, publish better books and enhance their integration with a variety of library tools and systems”–can you share what you mean by this?

Alan, a senior member of your organization told me that (a) he did not know where the sales to Amazon ultimately end up and (b) he had no idea how many books show up in libraries. Now we are on the way to find out. Concerning OA books, the enthusiasm for this in the press community is loud, though not entirely thought out. I really don’t think there is a caricature here.

Joe, what struck me as a caricature is the remark that “it is not uncommon to hear university press personnel moan that libraries have stopped buying scholarly monographs” and the rest of that paragraph. But OK, maybe that was just for rhetorical effect. It will be excellent to have Ithaka’s study to clear up the uncertainty about the proportion of library sales going through Amazon. If it’s a very large proportion, it will show that library sales haven’t fallen quite as far as we thought. But the evidence that libraries have slowed (not stopped) their monograph acquisitions over the years can’t be dismissed for being incomplete.

I think you’re comparing apples with oranges. In my experience, libraries don’t see a print book via Amazon and an ebook via DDA as comparable things, or money coming from the same pot. For my library, Amazon is for the stuff we can’t get from our normal book supplier – the out of print stuff, stuff from very niche publishers, dvds, that kind of thing. We’d only use them if the normal supplier can’t supply. Certainly not for typical academic books – we’re better off getting them from a normal library supplier, who will do shelf-ready (for print) and send us MARC records and all the sorts of things Amazon won’t. If we’ve had to resort to getting something from Amazon, we must want it pretty badly. Maybe it’s different in other HE institutions, but we don’t get big chunks of money and then think, “Hmmm, what shall we spend all this on? Amazon, or book supplier, or shall we just blow it all on DDA?”. That really isn’t how it works. If we aren’t buying your monographs, it’s not that we’re sneakily getting them though Amazon. We’re just not buying them.

I don’t think anyone is saying librarians are sneaky. The point is that book publishers operate with almost no market intelligence, which is denied them by the complexity of the distribution channels. This leads to bad decisions. As for what libraries actually buy from Amazon, a granular analysis of that kind would be an excellent follow-on study.

I’d like to see the results using Infographics that can summarize using data visualization techniques. I’ve been told that a lot of “Amazon” purchases are actually just coming from Ingram warehouses. That might explain the lower price average of Ingram. The GOBI integration capabilities might explain YBP’s market dominance.

It is true that some portion of Amazon shipments are fulfilled by Ingram and perhaps other wholesalers. Amazon is the vendor of record in these instances. The shipments arrive in an Amazon-branded box. This is true for shipments to residences, too. No one knows how widespread this is, as the information is closely held by the participants. This does not change the implications of our analysis. What we don’t know at this time is precisely what is inside those various Amazon shipments. Used books? Looseleaf binders? Snowshoes? Much more analysis is needed, and a bigger data set will help get to more robust conclusions.

On the ebook-side of Alma, are packages counted in your study as a single title or multiple titles? I agree that sharing this data is important (on an aggregate level).

This is an important point. The short answer is that we are hoping to capture per-title data, not vendor aggregation data, but the reported ratio of print to ebooks is so stark that we are planning to review this matter this fall for the final report. Stay tuned.

In Alma, ebook packages most likely show as a single title. When I ran the report for my institution for 2017, the results are inline with those presented here: it looks we purchased about 6,500 books, the majority in print (4,500). What’s missing are the 9,000 ebooks that we purchased as publisher packages, or through EBA/DDA, and that lack individual order records.

I have discussed this with Katherine Daniel, and she has said they will be correcting for this in the final report. Considering the degree of skew, however, I do wish Ithaka S+R would either retract or amend the preliminary report.

Smart article and good insight, particularly this: “Since Amazon was classified as a retail account, its impact on library acquisitions was overlooked.” In all honesty, I think that statement works both ways. “Libraries buying from Amazon don’t care is Amazon is a retailer, as long as they can get the books in a timely manner and at a discount price equal or less than library programs from publishers and distributors.”

On one level this isn’t really news: any academic or uni press publisher who didn’t realize that some or even a lot of their Amazon sales were going to libraries is prob a pretty poor publisher with little understanding of their market and customers. That said, there is still some useful info here. Thanks!

Just to clarify Joe’s point here (and I’m happy to be corrected in turn): “Since Amazon was classified as a retail account, its impact on library acquisitions was overlooked. This piece of bad data has cost academic book publishers millions of dollars.” Translation: an academic publisher provides an entity like B&T a book on a “short discount” (e.g., 25% off list price: so publisher collects from B&T $30 of the $40 list price, even if B&T opts to discount on its own the list price). Amazon receives a “long discount,” which varies depending on what publishers have negotiated. In the scholarly book trade, per my own experience, that was 40% off list, but it could be more. Every library purchase through Amazon was bad for the book’s P&L. Long story short: publishers prefer selling via B&T than through Amazon. Note this is entirely separate from the question of no. units sold on average per book. Those may or may not have declined over time, depending on the nature of the book. For example, some of the highest attrition rates in lifetime units has occurred not so much among academic monographs, which have often long sold in small numbers, but among high-page count reference titles, esp. multivolume encyclopedia sets. Of course, this last is the direct result of the quicker migration of reference content to online settings (both OA and commercial). And let’s not even get into the dynamics of the traditional textbook market (into which some academic publishers insert themselves with niche titles that manage a toehold). That, too, will be changing.

Comments are closed.