Extracting Book Data from Library Information Systems

Previously I mentioned on the Kitchen that I have been working with Roger Schonfeld and Katherine Daniel of Ithaka S+R on a project to determine how academic libraries acquire books and what market share the various vendors control. (If you follow the link to that Kitchen post, be sure to watch the video at the end, as it is of the essence.) I am not going to rehearse the nature of that project again beyond stating that we set out to answer two questions: What proportion of academic library sales of books does Amazon control, and have university press sales to libraries been declining?

To these questions we added several more, including investigations into pricing and the subject categories that the acquisitions fell in. I am pleased to report that the final report for that project is now available on the Ithaka S+R Web site. I recommend it to everyone interested in the academic book market. Naturally, having completed this study, we now see reasons to do a dozen more. That is always the case: data breeds a need for more data, and more data engenders a need for even more data, and so on ad infinitum. So rather than say we have completed our project, it’s probably more accurate to say we are taking a breather. The case for hiring data analysts to help operate a business gets stronger day by day.

Mister Magoo record cover — Image via Kevin Dooley.

The key thing about the final phase of this project is that we were able to get data from more libraries — a total of 154. We got this data through arrangements with two of the leading vendors of Integrated Library Systems (ILS), OCLC (WorldShare) and ProQuest (Alma). This meant that we were able to look at the very data that libraries use in managing their operations. It would have been great to have even more libraries — how about 500? 1,000? — so that we could claim that our sample was representative of the U.S. academic library market as a whole, but there were technical limitations in getting a larger sample. How to overcome those technical limitations is a matter for future projects. In the meantime we are being careful not to say that our sample is representative of U.S. academic libraries as a whole; the results are suggestive and directional, but not definitive.

Our data was fleshed out a bit through conversations with representatives from vendors. This allows us to feel comfortable making the following statements:

Amazon controls about 10% of the print book market (and no discernible share of ebooks).
Amazon’s growth is coming at the expense of other vendors. Amazon thus must be acknowledged as a library “wholesaler.”
Our data provided little insight into ebook aggregations, which are cited in an ILS as a single entry and not title by title. While the vendor information helped to fill this out, we still cannot provide answers to some questions, such as the extent that aggregations are replacing title-level book acquisitions.
The unavailability of ebook aggregation data means that we cannot say with certainty that sales of books to academic libraries are declining, which is a view commonly held by many. If there is a decline, it is partly offset by Amazon’s growth; some portion of Amazon’s increasing revenues must be counted by publishers as library sales. Ebook aggregations would add considerable volume to the title-by-title figures we uncovered.

Library expenditure on books is inevitably larger than publisher revenue. One area that we have not explored, but that may represent a significant part of the market, is the size of inter-library loans (ILL). How to put a dollar value on these loans, which cost libraries money to participate in? There are thousands of books being borrowed in this way, but ILL produces no revenue for publishers. Similarly, the purchase of used books is a cost to libraries but yields no revenue to publishers. Finally, only a portion of the prices paid by libraries for aggregated book packages is passed along to publishers in the form of revenues: the publishers’ income is net of the costs of such middlemen as EBSCO, ProQuest, Amazon, Baker & Taylor, Ingram, etc. Thus the $700 million market figure cited here (plus whatever you add on for ILL) represents library expenditures but not publishers’ income. I would be interested to hear members of the community offer comments about what dollar value to put to that net figure. Perhaps $400-$500 million?

It was pure coincidence, but while working on this project some of my colleagues and myself became involved with an entirely separate project concerning ILS’s. Sitting around a large conference table with the heads of a number of libraries of all sizes, the complaints piled up, beginning with the librarians’ frustration with electronic resource management (ERM), on which there was unanimity. Some of the complaints were with publishers, not the ILS — for example, librarians often find that the resources they license, which are listed in contracts, do not match what they actually get access to. It’s no surprise that sometimes publishers fail to turn on access to some publications, but it was eye-opening that often libraries find that they have access to materials they did not pay for (whose responsibility is it to correct that?). Many librarians, it seems, are forced to spend their time dealing with low-value issues, essentially cleaning up the messes left for them by publishers, other vendors, and ILS providers alike.

The real issue these librarians have, however, is that they don’t have the data to do their jobs. The ILS does not enable the kind of management reports that they rightly feel they should be able to get — and the frustration with ERM systems appears to be universal. If the data were better, if it were comprehensive, and if it were viewable through customizable dashboards, what would this inform librarians about their collections, their patrons’ usage, and the value they get from particular publishers?

So here we have a bizarre market situation where book publishers and librarians alike are starved for data, but the intermediaries are not sufficiently helpful in illuminating this. I say “book publishers” because journals are different; usage largely takes place on a publisher’s server, enabling publishers to undertake data analytics directly. But book publishers only get hints and (sometimes) polite smiles from their distributors, and librarians struggle with the data they have, which is incredibly messy with many items miscategorized, occasionally superfluous, and not in a form that permits it to be integrated with other data across platforms.

So what good is hiring a data analyst, whether you are a publisher or a librarian, if you don’t have the data in the first place? In the absence of more comprehensive data and standards for developing it and using it, the industry-wide emphasis on data analysis operates in a vacuum. Of course, larger firms have more data and can do more with it, but smaller enterprises, including all but a handful of university presses, have to fly blind, as do the libraries they seek to serve, which suffer with their recalcitrant ILS.

What we need is to find a way to free the data from the control of the intermediaries or for the intermediaries to improve the data they provide in the first place. Note that this is not a call for open access; the data that would be made available is data about a publisher’s own books and a library’s own collections. Intermediaries play an important role in scholarly communications and they add value to the process, but the academic book market will only shrink if the participants cannot get a window into their own operations.

Joseph Esposito

Joe Esposito is a management consultant for the publishing and digital services industries. Joe focuses on organizational strategy and new business development. He is active in both the for-profit and not-for-profit areas.

Discussion

3 Thoughts on "Extracting Book Data from Library Information Systems"

You may also want to add to your future analysis the extent to which EBA programs that publishers offer will be bringing back some of the revenue that they’ve been losing to intermediaries. Also very important is to look at whether shifting library expenditures into EBAs with the large publishers who can offer them is starting to harm smaller commercial and scholarly book publishers, as it ties up library budgets with a few big publishers, just as Big Deals did to the journal market decades ago. Project Muse and JSTOR are the only aggregators I know of that are offering EBAs that include many smaller scholarly book publishers (mostly university presses). Finally, in terms of the issue of having appropriate quality metadata to even do analysis with, I encourage you to get involved in the issue of ebook ISBNs. We always had a problem with different ISBNs for different editions/printings of print books (hardcover vs paperback) but now a library could have the same ebook on 5 different platforms and not realize it because ISBN rules dictate that every one gets a different eISBN so it’s incredibly hard to discover the overlaps. We need an ISBN-L along the lines of the ISSN-L. Apparently a related project was started but abandoned. People with voices that are listened to (like you chefs) are needed to make this happen.

By Melissa Belvadi
Jan 29, 2019, 10:17 AM

Joe is there any data on book collections as data bases? As I was retiring books were being put into databases and libraries were buying those instead of individual books.

By harvey kane
Jan 29, 2019, 10:38 AM

As a librarian trained in Europe and the US and having worked for ILS vendors and now for a university press, I understand these challenges were well. In my observation, providing a well functioning ILS is as complex as it was at the beginning of my career almost 30 years ago. An ILS interfaces with many different technical and administrative entities and components within a university, research organization, and community. While it has remained near impossible to provide the perfect system, I consider many ILS far from recalcitrant.
You stated: “Our data provided little insight into ebook aggregations, which are cited in an ILS as a single entry and not title by title.” Are you referring to “cited” as part of the acquisitions data? Title by title citation is successful through MARC records, and it’s up to the library to determine cataloging practices of their ebooks.
The ERM component of the ILS comes with its own challenges, and while many provide reports and functionality that you refer to as dashboards, I agree that there is also no perfect ERM, largely due to the issues you described.
As for publishers not receiving revenue from ILL, I wonder if anyone has analyzed the cost savings to libraries of ILL compared to what acquisitions of these materials (books and ebooks/chapters in this case, including administrative processing costs and space considerations for printed materials) would cost. One can be hopeful that these savings are applied towards additional book acquisitions, thus benefiting publishers after all.

By Katja Moos
Jan 29, 2019, 1:25 PM

The Scholarly Kitchen

Extracting Book Data from Library Information Systems

Joseph Esposito

Discussion

Announcing Our 2026 New Directions Seminar: “What Is a Journal in 2030?”

Joseph Esposito

Related Articles:

Next Article: