Information in a library is of two kinds — there is the content, the collection, all that stuff that resides in books and journals and special collections; and there is the information about that content, the metadata: information about where things are located, how they relate to other things, how often they circulate (but, rarely, for privacy reasons, about who actually accesses and reads the content). It’s that latter kind of information, the metadata, I am interested in, as it may provide value to certain organizations, value that libraries may seek to tap.
I have been thinking about library circulation for some time, but my interest grew when I began to study PDA last year, as eliminating books that don’t circulate is one of the reasons librarians are interested in PDA in the first place. PDA programs implicitly ask two questions: Why would libraries want to acquire books that don’t circulate? and, Why would publishers want to publish books that nobody reads? There is a risk with questions like this in that they can make scholarly publishing into a popularity contest. After all, if the measure of a successful scholarly book is how many copies are sold and how often that title circulates in a library, then the big trade publishers would become the model for all publishing, drowning out the specialized, intellectually serious work that is the business of a research university. But surely there is a middle ground between bestsellerdom and the totally obscure. Information about how books circulate in libraries would help publishers evaluate their lists and provide guidance for future editorial acquisitions.
Publishers do, of course, have some limited information coming back to them from the marketplace. No publisher fails to study the Amazon rankings of a title, for example (some authors do nothing else), and there are services that provide a modicum of information about the sales of books in retail outlets. Scholarly publishers have a special problem, however, in that their titles are sold disproportionately to libraries, so the absence of circulation data affects them more seriously than it does trade houses.
You would expect to be able to go online and look all this stuff up, even if some of it resides behind a paywall. But you can’t: there is no place to go to get aggregate data on library circulation. So, for example, Stanford University Press published a book called “Monopolizing the Master: Henry James and the Politics of Modern Literary Scholarship,” by Michael Anesko, a book I am choosing at random (though it sounds interesting). Is it not reasonable to ask what libraries purchased a copy and how often it has circulated? You can get an answer to the first question by looking at WorldCat, but the second question is unanswerable at this time.
Individual libraries study their circulation records carefully. I have previously cited on the Kitchen a rigorous study done at Cornell, and I imagine many libraries have something of this kind in hand; librarians being librarians, one assumes that such studies get passed around informally. But there is a place for a full service, one that aggregates the circulation data, properly anonymized, of all library collections, and that can generate management reports for interested parties.
So let’s imagine a new library service called BloodFlow, which sets out to aggregate the circulation records of all the world’s libraries. The libraries themselves would have to be tagged by type (e.g., their Carnegie classifcations or by using a different taxonomy) so that one could distinguish between the major ARLs, liberal arts college libraries, the libraries of community colleges — and, of course, school, public, and corporate libraries. Circulation data from all these libraries would be uploaded to BloodFlow, which would aggregate the data in a form that allowed it to be packaged according to the needs of any particular user. For example, a librarian at the University of Michigan may contemplate whether to purchase a revised edition of a book first published by Rutgers University Press 10 years ago. What is the demand among research universities for this title? If the circulation in the aggregate is strong, Michigan may decide to purchase the book. Or a librarian at a public library may look at the circulation records for a book that is already in print from Palgrave Macmillan. But if the records show that virtually all of the book’s circulations were at the top ARLs, the librarian may pass on that title as not a good fit with a public library’s collection.
Publishers would make different uses of this data. Should I bring a book back into print? Let’s check the circulation records. Or, we have a submission here on Byzantine studies; how can we assess the market opportunity? Publishers would also be interested in trends: Are books in Women’s Studies circulating more or less strongly over the past decade, and how do these circulation records compare to that of collections as a whole? Or how about economics, or physics? Once you begin to study data like this, the number of new questions that arise can be mind-boggling. Mix a curious mind with a large data set and the tools to manipulate it and suddenly you find that you have given birth to a new Edison or Tesla.
One way to get this service to work would be to set up a membership organization — the BloodFlow Partnership. Any library could join, with the following conditions: there is a membership fee, scaled by size and type of library, and the library must make all its circulation records available to the partnership. A member would then have unrestricted access to the data, including the report-generation feature. (An interesting question is whether information about the reports requested — the meta-metadata — would be part of the service as well.) Non-members would have to pay a fee, which would once again be scaled by type and size. For whatever reason, Colby College decides not to participate, but it subscribes to the service; the price for Colby, however, is far less than that paid by Oxford University Press and Simon & Schuster. Thus the business model is a combination of membership and toll-access publishing. Ideally, the circulation records would be available in real-time (How many copies of “Administrative Law: The Informal Process,” by Peter Woll and published by the University of California Press are circulations right now?), but this may be hard to achieve technically. The more granular the data, the better, but even annual circulation figures from libraries without the technical means to publish an API to their circulation records would have some value.
There is a corollary to this argument, and that is that with more and more libraries getting into the publishing business in some way, usually with various kinds of open access services, there is an unanswered, even unasked editorial question: What is the right kind of content for a library to publish? In my view, the best new publishing enterprises focus on new and growing content areas. A library that seeks to publish material in European history must contend with the program at OUP; a library interested in American history will have strong competition from Harvard University Press; and, most obviously, a library interested in STM journals will find such organizations as Elsevier, Springer, and Wiley Blackwell fiercely defending their turf. But aggregate library metadata is another matter. This information is proprietary to libraries; only they have access to it, only they can publish it. It’s a great competition position to be in. The beautiful irony is that the paying customers for such services will in part be traditional publishers.