Google Digitization signs are all over the Mic...
Image via Wikipedia

In a late-November issue of the New York Review of Books, Robert Darnton unburdened himself of “three jeremiads,” the third of which dealt specifically with what he considers troubling aspects of the Google Books settlement.  As I understand his primary concerns — and if I’ve misunderstood or misrepresented them, I hope someone will set me straight — they are these:

  1. Google’s purpose is not to provide access to books, but to make money for its shareholders.
  2. Libraries, by contrast, “exist to get books to readers . . . (which they do) for free.”
  3. This “incompatibility of purpose might be less troublesome if Google could offer libraries access to its digitized database of books on reasonable terms” – but, he implies, the terms embodied in the proposed settlement between Google and copyright holders are not reasonable.

Although Dr. Darnton is not very explicit about what he feels is unreasonable in the settlement’s terms, he does express serious concern about the following aspects of the proposed arrangement:

  1. Libraries “are not partners to the agreement, but many have provided, free of charge, the books that Google has digitized.”
  2. Libraries “are being asked to buy back access to those books along with those of their sister libraries.”
  3. The price of access “could escalate as disastrously as the price of journals” has done in recent decades (this is the subject of one of his other two jeremiads).

Dr. Darnton then poses two questions:

  1. “Do we want to settle copyright questions by private litigation?”
  2. “Do we want to commercialize access to knowledge?”

That second question is an important one with enormous implications, and deserves a separate response all its own.  More on that later.  The first one strikes me as too vague to be useful — if by “copyright questions” he means copyright challenges, then yes, private litigation is exactly the way they should be settled, since copyrights are almost always held privately (by authors or their assigns).  If by “copyright questions” he means issues larger than those posed by specific cases, then the only correct answer would be “it depends on what the question is.”

I’d like to focus briefly on just a few of Dr. Darnton’s first six points — the two sets of three points above.

First of all, the idea that libraries provide free services is both pernicious and inaccurate, and the perpetuation of that idea is ultimately harmful to libraries (not least because, when expressed publicly, it’s very often in the context of arguments for continued or additional funding). The reality is not that library services are free, but rather that the charges are hidden — in property or sales taxes, in student fees, in tuition — and the assessments are distributed broadly, such that library services feel free, more or less, to those who use them. There’s absolutely nothing wrong with funding libraries in this indirect way, of course, but there is something very wrong with treating the illusion of “freeness” as if it had a basis in reality.

More troubling is Dr. Darnton’s (in my view) deeply misleading characterization of both what participating libraries gave to Google in the course of the digitization project, and what will be sold by Google to those libraries and others who subscribe to the service proposed in the settlement (I’ve labeled these as points numbers 4 and 5 above).

It is simply not true that participant libraries “provided . . . Google” with books “free of charge.” This is a common formulation among opponents of the settlement, and it’s deeply misleading, creating as it does a mental image of libraries handing over their books and Google taking them away.  What the libraries provided was access to books in their collections; Google made copies and took the copies — not the books — away with them.  Not only did Google take nothing from the libraries, it actually left them with more than the libraries had prior to the project, in that each library was left with its own digital copies of the books in its collection, copies made by Google and given to the library at no charge.  The libraries lost nothing, spent nothing, and gained much.  One example: at the University of Michigan, the library estimates that it costs them roughly $60 to digitize a single volume.  This means that Google has given their library roughly $60M of value by digitizing (and thereby both preserving and making publicly accessible) their public-domain materials, and another $300M of value by providing digital preservation for their in-copyright books.  Yes, the project yielded tremendous value for Google, but allowing Google to make money seems like a relatively small price to pay for this kind of service.

Second, libraries that participated in the project are not “being asked to buy back access to [their] books.”  Since partner libraries are left with their own digital copies made from their collections, they already have access to the books they allowed Google to copy; in fact, they have been given tremendously enhanced access to those books, and at no charge.  What Google does expect them to pay for, if they wish it, is hosted access to additional digital copies of a much larger collection of books, which includes titles from their collection as well as from many others.

Am I saying that the settlement is perfect?  Of course not.  The possibility of skyrocketing prices is certainly troubling (just as it is for every other service that libraries buy), and it seems to me that should it be approved, a settlement as far-reaching as this one is liable to have consequences both unforeseen and unintended for copyright law in other corners of the scholarly publishing world.

But I think it’s important that we be clear on the facts.

One fact is that no participating library was left after the Google project with anything less than what it had beforehand.  Another is that each participant library was left with the extremely valuable results of a massive digitization project, which cost the library nothing.  And another is that no library will have to pay for access to the books the library allowed Google to digitize.

Clarity on these facts is, it seems to me, essential to any responsible and useful discussion of the worthiness of both the Google project and the proposed settlement.

Enhanced by Zemanta
Rick Anderson

Rick Anderson

Rick Anderson is University Librarian at Brigham Young University. He has worked previously as a bibliographer for YBP, Inc., as Head Acquisitions Librarian for the University of North Carolina, Greensboro, as Director of Resource Acquisition at the University of Nevada, Reno, and as Associate Dean for Collections & Scholarly Communication at the University of Utah.

Discussion

15 Thoughts on "Responding to One of Darnton's Three Jeremiads — the Google Books Settlement"

It is interesting to hear the library community complain about this “problem” that they jumped wholeheartedly into with such gumption. To hear some in the library community explain it, they were unwitting dupes of this “We do no evil” propaganda. When Google appeared at their doorstep, was there a thought given to why a corporate entity would want to scan their entire collection free of charge? Did anyone give the issue some thought and realize that they collections they held had value and their was “gold in them there books”? Didn’t anyone realize how successful JSTOR has been and why? Now to start complaining that the work is going to only provide Google with an unencumbered monopoly on the world’s greatest library ever compiled seems to me the worst form of Monday-morning quarterbacking. It is hardly as if the various participating libraries weren’t sitting at the table and taking these decisions about what would happen, when and how.

The real problem is that now every participating library has been left with nothing less than they had before, but it is actually WORTH less than before. Having a complete printed collection was only useful before the collections had been digitized and made accessible digitally. And the sum of the parts is much more valuable than the entirety. Why would anyone want to go to a library, even to one holding a collection as sizable as Harvard’s, now that the world’s best library is the Google Library?

What was lacking 10 years ago was a collective vision of how this could be done by the library community and the wherewithal to find the funding to do so. There were projects led by the library community to provide scanning services and digital copies. They were all deemed “too slow” and “too costly” compared to GBS. The problem was that few really thought through the unintended costs of handing over the entire collection to Google.

I sympathize with Darnton on this point. However, I see him as being exactly the wrong person to be complaining. He facilitated the creation of this beast – touted it and fostered its development. It is as if Darnton were playing the role of Victor Frankenstein: “[T]he beauty of the dream vanished, and breathless horror and disgust filled my heart.”

{This is of course, stepping aside from the entire question of walking into a library, copying everything in it and walking out. Which is in every way conceivable a flagrant violation of US Copyright law. Apparently, if you have enough money, lawbreaking is completely allowed. Ask Joel Tannenbaum, who was fined $22,500 per song he copied illegally what the costs of copyright infringement are. At 1 million items scanned, that place the GBS at ~$22,5 BILLION settlement. I’d say GBS is off by a factor of several zeros if the law were equitably enforced.}

Hi, Todd –

You’ve raised a lot of issues in your response, but I’m just going to pick one of them to respond to:

“The real problem is that now every participating library has been left with nothing less than they had before, but it is actually WORTH less than before.”

I might agree with you if I believed that the value of a library is measured by the number of visitors it attracts to its physical building, and by its competitive position in relation to other libraries. If, on the other hand, you measure the value of a library by the amount and ease of access it gives patrons to high-quality information, then a Google partner is a much, much more valuable library than a non-Google partner. By taking advantage of the free digitization service offered by Google, Harvard and Michigan and the other Google partners have (potentially) made their entire collections available to their patrons both remotely and around the clock — not to mention the access they can now offer to people who aren’t their patrons. That may reduce the uniqueness of those libraries’ offerings (by making them available to people who haven’t paid tuition and fees), but I’m not sure it can be said in any meaningful sense to have reduced those libraries’ value.

Rick

Rick,
The point about providing value to patrons is exactly the point. By providing the content to Google Books, yes the library is providing greater content to the entire community who can get access to GBS and all of the content (again setting copyright aside). Where this arguement falls down is two fold: 1) The point of entry is Google not the library and forever moving forward the library isn’t providing access, Google is. Beyond providing the initial content, the library steps back and is no longer part of the equation (from the user’s perspective).

2) The value of the printed copy, once people have a digital facsimile that is “good enough” will mean that the value of holding that physical item is increasingly diminished, although no less costly. The costs for maintaining a print collection, even if spread across institutions for the increasingly rare request will become unrealistic supports for many, if not most institutions.

I have a troubling vision that the long-term outcome of the GBS will probably be a concentration of regional print-holding libraries (lucky for those that are one and trouble for those that are not), an increasing focus on special collections, and social gathering places. Most libraries will be little more than museums for special collections, POD outlets for GBS books, and coffee shops. Whether this is good or bad is open for debate, but it is a much different vision of what a library is and provides.

You are right about providing greater value to the community. However, having provided that value, now libraries are simply providing an expensive storehouse for dead trees and ink, while the use (and therefore value) is being hosted somewhere else.

{I realize this is a dark and edge case, but it seems more and more likely as GBS gains traction and success.}

Todd

Hi, Todd —

I don’t necessarily disagree with your predictions, but it seems to me that your concern about those eventualities is too library-centric — I hear you saying “GBS makes X and Y more likely, and X and Y would undermine the position of the library, and therefore GBS is problematic.” But what if X and Y undermine the position of the library by doing things for society that need doing, but that libraries don’t do very well — such as provide access to books? (And let’s be clear about that: libraries are very good at building collections, but we’re mediocre at best when it comes to providing access to them.)

For example, you predict that the success of GBS will mean that “the point of entry is Google, not the library.” True enough, and it’s also true that such an eventuality might threaten me personally as a librarian — but should we assume that it would be a bad thing for my stakeholders? My library currently provides access to a collection roughly 1/5 the size of the University of Michigan’s. If GBS gives my patrons access to the UMich collection, that may indeed undermine my importance as a CD officer for my library. But it seems to me that I’d be doing a great disservice to my patrons if I balked at GBS for that reason.

The same goes for your second prediction: “The value of the printed copy, once people have a digital facsimile that is ‘good enough,’ will mean that the value of holding that physical item is increasingly diminished, although no less costly.” That’s absolutely correct. But I think I’d argue that preserving the value of the printed copy is not a worthy goal — and that to think otherwise would be to confuse means with ends. My goal, as a librarian, isn’t to own a printed copy; it’s to provide access to the book. If a better access tool becomes available, then that fact absolutely _should_ lead me to reconsider the role of printed copies in my collections. I’m not suggesting that the answers will be simple, or that they’ll be the same in every library — only that we need to keep means and ends straight.

In other words, I don’t disagree with you that GBS poses a potentially existential threat to traditional librarianship. Where I think we may disagree is on the question of whether that threat is ultimately a good thing or a bad thing for the world of scholarship. If GBS is bad for libraries but good for scholarship (a debatable point, obviously!), then the answer is not to oppose GBS, but to rethink the library.

Best,
Rick

Oddly enough, I completely agree. Odd because I am not, nor have ever been, a librarian. I have little to lose (personally) if libraries become the Blockbuster stores of the 21st Century.

To be clear, I think there is also significant potential risk to the publishing community of handing the market to Google in this way as well.

As a user of content, I think there’s tremendous potential in having access to the digitized content from the world’s largest libraries and I’d probably be among the first to pony up for my “Google Library Card/Login”. Access to it would be extremely valuable and for which I’d pay (individually or organizationally) and probably pay a fair amount to get access to.

I still believe that opposition to GBS is warranted on other grounds — copyright defense and the belief that persistent, open and un-biased access would be better provided by a library (or more likely a consortium of libraries) than by a single corporation.

There is no inherent reason why Google could provide this service better or cheaper than the library community could. Creating just such a service would (or would have?) addressed the problem of providing access not just collections. The problem was a lack of vision and collaboration. There is more than enough resources to undertake a project as ambitious as GBS, were the resources strategically amassed and deployed to do so.

Todd

Re: “There is no inherent reason why Google could provide this service better or cheaper than the library community could. Creating just such a service would (or would have?) addressed the problem of providing access not just collections. The problem was a lack of vision and collaboration.”

With respect, Todd, you’re deeply mistaken about the resources available in the library community. While I agree that we often suffer from a lack of vision and poor coordination of effort, the fundamental roadblock to a digitization project on the scale of GBS is resources: such a project is utterly beyond the means of libraries, either individually or collectively. Just to stick with Michigan as an example: if their cost estimate of $60/book is accurate, then it would have cost them $360M to do what Google did in their library — and that’s just one library collection (though admittedly one of the bigger ones). Collaboration with other libraries might have eliminated some of those costs by reducing duplication, but the process of finding and eliminating duplication would have entailed its own significant costs. Even if Michigan could have found a way to reduce the cost of the project by 90% (which is a huge leap of faith), they still would have had a $36M project on their hands. By comparison, according to the most recent ARL stats, Michigan’s materials budget totals about $20M per year.

The bottom line is that the only way something like GBS was going to happen was if an entity like Google did it: an entity with incredibly deep pockets, a high tolerance for risk, and a business plan. And “deep pockets” is the trump card.

Best,
Rick

This is a very interesing discussion. I agree with many of the points that each of you is making.

I have felt for some time that the library’s focus is becoming more the organisation and dissemination of information. Our collections incorporate a number of formats including online access, and we select the titles that we add to our collections based on assessment against established collection criteria that are compiled with the needs of our specific users in mind. Our users can use Google products to be overwhelmed by the whole world of information that is good/bad, relevant/irrelevant, useful/useless, or they can come to our library – either in person or online – and use the material that we have selected specifically as being appropriate for their needs. I think libraries and libraries will be needed more in the years to come, not less.

Rick,

I take significant issue with your math. There are the resources in the community, especially if you consider that the digitization does not need to happen overnight. Some figures for comparison:

According to Internet Archive, they were scanning books at $0.10/page or roughly $30/book. They estimated Google was doing so at $10/book or less. http://www.opencontentalliance.org/2009/03/22/economics-of-book-digitization/

The combined ARL Materials budget is $1.363 Billion dollars across 124 institutions.

The Anderw W. Mellon Foundation annually appropriates roughly $100+ million (granted not all to libraries).

The IMLS has an annul appropriation of $213 million. Even if you take only the National Leaership in Libraries program (the most relevant to this type of proejct), they annually disperse roughly $12 million.

If the total project has scanned ~3 million books at the Archive’s high price of $30/book the total investment would be $90 million. Say you take 15 years, the total investment by the community would be a manageable $6 million per year – True a large amount for any single institution, but on a collective basis, absolutely manageable – actually less than some currently active grant-funded projects. Or, another way to look at this is that the investment would be 0.44% of the total ARL-wide materials budget. I expect that they’ve seen more than that in cuts over the past three years.

Todd

Todd,

You say that you take issue with my math, but none of what you’ve said above contradicts it — except for the implied critique of Michigan’s estimate that it costs them $60 to digitize a book.

So let’s say that Michigan overstates its costs and that the real cost of digitization for an ARL library is closer to IA’s $30/book. (If Google really was able to do it for $10 apiece, then that amounts to a very strong argument in favor of the Google project.)

As of 2008, the ARL libraries held a combined total of 565,377,383 volumes. At $30/volume, it would cost nearly $17 billion to digitize all of them. Now let’s assume that 70% of those volumes could be eliminated because of duplication (though, again, that process alone would entail its own significant costs). We’ve now reduced the cost of digitization to just over $5 billion. That’s 25% more than the annual operating budgets for all ARL libraries combined. This is what I mean when I say that the resources are not available in the library community. Even if you drag the project out over ten years, you’d have to account for acquisitions made during that same period — and you’d be effecting a massive budget cut for the library’s other operations during the project. By contrast, Google did all of its digitization for free, and handed over copies of its files when it was done. Yes, there are downsides to letting Google do it — but opportunity cost is only real where opportunity exists, and there’s simply no way that the ARL libraries could pony up an amount greater than their total annual budgets in order to digitize their collections.

(One specific numerical quibble: you point out that “if the total project has scanned ~3 million books at the Archive’s high price of $30/book the total investment would be $90 million.” But I’m not sure how you came to the conclusion that Google scanned only ~3 million titles for the total project. They scanned 6 million volumes at Michigan alone, and Michigan is only one of 21 Google partner libraries — and not the biggest one.)

Best,
Rick

Rick,
You’ve gone from replicating what Google is doing, digitizing 25 million copies to to something magnitudes of more ambitious – 30% of all ARL holdings. Google’s work is 1/6th of that. (NB – Sorry about my math, which I chalk up to too quick a Google Search. Wikipedia is saying that the total scanned so far is 25 million copies.)

As to the cost of scanning again, if we take U.Michigan’s estimate of $60/book that would mean that Google has already invested $1.5 billion in Google Books already. This would be nearly 20% of Google’s entire (very sizable) R&D budget http://investor.google.com/financial/tables.html, which I’m certain is wildly disproportionate. With Google investing in everything from genomic research to wind farms and, well, developing new products, the total amount directed to GBS work has got to be a fraction of $1.5 billion.

The first car that was built was insanely inexpensive on a per-unit basis. It wasn’t until Henry Ford created efficiencies in production that the costs dropped precipitously. The sames is probably happening with book scanning. There are estimates that Google is scanning at a rate of 1,000+ pages per hour. I expect the costs to do so are dropping into the single dollars per book range or less.

My point (agreeing with you), though is that if providing access to content were the highest priority for libraries, then undertaking a project as ambitious as Google’s scanning over a 10 or 20 year timeframe would have been entirely manageable financially if the costs were spread over the entire library community. There’s an audaciousness is Google’s thinking that could be replicated elsewhere if people were willing to collaborate to accomplish things. “We choose to go to the moon…, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone”
Sure it would have taken longer, it would have been a sacrifice, but it could have been done and libraries would have been better off for doing it themselves.

As to Sandy’s point – I applaud Hathi for taking what’s been given to them and to retooling it into something more akin to library’s missions. I don’t think though that the end result of either project will be a net win for libraries as we know them as I described above. A win for patrons, users and the few libraries that remain certainly. However, the library community will look radically different in 20 years time for these innovations and many institutions existing now won’t need to be in the future.

You’ve gone from replicating what Google is doing, digitizing 25 million copies to to something magnitudes more ambitious – 30% of all ARL holdings. Google’s work is 1/6th of that.

My understanding (incomplete, since they keep their numbers hysterically confidential) is that Google’s project is much more ambitious than the figure of 25 million that has appeared in some public reports. I mentioned evidence of this before, in that the project currently has 21 participant institutions, two of which are multi-library consortia, and Google reportedly digitized nearly 5 million titles at just one of those libraries (not 6 million as I said earlier). If that report is correct, then a total of 25M books would mean that Google will digitize, on average, fewer than a million titles from each of the other participants. Bearing in mind that those other participants include Harvard, Oxford, Princeton, and the entire U California system (which counts as a single partner), that seems extremely unlikely.

But let’s say that in order to replicate Google’s project, libraries would only have to digitize 25M books, and let’s allow (as I did above) that the real cost is not $60/book, but only $30/book. That’s a total cost of $750M. Divide that among the ARL libraries and you’ve got just over $6M per library. Spread it over ten years and that’s $600k/year/library, which does start sounding feasible for a large research library that is willing drastically to change its programming. But that number greatly underestimates the real costs such a project would entail. Even if the direct costs of digitization amount to only $30/book, that doesn’t take into account the costs of coordinating such a project among more than a hundred libraries (assuming that only ARLs are involved; expand the project to others and you reduce the size of the project per collection while increasing the cost of coordination and administration). It also doesn’t take into account the enormous legal costs (and risks) that Google assumed by being willing to challenge the publishers’ copyright claims. You said in your original comment that Google’s project constitutes “in every way conceivable a flagrant violation of US Copyright law.” Setting aside the moral issue (if it’s wrong for Google to do it why would it be right for libraries to do it?), surely it would at the very least be extremely expensive for libraries to undertake such a program of systematic and flagrant violation on their own.

I realize this probably sounds like I’m just whining: “Come on, Todd, this would be hard! Don’t make us change what we do and reallocate our resources! Waaaah!” I’m sensitive to that, because I spend a lot of time preaching at my colleagues about how important it is that we change what we do and reallocate our resources. But at the same time, it’s very easy for someone outside the library community to say “If you guys would just have some vision and do some collaboration and reallocate some resources you could do the same thing Google’s doing.” Take it from someone who’s working in these particular trenches: the fiscal reality is much, much more complicated and difficult than that, and has become increasingly so in recent years.

It’s interesting to me that there is no mention here of the Hathi Trust, which built on the Google Library Project to add new value to it in various ways, especially through digitization of special collections that are unique to each participating library and the offer of new services, some of which are available only to members. This certainly extends the value of what Google started immeasurably, and unlike GBS, it is under the control of the member libraries. This is a good thing both for the world of scholarship at large, which benefits from open access to rich resources hitherto accessible only to people who visited the physical libraries, and also to the participating libraries, whose patrons gain extra layers of service not granted to non-members. At the same time, I have earlier raised in an e-mail exchange with Rick whether the availability of these special collections does not at least somewhat diminish the value these libraries can claim to have as destination sites for researchers, somewhat along the lines of the point Todd makes here. And, unlike books (unless they are really rare), these special collections do need to be maintained in print form because at least some of them have continuing value as physical artifacts.

Sandy, I couldn’t agree more about the value of Hathi Trust. It’s a fantastic project. And as you know, I also agree with you that by digitizing our rare and unique collections we run the risk of reducing the number of people visiting our libraries — and I think that’s exactly as it should be. I would much rather make it possible for a researcher in Berlin to read my library’s unique pioneer diaries online than ensure that the only way he could do so is to visit my library.

Comments are closed.