The main reading romm of Graz University Libra...
The main reading romm of Graz University Library (19th century) on 2 Sep 2003. Picture taken and uploaded by Dr. Marcus Gossler. (Photo credit: Wikipedia)

In my previous posting, I focused on what I believe to be dim prospects for the Encyclopedia Britannica as it transforms from a set of printed volumes into a networked online information portal. My skepticism stems from the fact that although the EB claims to offer “the breadth of the world’s knowledge,” its coverage of the world’s knowledge is actually severely limited; it offers scant information on many topics and none at all on many others. To me, the likelihood that it will find many paying customers — especially given the free availability of similar, easy-to-use, and far more comprehensive resources, such as Wikipedia — seems low.

Today, I’d like to suggest that the traditional research library faces a similar challenge. The library collection is simply a bigger version of the encyclopedia: a seemingly exhaustive but actually (in the great majority of cases) very limited information portal that invites increasingly-skeptical customers to “start your research here.”

It’s worth asking why a patron would (or should) want to start his or her research with the library collection. The answer will obviously depend on what kind of research is being done. If the patron is looking for a known item, then the question  he is asking himself is Can I get quick and easy access to Document X? The library does a good job of answering that question for its patrons. Library catalogs are generally pretty effective tools for known-item searching, and academic libraries have gotten quite good at providing easy access to the items in their collections, many now going so far as to offer free on-campus delivery of books and personal electronic delivery of articles.

But known-item searching constitutes only a small part of the scholarly research process. A much larger and arguably more important part of that process is the one that involves the question Is there any such thing as a document dealing with Topic X? The traditional library is, and always has been, poorly positioned to answer that question. While the library catalog can tell you whether or not it holds a book or article on Topic X, it should be obvious that this question is more or less beside the point to a researcher, whose world is not defined by the boundaries of the library. Finding relevant documents or citations among the library’s offerings doesn’t answer the question fully (are there others?), and establishing a lack of them in the library doesn’t answer the question at all.

There was a time — not very long ago — when the boundaries of one’s local library collection did more or less define one’s functional information world, and therefore the functional difference between these two questions (“Does my library own . . .” and “Is there such a thing as . . .”) was much smaller. During that period it made more sense to conflate those questions, and it made sense for libraries to encourage their patrons to “start your research here” — not that such encouragement was really necessary, as patrons had few other options. The fact that the library’s collection was severely limited in coverage, and that it was difficult to navigate, reachable only by travel, and open only part of the day posed little threat to the library’s position as an information and research portal, because it had no real competitors for that position.

Obviously, the information environment now teems with such competitors. But much more importantly, it’s now very difficult for any new entrant to the portal marketplace to get a foothold. Those who want quick information on a particular topic and might once have turned to a traditional encyclopedia now have Wikipedia — which is free, very easy to use, much more comprehensive in its coverage than any traditional encyclopedia, and reasonably authoritative. And those who want to figure out whether there is such a thing as a document on Topic X now have Google — which is free, very easy to use, and searches an astronomically huge (though not absolutely comprehensive) array of documents, many of which can be directly accessed in their entirety right from the search result, and others of which are discoverable as citations. Taken together, Google and Wikipedia arguably do an awful lot of what the library once did, and they do it more effectively, more conveniently, and for a much, much larger population than any individual library can serve. And they never close.

One obvious response to this argument is that part of the library’s value lies in its selectiveness. Like an encyclopedia, it not only includes but also excludes. This means that the library’s value proposition is not just that it includes everything you’ll need, but also that you won’t have to waste a lot of time slogging through what you don’t need. The problem with this value proposition is that it constitutes a solution to a problem that many of us actually experience quite rarely these days: that of being overwhelmed by information. We don’t actually have to slog through irrelevancies very often. Yes, results of a Google search might number in the millions. But its sorting algorithms have gotten so good at intuiting what the searcher is looking for that it’s rarely necessary to look beyond the first page of results to find exactly what one is after. Even if you need a wider range of sources, you will often find as much as you need within the first few screens of the search result; the fact that the entirety of that result tails off into hundreds of thousands of other hits doesn’t usually impinge on one’s searching experience.

The simple (and slightly terrifying, to a professional discriminator) fact is that selectivity offers less value in an environment of networked online access and full-text searchability than it did when information was housed in printed documents. One purpose of selectivity is to keep the size of the document manageable—but if you don’t have to carry the document on a bike or store it on a shelf and can search its entire text without recourse to an index, then document size becomes much less of an issue. Another goal of selectivity is to save the neophyte the effort of trying to figure out what’s essential and what isn’t. This is an honorable endeavor, but it assumes that people only need access to what we librarians consider essential (or even of high quality). In reality, researchers’ needs vary widely from person to person and from project to project, and they may well need access to materials that we would not consider to be core, reliable resources.

A more encouraging fact for libraries is that while both Wikipedia and Google offer unprecedented coverage and ease of access, neither of them offers a staff of dedicated helpers ready and waiting to help researchers shape their projects and locate relevant and high-quality sources. This is significant, and it represents one of the traditional library’s stronger value propositions (although as a service model, it suffers from serious structural limitations). As long as students and researchers believe that they need help, librarians will likely have an important role to play.

But in its fight to retain a strong position in the marketplace of researchers’ time and attention, I think the library’s most powerful weapon is the type of material we usually refer to as “special collections.” Patrons can get commercially-published books and articles from any number of sources, but if your library owns a truly unique document (like a daguerrotype portrait of a 19th-century actor, or the handwritten diary of a Mormon pioneer, or a typescript transcription of an oral history) then access to that document constitutes a genuinely unique value proposition. Historically, we in research libraries have tended to consign special collections to something of a ghetto—a benign and beloved one to be sure, but one that is somewhat outside the mainstream of everyday library services.

That has to change. Greg Silvis, of the University of Delaware library, put it very well when he argued recently that “the future of libraries will not be found in commodity (catalog) records for commodity books.” Serving as a broker for resources that exist in many different copies in multiple formats and that can be found easily through Amazon or iTunes and purchased at reasonable prices is not an area of growing opportunity for libraries. Where we offer real and unique value, value that separates us from the competition, is in those areas in which we have no competition.

Is there enough demand for such resources to keep us in business? One problem with focusing on these materials is that (unlike our general collections) they’re likely to be of serious interest to a relatively small subset of our local client base. One solution to this problem is, of course, digitization, by which we can make much (if not all) of the relevant content accessible to an audience of billions. So then the next question is: can we convince our sponsoring institutions that supporting the provision of that kind of value to billions of people who are mostly outside of our service remit is the best way to invest institutional funds? That’s probably a good topic for a future post.

Rick Anderson

Rick Anderson

Rick Anderson is Associate Dean for Collections and Scholarly Communication in the J. Willard Marriott Library at the University of Utah. He speaks and writes regularly on issues related to libraries, scholarly communication, and higher education, and has served as president of NASIG and of the Society for Scholarly Publishing.

View All Posts by Rick Anderson


25 Thoughts on "The Portal Problem, Part 2: The Plight of the Library Collection"

I absolutely agree that special collections have always been at the core of what value libraries have to offer, now so more than ever. But then one wonders at the logic of digitizing them and making them freely available to everyone, thus essentially turning them into just another Internet resource like Wikipedia, albeit a very focused one. Doesn’t that very act undermine the possessing library’s uniqueness as a destination point for scholars, unless there is some additional value to inspecting the collection’s holdings in their sheer physicality?

  • Sandy Thatcher
  • Apr 16, 2012, 10:04 AM

It depends on what you mean by “destination point.” If you mean that digitization undermines the library’s uniqueness as a _physical_ destination–a place to which researchers have to travel in order to have access to the resources–then yes, digitization absolutely undermines that goal. But in my view, such a goal is unworthy and would represent a profound disservice to the scholarly community. Getting people in the door is not the point; giving people access is the point. Making them come in the door in order to have access, when we have the ability to make access available otherwise, would be a perversion of the library’s purpose.

If by “destination point” you mean a virtual destination point, then no–digitizing a unique collection enhances, rather than undermines, the library’s value as a destination for scholars, because it makes the library more useful to them by providing better access to more unique content.

  • Rick Anderson
  • Apr 16, 2012, 12:26 PM

I was just trying to point out the paradox that by digitizing their special collections libraries are at one and the same time undermining what makes them special. Once digitized, and especially if available through a site not dependent on the library itself (say, HathiTrust), these resources become universally available, and there is no longer any need to retain the library–except, perhaps, as a museum for the physical artifacts. I certainly wasn’t questioning the value of making the collections available digitally, just the logic in terms of libraries’ self-preservation. They can digitize themselves out of existence!

  • Sandy Thatcher
  • Apr 16, 2012, 1:56 PM

Sandy, are you under the impression that once a collection is digitized, nothing else needs to be done to make it available and keep it that way? You might as well argue that by collecting books, the library runs the risk of collection-building itself out of existence (after all, once the books are bought, who needs the library?). In reality, the creation and maintenance of digital collections is a large, complicated, and ongoing task that involves not just significant infrastructural investments, but the investment (and/or redirection) of staff time as well.

Am I saying that these tasks necessarily have to be done by librarians, or that shifting the focus from commodity print collections to unique digital collections will guarantee a secure future for the traditionally-configured library? Absolutely not–I believe the challenges to the research library’s future are real and serious, as I’ve argued elsewhere. But the only aspect of library service that is undermined when we make resources remotely and easily available is that of the library as a physical place–and since every research library of which I’m aware is filled to overflowing with students using its study spaces and faculty members using its instruction spaces, and since the online availability of collections has no bearing on that kind of use, I think the traditional “library as place” role is probably the least threatened of any right now.

  • Rick Anderson
  • Apr 16, 2012, 2:24 PM

Yes, but when the digitizing is done by Google, which owns the digital files, and the maintenance is done by the HathiTrust, what is the remaining role for the donating library beyond preservation of the physical artifacts?

  • Sandy Thatcher
  • Apr 16, 2012, 3:51 PM

Well, for one thing, only a tiny fraction of research libraries have been Google partners. For another, even among that tiny fraction of libraries, only a tiny amount of what Google has digitized is special collections-type material–most of the scanned content comes from commodity books, not the kinds of unique materials that I’m suggesting represent the strongest value proposition for a research library. The Hathi Trust also mainly hosts formally-published, commodity books, not the kinds of materials I’m talking about.

Also, bear in mind that even among the very few libraries that are Google partners, it’s not as simpe as Google “own(ing) the digital files.” Google takes away copies of the digital files but also leaves copies that are owned by the hosting library and not controlled by Google at all.

  • Rick Anderson
  • Apr 16, 2012, 7:20 PM

As I understand it, what the HathiTrust has done–and what makes it particularly valuable–is to build on top of the Google original digitization of books a new program of digitizing unique special collections. That is certainly the case for Penn State’s library, which contributed only materials from special collections. Since the participants in HathiTrust include some of the biggest academic libraries, this does account for a lot of special collection items.

  • Sandy Thatcher
  • Apr 17, 2012, 7:43 AM

A particular focus on special collections material is nowhere mentioned, as far as I can determine, in any of Hathi Trust’s objectives, mission, or goals documents. It may well be that Penn State is sending scans of unique, special-collections content to Hathi, but the fact remains that, as I said above, the vast majority of Hathi’s content consists of non-unique, in-copyright materials (what I’m calling “commodity books”). Obviously, none of this is to say that Hathi is uninterested in unique, special-collections-type content — only that Hathi’s collections consist mainly of commodity books, and Hathi does not seem poised to take over for individual libraries as a gateway for public access to digitized collections.

In the particular case of Penn State, it’s worth pointing out that although its library may well be using Hathi as a backup archive for at least some of its digitized special collections material, it is clearly not ceding the gateway and discoverability functions to Hathi — as PSU Library’s elegantly designed Digitized Collections portal makes clear.

  • Rick Anderson
  • Apr 17, 2012, 9:21 AM

A focus on special collections may not appear in any HathiTrust documents, but I can tell you with great certainty that this was a primary reason that the CIC universities got involved in setting it up because I heard the discussions via my boss, PSU’s head librarian. The idea was to build on the large corpus of what you call “commodity books” digitized by Google by adding materials unique to the collections of the CIC libraries.

  • Sandy Thatcher
  • Apr 18, 2012, 12:35 PM

So it sounds like your original question (what is left for libraries to do if they let Google do the digitizing and Hathi do the hosting?) was prompted by your understanding that the CIC libraries, at least, plan to use Hathi as a host for their rare and unique materials. Is that right?

If so, then I guess this is my follow-up question: is it your understanding that the CIC libraries plan to transfer all of their digital special collections to Hathi, delete locally-held files, and eliminate their online exhibits, finding instruments, and access portals? If so, then those libraries do indeed intend to use Hathi in a way that undermines their own role as access destinations. But if (as I strongly suspect) they in fact plan to maintain local copies of the files and maintain locally-designed online exhibits, finding aids, and access mechanisms, then in that case Hathi acts not as a replacement, but rather as a backup repository and a secondary access point for the digitized special collections of a few libraries. And for the great majority of research libraries the issue is moot, since they aren’t Hathi partners (at least for now).

  • Rick Anderson
  • Apr 18, 2012, 2:43 PM

Yes, that’s what I’m saying, except that I would disagree with you about calling the Hathi site a “secondary access point” because, for most of the people located anywhere else but the place where the physical library exists, Hathi will become the primary access point for these special collections.

  • Sandy Thatcher
  • Apr 18, 2012, 5:01 PM

Why do you think that, Sandy? I live in Utah, and let’s suppose that I want to explore Penn State’s digitized special collections. Why, given the public availability of Penn State’s excellent digital collections portal, would I choose instead to go to Hathi Trust and try to dig up Penn State’s digital files there? Nothing about my physical location makes it easier for me to access those documents via Hathi than via PSU.

  • Rick Anderson
  • Apr 18, 2012, 5:30 PM

For the same reason that people use Wikipedia or Google–discoverability, as you’ve pointed out. Only if you already know that Penn State has some special collection that you are interested in are you likely to go to its site directly first. If you are researching a topic, you might go to Google first, and Google is more likely, because of its algorithm, to lead you next to HathiTrust as the site for the special collection than to PSU.

  • Sandy Thatcher
  • Apr 18, 2012, 5:50 PM

Ah, okay — you’re assuming that Hathi is destined to achieve Google- or Wikipedia-level popularity as a first-stop portal for researchers. There is no bigger fan of Hathi Trust than me, but I don’t really think there’s much likelihood of that. I don’t even think it’s something that Hath aspires to–again, look at their “Mission and Goals” statement; they’re focused on preservation and access, not discovery, and while those goals are not unrelated they are nevertheless very different and imply very different development trajectories. That’s why Google, Wikipedia, and Hathi have such different looks and feels. But who knows? I guess it could happen; we should revisit this conversation in five years and see.

  • Rick Anderson
  • Apr 18, 2012, 11:58 PM

In other words, the future of libraries is in publishing.

  • Dean
  • Apr 16, 2012, 1:08 PM

Yes, I guess — if you’re willing to accept a very broad definition of “publishing.” (I have no problem with such broadening, but I know some publishers who would probably disagree.)

  • Rick Anderson
  • Apr 16, 2012, 1:58 PM

I have no complaint with your special collections thesis — but I think you give Google and Wikipedia, etc. too much credit for meeting the needs of scholars. So much leading-edge publication is not crawlable, but kept locked up in subscription databases. Libraries still have useful rolls as portals — to secure, and advocate for access to those crucial subscription-constricted resources on behalf of their scholars/communities. I know that’s not the point of your article here — and realize this is a separate soapbox, but it is a relevant part of the library-as-portal discussion.

  • Leahkim Gannett
  • Apr 18, 2012, 3:44 PM

Yes, my point here is more about the library’s role as a discovery portal. You’re right that there are many documents (notably research journal articles) that many patrons will only be able to access if the library buys them. But it’s decreasingly true that those patrons need to (or should) rely on the library’s catalog to discover those documents. Even locked-up articles are often easily discoverable on the open web, even if they aren’t freely available for full-text downloading. Once the patron discovers them, s/he is then faced with the first question I mentioned — “Can I get access to this document?” — which, as I said, is a question that libraries are very well positioned to answer for their patrons.

  • Rick Anderson
  • Apr 18, 2012, 5:26 PM

Interesting post, Rick, and I’m looking forward to your visit to the University of Maryland later this month. I read the Portal Problem Parts 1 and 2 and they raised a lot of little questions for me, but two big ones:

First, I’m not sure I follow what you mean by “portal.” In The Portal Problem, Part 1, it seemed like you were setting up an opposition between Encyclopedia Britannica as portal and Wikipedia as something else. In Part 2, it looks like we’re talking about the library (or the library catalog?) as portal vs. Google and Wikipedia as “not portals.” I would consider both Google and Wikipedia to be portals and in that case the library is one portal among many. What’s the distinction?

Second, I think you’re conflating the library collection with its catalog. Obviously, the catalog is limited as a discovery portal (as are all portals, but it does certain things better than Google), but it’s one portal within the larger portal of the “library collection.” Any good research library has additional portals in the form of bibliographies, chronologies, specialized indexes and so forth that get at information in their own unique ways. Thought of this way, I’m not sure the metaphor of “library as bigger encyclopedia” holds up. The library collection contains numerous sources, primary, secondary and tertiary, both general and agonizingly specific.

  • Steve Henry
  • Apr 19, 2012, 12:58 PM

Hi, Steve —

I’m looking forward to my visit too! To respond to your (very good) questions:

First, I’m not sure I follow what you mean by “portal.” In The Portal Problem, Part 1, it seemed like you were setting up an opposition between Encyclopedia Britannica as portal and Wikipedia as something else.

That wasn’t my intention. What I meant to do was set them up as opposed examples of a successful portal (Wikipedia) and an unsuccessful one (the EB). My argument is that Wikipedia beats the EB at the portal game because it’s more comprehensive, easier to use, reasonably authoritative, and free. My working definition of “portal” is basically any information service or resource that says to its audience “Start your research here.” Interestingly (though maybe not significantly), that’s a message that is sent more or less implicitly by the successful ones (Google and Wikipedia) and much more explicitly by the ones that I think are likely to be unsuccessful (the EB and the library interface).

In Part 2, it looks like we’re talking about the library (or the library catalog?) as portal vs. Google and Wikipedia as “not portals.” I would consider both Google and Wikipedia to be portals and in that case the library is one portal among many. What’s the distinction?

None — I agree. Google, Wikipedia, and the library interface are all portals according to my working definition.

Second, I think you’re conflating the library collection with its catalog.

That’s also not my intention, though I probably expressed myself clumsily. I consider the library’s website (to include the catalog interface) to be the portal–that’s the service to which we’re trying to attract our users with the invitation to “start your research here.” The collection lies beyond the portal. In the case of the library, the portal interface (catalog) and the content (collection) are more separate than they are in the cases of Wikipedia and the EB; in both of those cases there’s no distinction between the search result and the content–the result is content. Google and the library catalog both return search results that consist of metadata–though in the case of Google, the metadata very often links directly to full-text content, and in the traditional library catalog the result is metadata only. (Thankfully, that’s changing in libraries, though I’m not sure it’s changing fast enough.)

Any good research library has additional portals in the form of bibliographies, chronologies, specialized indexes and so forth that get at information in their own unique ways.

True, but access to those sources is mediated by the primary library portal. The fact that the collection includes secondary sub-portals doesn’t solve the problems that I believe are posed by the library’s weaknesses as a portal destination at the primary level.

Does that make sense? I worry a little bit that I’m actually being less clear rather than more…

  • Rick Anderson
  • Apr 19, 2012, 2:03 PM

Okay. Yeah, that helps a lot. Thanks for the thoughtful reply, Rick. And I think for a complex topic like this it’s okay for things to get less clear as ideas are explored!

I think the key for me understanding you was the definition of portal as something that says “start your research here” and I can work with that, although I tend to think of a portal as “an entry point into a body of records” (something like that) and that conception changes the game a bit. Another way to complicate things a bit would be to think of “research” not as a stable unified concept but as something a little more fluid–in other words one portal might say “start a certain *type* of research here.” If we stick with your definition the next step for me would be to figure out how important “starting research” is in the larger research process.

In the case of EB vs. Wikipedia, the comparison is pretty clear, because both resources serve very similar functions (“starting research” or “getting general background information”). And I totally agree with you that Wikipedia is more successful, for the reasons you mention. One concern I have is that EB does have some values that are perhaps secondary to the portal function as you have defined it: EB probably still has a slight edge in authoritativeness, EB serves a “canon-forming” function, and related to canons EB is useful for “tracking changes” across time (although Wikipedia beats the pants of the online EB in the track changes department). And if nothing else, as a researcher it’s valuable to have more than one general knowledge portal (where do they agree and disagree, how do they spin various concepts, what are the biases of the authors). Still, like you I don’t know if the marketplace can sustain both (especially given the fees EB is asking for) and I’m sure we don’t have the political will to help save EB as a “public good.”

When it comes to the library portal vs. Google/Wikipedia/open Web, for me the comparison gets a lot more difficult, mainly because Google and the library portal aren’t as similar in function as EB and Wikipedia. I agree with you that the library is weak as a “portal destination at the primary level” but I’m not sure how much value I place on being at the primary level. For many users, yes, Google is probably always the starting point (and the ending point, which might be a bigger problem). Most researchers I work with have already Googled their topic to death and then come to me because they’re stuck, or want more, or want the specialized tools the library offers, or want confirmation that something doesn’t exist (lots of that last one, actually). I’m not so sure this is a lot different than pre-Google days: research then and now often begins with conversations at conferences, meetings between student and adviser, browsing in book stores, following citations from other readings, etc. And when you start talking about more advanced researchers, I think there are in fact cases where the library is the primary portal and it just depends on the researcher, the topic at hand, or the current method of research. There’s a lot of research coming out finding heavy readers use both e-texts and paper and I would hypothesize the same goes for research portals: serious researchers Google a lot and also use the library portal a lot.

So I think I would add “systematic in-depth research” to special collections and people resources as part of the library’s value proposition. And maybe we need to think of the library portal saying to users “continue your research here” or “get deep in the weeds with your research here.” Not sure how to brand that! I don’t know. Are we setting ourselves up for failure by trying to compete with Google as a starting point for research? Can we clarify what KINDS of research the library portal succeeds with?

  • Steve Henry
  • Apr 20, 2012, 1:44 PM

You make some very good points here, Steve. To respond to just a couple of them:

When it comes to the library portal vs. Google/Wikipedia/open Web, for me the comparison gets a lot more difficult, mainly because Google and the library portal aren’t as similar in function as EB and Wikipedia.

Exactly, and their functional dissimilarity is my point. I think a real problem arises when we encourage students to use the library portal as if it were Google (which is what we’re doing whenever we say “start your research here”). The library portal is arguably a poor place to start your research, because it can’t answer very well the most fundamental question: “is there such a thing as a document on Topic X?”.

There’s a lot of research coming out finding heavy readers use both e-texts and paper and I would hypothesize the same goes for research portals: serious researchers Google a lot and also use the library portal a lot.

My impression is that this is exactly right. The problem, for academic libraries, is that serious research use isn’t enough to keep us in business. Our bread-and-butter constituency–in terms of sheer number–is not serious researchers, but undergraduates. So one of our big challenges is (and always has been) trying to configure our services and value propositions in ways that will serve a whole range of users equally well.

And maybe we need to think of the library portal saying to users “continue your research here” or “get deep in the weeds with your research here.”

I like that formulation a lot, and I agree that the branding/marketing challenge for it is significant. But it might be fun to try to meet it. How about a poster with a picture of a stupefied kid standing behind three chest-high piles of search result printouts and the caption “Had enough yet?”. (That would seem to go against my earlier argument that researchers aren’t as overwhelmed by search results as we like to tell ourselves they are, but in this case that’s beside the point–here the point is to quickly and engagingly communicate the idea that “we’re here to help you tighten and deepen your research focus” or whatever.)

  • Rick Anderson
  • Apr 20, 2012, 2:28 PM

