Rediscovering Discovery - How We Find Things, and Its Implications

I recently attended the Midwinter ALA conference in Dallas. The longest line was at Starbucks.

What brought me to Dallas was an invitation to participate in a panel sponsored by Sage on discovery. Sage had commissioned a white paper on the topic, which you can find here. It’s a good paper, which I recommend to one and all. The authors of the paper were on the panel as well and spoke authoritatively to the question of how libraries are implementing new means of discovery. I had not realized how complicated the situation was. What was clear to me, though, is that librarians and publishers alike have an interest in improving finding mechanisms: librarians because they want their collections to be used, publishers because such usage translates into a stronger brand, which can help in making the case to purchase the next product.

I should note that discovery is one of two words in scholarly communications (the other is curation) that leave me not entirely satisfied. This has to do with where you imagine yourself sitting in the value chain. My bias is on the publishing side, where capital is injected into the system. For a publisher, the better terms are marketing and editing. My definition of discovery is what happens after the marketing is successful. Discovery is about finding things, but marketing is about coaxing people into finding the things that you want them to. You discover things in many ways; for example, you might type some keywords into Google. Marketing is what put those words into your head in the first place. I Googled the Web to find out the schedule of the latest episode of Downton Abbey, but I went looking for that schedule because of the immense and brilliant media blitz PBS brought to that property. Discovery is indeed a librarian’s concern; it’s what you do after the product has already been purchased.

My beef with curation is of the same order. Editing is a creative act. Editors work with authors, identifying what is good, breakthrough, and promising, and assisting in the shaping of the work. Curation is, like discovery, an after-the-fact activity: once you have something in your collection, how do you care for it? There is nothing wrong, and everything right (I am trying to forestall all the nasty comments), about discovery and curation, but they represent a library-centric way of looking at things. We need libraries, we need bigger libraries, and we need more libraries. And, by the way, we also need more librarians. A publisher-centric way of viewing the world, in contrast, would lead us to declare that we need editorial selection, that we must reject many more works than we publish (hurrah!), that we will stick our necks out with a capital investment to back the works we selected, and that in order to earn that investment back and then some, we need to bring the works to everyone’s attention. All eyes turn to the trembling head of marketing.

There is an elephant in the room — or perhaps just a multi-volume encyclopedia — when we talk about discovery, and that is that more and more libraries are reporting that over half of all searches on their collections are coming from Web-scale search engines, primarily Google and Google Scholar. So the issue of discovery inside the library is also a matter of discovery outside the library. This is where things get complicated. Publishers, seeking the attentions of Google, drop a handkerchief in the form of metadata and sample text on the open Web, but if the purchased content resides in a scattered manner among the world’s libraries, how best to direct a user to the place where he or she has unfettered access? There are mechanisms for this, but cooperation among publishers and librarians is essential to ensure that users leave the table satisfied that rich content has been served.

All this has put new pressure on creating stronger discovery engines within the library context–not to compete with Google on the open Web, but to point users precisely at the content to which they have authorized access. It may be that Google will still be the predominant agent of discovery, but local search tools have the advantage of only showing users what is theirs to view. A representative from Summon, a service of Serials Solutions, participated on the ALA panel, but there was also mention of EDS (EBSCO) and Primo. These services and others are now competing to be the primary means by which users find things in library collections.

Search, whether in-library or out, is a winner-takes-all game. If one search engine gets ahead of the others in terms of market share, it can take the greater amount of information gained from that larger share to improve the search service. An improved search service is likely to win over new libraries and users, which will in turn improve market share further. This is known as the law of increasing returns. While it is unlikely that anyone will achieve a 100% market share, we should be prepared for the day when the bulk of library collections are mediated by the search service of a single commercial vendor, a vendor (like Serials Solutions/ProQuest and EBSCO) that may be tempted to steer search to properties under their own control. Local discovery tools, in other words, could evolve into marketing tools. Am I finding the best document to answer my question or am I finding the document in which the search engine’s management has an interest? Already some people are whispering about a lack of impartiality on the part of some discovery tools (I have no evidence of such bias myself), and one can be sure that all the discovery services are going to be closely scrutinized, especially by the publishers who feel that they are not getting their due.

Library discovery, in other words, is not simply a matter of means to satisfy users; it is also a competitive battlefield.

We should look for discovery services with a big market share to begin to think about the value of their aggregate data. This could become a new business in itself. When you provide search services for one institution, you collect usage data for that one institution. But when you provide such services for one thousand institutions or more, you can compare the research activities across institutions. Let’s take a leap here and imagine a service for information usage that is analogous to the stock market indexes assembled by Standard & Poor’s. My institution prides itself in competing favorably with that other institution with the grander name, but the new information index provides real data on what is used, how it is used, and, very importantly, how much it is used. This cross-institutional data (which is owned by commercial entities) thus becomes a proxy for the research activity of the university itself. So we may see a day when we routinely rank universities by how they score in the Summon or Primo Information Index, cousin to the S&P 500. Provosts may want to take a look at this and may instruct their libraries to subscribe to the new service. Thus collection management spawns new services, which put an incremental tax on the library’s budget.

Like Columbus, we discover nothing that has not been discovered before. But in our rediscovery, we add new layers of meaning to the process, out of which comes new products and services.

A footnote to this discussion: the Elsevier ALA reception was in the unfortunate situation of competing with the PBS broadcast of Downton Abbey, which aired at 8:00 P.M. Central Time. As librarians left the Elsevier dessert hall (adorned with two live bulls — this was Dallas, after all) to get back to the TVs in their hotel rooms, we saw yet another instance of the consumer market’s intrusion into academic publishing.

Joseph Esposito

Joe Esposito is a management consultant for the publishing and digital services industries. Joe focuses on organizational strategy and new business development. He is active in both the for-profit and not-for-profit areas.

Discussion

9 Thoughts on "Rediscovering Discovery — How We Find Things, and Its Implications"

Joseph: Can you expand on this?: “….over half of all searches on their collections are coming from Web-scale search engines, primarily Google and Google Scholar.”

Are these digital collections of, say, PDFs that users can then download anywhere in the world? GS often offers such, but there is no local access issue. The user is not led to the library. I think Google offers a local search capability but that does not sound like what you are talking about. If either of these engines led the user to stuff that was only physically available in the library I would think that was a useless hit, as the chances that the user is near the library, or wants to arrange for a transfer to their library, might be quite small.

By David Wojick
Jan 30, 2012, 7:35 AM

Technically, Google Scholar doesn’t provide collections of anything. What it does is lead the searcher to documents that are available online — some of which are full-text articles that are publicly available, some of which are simply metadata records (such as citations and abstracts). So Google provides both an access portal (I know, I know, the word “portal” is so 2001) and a “discovery” mechanism by which searchers can discover that a document exists and that it may be worth seeking out. That last function actually is a useful one. Even if the document is only available physically in the library, knowing that a relevant and potentially useful document exists in some format is certainly better than not knowing it exists.

Also: Google Scholar actually does have the ability to lead users to their libraries. One of the available user preferences in GS allows you to specify your library and thereby create a special display for items in your search results that are available to you by virtue of that affiliation.

By Rick Anderson
Jan 30, 2012, 4:57 PM

Joe,

You might want to look at WorldCat Local and other discovery options provided by OCLC; they aggregate libary collections on a huge scale.

By P Martin
Jan 30, 2012, 12:52 PM

One of the problems with Google Scholar is that it only covers scholarly papers and won’t cover scholarly books – yes, won’t. We’ve recently been in touch with Google about this and they tell us they only want to have books in their books service. This is hardly user-centric thinking.

By Toby Green, OECD Publishing
Jan 31, 2012, 5:19 AM

The question of How Users Navigate to content is one that I and my colleague Simon Inger (one of the sources cited in the Sage White Paper) originally addressed in a report circa 2005. It is still available on his web site in its latest 2008 edition (with Tracy Gardner as co-author) here: http://www.sic.ox14.com/publications.htm

More importantly while it is unsurprising that large numbers of librarians should be eschewing the delights of two live bulls at an Elsevier dessert hall in favour of Downton Abbey, I do think that the general anglophilia of American librarians for all things British and crusty (from which I have benefited considerably over the years) needs to be put in perspective with some understanding as to how we Brits see Downtown Abbey.

This best insight into our view is available here:

http://youtu.be/r5dMlXentLw Part 1
and here
http://youtu.be/p3YYo_5rxFE Part 2

For a more in depth analysis of the history of Anglo American relations I thoroughly recommend Christopher Hitchens: Blood, Class and Empire: The Enduring Anglo-American Relationship

By Chris Beckett
Jan 31, 2012, 12:06 PM

The videos are hilarious. Thanks for the pointers. But don’t be too hard on us Yanks. We admire the Brits for bringing us Monty Python as well as Jane Austen, the Beatles as well as Lady Diana. And Henry James migrated there: What better recommendation?

By Joseph Esposito
Jan 31, 2012, 12:40 PM

Google Scholar is listed on many libraries’ websites as another choice among the indexes and databases. It can be listed right alongside the more traditional choices. If a library associates its collection (electronic journals) with GS via their link resolver, the user who is authenticated (current faculty, staff or student, for instance), uses Google Scholar to do their search. The authenticated searcher is then is taken directly to the display of results which include a text version of the link resolver-maybe saying “full text here.” Clicking on the full text icon brings the searcher directly to that library’s subscribed material, displaying the full text article. So, the step beyond the “preferences” is to partner with Google Scholar to bring the library’s holdings up front in the display, and with full text easily accessible via the link resolver, it is certainly a popular choice in libraries that use the program. Libraries can assess usage locally.

GS works extremely well and serves to push referrals to subscribed resources, thereby increasing their usage. For materials not subscribed, GS collects all the versions of an article together, pushing use of open access resources as well. Here’s the description of the program:

http://scholar.google.com/intl/en/scholar/libraries.html

I would think uptake of this program by now is pretty popular, at least among research libraries. Google Scholar’s appearance on library websites’ lists of indexes and databases had been growing.

Libraries enabling the LibX toolbar also have their link resolver show up in other places on the web and can enable a Google Scholar search as well. Many libraries have the LibX toolbar installed, allowing highlighting of references within text, dragging and dropping into Google Scholar via the “magic button,” and then taking the searcher to ful ltext if available. The toolbar is also popular with users and aids discovery -as well as pushes use of library resources.

http://www.libx.org/

By Laura Bowering Mullen
Jan 31, 2012, 12:39 PM

Great post, Joe, and I appreciate the link to the Sage white paper, as you know. I do have a bit of a disagreement with you about the intersection between “discovery” and “marketing” in the scholarly context, which is mainly what you’re talking about. In scholarly research, the “marketing blitz” is not the true driver of discovery. Your example of “Downton Abbey” holds more weight in the “trade” side of scholarly books, definitely: do you get people to buy the Mark Twain autobiography and where do they find it? In that case, absolutely, marketing is the driver. Another example where marketing is a driver, perhaps again the main driver, is making sure that experts on a topic are aware if new books in their area of expertise. Otherwise, though, researchers are going to look for scholarly works on a given topic when and only when they need to delve into the topic in question. If you are doing research on Latin American social movements, consulting databases such as Google Scholar, Ebsco Discovery Service, Serials Solutions Summon service, et al., it’s the research topic that’s the driver, and the publisher’s role here is making sure the metadata is robust enough to connect THEIR book to YOUR search at the top or near the top of the rankings, instead of on the 32nd page of search results (or even worse, their book is not in the database at all). In this case, in my opinion, they key for the publisher is supplying quality metadata to the right channel, ensuring that the book indexed for full text search, AND tying the search result into the supply chain, such as patron driven acquisition or some other acquisition model.

By John Warren
Jan 31, 2012, 6:43 PM

The Scholarly Kitchen

Rediscovering Discovery — How We Find Things, and Its Implications

Joseph Esposito

Discussion

Latest “Pulse Check” Results Reveal Diverse Approaches to Social Media

SSP Joins Nearly Half Million Comments in Opposition of Proposed OMB Revisions

Joseph Esposito

Related Articles:

Next Article: