With the recent surge in library e-book sales, serials aggregators are racing to add e-books to their platforms. ProQuest’s recent acquisition of ebrary and JSTOR’s expansion into current journals and e-books signal a shift from standalone e-book and e-journal aggregator platforms to mixed content gateways, with e-books and e-journals living cheek by jowl in the same aggregation.
Meanwhile, researchers have become accustomed to the big search engines, and have shifted from reading to skimming. As the authors of an article in the January issue of Learned Publishing, “E-journals, researchers – and the new librarians,” summarize:
Gateway services are the new librarians. . . . Reading should not be associated with the consumption of a full-text article. In fact, almost 40% of researchers said they had not read in full the last important article they consulted. . . . ‘Power browsing’ is in fact the consumption method of choice.
These changes in behavior mean that gateway vendors have to develop more sophisticated tools for organizing and surfacing content. ProQuest, OCLC, EBSCO, and others have responded by creating new tools and systems. But is it enough?
Publishers often discuss distinctions between e-book and e-journal business and access models, but the truly complex differences in e-books and e-journals reside beneath the surface, in the metadata layer. Understanding and compensating for these differences is essential for interoperable content discovery and navigation when mixed e-book and e-journal content is delivered in large-scale databases, which is increasingly the norm.
Until the evolution of semantic technologies reduces our reliance on catalog and bibliographic records for information discovery and contextualization, nothing supports research discovery better than pristine, consistent, and granular metadata.
As discussed in a recent opinion post in The Atlantic Wire, consumers are recognizing the drawbacks of Google-style search. For research, Google searching is especially inadequate for researchers given its:
- Susceptibility to search engine optimization gaming
- Reliance on linear ordering of result sets
- Lack of transparency about resources that are not included in the searched information and/or not prioritized by the search algorithm
- Inability to provide contextualized information — the “shape of the elephant” of what is being sought, based on piecemeal queries*
Networked metadata layers offer new ways of navigating and linking content, ways that avoid these pitfalls. But they have their own challenges.
Lack of consistent record quality — When dealing with such larger volumes of information, there’s little incentive for individual publishers to invest in manually overhauling their metadata. Most publishers don’t create their own records. They rely on clearinghouses — OCLC predominantly — to manage record creation for them. Inconsistencies and errors in records create enduring problems, and locating and repairing these can be a daunting task.
E-book records are arguably more problematic than their more consistent e-journal brethren. Few publishers have a detailed appreciation of the challenges involved in creating durable links across e-book subsections like chapters or entries. Even at the book level, I have heard from ARL librarians that they will refrain from populating Open URL resolvers with e-book MaRC in order to avoid the broken links and version confusion that results from dirty data.
Need for more granularity — As researchers increasingly seek smaller and more specific content units, quality metadata assignment at the entry or chunk level becomes even more important. Ideally, metadata will support durable linking like a digital object identifier (DOI) and provide hierarchically structured subject information, such as is ideally contained in an e-book MaRC record or e-journal bibliographic record.
Networked data nodes can effectively drive dynamic discovery and access for e-books, e-journals, and other content formats, including multimedia. Is there enough clean data at scale to support this? No.
Overhauling old records, investing in more granular record creation, and cross-matrixing MaRC, DOI, and bibliographic records is a massive endeavor requiring significant investments of money and time. When done right, the process can significantly improve interoperability and navigation in discrete publisher platforms. But right now, it appears this will be a competitive advantage for some and not a universal benefit.
Data layers must be populated with comprehensive, discipline-specific taxonomies and clean metadata. For purposes of discovery and navigation, data nodes should contain MaRC and e-journal bibliographic record match points as well as durable linking locations. While some individual publishers may undertake this, as Springer has, it’s untenable in the mid-term for most non-STM publishers.
One anticipates that dynamic, automated processes for metadata creation will become the norm. If Narrative Science can generate text from data, surely the reverse is within our grasp.
We can look forward to semantically generating reliable, structured metadata on the fly from text and image content chunks. STM e-content companies are the most likely candidates to blaze trails with metadata creation “engines” that learn to read and interpret content chunks in order to produce descriptive data in real-time with increasing accuracy.
However, until technology learns to learn more accurately, there will continue to be problems with scale, interoperability, and consistency of metadata in large content gateways. Discovery of mixed- type research content will be suboptimal and incomplete.
Semantic research gateways are the next wave, but they will work best when informed by the traditions of librarianship. There is critical value to be gleaned from time-tested practices for describing and organizing multidisciplinary subject matter. Metadata librarians have long wrangled with consistency issues. Reference and subject librarians — particularly those working with libguides — grasp context better than most.
The most promising solutions will make the most of innovative technologies while also mining specialized institutional knowledge.