Three years ago this week, my friend and guest blogger Julie Zhu of the IEEE posted an expertly written overview of the tangled web of metadata that flows through the pipelines of scholarly communications. In particular, Julie highlighted the gaps in our supply chains that undermine the accuracy and quality of library discovery services, catalogs, and other pathways to institutional usage of publisher platforms and databases.
We have seen a number of indicators that posts like Julie’s have inspired change in our industry when it comes to the scholarly metadata supply chain:
- Library discovery service vendors are holding more routine meetings with publishers to address metadata gaps, enhancements, and standards compliance.
- Nearly 40 publishers have staff positions focused on improving content discovery and publication metadata. As Julie notes, “we have now formed a Google Group for Publisher Discovery Contacts, where we can share information about content discovery and management.”
- Libraries are increasingly willing to do their part in configuring and iteratively improving the metadata ingested by discovery systems; in fact, nine libraries have now released statements aligned with NISO’s Open Discovery Initiative (ODI).
- And, ODI has released its second installment of recommended practices for librarians, publishers, and indexers.
Let this reflection serve as a reminder to us all: Healthy metadata pipelines rely on each player in the value chain doing their share and ensuring our shared users have access to highly accurate and timely scholarly information.
Building Pipes and Fixing Leaks: Demystifying and Decoding Scholarly Information Discovery & Interchange
Editor’s note: Julie Zhu is Manager of Discovery Service Relations for IEEE, “the world’s largest organization dedicated to advancing technology for humanity”. She works with internal teams, library service and search engine providers, libraries, and other parties to make the IEEE content more discoverable, linkable and accessible from external discovery channels. She has been active in NISO since 2010, currently serving in the Information Discovery & Interchange Topic Committee, ODI Standing Committee and KBART Automation Working Group.
Information discovery and access is often a source of frustration for scholarly information stakeholders. Feeling fed up with the many stumbling blocks in web-scale discovery service tools, some are chasing “new” discovery solutions or supercontinents of scholarly publishing, even though these new solutions may not improve search or retrieval. Believing that web-scale discovery has passed its heyday, some are focusing on authentication products that bypass paywalls. Still, promoting library branding, some librarians are enhancing their own library discovery pages and making content in institutional repositories and archives more discoverable. The landscape of scholarly information interchange is becoming more complex, more fragmented and more siloed.
Wherever researchers start their content discovery, no solution can promise a wonderland where all discovery, linking, authentication and access problems magically disappear. As scholarly communication professionals, we need to demystify and decode the channels of scholarly information discovery and interchange. It may help to imagine these discovery channels as a complex network of content pipelines, connecting various electronic systems, from author research and manuscript submission, to online publishing, syndication, indexing, linking, authentication and much more. Like any plumbing system, these data pipes can leak, break or be blocked in many possible ways.
For example, authors who format their names incorrectly upon submission, such as not capitalizing, or including math symbols, may find their names left out of search results in Google Scholar. Content pipelines inside a publishing organization may break if one group decides to change the format of one data element, a column in a database table, or a simple workflow without informing other groups. External content feed pipelines may break if FTP transfer fails on either the sending or the receiving side, without triggering a warning. Content pipelines may be blocked when a discovery service vendor does not index received content in a timely way, maps content incorrectly, fails to provide a link, or produces a broken OpenURL link. The largest blockages may come from libraries that inadvertently turn off the faucets, by not enabling the correct collections in a discovery index, not selecting the correct title list targets in a link resolver knowledge base, or not putting the discovery tool or databases under proxy for off-campus users.
Stakeholders in the scholarly information content supply chain need to design and build effective content pipelines to find and fix content leaks, breaks and blockages. NISO’s Open Discovery Initiative (ODI) is a committee of librarians, content providers and discovery service representatives dedicated to enhancing scholarly information discovery through greater collaboration across the community. ODI engages all parties in the discovery chain, for web-scale discovery services like EDS, Primo, Summon and WorldCat Discovery, to ensure transparency and freedom of choice through rich metadata inclusion, resource interoperability, statistical consistency, and link customization and optimization.
In 2018 ODI published The ODI Implementation Guide for Content Providers, to help content providers conform to the ODI Recommendations issued in 2014. As a checklist developed out of “Should publishers work with library discovery technologies and what can they do?”, the Implementation Guide is a roadmap to help content providers detect and fix discovery-related problems. Although the focus is on web-scale discovery tools, the methodologies can be used for other search channels as well. The following sections summarize recommendations from the Implementation Guide and provide examples on how content providers, discovery service providers and libraries are working together to build better content pipelines and fix leaks.
- Dedicate resources towards content discovery
ODI recommends content providers allocate resources to content discovery, based on the organization’s needs and resource availability. They can designate a discovery service lead to coordinate internal team projects and manage external relationships with vendors, libraries and other industry stakeholders. They can form a cross-functional discovery task force, including members from content and technical units that create content, manage databases, deliver syndicated content, and develop software, as well as from business units that interface with library customers, manage online platforms, and create analytic reports. Since not all the staff are customer-facing or understand the importance of discovery and the downstream complications, it is important to secure cross-divisional management buy-in, for projects like improving internal data quality and flows, complying with industry best practices and collaborating with vendors and libraries. Communication is a key component, internally and externally.
In 2013, IEEE assembled a three-person team to look for problems in discovery services, work with vendors and create discovery service and knowledge base configuration guides for libraries. After determining that distributing this work across multiple teams and people wasn’t effective or efficient, IEEE created a position of a dedicated Discovery Service Relations Manager in June 2014. The remit was to conduct deeper analyses of the problems, create roadmaps for overall improvements and coordinate internal and external resources for resolving issues. In 2015, with support from the IEEE management, a nine-person Discovery Service Working Group was formed, to communicate regularly on detected issues, ongoing projects and ways to collaborate with vendors and library customers.
More publishers, of various types, sizes and focuses, are devising ways to allocate resources to content discovery. Since 2015 at least eight publishers have created job titles including the word “Discovery.” The publishers are Gale, SAGE, Oxford University Press, Emerald, Springer Nature, Bloomsbury, SAE International and IOP Publishing. In dozens of other organizations, such as Wiley, Cambridge University Press, Taylor and Francis and SPIE, staff, mostly in the departments of Product Management, Marketing, Sales and Library Support, have volunteered to take on extra responsibilities on content discovery. Lacking internal expertise, some publishers hire consultants to carve out a content discovery strategy before creating a dedicated position.
- Assess content gaps and leaks
ODI advises that content providers conduct some testing for the discovery, linking and authentication of their content in the major discovery service tools and decide whether further systematic investigations are warranted. “Collaborating to Reduce Content Gaps in Discovery” demonstrates how IEEE assesses content gaps in the syndication feeds delivered to indexing partners, the indices of four major discovery service tools and twenty-four library configurations of the discovery tools. It also analyzes the causes of those gaps and suggests ways to close them.
Some other publishers have started systematically assessing content gaps and leaks. Oxford University Press is investigating content gaps for several key products and trying to learn more about how content types are categorized across discovery services. Emerald Group Publishing regularly audits major discovery indices to spot systematic issues affecting coverage and monitors web traffic from different library discovery services to detect service-wide issues affecting multiple customers.
It is understandable that each publisher needs to find its own ways to assess content gaps and leaks, because each may have its unique content types, metadata challenges, workflows and connections to discovery tools and library systems. It is also an ongoing process, as new content and collections continue to be added to publishing platforms and syndications. To refine content discovery and retrieval techniques, publishers may need to collaborate internally, as well as externally with discovery, link resolver and authentication tool vendors and mutual customers that deploy them.
- Improve workflows and conform to industry best practices
ODI encourages content providers to improve data quality and workflows and conform to industry recommended best practices, such as ODI, KBART II and ALI. Conforming to industry recommendations is no easy task. Content providers may find that they need to do remediation work on improving metadata quality, assembling needed metadata elements, increasing content inventory and even changing workflows. These projects can be complex, time-consuming, costly and may prove difficult to demonstrate immediate return on investments. It is important to keep educating publishing staff about industry best practices, secure cross-divisional management buy-in and find ways to show value for work on discovery-related issues.
In June 2015, Credo, Gale, IEEE and SAGE were the first four publishers to declare conformance with NISO’s Open Discovery Initiative. Since then five more have completed ODI Conformance Statements. ODI encourages more content providers to declare ODI conformance. In “A brief history of the Open Discovery Initiative,” Rachel Kessler, Co-chair of ODI, assures content providers that the goal of ODI is transparency, not perfection. Declaring conformance means that the “organization is honest and forthcoming” and is making plans “to improve upon the areas where they are not yet perfect.” This ODI implementation guide for content providers also intends to help content providers, as well as other stakeholder parties, to overcome some of the hurdles.
- Collaborate with library service providers
Content providers may develop working relationships with each of the major discovery service, link resolver, authentication product and search engine vendors and set up regular meetings or other channels of communications. The collaborations can cover a wide range of topics, such as assessing content gaps, delivering content feeds, indexing new content, filling in missing content, improving results ranking, setting up link resolver targets, creating MARC Records, improving authentication, diagnosing usage reports, creating and updating configuration guides, resolving mutual customer issues and more.
Several discovery service providers have dedicated staff to working with publishers to ensure publisher content is best represented in their services. EBSCO has opted to create more EDS Partner Databases and, to increase global reach, has made the EDS Partner Database Questionnaire available in 11 languages. EBSCO Discovery Service – Publisher Guides lists links to EBSCO Discovery Service Quick Reference Guides and Full Text Finder Quick Reference Guides for over a dozen publishers. Proquest’s Publisher Relations Engagement Managers are having regular conference calls with several dozen publishers, often scheduling several calls a day. The calls cover a wide range of topics. Abigail Wickes, Senior Library Discovery and Information Analyst of Oxford University Press, finds these relationships with discovery partners “have been incredibly valuable and have allowed OUP and discovery partners to be much more collaborative in troubleshooting and supporting linking and access.”
- Work with libraries
ODI proposes that content providers interact more with libraries and understand the librarian roles, functions, tools and workflows. Content providers can develop systematic ways to track, troubleshoot and respond to library inquiries about discovery, linking and access problems. They can also proactively detect, resolve and prevent discovery related problems through creating and promoting system configuration guides, conducting discovery auditing and training support staff and librarians on discovery troubleshooting.
Since 2015, IEEE staff have conducted discovery workshops in UK, Ireland, Germany and Austria, created technology profiles for hundreds of library accounts, audited discovery and link resolver configurations for subscribed IEEE Xplore content by over 1,000 libraries and reached out to librarians to help make corrections. Dominic Benson, Analytics & Discovery Officer at Brunel University London, attended an IEEE discovery workshop in 2015 and found it very useful in identifying missing IEEE holdings. He has used IEEE’s auditing approach to improve his library’s overall holdings in the knowledge base. He appreciates IEEE’s including proactive discovery auditing as part of the post-sale support service and considers library-publisher relationship one of the key factors when considering renewals.
We need your help!
Building effective content pipelines and fixing leaks to ensure better scholarly information discovery and interchange is not easy. It takes patience, persistence, and perspiration.
- It asks organizations to develop a culture that appreciates data quality and flows and be willing to invest in resources and initiatives that will benefit information interchange downstream that will result in improved researcher experiences;
- It calls for collaborations among stakeholders along the content supply chain, i.e., publishers, vendors and libraries, to design better architecture for electronic systems, tools, platforms and content pipelines; and
- It requires stakeholders to do better data plumbing, i.e., developing better routines to check for defects and fix leaks.
We invite more organizations and individuals to join the efforts to improve scholarly information discovery and interchange. We encourage you to participate in NISO’s standards development activities and utilize NISO resources such as those published on the NISO ODI website. We invite publisher discovery leads to join Google Group for Content Discovery Managers, to network with colleagues facing similar challenges.
Julie Zhu would like to thank Abigail Wickes, Lola Estelle, Michael Roberts, Dominic Benson and Benjamin Johnson for responding to my questionnaires and sharing their insights. I appreciate helpful feedback from Marjorie Hlava, Scott Bernier, Prakash Bellur and Tiffany McKerahan. Special thanks go to Lettie Conrad for guiding and supporting this article throughout its multiple iterations.
3 Thoughts on "Revisiting — Building Pipes and Fixing Leaks: Demystifying and Decoding Scholarly Information Discovery & Interchange"
Thank you for your article. We have been discussing data statements and the need to have consistent placement and wording so that they can be found in research outputs. I wonder if NISO or some other cross stakeholder group is actively pursing this. A Research Data Alliance Interest Group also active in this area https://www.rd-alliance.org/groups/data-policy-standardisation-and-implementation-ig It would be great to have better ways to encourage authors to include clear data statements consistently.
Hi Valerie, NISO is always interested in new ideas for standards so, if there’s strong community interest in this, you are very welcome to submit a work item request for consideration by the relevant Topic Committee. My colleague Nettie Lagace, who runs our standards program, can provide more information on this process, including a template you can use for pulling together the information we need. You can reach her via our office email nisohq[AT]niso.org. Thanks!
Now that I’m no longer at OUP (the discovery work is in the very capable hands of a former colleague!) and working at an academic library, I can’t say enough about the importance of content providers adequately resourcing content discovery work and collaborating with discovery services. It really requires a team to make meaningful improvements in data transfer, and many publishers only have one full time person devoted to this work. It’s also so important for content providers to collaborate directly with discovery services; we work with at least one publisher who refuses to get involved in troubleshooting with EZproxy, ProQuest, etc., and it puts the library in a very frustrating middleman position. Thanks for highlighting the importance of this work Julie and Lettie!