As we welcome autumn, we reflect on this summer’s virtual AUPresses Annual Meeting, where book publishing experts came together to discuss what it takes to ensure the discoverability of scholarly books, particularly for the mainstream web. We began by sharing the outcomes of our Crossref-sponsored study measuring the impacts of books metadata and how various stakeholders are responding to these insights. Although Google Scholar claims to not index DOIs and associated data points, we found that books with DOIs overall perform better, likely because they benefit from widely distributed structured, open metadata. However, DOIs alone are not enough to ensure optimum discoverability of books in Google Scholar, which relies on metadata from publisher websites, platform providers, and Google Play/Books.
In this study, we found that scholarly metadata standards have an indirect impact on open-web discoverability and users’ experiences with mainstream search. Specifically, we found that Google Scholar operates outside our industry’s established metadata supply chains, and therefore publishers must treat it with channel-specific strategies for constructing and distributing metadata. We found a structural disconnect between Google Scholar expectations and publisher practices, most clearly seen in Scholar’s preference for indexing books where each chapter has a unique author (akin to a journal model) and publishers’ lack of controlled book-type classifications.
With these findings in mind, we convened a panel to reflect on these outcomes and ponder how to best measure the return on metadata investments with the AUPresses community, a notable contributor to the world’s catalog of scholarly books. This post covers the highlights of our panel conversation, which included the following experts:
- Allison Belan, Director of Strategic Innovation and Services, Duke University Press
- Paul Crook, Manager, Business Systems, MIT Press
- Stephanie Dawson, CEO, ScienceOpen
- Sandra Shaw, University of Toronto Press (UTP)
Reactions to the study
Sandra: I was not surprised to learn that standardized metadata, like DOIs, have impacts on discoverability, although a bit surprised that the impacts were found to be somewhat limited for Google Scholar. DOI construction and assignment is key to UTP’s publishing workflows and processes, and this will continue to be seen as a key component of our discoverability strategy, albeit with higher impacts in other search engines and information channels.
Paul: The MIT Press has certainly enjoyed many benefits from using DOIs for our books. I think your study demonstrates that publishers who go to the work of assigning DOIs also invest in other types of metadata that enable wider dissemination of their content. Assigning a DOI once to a publication allows for updating data points, like the URL, if a publisher changes platforms or makes changes to channel sales. In fact, we added a component of the URL to indicate the type of book (monograph, edited, licensing, etc.), which helps with their indexing in Google Scholar.
Stephanie: The thing that I found really interesting was that open-access books did not perform significantly better than traditional books, but that may have something to do with the fact that OA titles are often not part of metadata supply chains used for booksellers, library catalogs, etc. This has improved over time, but getting OA books into these established workflows has been a long-term challenge. I think your study shows that the DOI can be a catalyst for bridging the gap between these workflows and channels of discoverability.
Allison: I thought it was an interesting study and appreciated some insight into your process for measuring metadata impacts. What I think is most interesting is what does not influence discoverability in Google Scholar, because that can help publishers prioritize their efforts and optimize data for specific channels. The finding I took away as an immediate to-do item was the fact that Google Scholar assigns so much weight to the publisher marketing site or book landing page, so we’re now planning to include the DOI on our website — that never occurred to us as a platform for greater visibility of the DOI and related metadata.
Multi-channel metadata strategies
Paul: Metadata distribution can be a challenge, especially for many different platforms and indexers, and this was largely done by hand historically. We are now on the path to reduce manual errors and scale automated production wherever possible. So, the ideal is to gather and manage all our metadata in one place, which then becomes the single, canonical source from which we distribute data to all our vendors, partners, and other targets. We are getting closer to that vision, but not quite there yet.
Stephanie: At ScienceOpen, we provide both publishing services to content providers as well as serving as an aggregator of publishing collections, now over 5 million article records. We use Crossref DOIs as the basis for our discovery platform so we understand the value of that identifier in scaling a service that integrates both journals and books. For example, they have built a catalog for the Association of European University Presses, which uses DOIs as its foundation. So, DOIs are the carrot to encourage publishers to include their works in this catalog. And we developed a BookMetaHub to help publishers manage their books metadata and bridge the disconnect between ONIX and Crossref data.
Sandra: For journals, we have good automatic feeds in place to aggregators, discovery services, Google Scholar, and many others. But, with books, we have prioritized metadata for commercial channels and secondarily to Google Scholar with ONIX. We have one set of data records and do not customize metadata delivered to each channel, therefore the records are very robust and capture everything that each provider might need.
Allison: There are many different audiences to consider and, for Duke University Press, we are focused on six primary channels: direct-to-consumer (website, supported by a title management system), retail partners (ONIX feeds), library sales (GOBI system), library discoverability (XML and MARC records to vendors like EBSCO, OCLC, etc.), researcher channels (mainstream search as well as tools like Zotero or Mendeley, which generally feels like the wild west), and public audiences (via the open web). It’s all a lot of work and each of these rely on good metadata hygiene practices as well as reliable structuring and dissemination workflows, tools, and partners.
Metadata supply-chain gaps
Paul: When we first began indexing books with Google Scholar, the goal was to match what Taylor & Francis had achieved, drawing on their implementation as the guide. Establishing the preview PDF was the biggest challenge to work around for paywalled monographs. The spec of what Scholar needs for indexing continues to evolve and eventually, we brought our platform vendor into the conversation, as they were working to achieve the same indexing for other publishers. So, it’s a matter of balancing what our vendors need and what indexers, like Google Scholar, require to capture our metadata.
Allison: We can really relate to that, Paul. Human communications are so key to making it all work and that takes a lot of time to manage those relationships. It means we’re taking regular one-on-one calls with folks at Ex Libris, ProQuest, etc. Most content providers must be on a first-name basis with Anurag Acharya (co-founder of Google Scholar) to achieve good indexing, in part because there is no published spec for us to refer to. For better or worse, journals set the standard for indexing and, when ebooks hit critical mass around 2015 or so, we found that books were forced to align with a journal indexing model — that’s certainly true for Google Scholar. That’s been a hard leap for us to make, some of that doesn’t make sense for online books.
Stephanie: Bringing books and chapters into ScienceOpen was a challenge and we began by capturing book citations within the journal content we host. We have found that book and journal metadata formats and supply chains are not well aligned. Aggregators or third-party hosts (like JSTOR, De Gruyter, etc.) often assign DOIs to the books they host, which means a single title may have multiple IDs and Crossref built a tool to enable a co-access linking.
Measuring metadata impacts
Sandra: We format our metadata based upon guidance from Nielsen Book in the UK, which is based upon research they conducted into the connection between metadata and sales. So, by using their standards, UTP provides rich datasets, keywords, and SEO-compliant formats. We look at sales and usage rates mostly to assess the impacts of these activities.
Stephanie: As we build out our book collections, we’re looking at ways to capture metrics that speak to the impact of those titles within both book and journal literature, which is really critical for open-access publishing (for reporting back to funders and institutions). Connecting DOIs with ROR IDs is key to advancing our ability to assess these impacts. Book metadata that includes abstracts has proven to drive search and discovery, so that’s a key recommendation we share with publishers — the more words associated with your book record, the better the search experience and overall usage. The same is true for links, the more cross-linking facilitates both SEO and indexing as well as user serendipity.
Paul: We test the performance of metadata in Google Scholar by running searches ourselves and checking for links; our site is not yet considered the authoritative source, but we expect this will improve in time. We find hits for author sites and book reviews as well when we run test searches.
Allison: Publishers’ investments in metadata are kind of an act of faith. Our actions related to metadata are like shots in the dark; we hope something good will come, but we’re not entirely sure what that will look like. It’s very difficult to measure the value and impact of our metadata across the various channels we promote. We follow research like this study and compare notes with other publishers and do our best. Measuring metadata impact has to be well-defined for our metrics to be precise enough to be useful.
We learned a lot about books metadata through our study and from the expertise of the panel we convened for AUPresses. It is clear, no surprise, that more work can be taken up to improve both user experience with finding ebooks and publisher workflows by creating better channel-oriented metadata. Please join in the conversation and share your feedback in the comments below.