The Scholarly Kitchen

What’s Hot and Cooking In Scholarly Publishing

  • About
  • Archives
  • Collections
    Scholarly Publishing 101 -- The Basics
    Collections
    • Scholarly Publishing 101 -- The Basics
    • Academia
    • Business Models
    • Discovery and Access
    • Diversity, Equity, Inclusion, and Accessibility
    • Economics
    • Libraries
    • Marketing
    • Metrics and Analytics
    • Open Access
    • Organizational Management
    • Peer Review
    • Strategic Planning
    • Technology and Disruption
  • Chefs
  • Podcast
  • Follow

Let the Metadata Wars Begin

  • By Todd A Carpenter
  • Jun 22, 2022
  • 7 Comments
  • Discovery
  • Infrastructure
  • Libraries
  • Technology
Tweet
Share
Pin1
Buffer
Share
1 Shares

Early movers in any market have an advantage over their competition. The Ohio College Library Center organization (which eventually grew to become OCLC) was no different.

In 1967, before most even knew what a computer was or how it might positively affect our lives, its team based in Dublin, Ohio was making great progress in the electronic management of bibliographic information. In August of 1971, the cooperative helped the Alden Library at Ohio University launch the first online catalog of any library in the world. OCLC continued to be a pioneer in many of the moves to digitize and interconnect library catalog data, all the while improving the services and making libraries more efficient. It also began to generate significant surpluses, with which it continued to expand the services it could provide to the library community and serve a worldwide network. What was once a regional network of libraries has today become a massive institution serving a global community of more than 30,000 libraries, while also becoming one of the leading employers in the state of Ohio.

Joseph P. Kinneary United States Courthouse in Columbus, Ohio
Entrance to the US Courthouse in Columbus, OH, where OCLC is pursuing action against Clarivate (Photo by Ɱ, CC-BY-SA)

Every library needs a catalog of the items it holds and acquires. A good catalog is vital to managing a collection and supporting its circulation. This is no small effort and, if individually replicated in the thousands of libraries around the world, it would be a laborious and costly endeavor, particularly if each record was hand-curated by experts in library sciences. It absolutely made sense in the world of digital information that efficiencies could be gained by having the network of libraries share both the work and their data collaboratively, thereby reducing unnecessary duplication and improving overall quality across the network. OCLC claims, reasonably, that it has invested millions of dollars in working with librarians and publishers to build and license as comprehensive a worldwide catalog as possible. It regularly enhances the quality of its records through a variety of approaches. This has benefited the vast majority of libraries and also OCLC’s bottom line.  

However, in the past ten years the ecosystem of data has changed significantly. National Libraries have released troves of bibliographic records and linked data. Some libraries, notably Harvard, Penn, and Yale, have released their catalog data publicly under either a CC0 (in the case of Harvard) or an OCD-BY license. Publishers are increasingly eager to share their bibliographic data as a way to drive sales and increase usage. Machine processing of structured data has become incredibly robust, so that grouping, description, linking materials, and enrichment, are much simpler tasks for most with basic coding skills. Whether these moves match the quality expected of cataloging records is an open question, but one can expect improvements will continue to erode the comparative value in the human work of cataloging. 

Earlier this year, Clarivate quietly announced a new product, MetaDoor, which is described as an open platform for sharing cataloging records. Possibly building upon data gathered by a company earlier acquired by Innovative Interfaces (which eventually was folded into Clarivate), this new product is being positioned as an alternative, free structure to share catalog data in the community. In trying to recruit members to use the new service and be early adopters, Clarivate has caught the attention of OCLC, who views this new product as an obvious competitor to its flagship WorldCat service. Challenging both the source of the data in MetaDoor and its efforts to recruit participants in this data-sharing ecosystem — in breach of their agreements with OCLC — last week OCLC filed a lawsuit in Ohio courts against Clarivate and its operating units claiming predatory market behavior and tortious interference in OCLC’s contracts with its member organizations. Clarivate has strenuously objected to the claims.

The arguments around what libraries can and cannot do with the cataloging records that they create and then share has been an issue for well over 15 years now. Review of the WorldCat records policy also goes back to the mid 2000s. After announcing changes to its policy in the late 2000s, some in the community rebelled against the newly announced policy, while others supported the changes. This set in motion a new public group to review the data-sharing principles that surround the use of OCLC’s cataloging records, in response to a community petition launched by Elaine Sanchez. In 2009, OCLC launched a group that was tasked to “seek to understand today’s environment as it relates to the creation, use and transfer of data and articulate principles of shared data creation consonant with the values of the OCLC cooperative.” Shortly thereafter, a final report was issued and a new policy was released in 2010.  

However, this didn’t end the conversations in the community; the question of the rights of OCLC members has been an ongoing issue that some are keen to press. The Program for Cooperative Cataloging issued a proposed policy statement on open cataloging records in January, outlining a middle ground for sharing of PCC-developed records. Another example: last year, ICOLC produced an internal report that, among other things, criticized OCLC for the costs and interoperability concerns of the records WorldCat aggregates, including limitations on what libraries and other vendors in this space can do with that data. OCLC responded privately, but based on the FAQ that accompanied the legal filing, one can surmise what its response was. 

OCLC’s position is that it is working in the best interests of all libraries and does a tremendous service through its aggregation, enhancements, dissemination, and distribution of bibliographic records. Furthermore, it takes the surpluses that this business generates and invests heavily in other library services, tools, and research projects. Many have argued that OCLC is a positive force for libraries and library technology. Others have been more critical, particularly commercial players in this space.

There have been a few entrants into this ecosystem seeking to crack OCLC’s control over the bibliographic data stream. For example, in 2010 SkyRiver, a small bibliographic data services company, launched a lawsuit alleging that OCLC was engaging in anti-competitive corporate behavior by exerting monopolistic control over library data exchange. Innovative Interfaces Inc (III) also joined the lawsuit, which was dropped when SkyRiver was acquired by III in 2013. The service continues to operate and, as part of the larger Clarivate organization, likely seeded the MetaDoor service, but Clarivate’s portfolio likely contains significant amounts of data from other sources. SkyRiver’s catalog service business is a fraction of the size of OCLC’s WorldCat, with roughly 70 million records, compared to WorldCat’s 500+ million according to the complaint. What to do with this resource and how to position the service seems to have been lurking behind the scenes as III, ExLibris, ProQuest, and now Clarivate have moved through their various merger activities, most likely because strings of corporate mergers can be quite distracting. The lack of a comprehensive, unified catalog of library holdings data is a gap in Clarivate’s library services technology stack. Clarivate might also argue that less interoperability would be needed in the world of library services if all of the technology was handled by a single provider. But this sole-source provision of all services, while appealing at first glance, would also put the community troublingly at the whims of that one provider.

Obviously, if a new high-quality source of bibliographic data becomes available, it could have a negative impact on the sales of other data stores. Bibliographic data is a very substitutable good, and free is always a better price point than anything more than not-free. Whether the free data is really as high-quality as curated data and whether it is fit for purpose are the important questions. Despite the occasional complaint, the OCLC data is of an extremely high quality. Could a similar level of data be achieved by others, with the right resources (say 74% market share of the worldwide academic ILS market and $1.9 billion in revenues per year)? Almost certainly, yes. Rather than doing the costly enrichment and quality control itself, Clarivate is seeking to leverage the collective work of the library community  in enriching its service. There is certainly a reason to believe some in the library community might be motivated to contribute to an open repository of cataloging data. However, as Kaitlin Thaney at Invest in Open Infrastructure often notes, people should be wary of the interests controlling the platform on which open data/content is being shared. One should be reminded of the internet-age adage: “If you’re not paying for the product, you are the product.”

To the core claim of OCLC, it would strike me as odd if Clarivate wouldn’t be scrupulous in where it would be gathering data from, since much of the core of this data is freely available as linked data, or from publisher’s feeds, or other resources. However, in an environment of machine crawling, haphazard sharing, and a world of many billions of records, errors are bound to happen. Clarivate certainly would have the burden of ensuring it is not inappropriately republishing licensed data, in much the same way it shouldn’t be republishing text content from other publishers in its products, i.e., that they know where they were gathering data from and what rights are associated with it. Of course, if there is proof to the contrary, then OCLC most certainly has a case in the United States.  

As to whether this is a tortious breach, this is certainly something I am neither qualified to delve into the details of, nor something I choose to speculate on. Clearly this question will either be settled or head to the courts, because so much is riding on the outcome for OCLC. OCLC has proven adept at using Ohio Law – even going so far as to lobby for changes to Ohio Law – to its benefit. This is not at all a criticism; all large companies (as well as non-profits and educational institutions) use legal and lobbying systems to achieve their ends. 

One question at issue centers on, in an esoteric way, the presence of an OCLC Cataloging Number (OCN) in the catalog records that MetaDoor has shared (clause 97 on page 22 of the filing), which OCLC is taking as proof that the record did indeed begin within their records system. However, in 2013 in the early days of linked open data (which perhaps ended not long thereafter) Jim Michalko, then part of the team at OCLC Research (since retired), wrote about the value of sharing OCN data publicly and announced that OCLC had taken the decision to release the OCN data as if it is in the public domain. An Archive.org version of the OCLC WorldCat Record Use and Data Licensing terms page from 2014 states as much. It is unclear that this remains OCLC’s policy — probably not, because that current page does not list any such language. However, to this day, OCLC’s definition of Worldshare Reports elements under the OCLC Number definition states that: “OCLC encourages the use of the OCLC Control Number in any appropriate library application, where it can be treated as if it is in the public domain.” The “as if” in that sentence is carrying a lot of weight here — who can really tell what it means when something is kinda-sorta-public-domain-but-not-really.  Perhaps it is the sort of thing that could land one in court. “Squishy” legal language often surrounds open access licenses that aren’t strictly CC-BY, causing headaches for users who can’t decide what they can or can’t do with something they find on the web. It also highlights the problem that is created once you release information onto the web with the equivalent of an open license: even if you subsequently want to take it back, sometimes it is impossible to do so in reality. I don’t want to suggest that this unclear language is indicative of how Clarivate accessed this information, or whether whomever has shared it based their decision to do so on this, but it is relevant.

This is certainly not to say that Clarivate is acting in the world’s interest here. As the market-dominating powerhouse in library systems, especially in the academic market, there is a real benefit to Clarivate of breaking the control over cataloging records that OCLC has held for decades. Many of the services that one can envision libraries — or networks of libraries — requiring will need reliable information on holdings to function properly. One solution is interoperability, but this requires a willingness to share data equitably, in standard forms, and under reasonable terms, something that isn’t always in corporate interests to pursue. The near-monopoly OCLC has on this data inhibits some potential products and services that ExLibris (and others) sought to pursue prior to the Clarivate merger. Increasing the power of an already dominant player probably is not a good thing for libraries writ large, since a lack of competition in a marketplace tends not to yield better results for customers. SPARC made this point in its unsuccessful effort to influence the FTC’s exploration of whether to block the ProQuest/Clarivate merger last year.

If Clarivate is actively recruiting libraries to share the data that they possess in breach of their contracts with OCLC, the ultimate decision on whether this sharing is permissible – or whether, in fact, breaches  their contracts – is taken by the library providing the cataloging data, not Clarivate. If breaches are happening, then there are libraries that might also expect an action for breach of contract. However, suing your members/customers is never a wise business strategy, so OCLC is likely to tread lightly there. It seems according to the filing (if proven) that some on Clarivate’s team are inartfully encouraging this breach — or perhaps more generously, encouraging the belief that this is not a breach — but, based on the complaint, there is probably little hard proof this is happening at scale. Perhaps more specifics will come to light in the discovery process or at trial.

Realistically, this battle is very similar to the discussions afoot in scholarly publishing around access and sharing of data related to discovery and use. The availability of open citations, the discussions about open identifiers for institutions, and the questions around open infrastructure all hinge around the ecosystem of data — who controls it, and what can be done with the data once it is aggregated.  Looking to the future, there will be many battles raging about who controls what data and who can do things with it. Since everyone wants to be an “analytics company” these days, this could include usage data on open access materials, predictive analysis trained on open content, or (as in this case) leveraging the distribution of open catalog data to embed or extend your market-dominating position.

There is black gold in that data (so they say). And, when there’s gold on the table, sadly, there is usually a fight brewing about who gets to cash it in. This lawsuit likely will not be a single battle between two large players, but part of a larger war to control the data and metadata that are so valuable in our world.

 

DISCLOSURE: Both OCLC and Clarivate are members of NISO and Mary Sauer-Games (VP of Global Product Development, OCLC) currently serves as Chair of NISO’s Board of Directors. Neither organization nor their representatives were involved in the preparation of this post. 

Tweet
Share
Pin1
Buffer
Share
1 Shares
Tweet
Share
Pin1
Buffer
Share
1 Shares
Todd A Carpenter

Todd A Carpenter

@TAC_NISO

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a variety of leadership roles of a variety of organizations, including the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), the Coalition for Seamless Access, and the Foundation of the Baltimore County Public Library.

View All Posts by Todd A Carpenter

Discussion

7 Thoughts on "Let the Metadata Wars Begin"

Since truth is not a trusted qualification on which people agree, the value of mass data is low. For example, many people do not believe that mathematics and physical reality are related. Theoretical physics still has no accepted foundation. You cannot process mass data and retrieve the foundation of reality out of that chaos.

  • By J.A.J. van Leunen
  • Jun 22, 2022, 8:34 AM
  • Reply to Comment

Now is a good time to clarify this. While OCLC remains a nonprofit, it is acting more and more like a vendor all the time, with many of its services now restricted to those who use its ILS. Should they continue to move in that direction, it’s hard to know if they may consider restricting access to bib records as well. Perhaps they should split off the part of their company that is clearly a vendor and return to serving all libraries equally, which would guarantee continued access to everyone.

  • By Jenna Webster
  • Jun 22, 2022, 1:10 PM
  • Reply to Comment

‘Bibliographic data is a very substitutable good’ – well that depends on whether you are talking about a minimal physical description record, or a much richer subject-cataloged record enhanced with various expansive elements like abstract, table of contents, etc. The fact that libraries are willing to pay OCLC for access to copy their records when there are already plenty of free sources to copy from using z39.50 or the newer SRU technologies tells you that librarians see added value in getting authoritative records (not to mention authority records, inside joke) from OCLC. Disclaimer: mine is one of the very few Canadian libraries who is NOT a member of OCLC and does use those free tools instead. We actually quit OCLC when that controversy this post mentions regarding their claiming intellectual property rights over the work that member libraries contributed happened.

  • By Melissa Belvadi
  • Jun 22, 2022, 3:29 PM
  • Reply to Comment

A “very substitutable good” is very much conditional on the following important caveat: the quality of that data. Bibliographic data can be of highly variable quality. A note from the NISO Ebook Metadata Recommended: Publishers, please don’t include ‘NY Times Bestseller: ________’ in your title metadata in an ONIX record. (but I digress)

  • By Todd A Carpenter
  • Jun 22, 2022, 4:30 PM
  • Reply to Comment

Thanks Todd, I always enjoy reading your posts. A few quick clarifications if I may.
Clarivate is not building a repository – once launched, the intention is that MetaDoor will be an online service that will simply forward records from one institution to the requesting institution. So, all records will continue to live and belong to the libraries *in* their library system the service will help libraries leverage the collective work to enrich *their* own catalog. None will belong nor live in MetaDoor.
Also, the historical development of SkyRiver is not being used to seed the MetaDoor platform.
I hope that is helpful.
Lisa Hulme – Clarivate

  • By Lisa Hulme
  • Jun 23, 2022, 3:54 PM
  • Reply to Comment

Should have called it Bibster!

  • By Richard Wynne
  • Jun 24, 2022, 10:18 AM
  • Reply to Comment

Todd,
Thank you for taking the time to take a very challenging topic and providing the history and your insights.

Yup, the fight is on over metadata and who has what rights.

  • By Darrell W Guntet
  • Jun 25, 2022, 8:28 PM
  • Reply to Comment

Leave a Comment Cancel reply

Official Blog of:

Society for Scholarly Publishing (SSP)

The Chefs

  • Rick Anderson
  • Todd A Carpenter
  • Michael Clarke
  • Angela Cochran
  • Lettie Y. Conrad
  • David Crotty
  • Phil Davis
  • Joseph Esposito
  • Robert Harington
  • Siân Harris
  • Haseeb Irfanullah
  • Lisa Janicke Hinchliffe
  • Phill Jones
  • Scholarly Kitchen
  • Judy Luther
  • Alice Meadows
  • Ann Michael
  • Alison Mudditt
  • Jill O'Neill
  • Charlie Rapple
  • Dianndra Roberts
  • Roger C. Schonfeld
  • David Smith
  • Tao Tao
  • Tim Vines
  • Jasmine Wallace
  • Karin Wulf

Most Recent

  • Going Legit Part 2: The Continuing Path from Piracy to Partnership
  • Ask the Community: What Did SSP 2022 Mean to You?
  • Guest Post — Striking the Right Chord with Millennial and GenZ Researchers

Recent Tweets

Retweet on Twitter Scholarly Kitchen Retweeted
itaniamina amina @itaniamina ·
3h

Guest Post -- Striking the Right Chord with Millennial and GenZ Researchers https://scholarlykitchen.sspnet.org/2022/06/23/guest-post-striking-the-right-chord-with-millennial-and-genz-researchers/ via @scholarlykitchn

Retweet on Twitter Scholarly Kitchen Retweeted
jafurtado Jose Afonso Furtado @jafurtado ·
13h

What Universities - and Libraries, Researchers, and Publishers? - Owe Democracy, by Karin Wulf ⁦@kawulf⁩ / ⁦@scholarlykitchn⁩ https://scholarlykitchen.sspnet.org/2022/06/16/what-universities-and-libraries-researchers-and-publishers-owe-democracy/

Retweet on Twitter Scholarly Kitchen Retweeted
heidilibrarian Heidi Colom @heidilibrarian ·
14h

Two giants in the library technology market move the battle over who controls library catalog records to court as OCLC sues Clarivate by @TAC_NISO https://scholarlykitchen.sspnet.org/2022/06/22/oclc-sues-clarivate-over-the-new-metadoor-platform/ via @scholarlykitchn

Follow the Scholarly Kitchen Blog Follow Us

Related Articles:

  • OCLC: Indispensable Database Collaborative or Social Media Prelude?
  • abstract neon lights with luminous swirling backdrop. “Recenter Library Systems on the User”: An Interview with OhioLINK’s Gwen Evans
  • Documents about mergers and acquisitions Clarivate to Acquire ProQuest

Next Article:

Chicago Architecture reflected in the Cloud Gate sculpture at Millennium Park on a sunny day It's Good to be Back... But What Happens Now?
Society for Scholarly Publishing (SSP)

The mission of the Society for Scholarly Publishing (SSP) is to advance scholarly publishing and communication, and the professional development of its members through education, collaboration, and networking. SSP established The Scholarly Kitchen blog in February 2008 to keep SSP members and interested parties aware of new developments in publishing.

The Scholarly Kitchen is a moderated and independent blog. Opinions on The Scholarly Kitchen are those of the authors. They are not necessarily those held by the Society for Scholarly Publishing nor by their respective employers.

  • About
  • Archives
  • Chefs
  • Podcast
  • Follow
  • Advertising
  • Privacy Policy
  • Terms of Use
  • Website Credits
ISSN 2690-8085