Let the Metadata Wars Begin

Early movers in any market have an advantage over their competition. The Ohio College Library Center organization (which eventually grew to become OCLC) was no different.

In 1967, before most even knew what a computer was or how it might positively affect our lives, its team based in Dublin, Ohio was making great progress in the electronic management of bibliographic information. In August of 1971, the cooperative helped the Alden Library at Ohio University launch the first online catalog of any library in the world. OCLC continued to be a pioneer in many of the moves to digitize and interconnect library catalog data, all the while improving the services and making libraries more efficient. It also began to generate significant surpluses, with which it continued to expand the services it could provide to the library community and serve a worldwide network. What was once a regional network of libraries has today become a massive institution serving a global community of more than 30,000 libraries, while also becoming one of the leading employers in the state of Ohio.

Joseph P. Kinneary United States Courthouse in Columbus, Ohio — Entrance to the US Courthouse in Columbus, OH, where OCLC is pursuing action against Clarivate (Photo by Ɱ, CC-BY-SA)

Every library needs a catalog of the items it holds and acquires. A good catalog is vital to managing a collection and supporting its circulation. This is no small effort and, if individually replicated in the thousands of libraries around the world, it would be a laborious and costly endeavor, particularly if each record was hand-curated by experts in library sciences. It absolutely made sense in the world of digital information that efficiencies could be gained by having the network of libraries share both the work and their data collaboratively, thereby reducing unnecessary duplication and improving overall quality across the network. OCLC claims, reasonably, that it has invested millions of dollars in working with librarians and publishers to build and license as comprehensive a worldwide catalog as possible. It regularly enhances the quality of its records through a variety of approaches. This has benefited the vast majority of libraries and also OCLC’s bottom line.

However, in the past ten years the ecosystem of data has changed significantly. National Libraries have released troves of bibliographic records and linked data. Some libraries, notably Harvard, Penn, and Yale, have released their catalog data publicly under either a CC0 (in the case of Harvard) or an OCD-BY license. Publishers are increasingly eager to share their bibliographic data as a way to drive sales and increase usage. Machine processing of structured data has become incredibly robust, so that grouping, description, linking materials, and enrichment, are much simpler tasks for most with basic coding skills. Whether these moves match the quality expected of cataloging records is an open question, but one can expect improvements will continue to erode the comparative value in the human work of cataloging.

Earlier this year, Clarivate quietly announced a new product, MetaDoor, which is described as an open platform for sharing cataloging records. Possibly building upon data gathered by a company earlier acquired by Innovative Interfaces (which eventually was folded into Clarivate), this new product is being positioned as an alternative, free structure to share catalog data in the community. In trying to recruit members to use the new service and be early adopters, Clarivate has caught the attention of OCLC, who views this new product as an obvious competitor to its flagship WorldCat service. Challenging both the source of the data in MetaDoor and its efforts to recruit participants in this data-sharing ecosystem — in breach of their agreements with OCLC — last week OCLC filed a lawsuit in Ohio courts against Clarivate and its operating units claiming predatory market behavior and tortious interference in OCLC’s contracts with its member organizations. Clarivate has strenuously objected to the claims.

The arguments around what libraries can and cannot do with the cataloging records that they create and then share has been an issue for well over 15 years now. Review of the WorldCat records policy also goes back to the mid 2000s. After announcing changes to its policy in the late 2000s, some in the community rebelled against the newly announced policy, while others supported the changes. This set in motion a new public group to review the data-sharing principles that surround the use of OCLC’s cataloging records, in response to a community petition launched by Elaine Sanchez. In 2009, OCLC launched a group that was tasked to “seek to understand today’s environment as it relates to the creation, use and transfer of data and articulate principles of shared data creation consonant with the values of the OCLC cooperative.” Shortly thereafter, a final report was issued and a new policy was released in 2010.

However, this didn’t end the conversations in the community; the question of the rights of OCLC members has been an ongoing issue that some are keen to press. The Program for Cooperative Cataloging issued a proposed policy statement on open cataloging records in January, outlining a middle ground for sharing of PCC-developed records. Another example: last year, ICOLC produced an internal report that, among other things, criticized OCLC for the costs and interoperability concerns of the records WorldCat aggregates, including limitations on what libraries and other vendors in this space can do with that data. OCLC responded privately, but based on the FAQ that accompanied the legal filing, one can surmise what its response was.

OCLC’s position is that it is working in the best interests of all libraries and does a tremendous service through its aggregation, enhancements, dissemination, and distribution of bibliographic records. Furthermore, it takes the surpluses that this business generates and invests heavily in other library services, tools, and research projects. Many have argued that OCLC is a positive force for libraries and library technology. Others have been more critical, particularly commercial players in this space.

There have been a few entrants into this ecosystem seeking to crack OCLC’s control over the bibliographic data stream. For example, in 2010 SkyRiver, a small bibliographic data services company, launched a lawsuit alleging that OCLC was engaging in anti-competitive corporate behavior by exerting monopolistic control over library data exchange. Innovative Interfaces Inc (III) also joined the lawsuit, which was dropped when SkyRiver was acquired by III in 2013. The service continues to operate and, as part of the larger Clarivate organization, likely seeded the MetaDoor service, but Clarivate’s portfolio likely contains significant amounts of data from other sources. SkyRiver’s catalog service business is a fraction of the size of OCLC’s WorldCat, with roughly 70 million records, compared to WorldCat’s 500+ million according to the complaint. What to do with this resource and how to position the service seems to have been lurking behind the scenes as III, ExLibris, ProQuest, and now Clarivate have moved through their various merger activities, most likely because strings of corporate mergers can be quite distracting. The lack of a comprehensive, unified catalog of library holdings data is a gap in Clarivate’s library services technology stack. Clarivate might also argue that less interoperability would be needed in the world of library services if all of the technology was handled by a single provider. But this sole-source provision of all services, while appealing at first glance, would also put the community troublingly at the whims of that one provider.

Obviously, if a new high-quality source of bibliographic data becomes available, it could have a negative impact on the sales of other data stores. Bibliographic data is a very substitutable good, and free is always a better price point than anything more than not-free. Whether the free data is really as high-quality as curated data and whether it is fit for purpose are the important questions. Despite the occasional complaint, the OCLC data is of an extremely high quality. Could a similar level of data be achieved by others, with the right resources (say 74% market share of the worldwide academic ILS market and $1.9 billion in revenues per year)? Almost certainly, yes. Rather than doing the costly enrichment and quality control itself, Clarivate is seeking to leverage the collective work of the library community in enriching its service. There is certainly a reason to believe some in the library community might be motivated to contribute to an open repository of cataloging data. However, as Kaitlin Thaney at Invest in Open Infrastructure often notes, people should be wary of the interests controlling the platform on which open data/content is being shared. One should be reminded of the internet-age adage: “If you’re not paying for the product, you are the product.”

To the core claim of OCLC, it would strike me as odd if Clarivate wouldn’t be scrupulous in where it would be gathering data from, since much of the core of this data is freely available as linked data, or from publisher’s feeds, or other resources. However, in an environment of machine crawling, haphazard sharing, and a world of many billions of records, errors are bound to happen. Clarivate certainly would have the burden of ensuring it is not inappropriately republishing licensed data, in much the same way it shouldn’t be republishing text content from other publishers in its products, i.e., that they know where they were gathering data from and what rights are associated with it. Of course, if there is proof to the contrary, then OCLC most certainly has a case in the United States.

As to whether this is a tortious breach, this is certainly something I am neither qualified to delve into the details of, nor something I choose to speculate on. Clearly this question will either be settled or head to the courts, because so much is riding on the outcome for OCLC. OCLC has proven adept at using Ohio Law – even going so far as to lobby for changes to Ohio Law – to its benefit. This is not at all a criticism; all large companies (as well as non-profits and educational institutions) use legal and lobbying systems to achieve their ends.

One question at issue centers on, in an esoteric way, the presence of an OCLC Cataloging Number (OCN) in the catalog records that MetaDoor has shared (clause 97 on page 22 of the filing), which OCLC is taking as proof that the record did indeed begin within their records system. However, in 2013 in the early days of linked open data (which perhaps ended not long thereafter) Jim Michalko, then part of the team at OCLC Research (since retired), wrote about the value of sharing OCN data publicly and announced that OCLC had taken the decision to release the OCN data as if it is in the public domain. An Archive.org version of the OCLC WorldCat Record Use and Data Licensing terms page from 2014 states as much. It is unclear that this remains OCLC’s policy — probably not, because that current page does not list any such language. However, to this day, OCLC’s definition of Worldshare Reports elements under the OCLC Number definition states that: “OCLC encourages the use of the OCLC Control Number in any appropriate library application, where it can be treated as if it is in the public domain.” The “as if” in that sentence is carrying a lot of weight here — who can really tell what it means when something is kinda-sorta-public-domain-but-not-really. Perhaps it is the sort of thing that could land one in court. “Squishy” legal language often surrounds open access licenses that aren’t strictly CC-BY, causing headaches for users who can’t decide what they can or can’t do with something they find on the web. It also highlights the problem that is created once you release information onto the web with the equivalent of an open license: even if you subsequently want to take it back, sometimes it is impossible to do so in reality. I don’t want to suggest that this unclear language is indicative of how Clarivate accessed this information, or whether whomever has shared it based their decision to do so on this, but it is relevant.

This is certainly not to say that Clarivate is acting in the world’s interest here. As the market-dominating powerhouse in library systems, especially in the academic market, there is a real benefit to Clarivate of breaking the control over cataloging records that OCLC has held for decades. Many of the services that one can envision libraries — or networks of libraries — requiring will need reliable information on holdings to function properly. One solution is interoperability, but this requires a willingness to share data equitably, in standard forms, and under reasonable terms, something that isn’t always in corporate interests to pursue. The near-monopoly OCLC has on this data inhibits some potential products and services that ExLibris (and others) sought to pursue prior to the Clarivate merger. Increasing the power of an already dominant player probably is not a good thing for libraries writ large, since a lack of competition in a marketplace tends not to yield better results for customers. SPARC made this point in its unsuccessful effort to influence the FTC’s exploration of whether to block the ProQuest/Clarivate merger last year.

If Clarivate is actively recruiting libraries to share the data that they possess in breach of their contracts with OCLC, the ultimate decision on whether this sharing is permissible – or whether, in fact, breaches their contracts – is taken by the library providing the cataloging data, not Clarivate. If breaches are happening, then there are libraries that might also expect an action for breach of contract. However, suing your members/customers is never a wise business strategy, so OCLC is likely to tread lightly there. It seems according to the filing (if proven) that some on Clarivate’s team are inartfully encouraging this breach — or perhaps more generously, encouraging the belief that this is not a breach — but, based on the complaint, there is probably little hard proof this is happening at scale. Perhaps more specifics will come to light in the discovery process or at trial.

Realistically, this battle is very similar to the discussions afoot in scholarly publishing around access and sharing of data related to discovery and use. The availability of open citations, the discussions about open identifiers for institutions, and the questions around open infrastructure all hinge around the ecosystem of data — who controls it, and what can be done with the data once it is aggregated. Looking to the future, there will be many battles raging about who controls what data and who can do things with it. Since everyone wants to be an “analytics company” these days, this could include usage data on open access materials, predictive analysis trained on open content, or (as in this case) leveraging the distribution of open catalog data to embed or extend your market-dominating position.

There is black gold in that data (so they say). And, when there’s gold on the table, sadly, there is usually a fight brewing about who gets to cash it in. This lawsuit likely will not be a single battle between two large players, but part of a larger war to control the data and metadata that are so valuable in our world.

DISCLOSURE: Both OCLC and Clarivate are members of NISO and Mary Sauer-Games (VP of Global Product Development, OCLC) currently serves as Chair of NISO’s Board of Directors. Neither organization nor their representatives were involved in the preparation of this post.

Todd A Carpenter

@TAC_NISO

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.

Discussion

13 Thoughts on "Let the Metadata Wars Begin"

Since truth is not a trusted qualification on which people agree, the value of mass data is low. For example, many people do not believe that mathematics and physical reality are related. Theoretical physics still has no accepted foundation. You cannot process mass data and retrieve the foundation of reality out of that chaos.

By J.A.J. van Leunen
Jun 22, 2022, 8:34 AM

Now is a good time to clarify this. While OCLC remains a nonprofit, it is acting more and more like a vendor all the time, with many of its services now restricted to those who use its ILS. Should they continue to move in that direction, it’s hard to know if they may consider restricting access to bib records as well. Perhaps they should split off the part of their company that is clearly a vendor and return to serving all libraries equally, which would guarantee continued access to everyone.

By Jenna Webster
Jun 22, 2022, 1:10 PM

‘Bibliographic data is a very substitutable good’ – well that depends on whether you are talking about a minimal physical description record, or a much richer subject-cataloged record enhanced with various expansive elements like abstract, table of contents, etc. The fact that libraries are willing to pay OCLC for access to copy their records when there are already plenty of free sources to copy from using z39.50 or the newer SRU technologies tells you that librarians see added value in getting authoritative records (not to mention authority records, inside joke) from OCLC. Disclaimer: mine is one of the very few Canadian libraries who is NOT a member of OCLC and does use those free tools instead. We actually quit OCLC when that controversy this post mentions regarding their claiming intellectual property rights over the work that member libraries contributed happened.

By Melissa Belvadi
Jun 22, 2022, 3:29 PM

A “very substitutable good” is very much conditional on the following important caveat: the quality of that data. Bibliographic data can be of highly variable quality. A note from the NISO Ebook Metadata Recommended: Publishers, please don’t include ‘NY Times Bestseller: ________’ in your title metadata in an ONIX record. (but I digress)

By Todd A Carpenter
Jun 22, 2022, 4:30 PM

Thanks Todd, I always enjoy reading your posts. A few quick clarifications if I may.
Clarivate is not building a repository – once launched, the intention is that MetaDoor will be an online service that will simply forward records from one institution to the requesting institution. So, all records will continue to live and belong to the libraries *in* their library system the service will help libraries leverage the collective work to enrich *their* own catalog. None will belong nor live in MetaDoor.
Also, the historical development of SkyRiver is not being used to seed the MetaDoor platform.
I hope that is helpful.
Lisa Hulme – Clarivate

By Lisa Hulme
Jun 23, 2022, 3:54 PM

Should have called it Bibster!

By Richard Wynne
Jun 24, 2022, 10:18 AM

“Enrich,” though a commonly used phrase about metadata, has no basis in U.S. copyright law. This is the “sweat of the brow” argument. Metadata is short phrases and pure facts and, though it is best practice to help libraries OUTSIDE the United States, within the United States the sharing should be a default as critical to libraries as understanding copyright is.
[1]: https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1060&context=scholcom#:~:text=Although%20metadata%20is%20arguably%20not,available%20under%20a%20CC0%20license.

By Scott Pope
Jun 28, 2022, 3:12 PM

Except if the metadata is “original” as I think the OCNs (OCLC’s own unique record ID#s) are. I’m pretty sure those at least would qualify for copyright protection. And there is enormous value in having those numbers included in one’s marc records, for doing cross-tabulation comparisons with other data sets and deduping work. I’m part of a group of librarians in Canada who are looking at how to create a schema for librarians to use in tracking their ebook perpetual access rights and one thing we’re looking really hard at is whether we can and should use OCNs as a unique identifier, because ISBNs are an unholy mess when dealing with ebooks across different platforms, but that’s a rant for another day. It’s exactly because of the extensive professional work that OCLC does to “curate” their records that makes their OCNs valuable.

By Melissa Belvadi
Jun 28, 2022, 3:56 PM

Assuming OCNs are copyrightable in the U.S., which I wouldn’t agree to, but I would mention that a number of member libraries claim ownership of the entire record including the OCNs besides OCLC, given that the numbers were not assigned in an artistic manner and are more like sequential lists. Perhaps, though, one could argue that the number of libraries that use that record is unique to the OCLC membership, and the sorting by number of libraries. I don’t feel like Clarivate is the one to take on this battle, but rather PCC libraries, and find value in libraries working together, but not the kind of value that OCLC has put on our profession.
[1]: https://www.casemine.com/judgement/us/5914c232add7b049347be0f5

By Scott Pope
Jun 28, 2022, 5:09 PM

You may be right about the OCNs and copyright. Just reviewing old US decisions, West (legal publisher) lost a copyright case about pagination, so if a court decided OCNs are just sequential like pages, they probably aren’t copyrightable. I hope they aren’t because that would make it easier for our perpetual access project to use them freely.

By Melissa Belvadi
Jun 28, 2022, 6:33 PM

Todd,
Thank you for taking the time to take a very challenging topic and providing the history and your insights.

Yup, the fight is on over metadata and who has what rights.

By Darrell W Guntet
Jun 25, 2022, 8:28 PM

As an update: On June 27, 2022 the Ohio Judge overseeing this case has granted OCLC’s request for a restraining order on Clarivate’s development of MetaDoor, with a hearing scheduled for December. Clarivate is barred from:
“contacting or communicating with OCLC WorldCat customers about: downloading, uploading, linking to, transferring and/or otherwise sharing WorldCat records or metadata for MetaDoor… ;
partnering or assisting Defendants with developing MetaDoor…;
requesting from OCLC WorldCat customers any OCLC WorldCat records and metadata or records and metadata derived from the same for the use of MetaDoor;
[and]
retaining, using, or making available to the public any OCLC WorldCat records and metadata or records and metadata derived from the use of MetaDoor which was obtained from OCLC WorldCat customers.”

To read the full ruling: https://t.co/tcjjnCrVol

By Todd A Carpenter
Jun 28, 2022, 3:38 PM

There are SO many things to discuss here Todd! I am appreciative of your thoughts and description. A few things I was noting in this case so far:

This lawsuit claims that Clarivate is interfering with the OCLC’s business by both inducing (and conspiring to induce) libraries to break with their agreements with OCLC by sharing WorldCat records and metadata as MataDoor enters the market. Upon reading the complaint I notice a clear customer-based legal strategy emerging: Don’t sue the paying customer.

Who was doing the breaching of these agreements? I noticed that it was (potentially) the libraries that were actually in breach, no? But OCLC wouldn’t dare sue one of their customer libraries (yet?). Does each of the OCLC Framework Agreements that libraries have signed have that level of remedies available? It seems to me from the policies you list above, much of these agreements have a flavor of best community practice/policy without the coercive power of a contract? Just speculating.

So, if they are choosing to not go after these potential library breaches, then clearly this is where the choice of the tort of “tortious interference with a contract” comes in. I love a good common law tort – which is being used here – “an infringement of a right leading to monetary liability (or injunctions)” – along with conspiracy to commit those torts. This “tortious interference” with business is a very old law. The idea of interfering with a competitor’s business was some of the earliest business cases in common law. [It’s illegal to threaten another’s business to cause “mayhem and vex with suits”!]

As OCLC is in Ohio, they get a bonus tort – Ohio law recognizes claims for tortious interference with a contract AND tortious interference with a business relationship. Two very similar claims for the price of one.

To prove a claim of tortious interference with a contract under Ohio law, a plaintiff must prove these factors: 1. the existence of a contract (check) 2. the wrongdoer’s knowledge of the contract (check?) 3. the wrongdoer’s intentional procurement of the contract’s breach (hmmmm…) 4. the lack of justification (uh-oh) 4. resulting damages (maybe).

The elements for OCLC’s separate claim for tortious interference with business relationships are almost identical, the main distinction being that interference with a business relationship includes intentional interference with prospective contractual relations, not yet reduced to a contract.

Sometimes this law is used to stifle competition. And I suspect that’s what is happening here. And the courts have struggled for decades with the question of when competition crosses the line from “good old fashioned free market norms” into an illicit action that is tortious under the law. OCLC clearly thinks the line was crossed.

However, proving all these factors is no small task. If this gets to trial, OCLC is going to have to prove that Clarivate intentionally procured the breach of the contract. A claim for tortious interference with a contract requires the plaintiff to prove, as an element, an actual breach of contract.

And, even if Clarivate’s interference with OCLC’s contract causes damages to be suffered, that interference does not constitute a tort if the interference is “justified.” The “justification” factor requires proof that the defendant’s interference with another’s contract was “improper.” That’s another hurdle. Competition is not, generally, improper. There’s a whole set of factors to prove if something is “improper.”

(Now that I have moved all this together into a very long, but unfinished comment, I think I’ll turn this into a blog post later…. more soon. In the meanwhile – great article!)

By Kyle K. Courtney
Jun 29, 2022, 2:55 PM

The Scholarly Kitchen

Todd A Carpenter

Discussion

Annual Meeting Early Registration is Open—Download the Preliminary Program now!

SSP Sponsors 2nd Student Journal Symposium for Literary and Research Publications

SSP Originals Auction is Back!

Todd A Carpenter

Related Articles:

Next Article: