Early movers in any market have an advantage over their competition. The Ohio College Library Center organization (which eventually grew to become OCLC) was no different.
In 1967, before most even knew what a computer was or how it might positively affect our lives, its team based in Dublin, Ohio was making great progress in the electronic management of bibliographic information. In August of 1971, the cooperative helped the Alden Library at Ohio University launch the first online catalog of any library in the world. OCLC continued to be a pioneer in many of the moves to digitize and interconnect library catalog data, all the while improving the services and making libraries more efficient. It also began to generate significant surpluses, with which it continued to expand the services it could provide to the library community and serve a worldwide network. What was once a regional network of libraries has today become a massive institution serving a global community of more than 30,000 libraries, while also becoming one of the leading employers in the state of Ohio.
Every library needs a catalog of the items it holds and acquires. A good catalog is vital to managing a collection and supporting its circulation. This is no small effort and, if individually replicated in the thousands of libraries around the world, it would be a laborious and costly endeavor, particularly if each record was hand-curated by experts in library sciences. It absolutely made sense in the world of digital information that efficiencies could be gained by having the network of libraries share both the work and their data collaboratively, thereby reducing unnecessary duplication and improving overall quality across the network. OCLC claims, reasonably, that it has invested millions of dollars in working with librarians and publishers to build and license as comprehensive a worldwide catalog as possible. It regularly enhances the quality of its records through a variety of approaches. This has benefited the vast majority of libraries and also OCLC’s bottom line.
However, in the past ten years the ecosystem of data has changed significantly. National Libraries have released troves of bibliographic records and linked data. Some libraries, notably Harvard, Penn, and Yale, have released their catalog data publicly under either a CC0 (in the case of Harvard) or an OCD-BY license. Publishers are increasingly eager to share their bibliographic data as a way to drive sales and increase usage. Machine processing of structured data has become incredibly robust, so that grouping, description, linking materials, and enrichment, are much simpler tasks for most with basic coding skills. Whether these moves match the quality expected of cataloging records is an open question, but one can expect improvements will continue to erode the comparative value in the human work of cataloging.
Earlier this year, Clarivate quietly announced a new product, MetaDoor, which is described as an open platform for sharing cataloging records. Possibly building upon data gathered by a company earlier acquired by Innovative Interfaces (which eventually was folded into Clarivate), this new product is being positioned as an alternative, free structure to share catalog data in the community. In trying to recruit members to use the new service and be early adopters, Clarivate has caught the attention of OCLC, who views this new product as an obvious competitor to its flagship WorldCat service. Challenging both the source of the data in MetaDoor and its efforts to recruit participants in this data-sharing ecosystem — in breach of their agreements with OCLC — last week OCLC filed a lawsuit in Ohio courts against Clarivate and its operating units claiming predatory market behavior and tortious interference in OCLC’s contracts with its member organizations. Clarivate has strenuously objected to the claims.
The arguments around what libraries can and cannot do with the cataloging records that they create and then share has been an issue for well over 15 years now. Review of the WorldCat records policy also goes back to the mid 2000s. After announcing changes to its policy in the late 2000s, some in the community rebelled against the newly announced policy, while others supported the changes. This set in motion a new public group to review the data-sharing principles that surround the use of OCLC’s cataloging records, in response to a community petition launched by Elaine Sanchez. In 2009, OCLC launched a group that was tasked to “seek to understand today’s environment as it relates to the creation, use and transfer of data and articulate principles of shared data creation consonant with the values of the OCLC cooperative.” Shortly thereafter, a final report was issued and a new policy was released in 2010.
However, this didn’t end the conversations in the community; the question of the rights of OCLC members has been an ongoing issue that some are keen to press. The Program for Cooperative Cataloging issued a proposed policy statement on open cataloging records in January, outlining a middle ground for sharing of PCC-developed records. Another example: last year, ICOLC produced an internal report that, among other things, criticized OCLC for the costs and interoperability concerns of the records WorldCat aggregates, including limitations on what libraries and other vendors in this space can do with that data. OCLC responded privately, but based on the FAQ that accompanied the legal filing, one can surmise what its response was.
OCLC’s position is that it is working in the best interests of all libraries and does a tremendous service through its aggregation, enhancements, dissemination, and distribution of bibliographic records. Furthermore, it takes the surpluses that this business generates and invests heavily in other library services, tools, and research projects. Many have argued that OCLC is a positive force for libraries and library technology. Others have been more critical, particularly commercial players in this space.
There have been a few entrants into this ecosystem seeking to crack OCLC’s control over the bibliographic data stream. For example, in 2010 SkyRiver, a small bibliographic data services company, launched a lawsuit alleging that OCLC was engaging in anti-competitive corporate behavior by exerting monopolistic control over library data exchange. Innovative Interfaces Inc (III) also joined the lawsuit, which was dropped when SkyRiver was acquired by III in 2013. The service continues to operate and, as part of the larger Clarivate organization, likely seeded the MetaDoor service, but Clarivate’s portfolio likely contains significant amounts of data from other sources. SkyRiver’s catalog service business is a fraction of the size of OCLC’s WorldCat, with roughly 70 million records, compared to WorldCat’s 500+ million according to the complaint. What to do with this resource and how to position the service seems to have been lurking behind the scenes as III, ExLibris, ProQuest, and now Clarivate have moved through their various merger activities, most likely because strings of corporate mergers can be quite distracting. The lack of a comprehensive, unified catalog of library holdings data is a gap in Clarivate’s library services technology stack. Clarivate might also argue that less interoperability would be needed in the world of library services if all of the technology was handled by a single provider. But this sole-source provision of all services, while appealing at first glance, would also put the community troublingly at the whims of that one provider.
Obviously, if a new high-quality source of bibliographic data becomes available, it could have a negative impact on the sales of other data stores. Bibliographic data is a very substitutable good, and free is always a better price point than anything more than not-free. Whether the free data is really as high-quality as curated data and whether it is fit for purpose are the important questions. Despite the occasional complaint, the OCLC data is of an extremely high quality. Could a similar level of data be achieved by others, with the right resources (say 74% market share of the worldwide academic ILS market and $1.9 billion in revenues per year)? Almost certainly, yes. Rather than doing the costly enrichment and quality control itself, Clarivate is seeking to leverage the collective work of the library community in enriching its service. There is certainly a reason to believe some in the library community might be motivated to contribute to an open repository of cataloging data. However, as Kaitlin Thaney at Invest in Open Infrastructure often notes, people should be wary of the interests controlling the platform on which open data/content is being shared. One should be reminded of the internet-age adage: “If you’re not paying for the product, you are the product.”
To the core claim of OCLC, it would strike me as odd if Clarivate wouldn’t be scrupulous in where it would be gathering data from, since much of the core of this data is freely available as linked data, or from publisher’s feeds, or other resources. However, in an environment of machine crawling, haphazard sharing, and a world of many billions of records, errors are bound to happen. Clarivate certainly would have the burden of ensuring it is not inappropriately republishing licensed data, in much the same way it shouldn’t be republishing text content from other publishers in its products, i.e., that they know where they were gathering data from and what rights are associated with it. Of course, if there is proof to the contrary, then OCLC most certainly has a case in the United States.
As to whether this is a tortious breach, this is certainly something I am neither qualified to delve into the details of, nor something I choose to speculate on. Clearly this question will either be settled or head to the courts, because so much is riding on the outcome for OCLC. OCLC has proven adept at using Ohio Law – even going so far as to lobby for changes to Ohio Law – to its benefit. This is not at all a criticism; all large companies (as well as non-profits and educational institutions) use legal and lobbying systems to achieve their ends.
One question at issue centers on, in an esoteric way, the presence of an OCLC Cataloging Number (OCN) in the catalog records that MetaDoor has shared (clause 97 on page 22 of the filing), which OCLC is taking as proof that the record did indeed begin within their records system. However, in 2013 in the early days of linked open data (which perhaps ended not long thereafter) Jim Michalko, then part of the team at OCLC Research (since retired), wrote about the value of sharing OCN data publicly and announced that OCLC had taken the decision to release the OCN data as if it is in the public domain. An Archive.org version of the OCLC WorldCat Record Use and Data Licensing terms page from 2014 states as much. It is unclear that this remains OCLC’s policy — probably not, because that current page does not list any such language. However, to this day, OCLC’s definition of Worldshare Reports elements under the OCLC Number definition states that: “OCLC encourages the use of the OCLC Control Number in any appropriate library application, where it can be treated as if it is in the public domain.” The “as if” in that sentence is carrying a lot of weight here — who can really tell what it means when something is kinda-sorta-public-domain-but-not-really. Perhaps it is the sort of thing that could land one in court. “Squishy” legal language often surrounds open access licenses that aren’t strictly CC-BY, causing headaches for users who can’t decide what they can or can’t do with something they find on the web. It also highlights the problem that is created once you release information onto the web with the equivalent of an open license: even if you subsequently want to take it back, sometimes it is impossible to do so in reality. I don’t want to suggest that this unclear language is indicative of how Clarivate accessed this information, or whether whomever has shared it based their decision to do so on this, but it is relevant.
This is certainly not to say that Clarivate is acting in the world’s interest here. As the market-dominating powerhouse in library systems, especially in the academic market, there is a real benefit to Clarivate of breaking the control over cataloging records that OCLC has held for decades. Many of the services that one can envision libraries — or networks of libraries — requiring will need reliable information on holdings to function properly. One solution is interoperability, but this requires a willingness to share data equitably, in standard forms, and under reasonable terms, something that isn’t always in corporate interests to pursue. The near-monopoly OCLC has on this data inhibits some potential products and services that ExLibris (and others) sought to pursue prior to the Clarivate merger. Increasing the power of an already dominant player probably is not a good thing for libraries writ large, since a lack of competition in a marketplace tends not to yield better results for customers. SPARC made this point in its unsuccessful effort to influence the FTC’s exploration of whether to block the ProQuest/Clarivate merger last year.
If Clarivate is actively recruiting libraries to share the data that they possess in breach of their contracts with OCLC, the ultimate decision on whether this sharing is permissible – or whether, in fact, breaches their contracts – is taken by the library providing the cataloging data, not Clarivate. If breaches are happening, then there are libraries that might also expect an action for breach of contract. However, suing your members/customers is never a wise business strategy, so OCLC is likely to tread lightly there. It seems according to the filing (if proven) that some on Clarivate’s team are inartfully encouraging this breach — or perhaps more generously, encouraging the belief that this is not a breach — but, based on the complaint, there is probably little hard proof this is happening at scale. Perhaps more specifics will come to light in the discovery process or at trial.
Realistically, this battle is very similar to the discussions afoot in scholarly publishing around access and sharing of data related to discovery and use. The availability of open citations, the discussions about open identifiers for institutions, and the questions around open infrastructure all hinge around the ecosystem of data — who controls it, and what can be done with the data once it is aggregated. Looking to the future, there will be many battles raging about who controls what data and who can do things with it. Since everyone wants to be an “analytics company” these days, this could include usage data on open access materials, predictive analysis trained on open content, or (as in this case) leveraging the distribution of open catalog data to embed or extend your market-dominating position.
There is black gold in that data (so they say). And, when there’s gold on the table, sadly, there is usually a fight brewing about who gets to cash it in. This lawsuit likely will not be a single battle between two large players, but part of a larger war to control the data and metadata that are so valuable in our world.
DISCLOSURE: Both OCLC and Clarivate are members of NISO and Mary Sauer-Games (VP of Global Product Development, OCLC) currently serves as Chair of NISO’s Board of Directors. Neither organization nor their representatives were involved in the preparation of this post.