It seems that barely a month goes by these days without another acquisition in the scholarly communications and publishing space. Most of the attention has focused on major acquisitions by Elsevier and Clarivate, particularly Elsevier’s recent acquisition of interfolio, the company behind the reporting tool researchFish, and Clarivate’s purchase of ProQuest at the end of last year. And to be sure, their movement towards scholarly workflow tools and platforms is an extremely important development. The recent news that the Copyright Clearance Center will acquire Ringgold is an important reminder that many other firms, including not-for-profits, are actively pursuing growth strategies that contain elements other than organic growth. It is also another confirmation of the extreme strategic value of infrastructure, including in particular the persistent identifiers, lovingly known as PIDs, that is needed to advance scholarly communication in an increasingly open access environment. And it raises the question of whether infrastructure will be managed openly through community governed organizations or the extent to which the sector can live with its privatization.
The Copyright Clearance Center
Traditionally, Copyright Clearance Center’s business model was straightforwardly captured in their name. They provide collective copyright licensing and agency services for copyright holders, perhaps best known in some circles for the “e-reserves” suit against Georgia State, but in ways that extend far beyond litigation. CCC has roots as a community organization, with a clear sense today of its own strategic direction. As an organization form, CCC is somewhat unusual for being considered a not-for-profit organization for state purposes while it is treated as a regular corporation by the federal government.
For the past decade, CCC has been gradually working to combine its licensing and agency expertise with a digital-first search and analytics business. This transition began with the launch of the RightFind suite of products, and their notable acquisition of Boston-based Pubget in 2012, a product that was designed to help researchers in academia and industry find appropriate legal copies of research works based on aggregated data from a range of metadata sources. Their most recent discovery service RightFind Navigate is a knowledge management platform that combines publicly available and subscribed content along with a company’s own internal resources, building it into a custom knowledge graph. It’s hard to say what CCC’s RightFind Navigate is directly competing against, as the platform covers a lot of ground and appears to be a combination of traditional knowledge management products like Document 360, intelligence solutions like Clarivate Cortellis, search solutions, and document delivery suites like Reprints Desk Article Galaxy.
CCC has also made a big move into the open access management space with a publisher-facing product, RightsLink for Scientific Communications (RLSC), aimed at making the notorious administrative challenges of transformative agreements easier to manage. As we’ll discuss in this post, adding value to RLSC is the most obvious immediate benefit that Ringgold can offer to CCC.
As one of us (Phill) and our colleague Alice Meadows have discussed previously here and here, PIDs are an increasingly important, if often misunderstood component of scholarly infrastructure. As David Worlock points out, PIDs are critical enablers of discovery. They also have important use cases in rights management (a business CCC is deeply embedded in), and the supply chain. As the volume of content has exploded due to ever-increasing levels of research activity and the open research movement — enabled by advances in information technology — there has been a similar explosion in the diversity of the types of content that are disseminated, including research data, open methods, open citations, source code, and so on. There’s a growing need for search and discovery that goes beyond typing in a keyword and hoping for the best.
Ringgold’s primary business is in the provision of institutional identifiers and metadata so that publishers can effectively manage their customer lists. And, it’s absolutely still an issue in the world of transformative deals, where submitted manuscripts need to be associated with the institution, or institutions, that are paying for publication.
Ringgold’s management of identifiers is really less about the assignment of numbers, but really tying that identifier to the metadata and the downstream use cases of that identifier. Ringgold had targeted a very specific use case of institutional IDs. Importantly, Ringgold invested heavily in connecting institutions with their related entities, a notoriously thorny challenge in research information management. They serve this community by curating the metadata associated with those institutional IDs they created, because these data change so frequently, particularly the related entity information. These data need to be maintained and that is messy, expensive work.
CCC cares about PIDs
CCC’s acquisition of Ringgold is the latest step in their own transformation from a content based business to a workflow and analytics information technology company. The press release that announced the acquisition give some indication of CCC’s thinking in buying Ringgold:
“Globally unique PIDs are essential for creating connections between articles, researchers, institutions, and funders,” said Tracey Armstrong, President and CEO, CCC. “We look forward to collaborating with Ringgold and industry stakeholders to further invest in identifiers to power interoperability, and data-driven applications. In particular, we will collaborate with partners to infuse PIDs earlier in the research lifecycle, addressing market demand for consistent use of PIDs in the article workflow.”
“Earlier in the research lifecycle” is the key element of the whole deal. It is an important departure from the standard practice that has emerged up to this point.
In recent years, a lot of companies, CCC included along with Digital Science, Elsevier, Clarivate, a number of publishers, and a host of startups and initiatives, have invested in trying to combine multiple data sources for scholarly metadata and harmonizing it through manual curation, machine learning, or a combination of both to try to build databases that expose the connections between objects in the scholarly ecosystem such as articles, researchers, institutions, grants, funders, journals, publishers, disciplines, etc., etc.
Unfortunately, however cleverly data sources are combined, no matter how well thought out the data model and how smart the algorithms are, there will always be gaps and errors as the computer is forced to make a best guess based on incomplete information. As any computer scientist will tell you: garbage in, garbage out. That is why the most savvy and successful research graph builders, including the ones mentioned by name above, are also working to increase adoption of PIDs and improve the workflows that associate metadata with entities as far upstream as possible.
Given the growing trend toward open access and the free availability of content, corporations in scholarly communications are recognizing that their businesses need to be based on more than simply collecting rights, distributing content, and licensing secondary uses of that content — the business that CCC is focused on. Making an investment of this sort in a PID organization gives CCC several interesting opportunities to integrate PIDs earlier in the knowledge and information supply chain.
So, why Ringgold? Why not just use ROR?
CCC clearly has an interest in persistent identifiers and the enhanced ability to build accurate knowledge graphs based on connections between entities, but why would CCC need to own a PID company rather than just making use of identifiers? This question is particularly pressing given that the new research organization PID on the block, ROR, as a community-led open identifier, might be seen as making Ringgold superfluous. Over email, Phill asked Babis Marmanis, CTO of CCC what drew them to Ringgold. This was his reply:
“At CCC we consider data quality as a differentiator, and the key ingredient for solving many of the problems that one encounters in the scholarly communications landscape today….Ringgold is a recognized leader in persistent identifiers for organizations and institutions and they have created a rich collection of associated metadata for each organization in their database. We intend to make the Ringgold data even better, expand in areas that our customers require more information, optimize processes, and integrate with customers and partners to enable the solution of many information problems in the industry.”
Similarly, Laura Cox, Chief Financial and Operating Officer at Ringgold wrote this, also over email:
“…We will be working together to fully integrate Ringgold and CCC and plan to invest in infrastructure while we simultaneously work to expand partnerships and explore opportunities to collaborate and experiment to address new uses cases where PIDs can be used earlier in the research lifecycle.”
So it’s not just about Ringgold being an identifier, or about the associated metadata. Ringgold have invested a huge amount of work into mapping the relationships between organizations and name variations (e.g., University of Oxford vs Oxford University). It’s not just a standard number for each entity, but a set of cross-references enabling a messy list of organizations to be readily resolved. They’ve built this because their primary business model revolves around enabling publishers to manage customer lists, given how messy institutional names and hierarchies are.
Up to now, at least, ROR is simply an identifier. The ROR identifier connects an institutional name with a persistent URI (Uniform Resource Identifier) so that the name can be referenced in other systems. ROR currently only identifies institutions at the organizational level, not at a sub-organizational level, such as departments or research institutes. As a relatively new identifier, it doesn’t yet have the same level of metadata and cross-references that Ringgold brings to bear. For this reason, Ringgold alone has the potential to be an extremely valuable resource for CCC in a variety of ways. For example, for RightFind Navigate, the ability to more accurately associate people, publications, outputs, grants, patents, and other entities to institutions would likely improve the quality of knowledge graphs and by extension, search results. For RightsLink for Scientific Communications, anything that improves matching of publications to institutions with transformative deals would be a big win.
Finally, we shouldn’t ignore the fact that Ringgold IDs have been around a lot longer than ROR and are more embedded in research infrastructure. Organizational affiliations in ORCID have been validated against Ringgold’s database since 2013, and are used by a variety of database providers, like EBSCO, for example for many years. Ringgold has real community value, and commensurate sources of revenue.
Given that Ringgold offers unique value that is not on ROR’s current roadmap, but that ROR has drawn some attention away from Ringgold in recent years as a community initiative, bringing Ringgold into CCC may be beneficial not only for CCC, but also for the broader community that relies on RInggold. If CCC continues to maintain Ringgold, not only for its own use, but for that of its community of publishers and information organizations, Ringgold’s sustainability will be better assured and CCC’s position in the market as controller of that infrastructure will be strengthened.
The question now is, assuming that CCC can successfully integrate Ringgold into its business, what will it do with a more powerful market position? Will it treat Ringgold as a community resource and perhaps even oversee a constructive merger of ROR and RInggold? Or will it follow in the footsteps of other major information businesses, seeking to build supercontinents, and leverage control of the data for its own growth and strategic goals?
In some ways, CCC’s ability to completely wall off the institutional identifier itself is limited by Ringgold’s connection with the International Standard Name Identifier (ISNI) system, an ISO standard for identifying the names of entities. Ringgold moved early to adopt the ISNI system and it is formally a Registration Agency for the ISNI system, assigning ISNIs to institutions since 2012. As such, the identifier assignment is governed by ISO rules. To move to control the data in the way some have envisioned would be limited by this contractual relationship with ISO, which CCC could exit, but it would then lose the connection to the international identifier community that has served Ringgold reasonably well. This doesn’t mean that the identifier or metadata is freely available, but that the data should be available under Reasonable and non-discriminatory terms and only on a cost-recovery-basis. Of course, there is much more robust information associated with the identifier than the basic, kernel metadata. Those services, such as pattern matching or services like RightsLink are ways that CCC can leverage these data to make a return on its investment.
The identifier community remains strangely fragmented, resulting in overlaps, inefficiencies, and strategic limitations. One might have thought that Ringgold putting itself up for sale could have resulted in CrossRef or ORCID, for example, taking steps to begin the consolidation of identifier providers. We look forward to seeing if CCC’s entry into this infrastructure space produces a spur for greater community consolidation over the course of time, or rather if it results in the privatization of infrastructure going forward.
Note: Todd Carpenter is Committee Manager for the ISO Technical Subcommittee (ISO TC 46/SC 9) that manages ISO’s content identification systems, such as ISBN, ISSN, DOI and ISNI. Additionally, in March 2013, NISO published the Institutional Identification Recommended Practice, which provided guidance on the use of ISNI for this purpose. This Recommended Practice was published nearly 6 years before the launch of ROR and has not been updated to reflect the changes in the marketplace in the intervening years.
7 Thoughts on "Is Infrastructure Consolidation the Next Step? CCC Acquires Ringgold"
It’s worth noting that ORCID currently supports several organization ID types and will soon be using ROR as its default identifier type https://info.orcid.org/add-research-institution-identifiers-with-ror. Behind the scenes in the ORCID DB, ROR will be used to unite the various IDs it has for each organization into a single record. This project has been underway for some time and is getting close to completion https://trello.com/c/JEkqoTb5/67-epic-integrate-ror-research-organization-registry-and-rationalize-organization-ids . Full disclosure: I’m the current ROR technical lead and former technical lead at ORCID.
Thank you for this extra information about ROR. I was aware that ROR is doing this work. I remember from my Digital Science days that cross-referencing other identifiers, including both ISNI and Ringgold was part of what GRID did, which provided a good chunk of seed data for ROR.
ROR is an extremely promising initiative. Speaking for myself, I’m hugely supportive of it as an example of community-based open infrastructure. Institutional identifiers are a challenging problem because there are not only spelling variants, but names change over time and the map of relationships is complex. When you get below the institutional level into departmental, school, institute level etc, things become even more complicated and fluid.
My hope is that CCC, with its traditional community-oriented mission will continue to work with the open infrastructure and PID community, including ROR, in ways that will help ROR scale and deliver on its own mission.
My 25 years of service on the CCC board ended in 2019, so I am not in a possition to make any comment about this specific acquisition, but my long experience with the CCC taught me that it is as much mission-driven as an organization in the way nonprofits are as it is a typical profit-oriented business. So I am confident that a concern for community service will play a major role in whatever the CCC does going forward.
With respect to the question of whether infrastructure will be managed openly through community governed organizations or the extent to which the sector can live with its privatization, it might be helpful to remind readers that by design, ORCID is independent and cannot be sold. We are a not-for-profit 501(c)3 organization registered in the United States. As such, we are subject to laws of the US Internal Revenue Service that specify that we cannot be purchased or otherwise managed by a commercial entity.
I think the article would have been a little more balanced if you had been able to include a quote/comment from ROR, personally I see a down side in the defragmentation of Institutional ID’s away from the open community driven ROR, especially for the open science and open source groups and community. I think it was a smart move for Digital Science to open up GRID to be the foundation of ROR, and there’s real potential for all stakeholders to support and get behind one unifying open organizational ID system.
Thanks for commenting. I could not agree more on the general point the open, community driven infrastructure has a huge number of advantages. These include a likelihood of greater interoperability, efficiency, reduction of researcher burden, and I believe, greater persistence. In fact, I co-founded a company that does a lot of work in this area.
In relation to this post, though, we wanted to focus on the motivations of CCC and Ringgold in entering into this partnership, and the possible implications. Although some level of comparison between the two identifiers had to be included, we didn’t want to turn this into a Ringgold vs ROR debate. As we mention in the post, we don’t think it’s quite that simple. Certainly my hope (I’m not speaking for the other authors) is that Ringgold and ROR work together closely in the future, with a view to making the Ringgold data more open for the community, but that’s a matter for CCC now.
I’m personally planning to write more on the topic of open scholarly infrastructure, including ROR.