It seems that barely a month goes by these days without another acquisition in the scholarly communications and publishing space. Most of the attention has focused on major acquisitions by Elsevier and Clarivate, particularly Elsevier’s recent acquisition of interfolio, the company behind the reporting tool researchFish, and Clarivate’s purchase of ProQuest at the end of last year. And to be sure, their movement towards scholarly workflow tools and platforms is an extremely important development. The recent news that the Copyright Clearance Center will acquire Ringgold is an important reminder that many other firms, including not-for-profits, are actively pursuing growth strategies that contain elements other than organic growth. It is also another confirmation of the extreme strategic value of infrastructure, including in particular the persistent identifiers, lovingly known as PIDs, that is needed to advance scholarly communication in an increasingly open access environment. And it raises the question of whether infrastructure will be managed openly through community governed organizations or the extent to which the sector can live with its privatization.
The Copyright Clearance Center
Traditionally, Copyright Clearance Center’s business model was straightforwardly captured in their name. They provide collective copyright licensing and agency services for copyright holders, perhaps best known in some circles for the “e-reserves” suit against Georgia State, but in ways that extend far beyond litigation. CCC has roots as a community organization, with a clear sense today of its own strategic direction. As an organization form, CCC is somewhat unusual for being considered a not-for-profit organization for state purposes while it is treated as a regular corporation by the federal government.
For the past decade, CCC has been gradually working to combine its licensing and agency expertise with a digital-first search and analytics business. This transition began with the launch of the RightFind suite of products, and their notable acquisition of Boston-based Pubget in 2012, a product that was designed to help researchers in academia and industry find appropriate legal copies of research works based on aggregated data from a range of metadata sources. Their most recent discovery service RightFind Navigate is a knowledge management platform that combines publicly available and subscribed content along with a company’s own internal resources, building it into a custom knowledge graph. It’s hard to say what CCC’s RightFind Navigate is directly competing against, as the platform covers a lot of ground and appears to be a combination of traditional knowledge management products like Document 360, intelligence solutions like Clarivate Cortellis, search solutions, and document delivery suites like Reprints Desk Article Galaxy.
CCC has also made a big move into the open access management space with a publisher-facing product, RightsLink for Scientific Communications (RLSC), aimed at making the notorious administrative challenges of transformative agreements easier to manage. As we’ll discuss in this post, adding value to RLSC is the most obvious immediate benefit that Ringgold can offer to CCC.
As one of us (Phill) and our colleague Alice Meadows have discussed previously here and here, PIDs are an increasingly important, if often misunderstood component of scholarly infrastructure. As David Worlock points out, PIDs are critical enablers of discovery. They also have important use cases in rights management (a business CCC is deeply embedded in), and the supply chain. As the volume of content has exploded due to ever-increasing levels of research activity and the open research movement — enabled by advances in information technology — there has been a similar explosion in the diversity of the types of content that are disseminated, including research data, open methods, open citations, source code, and so on. There’s a growing need for search and discovery that goes beyond typing in a keyword and hoping for the best.
Ringgold’s primary business is in the provision of institutional identifiers and metadata so that publishers can effectively manage their customer lists. And, it’s absolutely still an issue in the world of transformative deals, where submitted manuscripts need to be associated with the institution, or institutions, that are paying for publication.
Ringgold’s management of identifiers is really less about the assignment of numbers, but really tying that identifier to the metadata and the downstream use cases of that identifier. Ringgold had targeted a very specific use case of institutional IDs. Importantly, Ringgold invested heavily in connecting institutions with their related entities, a notoriously thorny challenge in research information management. They serve this community by curating the metadata associated with those institutional IDs they created, because these data change so frequently, particularly the related entity information. These data need to be maintained and that is messy, expensive work.
CCC cares about PIDs
CCC’s acquisition of Ringgold is the latest step in their own transformation from a content based business to a workflow and analytics information technology company. The press release that announced the acquisition give some indication of CCC’s thinking in buying Ringgold:
“Globally unique PIDs are essential for creating connections between articles, researchers, institutions, and funders,” said Tracey Armstrong, President and CEO, CCC. “We look forward to collaborating with Ringgold and industry stakeholders to further invest in identifiers to power interoperability, and data-driven applications. In particular, we will collaborate with partners to infuse PIDs earlier in the research lifecycle, addressing market demand for consistent use of PIDs in the article workflow.”
“Earlier in the research lifecycle” is the key element of the whole deal. It is an important departure from the standard practice that has emerged up to this point.
In recent years, a lot of companies, CCC included along with Digital Science, Elsevier, Clarivate, a number of publishers, and a host of startups and initiatives, have invested in trying to combine multiple data sources for scholarly metadata and harmonizing it through manual curation, machine learning, or a combination of both to try to build databases that expose the connections between objects in the scholarly ecosystem such as articles, researchers, institutions, grants, funders, journals, publishers, disciplines, etc., etc.
Unfortunately, however cleverly data sources are combined, no matter how well thought out the data model and how smart the algorithms are, there will always be gaps and errors as the computer is forced to make a best guess based on incomplete information. As any computer scientist will tell you: garbage in, garbage out. That is why the most savvy and successful research graph builders, including the ones mentioned by name above, are also working to increase adoption of PIDs and improve the workflows that associate metadata with entities as far upstream as possible.
Given the growing trend toward open access and the free availability of content, corporations in scholarly communications are recognizing that their businesses need to be based on more than simply collecting rights, distributing content, and licensing secondary uses of that content — the business that CCC is focused on. Making an investment of this sort in a PID organization gives CCC several interesting opportunities to integrate PIDs earlier in the knowledge and information supply chain.
So, why Ringgold? Why not just use ROR?
CCC clearly has an interest in persistent identifiers and the enhanced ability to build accurate knowledge graphs based on connections between entities, but why would CCC need to own a PID company rather than just making use of identifiers? This question is particularly pressing given that the new research organization PID on the block, ROR, as a community-led open identifier, might be seen as making Ringgold superfluous. Over email, Phill asked Babis Marmanis, CTO of CCC what drew them to Ringgold. This was his reply:
“At CCC we consider data quality as a differentiator, and the key ingredient for solving many of the problems that one encounters in the scholarly communications landscape today….Ringgold is a recognized leader in persistent identifiers for organizations and institutions and they have created a rich collection of associated metadata for each organization in their database. We intend to make the Ringgold data even better, expand in areas that our customers require more information, optimize processes, and integrate with customers and partners to enable the solution of many information problems in the industry.”
Similarly, Laura Cox, Chief Financial and Operating Officer at Ringgold wrote this, also over email:
“…We will be working together to fully integrate Ringgold and CCC and plan to invest in infrastructure while we simultaneously work to expand partnerships and explore opportunities to collaborate and experiment to address new uses cases where PIDs can be used earlier in the research lifecycle.”
So it’s not just about Ringgold being an identifier, or about the associated metadata. Ringgold have invested a huge amount of work into mapping the relationships between organizations and name variations (e.g., University of Oxford vs Oxford University). It’s not just a standard number for each entity, but a set of cross-references enabling a messy list of organizations to be readily resolved. They’ve built this because their primary business model revolves around enabling publishers to manage customer lists, given how messy institutional names and hierarchies are.
Up to now, at least, ROR is simply an identifier. The ROR identifier connects an institutional name with a persistent URI (Uniform Resource Identifier) so that the name can be referenced in other systems. ROR currently only identifies institutions at the organizational level, not at a sub-organizational level, such as departments or research institutes. As a relatively new identifier, it doesn’t yet have the same level of metadata and cross-references that Ringgold brings to bear. For this reason, Ringgold alone has the potential to be an extremely valuable resource for CCC in a variety of ways. For example, for RightFind Navigate, the ability to more accurately associate people, publications, outputs, grants, patents, and other entities to institutions would likely improve the quality of knowledge graphs and by extension, search results. For RightsLink for Scientific Communications, anything that improves matching of publications to institutions with transformative deals would be a big win.
Finally, we shouldn’t ignore the fact that Ringgold IDs have been around a lot longer than ROR and are more embedded in research infrastructure. Organizational affiliations in ORCID have been validated against Ringgold’s database since 2013, and are used by a variety of database providers, like EBSCO, for example for many years. Ringgold has real community value, and commensurate sources of revenue.
Given that Ringgold offers unique value that is not on ROR’s current roadmap, but that ROR has drawn some attention away from Ringgold in recent years as a community initiative, bringing Ringgold into CCC may be beneficial not only for CCC, but also for the broader community that relies on RInggold. If CCC continues to maintain Ringgold, not only for its own use, but for that of its community of publishers and information organizations, Ringgold’s sustainability will be better assured and CCC’s position in the market as controller of that infrastructure will be strengthened.
The question now is, assuming that CCC can successfully integrate Ringgold into its business, what will it do with a more powerful market position? Will it treat Ringgold as a community resource and perhaps even oversee a constructive merger of ROR and RInggold? Or will it follow in the footsteps of other major information businesses, seeking to build supercontinents, and leverage control of the data for its own growth and strategic goals?
In some ways, CCC’s ability to completely wall off the institutional identifier itself is limited by Ringgold’s connection with the International Standard Name Identifier (ISNI) system, an ISO standard for identifying the names of entities. Ringgold moved early to adopt the ISNI system and it is formally a Registration Agency for the ISNI system, assigning ISNIs to institutions since 2012. As such, the identifier assignment is governed by ISO rules. To move to control the data in the way some have envisioned would be limited by this contractual relationship with ISO, which CCC could exit, but it would then lose the connection to the international identifier community that has served Ringgold reasonably well. This doesn’t mean that the identifier or metadata is freely available, but that the data should be available under Reasonable and non-discriminatory terms and only on a cost-recovery-basis. Of course, there is much more robust information associated with the identifier than the basic, kernel metadata. Those services, such as pattern matching or services like RightsLink are ways that CCC can leverage these data to make a return on its investment.
The identifier community remains strangely fragmented, resulting in overlaps, inefficiencies, and strategic limitations. One might have thought that Ringgold putting itself up for sale could have resulted in CrossRef or ORCID, for example, taking steps to begin the consolidation of identifier providers. We look forward to seeing if CCC’s entry into this infrastructure space produces a spur for greater community consolidation over the course of time, or rather if it results in the privatization of infrastructure going forward.
Note: Todd Carpenter is Committee Manager for the ISO Technical Subcommittee (ISO TC 46/SC 9) that manages ISO’s content identification systems, such as ISBN, ISSN, DOI and ISNI. Additionally, in March 2013, NISO published the Institutional Identification Recommended Practice, which provided guidance on the use of ISNI for this purpose. This Recommended Practice was published nearly 6 years before the launch of ROR and has not been updated to reflect the changes in the marketplace in the intervening years.