As a former full-time PID person (until recently I was ORCID’s Director of Communications), I am convinced of the important role that persistent identifiers (PIDs) play in supporting a robust, trusted, and open research information infrastructure. We already have open PIDs for research people (ORCID iDs) and research outputs (DOIs), but what about research organizations? While organization identifiers do already exist (Ringgold identifiers, for example, have been widely adopted; Digital Science’s GRID is still relatively new), until recently there has been no truly open equivalent. But that’s changing, as you will learn in this interview with the team behind the newly launched Research Organization Registry—ROR.
What is ROR?
ROR stands for “Research Organization Registry”—a community-led project to develop an open, sustainable, usable, and unique identifier for every research organization in the world. When we talk about ROR we are alternately or sometimes simultaneously referring to a project, a layer of infrastructure, an identifier registry, an element of metadata, and/or a community of people.
The ROR registry launched in January 2019. It currently holds records for close to 100,000 organizations, all with unique ROR IDs and associated metadata. In addition to the registry itself, there are tools and interfaces for working with ROR data, such as a front-end search, an open API, a reconciler that works with OpenRefine to clean up messy affiliation data, and more.
How did ROR come about and who was involved?
ROR officially launched in 2019, although the effort to bring it to fruition goes back several years. ROR’s origins are in the OrgID initiative, in which 17 different organizations (representing publishers, libraries, platform providers, metadata services, and other stakeholders) worked together to define a vision for a community-led registry of organization identifiers. At that time, there was no organization identifier registry that focused on the use case of affiliation disambiguation and that was truly open, community-supported, sustainable, and that integrated with other foundational infrastructure. The collaboration that ultimately led to ROR entailed requirements-gathering, community workshops, working groups, and proposals to develop a core set of specifications and recommendations for the registry, and to seek expressions of interest from organizations that wanted to be involved in implementing and running it. An analysis of existing organization identifier data was conducted; GRID was declared the best fit for the affiliation problem, and its curator, Digital Science, was happy to donate the data to the community under a CC0 license. In the discussions and planning process that followed, it became clear that building and launching a pilot registry would be a practical place to start, with governance and other community layers ultimately built around it. In late 2018, an initial steering group consisting of California Digital Library, Crossref, DataCite, and Digital Science was tasked with implementing the pilot, seeding the registry with this donation of data from GRID. The pilot was called the Research Organization Registry and thus ROR was born!
ROR aims to fill a crucial gap in scholarly infrastructure by enriching the network of open persistent identifiers that help us to discover and track research outputs. The academic community wants and needs to be able to answer questions about the “who”, the “what”, and the “where” of research. Researchers are using ORCID iDs to address the “who”, and DOIs are assigned to articles, datasets, dissertations, and other outputs to define the “what”. However, before ROR, a fully open identifier for the “where” was missing.
Persistent identifiers are increasingly central to the global research landscape, facilitating access to research as well as the tracking of research use and impact. But without an open and community-governed identifier for the institutions affiliated with authors and outputs, identifiers for people and outputs only take us so far. We see ROR as a key missing piece of the puzzle.
How do ROR identifiers differ from other organization identifiers — and do we really need another one!?
The response to ROR’s launch as well as the collaborative effort that preceded it has demonstrated the existence of both a need and a desire among the scholarly community for open and trusted infrastructure to identify research organizations—the specific niche that ROR seeks to fill. While other organization identifiers preceded ROR, none of them specifically addressed community governance needs or the focused use case of affiliation.
ROR currently maps its IDs to other identifiers for the same organization, such as GRID, Wikidata, ISNI, and Crossref’s Funder ID. This kind of interoperability and ability to link and crosswalk identifiers is central to ROR’s aims.
So to answer the second part of this question, we do need another one! However, what is needed is not just any organization identifier, but one that meets the following criteria:
- Community-driven. ROR is uniquely focused on building accessible infrastructure by and for the scholarly community. No single organization should “own” ROR.
- Focused in scope. ROR is specifically focused on capturing and identifying the affiliations associated with research outputs. It is not meant to be a comprehensive registry of all legal entities in the world, nor is it focused on identifying departments or sub-units within an organization. ROR’s aim is to provide an open and usable registry of top-level research organizations.
- Open. ROR data is CC0 and will always be free and open for all. We believe that data on research outputs by institutions should not be locked in a commercial database or behind a paywall.
- Embedded in and interoperable with existing infrastructure. ROR IDs are already supported in DataCite metadata and will soon also be included in Crossref metadata. This means that repositories, publishers, and others registering metadata in DataCite and Crossref can collect ROR IDs for affiliations and include these in their deposits, making it easier to track and discover research outputs by specific affiliations. This interoperability, as well as widespread adoption in foundational scholarly infrastructures, is another unique feature of ROR.
Who do you hope will use ROR identifiers and why?
Librarians and academic administrators increasingly need access to data on their institutions’ publications and research outputs in order to support reporting requirements, funder and government mandates, institutional open access policies, and library collection development and licensing negotiations. We think that ROR can play a key role in serving this need. As a participant expressed during a recent webinar, “I needed ROR yesterday!”
There are several specific implementations of ROR that are worth mentioning to shed light on the broad-ranging applications of an open identifier for research organizations.
First, a unique identifier for affiliations is far more usable and effective than a free-text field. A free-text field allows users to write a given affiliation any number of ways (think University of California-San Diego vs. UCSD vs. UC San Diego, for example), but a ROR ID is unambiguous (https://ror.org/0168r3w48). So we hope that ROR IDs will be implemented in any system that collects affiliations, from repositories to manuscript tracking systems to funder platforms and more. An affiliation field in a form can perform a simple call to the ROR API so that when the user starts typing an affiliation, the API finds possible matches from ROR’s controlled list of institutions (read more about this type of implementation here). The user does not even have to know that a ROR ID is being captured. This is a small implementation that can have a huge payoff, without any additional work on the part of the researcher.
Second, we are excited that DataCite is supporting ROR IDs in their metadata and that Crossref will soon do the same, because this means that repositories, publishers, and others can collect ROR IDs in their own systems and include them in the metadata that they deposit in DataCite and Crossref so that this information can be searchable in their systems.
Third, we believe that ROR data will be used by and useful for anyone who needs to track or collect institutional research outputs — research administrators, policy administrators, funders, librarians, institutional repositories, and others.
You’ve already got some early adopters – who are they and how are they using (or planning to use) ROR identifiers?
We have some great examples of ROR implementations and are looking forward to seeing others coming to fruition soon.
When Dryad relaunched its data publishing platform a couple of months ago, it included an affiliation field for the first time so that datasets could be associated with an affiliation. Instead of letting researchers enter their affiliations as free text, Dryad implemented a ROR API lookup enabling the researcher to choose from the controlled list of institutions in the ROR database. This is completely invisible to the researcher, but it means that Dryad can now collect clean and consistent affiliation data for all its datasets.
Also, as mentioned above, DataCite has now updated its metadata schema to include support for ROR IDs. So when a repository like Dryad sends metadata to DataCite, it can include ROR IDs in this deposit. DataCite has also implemented ROR IDs in its DOI registration form, and now includes an affiliation facet (tied to ROR) in its front-end search, so it’s easy to look up datasets and other objects by the institution (the DataCite blog includes more details about how this works).
Other implementations at various stages of development include Crossref, Rescognito, Altum’s ProposalCentral, Cobaltmetrics, DataSalon, and the FREYA project. And there are others who expect to be able to make announcements about their use of ROR soon.
Can you share any lessons learned from their experiences so far?
We have seen a great deal of enthusiasm about ROR implementations, either by those implementing or by those benefiting from implementations. One lesson learned is about just how wide-ranging the application of ROR IDs can be — so many systems can benefit from them! Another lesson is that the implementation can be quite simple and does not need to require a great deal of developer time, as the Dryad experience showed. The Dryad implementation has also underscored the importance of understanding ROR in the context of end-to-end workflows and how the metadata travels downstream, as well as how different systems using ROR IDs need to connect (e.g., Dryad and DataCite). Finally, we are learning that while the universe of affiliations is relatively small compared to the scope of, e.g., ORCID, there will always be affiliations that are not already included in ROR, and we need robust workflows to support long-term curation of the registry.
How are you going to ensure that ROR really is persistent — what’s your business model?
The social and cultural aspect of adopting new technologies is often far more crucial than simply having the technology available, so the level of engagement and action ROR is seeing is a good indicator of long-term persistence.
In terms of our sustainability plan, the organizations leading ROR have made a commitment to continue operating ROR with in-kind staff resources, but we recognize that additional dedicated resources are needed to support technical development and wider ROR adoption. We are in the middle of a fundraising campaign right now to fund the hiring of two FTEs for development and adoption and to cover the basic costs of running the registry like hosting servers. This campaign will last through the end of 2021 and you can see the list of early ROR supporters on the website. We plan to launch a paid service tier in 2022 to cover operational costs, while keeping the registry data itself open and free, always.
What’s next for ROR — where do you see the initiative in one/five/ten years time?
We envision ROR being integrated into all layers of the scholarly communication landscape in the next five years, starting with implementations like the ones mentioned above and eventually becoming the “new normal” for how we all handle affiliations. Ten years from now, we should have proven that, with extensive community support, we can build and sustain this kind of infrastructure without the unnecessary overhead of forming a new organization or new membership model.
We’re also envisaging full global adoption as it will take more than the “usual suspects” to make ROR truly successful long-term. To start, we have just announced four additions to the ROR Steering Group, including the Academy of Science of South Africa and the Japan Science & Technology Agency, as well as the Association of Research Libraries and the Coalition for Networked Information. So, we’re looking forward to a bright and broad ROR future!
Thanks so much for the invitation to be interviewed; anyone with questions or implementation ideas can get in touch at email@example.com.