As a former full-time PID person (until recently I was ORCID’s Director of Communications), I am convinced of the important role that persistent identifiers (PIDs) play in supporting a robust, trusted, and open research information infrastructure. We already have open PIDs for research people (ORCID iDs) and research outputs (DOIs), but what about research organizations? While organization identifiers do already exist (Ringgold identifiers, for example, have been widely adopted; Digital Science’s GRID is still relatively new), until recently there has been no truly open equivalent. But that’s changing, as you will learn in this interview with the team behind the newly launched Research Organization Registry—ROR.
What is ROR?
ROR stands for “Research Organization Registry”—a community-led project to develop an open, sustainable, usable, and unique identifier for every research organization in the world. When we talk about ROR we are alternately or sometimes simultaneously referring to a project, a layer of infrastructure, an identifier registry, an element of metadata, and/or a community of people.
The ROR registry launched in January 2019. It currently holds records for close to 100,000 organizations, all with unique ROR IDs and associated metadata. In addition to the registry itself, there are tools and interfaces for working with ROR data, such as a front-end search, an open API, a reconciler that works with OpenRefine to clean up messy affiliation data, and more.
How did ROR come about and who was involved?
ROR officially launched in 2019, although the effort to bring it to fruition goes back several years. ROR’s origins are in the OrgID initiative, in which 17 different organizations (representing publishers, libraries, platform providers, metadata services, and other stakeholders) worked together to define a vision for a community-led registry of organization identifiers. At that time, there was no organization identifier registry that focused on the use case of affiliation disambiguation and that was truly open, community-supported, sustainable, and that integrated with other foundational infrastructure. The collaboration that ultimately led to ROR entailed requirements-gathering, community workshops, working groups, and proposals to develop a core set of specifications and recommendations for the registry, and to seek expressions of interest from organizations that wanted to be involved in implementing and running it. An analysis of existing organization identifier data was conducted; GRID was declared the best fit for the affiliation problem, and its curator, Digital Science, was happy to donate the data to the community under a CC0 license. In the discussions and planning process that followed, it became clear that building and launching a pilot registry would be a practical place to start, with governance and other community layers ultimately built around it. In late 2018, an initial steering group consisting of California Digital Library, Crossref, DataCite, and Digital Science was tasked with implementing the pilot, seeding the registry with this donation of data from GRID. The pilot was called the Research Organization Registry and thus ROR was born!
ROR aims to fill a crucial gap in scholarly infrastructure by enriching the network of open persistent identifiers that help us to discover and track research outputs. The academic community wants and needs to be able to answer questions about the “who”, the “what”, and the “where” of research. Researchers are using ORCID iDs to address the “who”, and DOIs are assigned to articles, datasets, dissertations, and other outputs to define the “what”. However, before ROR, a fully open identifier for the “where” was missing.
Persistent identifiers are increasingly central to the global research landscape, facilitating access to research as well as the tracking of research use and impact. But without an open and community-governed identifier for the institutions affiliated with authors and outputs, identifiers for people and outputs only take us so far. We see ROR as a key missing piece of the puzzle.
How do ROR identifiers differ from other organization identifiers — and do we really need another one!?
The response to ROR’s launch as well as the collaborative effort that preceded it has demonstrated the existence of both a need and a desire among the scholarly community for open and trusted infrastructure to identify research organizations—the specific niche that ROR seeks to fill. While other organization identifiers preceded ROR, none of them specifically addressed community governance needs or the focused use case of affiliation.
ROR currently maps its IDs to other identifiers for the same organization, such as GRID, Wikidata, ISNI, and Crossref’s Funder ID. This kind of interoperability and ability to link and crosswalk identifiers is central to ROR’s aims.
So to answer the second part of this question, we do need another one! However, what is needed is not just any organization identifier, but one that meets the following criteria:
- Community-driven. ROR is uniquely focused on building accessible infrastructure by and for the scholarly community. No single organization should “own” ROR.
- Focused in scope. ROR is specifically focused on capturing and identifying the affiliations associated with research outputs. It is not meant to be a comprehensive registry of all legal entities in the world, nor is it focused on identifying departments or sub-units within an organization. ROR’s aim is to provide an open and usable registry of top-level research organizations.
- Open. ROR data is CC0 and will always be free and open for all. We believe that data on research outputs by institutions should not be locked in a commercial database or behind a paywall.
- Embedded in and interoperable with existing infrastructure. ROR IDs are already supported in DataCite metadata and will soon also be included in Crossref metadata. This means that repositories, publishers, and others registering metadata in DataCite and Crossref can collect ROR IDs for affiliations and include these in their deposits, making it easier to track and discover research outputs by specific affiliations. This interoperability, as well as widespread adoption in foundational scholarly infrastructures, is another unique feature of ROR.
Who do you hope will use ROR identifiers and why?
Librarians and academic administrators increasingly need access to data on their institutions’ publications and research outputs in order to support reporting requirements, funder and government mandates, institutional open access policies, and library collection development and licensing negotiations. We think that ROR can play a key role in serving this need. As a participant expressed during a recent webinar, “I needed ROR yesterday!”
There are several specific implementations of ROR that are worth mentioning to shed light on the broad-ranging applications of an open identifier for research organizations.
First, a unique identifier for affiliations is far more usable and effective than a free-text field. A free-text field allows users to write a given affiliation any number of ways (think University of California-San Diego vs. UCSD vs. UC San Diego, for example), but a ROR ID is unambiguous (https://ror.org/0168r3w48). So we hope that ROR IDs will be implemented in any system that collects affiliations, from repositories to manuscript tracking systems to funder platforms and more. An affiliation field in a form can perform a simple call to the ROR API so that when the user starts typing an affiliation, the API finds possible matches from ROR’s controlled list of institutions (read more about this type of implementation here). The user does not even have to know that a ROR ID is being captured. This is a small implementation that can have a huge payoff, without any additional work on the part of the researcher.
Second, we are excited that DataCite is supporting ROR IDs in their metadata and that Crossref will soon do the same, because this means that repositories, publishers, and others can collect ROR IDs in their own systems and include them in the metadata that they deposit in DataCite and Crossref so that this information can be searchable in their systems.
Third, we believe that ROR data will be used by and useful for anyone who needs to track or collect institutional research outputs — research administrators, policy administrators, funders, librarians, institutional repositories, and others.
You’ve already got some early adopters – who are they and how are they using (or planning to use) ROR identifiers?
We have some great examples of ROR implementations and are looking forward to seeing others coming to fruition soon.
When Dryad relaunched its data publishing platform a couple of months ago, it included an affiliation field for the first time so that datasets could be associated with an affiliation. Instead of letting researchers enter their affiliations as free text, Dryad implemented a ROR API lookup enabling the researcher to choose from the controlled list of institutions in the ROR database. This is completely invisible to the researcher, but it means that Dryad can now collect clean and consistent affiliation data for all its datasets.
Also, as mentioned above, DataCite has now updated its metadata schema to include support for ROR IDs. So when a repository like Dryad sends metadata to DataCite, it can include ROR IDs in this deposit. DataCite has also implemented ROR IDs in its DOI registration form, and now includes an affiliation facet (tied to ROR) in its front-end search, so it’s easy to look up datasets and other objects by the institution (the DataCite blog includes more details about how this works).
Other implementations at various stages of development include Crossref, Rescognito, Altum’s ProposalCentral, Cobaltmetrics, DataSalon, and the FREYA project. And there are others who expect to be able to make announcements about their use of ROR soon.
Can you share any lessons learned from their experiences so far?
We have seen a great deal of enthusiasm about ROR implementations, either by those implementing or by those benefiting from implementations. One lesson learned is about just how wide-ranging the application of ROR IDs can be — so many systems can benefit from them! Another lesson is that the implementation can be quite simple and does not need to require a great deal of developer time, as the Dryad experience showed. The Dryad implementation has also underscored the importance of understanding ROR in the context of end-to-end workflows and how the metadata travels downstream, as well as how different systems using ROR IDs need to connect (e.g., Dryad and DataCite). Finally, we are learning that while the universe of affiliations is relatively small compared to the scope of, e.g., ORCID, there will always be affiliations that are not already included in ROR, and we need robust workflows to support long-term curation of the registry.
How are you going to ensure that ROR really is persistent — what’s your business model?
The social and cultural aspect of adopting new technologies is often far more crucial than simply having the technology available, so the level of engagement and action ROR is seeing is a good indicator of long-term persistence.
In terms of our sustainability plan, the organizations leading ROR have made a commitment to continue operating ROR with in-kind staff resources, but we recognize that additional dedicated resources are needed to support technical development and wider ROR adoption. We are in the middle of a fundraising campaign right now to fund the hiring of two FTEs for development and adoption and to cover the basic costs of running the registry like hosting servers. This campaign will last through the end of 2021 and you can see the list of early ROR supporters on the website. We plan to launch a paid service tier in 2022 to cover operational costs, while keeping the registry data itself open and free, always.
What’s next for ROR — where do you see the initiative in one/five/ten years time?
We envision ROR being integrated into all layers of the scholarly communication landscape in the next five years, starting with implementations like the ones mentioned above and eventually becoming the “new normal” for how we all handle affiliations. Ten years from now, we should have proven that, with extensive community support, we can build and sustain this kind of infrastructure without the unnecessary overhead of forming a new organization or new membership model.
We’re also envisaging full global adoption as it will take more than the “usual suspects” to make ROR truly successful long-term. To start, we have just announced four additions to the ROR Steering Group, including the Academy of Science of South Africa and the Japan Science & Technology Agency, as well as the Association of Research Libraries and the Coalition for Networked Information. So, we’re looking forward to a bright and broad ROR future!
Thanks so much for the invitation to be interviewed; anyone with questions or implementation ideas can get in touch at info@ror.org.
Discussion
15 Thoughts on "Are You Ready to ROR? An Inside Look at this New Organization Identifier Registry"
According the ROR homepage (https://ror.org/facts/):
“ROR updates its records when GRID releases a new database update. Ultimately the two registries will diverge.”
Have ROR any plans about when and how will they diverge?
Hi Gabor, I am the project lead for ROR on behalf of the project’s steering organizations. We are working on the “how” right now—defining and scoping the technical work and policies required to manage ROR data independently—and that will help us to determine the “when.” This is a top priority and we will definitely share more details as we continue to move forward (and as mentioned above, we will be bringing on additional development resources in the new year to support this effort). Stay tuned!
Maria, thank you for your answer. It will be exciting to see how ROR will evolve. Good luck with the project, and I hope that ROR will be at least as successful as DOI!
Thanks Alice, an excellent summary post about ROR, are there plans to help highlight new adopters and use cases for ROR as they emerge, look forward to keeping up with the news and developments
Hi Adrian, speaking on behalf of the ROR team I can assure you that we will definitely be highlighting other adoption stories and use cases as they emerge! Twitter (@ResearchOrgs) and the ROR blog (https://ror.org/blog) are both good ways to stay on top of news and developments. We also have a mailing list (sign up in the footer of the ROR website, https://ror.org) and hold regular community calls to share updates and collect feedback (email info@ror.org to get involved).
Focused in scope. ROR is specifically focused on capturing and identifying the affiliations associated with research outputs. It is not meant to be a comprehensive registry of all legal entities in the world, nor is it focused on identifying departments or sub-units within an organization. ROR’s aim is to provide an open and usable registry of top-level research organizations.
However there are department and subunits in ROR (and GRID). For example, there is “University of California System” and the individual UC schools. “University of California, Davis” (https://ror.org/05rrcem69) and “University of California Davis Medical Center” (https://ror.org/05t6gpm70). Not to mention “California Digital Library” (https://ror.org/03yrm5c26), a departmental / divisional unit beneath the “University of California Office of the President”, which does not have a ROR ID.
ROR could have great effect providing organisation identifiers for sub-units which would ensure that they could be easily identified as a child to their parent. For example, the “University of California Davis Medical Center” might appear as https://ror.org/05rrcem69/11xx23 to denote its sub-unit status of “University of California, Davis” (https://ror.org/05rrcem69). This could make things easier for systems ingesting ROR identifiers as a part of organisational affiliation data — one cannot tell, for example, that Ringgold identifier 8789 (UC Davis) and Ringgold identifier 117143 (UC Davis Department of Plant Sciences) without querying Ringgold directly.
GRID already registers such “parent/child” relationships for some extent:
See for example:
UC Davis: https://grid.ac/institutes/grid.27860.3b
UC System: https://grid.ac/institutes/grid.30389.31
Although as far as I see these relationships are not presented in ROR today.
@Gabor: GRID and Ringgold both register the parent/child relationships, yes. However they are not evident from looking at the identifier. For example:
UC Davis: https://grid.ac/institutes/grid.27860.3b
UC Davis Med Centre: https://grid.ac/institutes/grid.413079.8
Compare with an envisaged parent/child relationship demonstrated in the identifier itself:
UD Davis: https://ror.org/05rrcem69
UC Davis Med Centre: https://ror.org/05rrcem69/11xx23 (and not https://ror.org/05t6gpm70 )
Ok. I understand what you mean. Although I am not sure if this could work in practice. It is not uncommon that an entity has several parents:
For example: UC Davis/NIH NeuroMab Facility: https://grid.ac/institutes/grid.482686.6
has two parents:
National Institute of Neurological Disorders and Stroke: https://grid.ac/institutes/grid.416870.c
and University of California, Davis: https://grid.ac/institutes/grid.27860.3b
And even these entities have their parents respectively. I guess it would be pretty complicated (or maybe impossible) to create an identifier-system, where all this information is shown in the identifier itself. Of course it would be possible to have several different identifiers for the same entity, but that would undermine the whole idea of the unique identifier idea.
ROR development work is in process to add parent/child and sibling relationships to records based on the GRID model. This relationship information will be part of the metadata but not part of the identifier itself. The parent/child relationship is meant to address use cases like large university systems with multiple campuses. Identifying departments and other campus sub-units (and mapping the relationships among them) is not ROR’s focus but there are some ROR users and early adopters who are leveraging the registry’s open infrastructure to pursue localized projects to connect up department-level data to ROR IDs.
Awesome to learn, much appreciated, Maria! : )
ROR will be really useful for funding entities who also do research work, such as some US Federal agencies, especially if publishers can capture that information into Crossref. Currently in Crossref, one can only find out about funders (listed in acknowledgements/funding section), not author research affiliations, so one can’t easily capture all the journal articles funded by that sort of entity.
Agreed, Rob – thanks for bringing this up! We’re definitely eager to see adoption by publishers. Crossref’s next metadata schema update will include support for ROR, so publishers can capture ROR IDs for author affiliations and send them to Crossref.
Thanks Alice for the ROR info as it is valuable new metadata. On the JATS Standing Committee in version 1.2 (live early 2019) we added new elements to support research resource information (e.g. what laboratory was used for the research): , …so more than just research funding information. Looks like using ROR IDs for will be useful; and of course it could be used for as affiliation info.
To make this really valuable there needs to be consideration of a hierarchical relationship between ROR and community adopted ORCID identifiers, so an ORCID leads back to a particular institution.