In today’s Kitchen Essentials interview, we hear from Maria Gould, Director of the Research Organization Registry (ROR), which celebrates its fifth anniversary today! ROR is a global, community-led registry of open persistent identifiers (PIDs) for research organizations. It now includes IDs and metadata for more than 107,000 organizations (and counting), and is used in journal publishing systems, data repositories, funder and grant management platforms, open access workflows, and other research infrastructure components.
Please tell us a bit about yourself — your role at ROR, how you got there, and why you embarked on a career in research infrastructure?
I wouldn’t say that I “embarked” on a career in research infrastructure! I focused on humanities and social sciences in my undergraduate studies, and was about to enter a PhD program when I realized at the last minute that I actually saw myself going into librarianship. In order to cover my tuition costs while obtaining my MLIS, I ended up getting a job at PLOS, just when PLOS ONE was on the rise and the Open Access movement was all of a sudden changing the scientific publishing industry forever. Coming from a non-science background, I was intimidated, but also very excited to be part of what felt like an immense transformation in the way knowledge was being produced and shared, and I decided I wanted to grow my career at the intersection of publishing, librarianship, and research.
After many years at PLOS, I took my work to the library side, supporting scholarly communications at the UC Berkeley Library and then at the California Digital Library, where I began working on underlying scholarly communications infrastructure, focusing on persistent identifiers (PIDs). As part of that work, I assumed the role of ROR lead when it was just getting off the ground. Most recently (two weeks ago!), I’ve taken a new position as the Director of Product at DataCite, where I’ll be supporting global research infrastructure on an even larger scale (and where I’ll continue working on ROR, as DataCite is one of the initiative’s operating organizations).
Even though where I am now is very different than where I started, as I reflect on my trajectory I’m struck by how the themes that defined my upbringing resonate in my work today. Working in open research infrastructure is an opportunity to advance the values of knowledge and community and collaboration that have helped to shape who I am.
What do you like most and least about working in research infrastructure?
I like that the landscape is always changing and there is always more to learn and keep up with. I like that, when it works well, we sometimes don’t even have to know it’s there. And I like that it’s by nature a collaborative endeavor that brings people and communities together.
I suppose these are the same things that I also find challenging about working in research infrastructure (I wouldn’t say I dislike any of it): there’s a lot to learn and lot to keep up with; the work of connecting pieces behind the scenes can be quite complex and hard to coordinate; and bringing people and communities together also entails working across different perspectives and needs, not to mention the logistical hurdles of scheduling meetings across multiple time zones!
Based on your own experiences, what advice would you give someone starting, or thinking of starting, a career in research infrastructure?
Infrastructure involves technical work, but it also involves people. In my own experience, my career trajectory has been shaped by strong relationships with trusted colleagues and collaborators. So I think one piece of advice is to surround yourself with smart people who will both support and push you.
What sort of infrastructure does ROR provide, and who are your users?
ROR’s core offering is a free and open dataset of globally unique persistent identifiers and associated metadata for research organizations, and a small suite of openly licensed and openly available tools for accessing, using, and building implementations with registry data. Users rely on ROR as a trusted and normalized source of information about research institutions, and they integrate ROR into all kinds of systems to collect, standardize, and display institutional data, and connect this data to other systems.
The registry includes IDs and comprehensive metadata records (including name variants, relationships and hierarchies, and crosswalks to other PIDs) for more than 107,000 organizations around the world. We actively maintain and add records on a rolling basis, publishing new data releases approximately once a month so integrators can pick up the latest changes.
ROR helps to solve two fundamental problems in scholarly communications. Institutional names are notoriously variable (think UCLA vs. University of California at Los Angeles vs. University of California, Los Angeles, just to give one example); and the tools and systems used to track and discover publications and other outputs are inherently diffuse and siloed. ROR provides a standard (and open) identifier for institutions that can be implemented in and exchanged across various platforms, functioning as a single source of truth about an organization and linking it to everything it’s connected to. This can enable more meaningful and reliable insights about research activities – from tracking all research outputs at an institution, to monitoring the results of a funded research project, to identifying which authors are eligible for which publishing agreements, and more.
Many kinds of users see the value of having clean and interoperable metadata to help connect the dots across the research landscape and relieve the burdens of entering the same information multiple times or cleaning up messy text strings. We are currently seeing a lot of usage in publishing systems and workflows, in repositories and research information management systems, in government research systems, and in large-scale bibliographic databases and indexes. We’ve also seen growing uses of ROR among researchers in fields like bibliometrics, scientometrics, and meta-research, who are employing ROR to normalize institutional data as part of their methodological and analytical approaches.
How is ROR sustained financially?
ROR is operated and supported by three organizations: Crossref, DataCite, and the California Digital Library. This shared resourcing model allows us to leverage each organization’s strengths and expertise, avoiding the need to establish an independent organization and/or develop a revenue stream and additional overhead through paid services or memberships (there is nothing inherently wrong with those models, but we feel we can achieve ROR’s goals and provide a useful solution to the community through a different approach). It also minimizes our dependence on unpredictable and time-limited grant funds, as per our alignment with the Principles of Open Scholarly Infrastructure (POSI).
That being said, we have been able to supplement ROR’s core resourcing with additional financial contributions from community supporters (mostly libraries, institutions, governments, and service providers). In November 2022, ROR was selected for the fourth funding cycle of SCOSS, the Global Sustainability Coalition for Open Science Services, which identified ROR as an investment option for library and government groups that want to allocate funds toward essential global research infrastructure.
During ROR’s earliest years, before we finalized our current resourcing model, we were able to kickstart our work with key grant funding from the Institute for Museum and Library Services (IMLS) and the National Science Foundation (NSF), which helped enormously in accelerating our progress and getting us to where we are today.
As one of the leaders of a research infrastructure organization, what do you think are the biggest opportunities we’ve not yet realized as a community — and what’s stopping us?
One thing that can frustrate me is when it feels like we’re not leveraging the full potential of infrastructure to make all of our lives easier. We have the ingredients, but they’re not always coming together. And I think this situation can sometimes encourage a greater focus on what’s hard or on what’s not working, rather than on what we can do, and what is working, and then building on that foundation.
Looking at your own organization, what are you most proud of — and what keeps you awake at night?
Working on ROR has been an exciting and meaningful experience. Seeing it grow in just a little over five years from a pipe dream that didn’t even have a name to something that is known and used around the world has been a unique opportunity to understand how to build something from the ground up, establish strategic partnerships, and develop solutions that are easy, useful, and beneficial for broad spectrum of users. I think I’m most proud of the fact that even though ROR is actually a thing now, as opposed to just an idea, and even though we’ve grown and evolved over the years, the core principles that ROR started with haven’t changed. Just as on day one (and all the days leading up to that point), ROR is being developed with, for, and by the community, and it remains focused on solving a fundamental problem with an open solution that everyone can benefit from.
Honestly, there are many more scary and stressful things in the world today that tend to occupy the worried parts of my brain. It’s a privilege to work on something that is personally gratifying and that benefits other people’s work, to collaborate with an incredibly talented and capable team, to be able to go to work in the safety of my home and with reliable internet and electricity, and also to not have to think about work 24/7. I’m always going to be focused on making ROR even better, on supporting the ROR team, and on strengthening relationships with our community, but I see these as good and healthy problems, and not something to lose valuable sleep over 🙂
What impact has/does/will AI have on ROR’s work?
Machine learning has been around for years, long before ChatGPT launched, and data scientists who use machine learning techniques are beginning to use ROR to help train and enrich their models. We’ve recently published two case studies of machine-learning projects that depend on large, open datasets like ROR: one about OpenAlex, and the other about PaperMill Alarm by Clear Skies. OpenAlex uses machine learning techniques to build models that match variant text affiliations to ROR IDs, and the PaperMill Alarm uses machine learning techniques to help detect suspicious activity for scholarly publishers.
Overall, as scholarly publishers become more reliant on AI tools and AI-based workflows, we expect that they likely will be needing to rely on ROR to build models that enhance metadata,track trends, detect suspicious activity, and make predictions.
In our own workflows, we use AI to enrich and scale our curation process by parsing and standardizing the natural language in the various requests that come in to add and update records, so that they can be processed and reviewed more quickly by our human curators. We aren’t interested in letting AI make any decisions, but we do see it as useful in making some of our processes more efficient. We have also been working with the Crossref Labs team to develop and prototype strategies for improving existing tools for matching affiliation strings to ROR, which many of our users are interested in.
What changes do you think we’ll see in terms of the overall research infrastructure over the next five to ten years, and how will they impact the kinds of roles you’ll be hiring for at ROR?
I’m hesitant to predict the future, so I’ll respond to this question by focusing on what I hope to see in the coming years.
One thing I would love to happen is for conversations about ROR and other infrastructures to become unnecessary (or less necessary)! If we can get to a place where infrastructure decisions default to core principles around openness and interoperability and transparent governance and PIDs, then we can spend more time focusing on how to leverage shared infrastructure to solve problems at scale and achieve wide-ranging benefits.
Another thing I hope to see is more attention on the kinds of knowledge and expertise that is needed to implement, use, and evaluate infrastructure services. This means more training of librarians and institutional stakeholders to work with scholarly APIs and adopt best practices, more staffing at publishers and service providers to be able to work with openly available tools and resources, a greater focus on scholarly infrastructure into graduate and undergraduate programs, and more training in data science and machine learning overall. As part of this, I would hope to see continued attention to equity and collaboration, not just through training but also identifying opportunities to learn from, build on, and reinforce existing initiatives in regions and communities that often get less attention in certain circles.
Finally, my dream is better metadata quality and interoperability across the board, by default. In five to ten years (ideally, sooner!), I hope it will be seen as abnormal for any system to continue to allow inputs of messy and inconsistent free-text strings and to limit or block how metadata can be shared, exchanged, and connected. High-quality and interoperable metadata makes all of our work easier, and there should be no excuse for bad practices when we have the tools and services available to avoid them!
In terms of what this all means for ROR, it means we will continue to support ROR’s core infrastructure as usage increases and as more systems depend on it. It means we will continue to provide training and support and high-quality documentation for those who are using the registry and building implementations around it. It also means that we will continue to focus on making sure registry data is comprehensive, up to date, and high quality. All of these areas require resourcing as well as engagement with the broader community. We will always need staffing to support and scale ROR’s technical infrastructure, to work with community adopters and supporters, and to manage our careful and active curation process. But as a small community-based initiative by design, our goal is specifically not to evolve ROR into a large independent organization, but rather to leverage strategic partnerships and shared resourcing to grow and maintain a free, open, and easy-to-use solution for everyone.