The Research Data Alliance (RDA) is a community-driven, non-profit initiative that was originally set up in 2013 by the European Commission, the US NSF and NISO, and the Australian Department of Innovation. Right from the very start, RDA was an entirely independent, grassroots effort to build the social and technical infrastructure that supports open sharing and re-use of data. The most recent RDA plenary was an entirely virtual event spread across two weeks at the beginning of November, which I attended via Whova, Zoom, and a little bit of Gather, from the comfort and COVID-safety of my home office.
The RDA Plenary is very different from other conferences I have attended. The grassroots community focus visibly underpins everything about both RDA and its twice yearly Plenaries. Each session is organized by an RDA group, including working groups, interest groups, communities of practice, and birds of a feather groups (convened for a single RDA plenary to gauge interest in a new topic). It can all seem a bit complicated from the outside, so there are web pages of instructions and explainer videos on the RDA website.
The sheer scale of the research data ecosystem
Developing the research data ecosystem from the ground up is a substantial challenge, with action required at many levels — from individual researchers and communities of practice to funder policies. At a subject level, RDA has domain- or subject-specific groups, which are often supported by learned societies, like the Earth, Space, and Environmental Sciences Interest Group (IG), which has chairs from both the European Geosciences and American Geophysical Unions. Some of these groups have very specific focuses and could even be considered to be quite granular.
At a higher level, there are groups that discuss common standards for data management plans, a metadata standards catalogue, and even how to engage researchers with good research management practices. Just about every conceivable angle is covered, at least to some extent.
With so much to participate in, it’s only possible to write about a small part of the action, so I’ve picked a couple of things that I was particularly interested in.
All about persistence
The first session I attended was bright and early (at least in my time zone) on the first day of the Plenary. I was invited to attend the PID Interest group (PID IG). The discussion was very wide-ranging, with a series of prepared remarks from invited attendees and a vibrant open floor discussion in both audio and text chat feed.
One running theme was, as Tom Demeranville of ORCID put it, ‘persistence is a social problem rather than a technical one’, which might sound counterintuitive at first. When we look at how PIDs are defined by organizations like OpenAIRE and even ORCID, they are generally described as pointers that will always link to a particular digital resource, even if the URL changes. PIDs also have associated metadata that describes objects and enables linking to other PIDs to create an emergent knowledge graph. While that’s all true, it’s not a full definition of persistence, because somebody has to maintain all of those links and metadata. Otherwise, a PID system will suffer the same link-rot problems as every other web resource.
Building on the idea of PIDs as an organizational challenge, Natasha Simons of the Australian Research Data Commons (ARDC), talked about the need for investment to create and maintain PIDs. While the community is good at designing and creating the technical and organizational structures needed to create PIDs, she argued, more investment in communications and marketing is needed to make the value of PIDs visible. Only then, with the help of good governance and a sustainability model, can some PIDs transition from small grant-supported projects to sustainable organizations.
In my own short presentation, I focused on the need to continue to improve adoption. A major challenge in driving adoption is misaligned incentives. Simply put, metadata that is entered into a system that integrates into a PID — whether that be an ORCID, DOI from Crossref or DataCite, ROR, or RAiD — only has value if other stakeholders are also adding their own metadata and systems integrations. In other words, adding metadata to the PID graph generally helps other stakeholders more than the person who entered it. For example, funders need researchers and institutions to report on the outputs that are funded by their grants, while publishers need to know what new research priorities are getting supported so that they know what content to acquire or products and services to develop.
The burning question that I put to the PID IG was, how do we align those incentives and make the value more obvious? Is the answer funder mandates or improved incentives? Participation reports? Central support for integrations from funders? Better targeting of products and services? Better communications and marketing? Or some combination of all of the above?
Metadata interoperability
Another theme that ran through my plenary experience was metadata interoperability. As Alice Meadows and I wrote in a previous post, metadata is important because:
Metadata enables connections to be made between published articles, researchers, datasets, computer programs, institutions, grants, funders, and more, eventually including things like shared facilities
Several sessions at RDA touched on this issue, including the Research Data Architectures in Research Institutions IG. James Wilson of University College, London described how they have been developing a research data ecosystem out of a collection of technologies including their ePRINTs repository, Researchfish for grant reporting, and their CRiS system Symplectic Elements. Similarly, Kimi Keith of the University of Cape Town described their efforts to build a research data management ecosystem focused around their electronic Research Administration system, integrated into their CRIS system (Clarivate Converis) and figshare. The idea is to reduce researcher burden while better meeting various research management use cases, like ensuring compliance with data management commitments.
Despite these two, very different research environments, both speakers agreed that lack of interoperability between systems that contain metadata was a serious impediment to automating research management.
Conclusions
There were many other highlights of this year’s RDA plenary, certainly too many to go into here. Overall, from the discussions of the most niche aspects of metadata or data schema for a particular discipline to the highest level discussion of strategic research data management, the need for better accepted standards, best practices, workflows, and system interoperability is now clearer than ever.
If I have a concern at all, it’s that the good people at RDA can’t do all of this alone. Publishers, learned societies, institutions, libraries, and funders all have significant parts to play in building a better, more efficient, and more connected research infrastructure. While there are representatives of all of these stakeholders in RDA, more organizational and systemic work is needed. From the immediate benefits that publishers can glean from improved metadata about research and smoother processes, to systemic benefits for the research infrastructure that will accelerate science and save lives, it’s in all our interests to be a part of this transformation.