Continuing our series of Kitchen Essentials interviews with leaders of research infrastructure organizations, today we hear from Matt Buys and Helena Cousijn, the Executive Director and Director of Community Engagement, respectively, for DataCite. Founded in 2009 as a global registration agency for research data, DataCite DOIs has now registered over 58 million DOIs (as of February 2024), for an increasingly wide range of research outputs.
Please tell us a bit about yourselves — your roles at DataCite, how you got there, and why you each embarked on a career in research infrastructure?
Matt: In 2015, I transitioned from a role in commercial information services to become part of the open infrastructure community at ORCID. During my time there I held various roles, contributing to the growth of the community to over 1,200 organizations and 7 million researchers by 2019, when I moved to DataCite. As Executive Director here, I continue to collaborate with a dedicated team in the pursuit of making research outputs and resources easily discoverable and citable. This role aligns with my commitment to advancing research connectivity and knowledge dissemination globally.
Helena: Like many people working in scholarly comms and open infrastructure, I came from a research background. I no longer wanted to focus on just one genetic variant, but still wanted to be part of the scholarly community and work for an organization contributing to advancing research. I first worked for a funder, then for a publisher, and then this role at DataCite caught my eye. I joined DataCite in 2018 as Director of Community Engagement and now lead a team of 10 people who work closely with our members and the broader community to make research outputs available globally.
What do you like most and least about working in research infrastructure?
Matt and Helena: Research infrastructure is a great community to be part of, full of passionate people who share the same goals and all want to make research more open, accessible, and inclusive. One of the challenges we see is that despite the shared goals, there are sometimes fragmented approaches which can work against each other unintentionally. We feel we could achieve more if we align and want to continue to work towards that. It is important to avoid duplicating efforts and reinventing the wheel, as by doing so we further fragment the ecosystem… we all know the anecdote about having too many standards and then trying to create just one more standard to solve all problems.
Based on your own experiences, what advice would you give someone starting, or thinking of starting, a career in research infrastructure?
Matt: For those venturing into a career in research infrastructure, my advice draws on the pivotal integration of technical and social dimensions. Cultivate a robust foundation in the technical aspects, staying current with tools and methodologies while also adapting to evolving technologies. Simultaneously, recognize the importance of engaging with the research community and refining your interpersonal skills. Actively participate in collaborative efforts, embracing interdisciplinarity and diverse domains. Finally, adopt a mindset of continuous learning to stay abreast of evolving best practices and emerging technologies. Balance technical proficiency with a nuanced understanding of social dynamics in the community so that you can make a meaningful impact in the dynamic realm of open research infrastructure.
Helena: Maybe I am saying this because I used to do research myself, but I think you need a good understanding of the research lifecycle and the stakeholders involved. If you don’t really understand how research works, what steps are important, and who is involved at different stages, I think it is difficult to find the right solutions. While for people working in this space, the availability of the concepts and solutions is a given, but a lot of researchers are not very aware of their open infrastructure options. To an extent, that is positive, because infrastructure that is working well is seamless and present in the background, but it sometimes also creates a gap between the work we’re doing and the work of researchers. Therefore, it is important to keep in mind who your solution will be benefiting and how it fits within their workflows.
What sort of infrastructure does DataCite provide, and who are your users?
Matt: DataCite is a not-for-profit membership organization that makes research more effective by connecting research outputs and resources. We support the creation and management of DOIs and metadata records, enhance research workflows with service integration, and enable the discovery and reuse of research outputs and resources. We are a global, fully remote organization with a team of 28 staff members across 12 countries, covering engagement, engineering, product and operations roles.
Helena: As our name indicates, DataCite’s original focus was on enabling citations for datasets but, with an ever-increasing array of outputs being created, we have since expanded to meet community needs and now register DOIs for a wide range of outputs, from data and preprints to images and samples. This enables all these outputs not just to be cited, but also to be more easily discovered and their impact tracked, evaluated, and recognized. DataCite DOIs and metadata are open and interoperable, meaning that both the identifier and and its associated metadata (often including other persistent identifiers like ORCIDs, RORs, etc) can be shared, accessed, and used by anyone.
How is DataCite sustained financially?
Matt and Helena: DataCite is sustained financially primarily through membership and service fees from our 1,200 organizations in over 52 countries. In addition, we explore strategic funding opportunities to support innovation initiatives, such as partnerships with research institutions, collaborations with governmental agencies, and engagement with philanthropic foundations. By going beyond traditional membership and service fee models with this additional funding, DataCite can undertake innovative projects, develop new services, and enhance existing infrastructure to better serve the research community. Two exciting examples of this in 2023 are our work on a Global Data Citation Corpus to aggregate data citation information across identifier systems, and the launch of the DataCite Global Access Program to engage with organizations in regions that are currently underrepresented in our community.
As the leaders of a research infrastructure organization, what do you think are the biggest opportunities we’ve not yet realized as a community — and what’s stopping us?
Matt and Helena: Our community has huge untapped prospects, especially in terms of harnessing the power of interoperability based on existing infrastructure. Using PIDs to create seamless connections between platforms will help unlock synergies and improve the overall efficiency of the research ecosystem, which will benefit researchers, research organizations, and the community at large. However, issues remain, most notably the lack of attention and value placed on non-traditional outputs. While research infrastructure is intended to support a wide range of outputs, incentive mechanisms continue to focus primarily on traditional publishing. To make the most of the opportunities that research infrastructure offers, we must actively work to restructure incentive structures, including fostering collaborations and partnerships that address new use cases. Examples such as the collaboration between DataCite and IGSN demonstrate the power of forging strong partnerships to overcome obstacles and stimulate innovation across communities.
Looking at your own organization, what are you most proud of — and what keeps you awake at night?
Matt: We are very happy with the community of communities approach that we are taking, because it brings different stakeholders together and enables us to continue addressing new use cases as we work with new communities. It allows domain and regional communities to come together as part of our global open scholarly infrastructure community, while maintaining the ability to focus on their specific community needs.
Helena: It is important to us that we don’t just provide technical services but that DataCite is also a community of committed organizations that all share the same overall vision and mission. We try to make it very clear that organizations don’t join DataCite to buy a service, but rather to join our community and help collectively address use cases through the development of relevant services and workflows.
Helena: One of the things I think about when I wake up at night — I wouldn’t say it necessarily keeps me awake — is how to best communicate the value of open infrastructure and the services we provide. Unfortunately we still encounter the idea that open equals free. I think that we, the people working in this space, can see the value and need for long-term sustainability very clearly, but we need to ensure that the whole community wants to make the long-term investments needed to support the infrastructure that helps to preserve, connect, and thereby advance research over time.
Matt: Mostly my toddler! But on a work-related note, what I think about a lot is how we continue to foster community (and team) collaboration across borders, with a shared passion for open scholarly infrastructure. We are all committed to the cause but there are lots of practical challenges such as time zones, fragmented technical infrastructure, etc.
What impact has/does/will AI have on DataCite’s work?
Matt: We have started exploring the application of AI within some related technologies, in particular, for detecting data citations or mentions in papers using the SciBERT model in partnership with the Chan Zuckerberg Initiative. However, we take a pragmatic approach to AI integration, with an emphasis on maintaining the quality of curated metadata. While we recognize the potential of AI, we do not foresee it replacing existing workflows; instead, our goal is to explore how AI can support our efforts to enhance the quality of services and metadata. Incorporating AI into our processes, could potentially enhance efficiency and effectiveness, while preserving the integrity of our established practices and our commitment to high-quality metadata. Any AI-related changes to our approach will need careful consideration and discussion across community stakeholders before moving ahead.
What changes do you think we’ll see in terms of the overall research infrastructure over the next five to ten years, and how will they impact the kinds of roles you’ll be hiring for at DataCite?
Matt and Helena: We foresee significant changes in the entire research infrastructure landscape during the next five to 10 years. Community leaders are increasingly aligning around more collaborative and community-driven activities. The scale of and interest in registering metadata for a wide variety of resources and outputs is growing throughout the research lifecycle.
We anticipate a continuing need for individuals who can thrive in a totally remote and really global setting when it comes to employment at DataCite. Our team already covers a wide spectrum of skill sets, and we expect this to increase further. All our positions, both now and in the future, require a strong understanding of the community and true passion for our mission.
Creativity will be essential for developer roles as we navigate developing technology and contribute to community solutions. We will need to recruit individuals who can both adapt to these dynamic challenges and also provide new insights to our team. As we look to the future of research infrastructure, we will continue to emphasize the need for team members who are not just skilled in their own particular skill set, but also genuinely devoted to DataCite’s collaborative and global mission.