Increasingly, scientific inquiry requires larger and larger technological investments to push the bounds of discovery. Research facilities, such as satellites, particle colliders, specially-equipped airplanes,or research vessels, and supercomputer centers are all examples of large-scale investments that make new research discovery possible. Dozens or hundreds of researchers at any given time can share the use of these resources. The lifecycle of these facilities can extend into decades and cover the careers of more than one generation of researchers. Tracking these investments over this timescale to fully capture the resulting impact can be as difficult a task as getting the initial funding to build the infrastructure at the outset.
There are many reasons why one might want to track the use of scientific infrastructure. The most obvious interest in this data is among those that fund that infrastructure. Investments of millions, or even billions of dollars are made and those behind these decisions want to ensure that the money is being usefully deployed, understand how nationally funded facilities are being used globally, and be sure that the return on investment is warranted. Researchers may want to track the developments taking place worldwide. When considering new projects and investments, examining the outputs of related infrastructure worldwide could provide direction for new discoveries. Gaps in capabilities might be more easily discerned and might support additional investments. Tracking the investments over time could also lead to sustained maintenance of the facility. Despite their impact, some facilities have fallen into disrepair and even collapsed. Perhaps if their impact were more easily tracked and demonstrated, sustained maintenance resources could have extended the life of the facility.
The ability to track the output related to these large-scale investments can, hopefully, signal their initial value and ongoing worth. Importantly, these investments must often be measured over years or even decades of scientific outputs. Previously, there was no easy way to connect those investments directly back to the facilities that made the research possible. In 2017, a team led by Laurel Haak, formerly at ORCID, including CHORUS, US Department of Energy (US DOE) Labs, and publishers conducted a pilot to capture the awarding of researcher access to US DOE Lab user facilities and produced a report on opportunities to identify and track facilities in research. That project advanced efforts leading to extensions of various metadata schema to support facilities and grants, adjustments in the JATS specification, research resources in ORCID records, and other improvements to the publication infrastructure. Building upon the lessons learned by the ORCID project, a new team formed in 2022 has developed a pilot approach and workflow to capture the basic information so that it can be aggregated and analyzed.
The lead partners on the project are CSIRO and CHORUS. The Commonwealth Scientific and Industrial Research Organization (CSIRO) is Australia’s national science agency, focusing on impact-driven research in a broad range of domains. CSIRO funds numerous research facilities such as the Australian Centre for Disease Preparedness, the Australian Synchrotron, and the National Collaborative Research Infrastructure. CHORUS provides necessary metadata infrastructure and governance to minimize open access compliance burdens while increasing access to literature and data. CHORUS combines support for existing infrastructure, while promoting collaboration and innovation, and provides a forum for dialogue between publishers, funders, service providers and other stakeholders.
They have convened a working group consisting of seven major publishers, Australian research facilities, infrastructure support organizations, Crossref and ORCID, as well as NISO. The publishers participating in the project are: American Chemical Society; American Physical Society; Elsevier; Institute of Physics Publishing; Oxford University Press; Springer Nature; and Wiley. Two CSIRO research facilities are also participating in the project, the Australia Telescope National Facility (ATNF) and the Marine National Facility (MNF) that operates the research vessel, RV Investigator, shown above.
This working group began meeting in early 2023 and worked to develop plans for a pilot to address this problem. After nearly a year of discussing potential models, workflows, and resourcing, a pilot was launched this January. The pilot will run through the rest of the year and 2025. It potentially will expand to additional CSIRO research facilities.
The proposed workflow begins when a research team requests the use of a research facility. The team is instructed to use a funder identifier from the Open Funder Registry assigned to the research facility, as well as a Crossref Grant DOI identifier for their project in the acknowledgements of any research paper outputs. (Note: Crossref will be deprecating the Open Funder Registry and transitioning it to RORs over the course of this project, which will be monitored and incorporated into this project.) These identifiers will then be entered into the production process of the paper at manuscript submission. When the article is published, the publisher will include the name of the facility, its Funder ID, and the research project’s Crossref Grant DOI (if applicable) in the article metadata. These data are then sent to Crossref and other applicable indexing services. This will allow publications to be matched to the facility and the specific research project. CHORUS will monitor article output during the pilot. Although not specified directly, the use of RORs (for institutions), and ORCIDs (for people) will also support more robust tracking of the teams and organizations involved in these research activities.
The participating publishers will deploy this workflow for approximately 50 journals where CSIRO output has been published recently. Journals were identified specifically for their relevance to research teams using ATNF and MNF during the past two years. Included are journals of the American Astronomical Society, American Geophysical Union, Royal Astronomical Society, and many other learned societies, which are supported by the participating partners. Ideally, once developed, this infrastructure and workflow will support tracking research facilities around the world by all types of research funders. As the pilot progresses, regular updates of its results will be provided by the team.
A lot has been written over the past couple of years about the value of the infrastructure of persistent identifiers (PIDs). A growing pool of research has shown that their adoption can improve discovery and reduce administrative burden on researchers, save them time, and thereby reduce overall costs. Last fall, Phill Jones and Alice Meadows further synthesized this point as they described the value of the investments in PIDs in a Scholarly Kitchen post. Work to improve this infrastructure in the US and around the world is ongoing. This pilot is a concrete example of the opportunities that are possible based on a robust PID infrastructure. Ideally, when completed it can be another illustration of how PIDs and their associated metadata infrastructure can be used to improve tracking of research outputs and ultimately improve research itself.
Discussion
2 Thoughts on "Tracking Research Facilities in Science: A CSIRO/CHORUS Pilot Sets Sail"
A very good and contextual description of the need and value of PID’s and other metadata.
Looking forward to reading more about this as the pilot progresses. It’s good to see buy-in from some of the more prominent names in publishing on the importance and value of consistent and persistent PIDs.