A Successful Start to a New Festival of Identifiers: PIDfest 2024

From June 11-13, 2024, 175 persistent identifier (PID) enthusiasts from around the world gathered in Prague for the first PIDfest, with a further 100 joining virtually. Billed as “a three-day international summit of talks, activities and workshops focusing on how persistent identifiers can deliver world-class research infrastructure,” it was also something of a successor to PIDapalooza, the self-described Open Festival of Persistent Identifiers, which ran more or less annually from 2016-2021. While the two events shared much in common, they were managed differently. PIDapalooza was organized and hosted by several of the major PID/infrastructure organizations (Crossref, DataCite, ORCID, ROR, and latterly NISO), while PIDfest, which grew out of the increasing number of national PID strategies being developed, was organized and hosted by the Australian Research Data Commons (ARDC) and the Czech National Library of Technology (NTK). Perhaps as a result, while there were many familiar faces in attendance at PIDfest, there were also lots of new people representing organizations that hadn’t previously participated.

The conference took place in NTK’s beautiful — and environmentally-friendly — building, which many of us learned more about during a tour by Director, Martin Svoboda. The opening ceremony, which featured a didgeridoo player and Czech musicians, reflected the close collaboration (and a lot of hard work!) on the part of ARDC and NTK. Likewise the two keynote panels reflected the expansion of the PID community. The first focused on Why PIDs Matter, with talks from PID users Simon Porter ( Digital Science, UK), Joy Owango (Training Centre in Communication, Kenya), Zefan Zheng (PhD student, Germany), Caroline Finch (Edith Cowan University, Australia), and Rinke Hoekstra (Elsevier, The Netherlands). The second, on Delivering Value to the Research Sector, brought in the voice of the PID providers, featuring Jonathan Clark (DOI Foundation, Croatia), Helena Cousijn (DataCite, The Netherlands), Maria Gould (ROR, USA), Shawn Ross (RAiD, Australia), and Chris Shillum (ORCID, USA).

A wide variety of topics were explored during PIDfest, both during the main conference (including in two sets of lively lightning talks and a poster session) and in a series of unconference discussions on the last day. And the amazing wrap-up PIDfest quiz by Suze Kundu of Digital Science brought it all together in a truly interactive and innovative way — what other conference have you been to that ends with everyone in the audience shouting out “Bridgerton!”? There’s too much to cover in a single blog post, but slides from the sessions are being made openly available in the NTK repository (work on this is still in progress but, where available, we’ve included links in this post). The three of us have therefore pulled out a few themes that we found interesting: the ongoing development of national PID strategies; standardization versus customization of PID approaches in different communities; making access to PIDs more equitable; emerging PID issues; and, of course, the perennial issue of why PIDs matter and how we can better communicate their value.

A significant part of the second day of PIDfest was focused on the growing momentum behind national PID strategies. Nearly four hours were dedicated to the topic, with one session focused on countries in the Asia-Pacific region and a second on efforts in Europe and in the Americas. The regional differences in how the research enterprise is managed and funded came through significantly in the different presentations. Those with more central and coordinated funding have made more progress, while those with more diverse or distributed research systems have been slower to advance PID strategies. The Irish, the Germans, and the Finnish have made significant strides in Europe. Similarly, the Australians, the Japanese, and the Indians have developed robust national PID strategies to foster adoption, consistent use, and even support for infrastructure deployment to support these strategies in their national contexts. A proposed project to advance a US National PID strategy was also presented. Because of the heterogeneous nature of the US research ecosystem, a less prescriptive approach than others will likely be needed than the others described.

One question that has increasingly come up in the PID community in recent years, as national and regional PID strategies emerge and mature, is whether or not there is a need for national level strategies. Research is often focused on global challenges like disease and climate, and conducted in an increasingly internationally collaborative environment. While this is true, the development of national strategies and regional efforts is beginning to draw attention to regional needs and requirements. At PIDfest, perhaps the most clear example of the continuing heterogeneity of the research landscape was the presentation by Joy Owango and Nabil Ksibi, both of Africa PID Alliance, on the Digital Object Container Identifier (DOCID). This innovative identifier is being supported by the Kenyan Ministry of Tourism and Wildlife and State Department of Cultural Heritage, with interest from the granting council in Kenya, the government of South Africa and a number of pan-African initiatives. DOCID is designed to be an identifier for African research, with data stored in Africa. What sets it apart from other container identifiers is that it’s aimed at a problem specific to the global and economic South, that is, knowledge appropriation without attribution or compensation. DOCID aims to link research to its upstream information or data sources, which are in many cases Indigenous knowledge and cultural heritage sources. Currently, tracking such research back to its origins is challenging, resulting in credit for knowledge going to those who ‘discover’ it, rather than those who originated it. It also links research to patents, which again are often hard to find, with patents from the global south being underrepresented in traditional patent databases. The ambition goes beyond Africa: as Joy said in her talk, the hope is that DOCID will be adopted or used as a template in other regions where there is a need to protect Indigenous knowledge.

If you follow the PID landscape, you might wonder how DOCID relates to the other exciting development in container identifiers: RAiD. For those who haven’t been following events in this space, RAiD is the Research Activity Identifier that has been developed by ARDC, which acts as the global Registration Authority. Over the past year and a half since the ISO standard was published, RAiD has grown significantly. It has been integrated into the European Open Science Cloud through FAIRCORE4EOSC, and discussions are underway with several other potential registration agencies, some of which have further ideas for integrations, including potentially the Africa PID Alliance’s DOCID.

During the RAiD session, in which Natasha Simons and Shawn Ross of ARDC spoke alongside Helena Cousijn (DataCite), there were interesting discussions about how RAiD will connect into the existing research infrastructure. One very exciting opportunity to reduce the research bureaucracy burden (which has been recognized as a huge problem internationally) is the integration of RAiD into current research information systems (CRIS), research information management systems (RIS/RIMS), and repositories. In fact, RAiD is already integrated into a system called Research Data Box (ReDBox) that is used at a number of universities in Australia.

Another topic of discussion was the partnership between RAiD and DataCite. Moving forward, DataCite will use their global resolution infrastructure to underpin RAiD functionality, while also integrating RAiD’s metadata into DataCite’s implementation of the PID graph, which can be accessed on the web through DataCite Commons. This raised questions about whether RAiD is really just a DOI. However, it isn’t; it has different functionality to a DOI and is designed to meet different constraints and requirements. It has to be different in part because of the dynamic nature of projects and research activities. It simply uses DataCite’s infrastructure to manage resolution.

Several speakers and sessions directly or indirectly flagged the current lack of equitable access to PIDs in some communities, and this was also the topic of various informal conversations we all had. DOCID — which, as noted, is a Global South developed, owned, and managed PID — is, in part, an effort to redress the balance. The affordability of PIDs is a major challenge for many organizations, especially but not only in the Global South. DataCite’s Global Access Fund and ORCID’s Global Participation Fund are two examples of formal initiatives to improve equity include. They were the subject of an inspiring session featuring Gabriela Mejias (DataCite) and Lombe Tembo (ORCID), who were joined virtually from Nigeria by Owen Iyoha (Eko-Konnect Research and Education Initiative). The examples they shared (“Addressing Equity and Access to PID Infrastructure” and “ORCID’s Multi-faceted Approach to Increasing Global Adoption“) were a clear indicator of how much enthusiasm there is for PIDs in the Global South — and how a small investment in those communities can bring enormous benefits, not to mention innovation. Other conversations about currently underserved communities and entities focused on PIDS for galleries, libraries, archives, and museums (GLAM), for practice researchers and non-traditional/creative arts outputs, for source code artifacts, for instruments, for “wild” content (grey literature, reports, etc), for organisms, and more.

In a broader context, there was a lot of expression among various participants that they were facing similar challenges. Yet there is a tension between the forces that seek to find standardized solutions and the interest in addressing unique needs through customization. Flexibility here is important and it can take many forms. From variances in metadata models to which of several identifiers to use for the same type of object, there are many ways to identify a thing. For example, one of the features of the DOI system, and its underlying Handle infrastructure is its flexibility in solving myriad issues for diverse stakeholders. The DOI system is the underlying technology used in communities as diverse as scholarly books and articles (Crossref), datasets (DataCite), to movies and audiovisual content (EIDR), and building materials (BSI.identify). The needs of different national contexts for researcher or institutional identification can vary in meaningful ways. Institutions have different reporting, accreditation, and assessment demands that necessitate tracking research in slightly different ways. Funding bodies each have their own expectations for research award identification, be that a monetary award or an allowance of infrastructure use time. Meetings like PIDfest are critical opportunities for the worldwide community to gather and share their perspectives and use cases.

While there are persistent identifiers for much of the content shared in scholarly communications, by no means is every aspect of the research ecosystem uniquely identifiable and well described. Part of the PIDfest conference explored these outer reaches of PID infrastructure and new directions for this community.

Research infrastructure is one example of an emerging need for identification. Many research projects are conducted on large scale research infrastructure, such as the Large Hadron Collider, supercomputer facilities, research ships, specially equipped airplanes, or satellites. Each of these represent massive investments in the physical infrastructure of scientific research. At present there is no consistent way to acknowledge or track their uses in the scholarly enterprise, particularly over the entirety of their functional lifespan. Persistent identifiers for research facilities could help in this, and several pilot projects are underway to explore assigning PIDs to these facilities. This work overlaps with a slightly broader conversation about PIDs for prizes and awards that highlighted related use cases for PIDS in a different session.

Similarly, the software used in scientific research has become another critical part of many research endeavors. At present, it too lacks widespread adoption of PIDs to identify it and connect it to the rest of the scholarly ecosystem. The Software Heritage, a project begun in the mid 1990s with a mission of collecting, preserving, and sharing all publicly-available software in its source code form, is working to address this. The approach Software Heritage is taking with its SoftWare Hash persistent IDentifiers (SWHIDs) is focused on using a hashing algorithm to uniquely identify software code, snippets, or repositories of code. The SWHID is currently in production and has been used to identify billions of software outputs as described during a session at the conference.

This intrinsic approach to identification is contrasted with extrinsic identifiers, which are most commonly used in the scholarly ecosystem. ISBN, DOI, RAiDs, and ISSN are all extrinsic identifiers, meaning that they have no inherent connection (binding) to the referent object itself apart from the metadata and assertions about the relationship with the item. An intrinsic identifier is often derived from an algorithmic process of the digital content itself, to create a string that is a representation of the original object. In an intrinsic identifier system, any change to the original object would result in a new and different algorithmic output. Recently, several other intrinsic identifier systems have been created, such as the International Standard Content Code (ISCC).

Sometimes, these intrinsic identifier systems use immutable public ledgers (blockchains) to store the metadata associated with the identifiers, so as to further diminish the need for a centralized registration and data management system. However, it is important to recognize that the assignment of an intrinsic identifier and the use of a blockchain system to manage the metadata are entirely separate issues. One shouldn’t dismiss an intrinsic identifier as simply the fad of the 2020s come back in a new form, or as a newly clothed blockchain system rebranding. There are interesting and novel approaches to intrinsic identifiers, which the community is beginning to explore. There was a lively unconference on the topic on the last day of PIDfest, which just started to tease apart some of these issues.

Whenever the PID community gets together, one of the topics we always come back to is how to tell people why PIDS matter. PIDfest was no exception, with the two keynote panels leading the way. The first, which focused on the user perspective, included Zefan Zheng’s explanation of why FAIR data (and, therefore, PIDs) is good for science and scientists; Simon Porter’s dream of a PID-enabled world that would make rankings independent of data sources; and Caroline Finch’s recognition of the value of PIDs for supporting research excellence, research translations, research management, and research growth. The PID providers, in turn, shared how they deliver that value: for example, Shawn Ross noted that implementing RAiD could save 207,000 person hours a year globally, increasing to 2.7M in combination with other PIDs; Helena Cousijn demonstrated the benefits of a PID-connected ecosystem; and Chris Shillum stressed the importance of continued action across the whole ecosystem to achieve these benefits. Other sessions looked at the value of PIDs in terms of a return on investment, and the importance of storytelling in helping people understand their value. Plus, courtesy of Eric Olsen (Center for Open Science), we had the chance to become data detectives, in a wonderful gamified real-life example of how PIDs can be used to track down the information you need.

Feedback on PIDfest has been overwhelmingly positive, both anecdotally and in terms of the feedback survey — of 56 people who had responded as of July 5, 50 rated the conference Excellent or Very Good. If you were there (in person or virtually), please do share your own takeaways from the conference in the comments. There were also lots of helpful suggestions for the next PIDFest, which will likely take place in 2026 — if your organization is involved with any aspect of national PID strategies and potentially interested in hosting, please email contact@pidfest.org. In the meantime, we’re already looking forward to another opportunity for the PID community to get together at next year’s International Data Week in Brisbane, Australia, hosted by none other than ARDC!

Disclaimer: Our two organizations were involved in PIDfest; Mary Beth Barilla (NISO) was Chair of the Marketing and Communications Committee: Josh Brown and Alice Meadows (MoreBrains Cooperative) were Chairs of the Unconference and Programme Committees respectively.

Alice Meadows

I am a Co-Founder of the MoreBrains Cooperative, a scholarly communications consultancy with a focus on open research and research infrastructure. I have many years experience of both scholarly publishing (including at Blackwell Publishing and Wiley) and research infrastructure (at ORCID and, most recently, NISO, where I was Director of Community Engagement). I’m actively involved in the information community, and served as SSP President in 2021-22. I was honored to receive the SSP Distinguished Service Award in 2018, the ALPSP Award for Contribution to Scholarly Publishing in 2016, and the ISMTE Recognition Award in 2013. I’m passionate about improving trust in scholarly communications, and about addressing inequities in our community (and beyond!). Note: The opinions expressed here are my own

Phill Jones

Phill Jones is a co-founder of MoreBrains Consulting Cooperative. MoreBrains works in open science, research infrastructure and publishing. As part of the MoreBrains team, Phill supports a diverse range of clients from publishers and learned societies to institutions and funders, on a broad range of strategic and operational challenges. He's worked in a variety of senior and governance roles in editorial, outreach, scientometrics, product and technology at such places as JoVE, Digital Science, and Emerald. In a former life, he was a cross-disciplinary research scientist at the UK Atomic Energy Authority and Harvard Medical School.

Todd A Carpenter

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.

The Scholarly Kitchen

A Successful Start to a New Festival of Identifiers: PIDfest 2024

Alice Meadows

Phill Jones

Todd A Carpenter

New “Pulse Check” Program to Capture Timely Insights from the Scholarly Communications Community

Applications for our Fellowship program close December 5!

Community Voices Celebrate the DOI, Inaugural Rosenblum Award Winner, in New Video

Alice Meadows

Phill Jones

Todd A Carpenter

Related Articles:

Next Article: