The first PIDapalooza took place in Reykjavik, Iceland, just over four years ago, rather inauspiciously starting the day after the 2016 US elections. The primal scream that the 120 or so attendees shared that morning both helped us let some of our feelings out and also set the tone for that conference — or festival, as it’s always been billed — and the ones to come. PIDapalooza is, at heart, a party for people who care about persistent identifiers (PIDs) — an attempt to share PID information, ideas, and updates, while also having fun! As someone who’s been involved in organizing it since the beginning, I know I’m somewhat biased, but I think it’s fair to say that we’ve mostly succeeded. At subsequent PIDapaloozas in Girona, Spain (2018), Dublin, Ireland (2019), and Lisbon, Portugal (2020), we attracted more and more PID people, got some pretty rave reviews, learned a lot, and enjoyed ourselves in the process.
PIDapalooza 2021 — our fifth — was always going to be special. But, when we first started thinking about it around this time last year, little did we know that it would also have to be fully virtual. COVID-19 was barely on anyone’s radar at the time, and we certainly didn’t expect that it would still be impacting the conference world in 2021. But, as you know, when life hands you lemons, the only thing to do is to make lemonade. And, as you also know, there are some huge upsides to hosting a virtual event, especially in terms of being inclusive. As you may have spotted, all the in-person PIDapaloozas to date have been held in Europe, making it challenging for people from other parts of the world to attend. Not so this year — we had participants from 38 countries all over the globe. There are also significantly more costs involved in hosting an in-person event. So far, PIDapalooza has been proudly sponsor-free, funded entirely by registration fees (which are kept very low), supplemented by contributions from the host organizations (California Digital Library, Crossref, DataCite, ORCID and, as of this year, my own organization, NISO). As not-for-profits, there’s a limit to how much we have to spend so, for an in-person event, it’s unlikely that we’d be able to afford much more than the 200 or so attendees we had in Lisbon last year. This year, our costs were so low that we were able to make PIDapalooza entirely free to attend — another great way to be inclusive.
But, for all the upsides, we still had to solve the challenge of recreating the PIDapalooza magic online — making it both valuable AND fun for attendees. Call us crazy, but our solution was to make this year’s PIDapalooza a 24-hour virtual PID party, starting at 9.30am EST on January 27 and ending at 10.30am on January 28 (yes, it was actually 25 hours in the end, but who’s counting!?). We had an amazing (crazy, terrifying!) program, with 92 back-to-back half-hour sessions and over 130 speakers (some of whom participated in more than one session), with one track, two tracks, even three tracks at times — to enable sessions in Portuguese, Spanish, Chinese, Japanese, German, and French, as well as English.
It could have all gone horribly wrong in so many ways … but it (mostly) didn’t! And that was, in large part, because of the enthusiasm of all participants — attendees, speakers, moderators, and organizers alike. Well over 1,100 people registered for PIDapalooza 2021, and an impressive 900+ actually showed up for at least some of the sessions live, with 200+ watching one or more of the instant replay videos during the conference itself. We tried to make it easy for everyone to feel welcome, via both our hosting platform, Crowdcast, and the PIDapalooza Slack workspace.
But, enough of how I thought it all went — what did others think? Luckily, a few other chefs were also there for some (or, in Todd’s case, all!) 24 hours, and have kindly agreed to share their thoughts. I especially appreciate Lisa’s feedback, as a first-timer, about how we need to ensure that the traditions we’ve created over the years in person don’t inadvertently feel exclusive or cliquey — we absolutely want and need everyone to feel welcome in our community.
So, with thanks to them all, here are views from Lisa, Todd, Judy, and Phill. And, if you enjoy this post, I’m happy to say that you can also “experience” PIDapalooza 2021 for yourself on our YouTube channel!
Lisa Janicke Hinchliffe
My motivation for attending PIDapalooza stemmed from my role as a newly seated member of the ORCID Board. Though the Board is not operationally responsible for ORCID identifiers, I felt I would be better able to contribute to the Board by developing a greater understanding of the broader landscape of persistent identifiers and the conversations taking place in this community. The conference absolutely delivered on meeting my goal. Though, I will admit, I did not take it all in on a live 24-hour schedule!
There were too many sessions to re-cap so I’ll just focus on one talk that I found particularly insightful, and that was the opening plenary by Adriana Romero Olivares. She shared her own experiences with identity management and identifiers — both joyful and painful. (With respect to the latter, her name is not hyphenated in actuality but she’s had to bend to the realities of so many online systems that cannot accommodate a double surname without a hyphen). She also shared information that she had gathered up from other researchers through social media and various conversations. I found this particularly valuable as I believe that hearing directly from researchers is critical for developing user-centric services. Hearing these researchers’ perceptions of, and questions about, persistent identifiers illustrated how opaque the purposes of certain identifiers are to scholars. It also served as a reminder that perceptions may or may not reflect what long-timers in the field know. I’m thinking here particularly of the question that was asked re: why did Publons create ResearcherID when we already have ORCID? A perfectly reasonable question when an individual has first encountered ORCID and then ResearcherID; there is no reason that a scholar would know that ResearchIDs (2008) existed before ORCIDs (2012). Our experiences create our realities!
I look forward to continuing to learn through engagement with the PIDapalooza community and hope to see it grow. As such, I’m also going to offer a bit of friendly critical feedback. To be honest, some of the traditions that seem most valued by those who have previously attended PIDapalooza were a bit puzzling and at times created a sense of there being an insiders clique of those who are part of the community vs. those who are not. I doubt this is intentional but nonetheless, as the community grows, I’d encourage leaders to consider how to create pathways of inclusiveness via these traditions.
PIDapalooza has styled itself after a music festival – the namesake Lollapalooza that was founded in the 1990s – so it made sense that trying something different in 2021 might lead the group in creative directions. What could be better than building on the model of a 24-hour rave? The notion isn’t quite as crazy as it might seem, because one of the problems of conferences is that they are traditionally situated in a particular time zone, often wherever the organizers are based. Yet in our work-from-home virtual world, where no one is really traveling anywhere, the constraints of time zones are artificial. Particularly since the organizers of PIDapalooza are spread out across at least nine different time zones, it never made sense to link the event to a single time zone. Furthermore, the community of identifier adopters, identifier providers, and identifier users is a global one. Forcing the world to engage at a time that is convenient for a single time zone is inherently exclusionary. Breaking the time-zone barrier was the first big strategy win of the 2021 PIDapalooza team. Hopefully, others will follow suit.
Some of the key takeaways for me are that:
The role of identifiers is critical and growing. There are PIDS for people: ORCID, ISNI; PIDS for organizations: ROR, GRIDs; and outputs: RGISN, DATA CITE DOIs, DOIs for articles, books, datasets; ISBNs for books, ISSNs for journals…and on and on and on. As the scholarly world expands, the problems of persistent and globally-unique identification will only grow. One example noted by Ed Pentz during his talk, is that in China roughly 700 million people share one of the top 20 last names – more than double the population of the entire US. The problem of disambiguating these content creators is significant, even if they are only a small fraction of that population. During PIDapalooza there were talks for everyone involved in scholarly, even non-scholarly, communication. There were discussions of PIDs for data, for research projects, for preprint articles, for software, for physical samples, and for the film The Highlander. Even people who aren’t aware of what persistent identifiers are use them and advocate for them; in her opening keynote, Adriana Romero Olivares quoted a high-profile professor as stating, “We have no need for PIDs, because we use DOIs.” Certainly, the PID community has some educating to do.
There is so much more infrastructure work to do. While there are already identifiers for many things scholars use, and their use is growing, the work is by no means complete. Important issues, even some fundamental issues, remain. During my own session, I spoke about a project at ISO focused on defining core principles of identification (while also producing an origami Transformer!). There are nascent identifiers for the production of data, for software, for grants, and for data management plans. These projects need to push harder for greater adoption and more widespread use. As with every standard, the challenge is normally not in creating the identifier, but in getting sufficient numbers of people to use them in their daily work. Even ORCID, while incredibly successful, is not yet used ubiquitously, nor universally required by publishers or grant-funders (although adoption is growing).
Once all of our world is properly identified, we still need to link these networks together to derive real meaning. Some of this work is showing promise in practical science as Katherine Kaiser, Adriana Romero Olivares, and Melissa Haendel all highlighted in their keynote talks. Publishers, funders, and researchers are beginning to see the value of PIDs and are supporting this through the application of identifiers in their workflows. This rollout will take time, but it is becoming apparent that the utilization of PIDs will allow for easier transactions, faster discovery, and, potentially, new and novel outcomes.
The PID Community is big and growing. Having been deeply involved in the identifier community for more than 15 years now because of my role in standards development, I would never have expected the number of people who care enough about PIDs to attend a conference about them to be more than 1,100. When PIDapalooza first got off the ground, I thought the meeting might draw a couple of dozen familiar faces who I regularly see around the standards-setting table. Of course, we all knew that people were aware of identifiers and what they did. I often used the ISBN to describe what I do, since everyone has seen the barcode on the back of a book and might have used it to search for a book online or in a library. What I did not expect, and have been pleasantly surprised by, was how much the size of the community and its enthusiasm for identifiers has grown in recent years. In part, I see this as being related to people’s increased use of digital technology to navigate, organize, and share information. This is impacting our expectations for what information exists in that digital environment and how we expect to interact with it. Some on the forefront of this move are recognizing that persistent identifiers are powerful tools in this digital ecosystem. As more realize this power, expectations and demands will grow on and among the PID community.
Finally, I would like to extend on a note of gratitude to the colleagues and friends who worked together to put PIDapalooza together. Like so many meetings in our community, it is the effort of a dedicated and passionate group of people who work tirelessly behind the scenes. These volunteers usually receive little glory or recognition for their work, but they make events like PIDapalooza (and SSP’s Annual Conference too!) possible. The program committee, the moderators, and the speakers all give their best and it showed. Thank you!
As a first-time attendee who has wanted to attend since PIDapalooza began, I was excited to participate in the first virtual meeting. In reviewing the program to contribute to this post I found myself listening to more of the sessions than I was able to attend the day of the meeting.
The audience attracted to PIDapalooza includes a significant number of scholars who are working in a variety of roles related to research within government institutes, university centers, and industry-wide organizations. There were also many librarians in metadata and scholarly communications as well as funders, publishers, tech organizations, and startups. Speakers addressed projects across disciplines from hard sciences to the humanities.
The virtual meeting was a good fit with the more casual culture, and seeing speakers in different time zones brought home the global nature of the gathering. This was also evident in the program, which listed simultaneous sessions in various languages such as Spanish, Portuguese, German, French, Chinese, and Japanese, depending on when during the 24 hours was daytime in their part of the world.
The combined use of Sched (for programs and attendees), Crowdcast (for content and chat), and Slack (for conversations) worked really well. The organizers deserve credit for the combination as it supported connections and provided immediate access to content and comments that I am still reviewing.
Day 1 began in the US with opening sessions followed by a discussion about how arXiv, the first preprint server, is planning a staged implementation of DOIs retrospectively for their entire collection of 1.8M articles. The next session moved to Australia and the development of RAID which serves as an envelope for elements of a project such as Data Management Plans, research publications, datasets, researchers, institutions, grants, etc. Then we shifted to Finland and the development of the research.fi portal, which addresses different questions by linking various components of research such as datasets, publications, funding awards with the researcher as the central node in the graph.
Multiple sessions throughout the day brought home the scope and scale of identifiers in the landscape and how rapidly they are expanding. Topics ranged from core elements of PIDs considering the challenges of ‘persistence’ to the PID Graph which enables linking of PIDs to create a networked landscape of research projects, their funders, the researchers, their datasets and publications, and organizations. The PID Graph will pay dividends by enabling analysis of the landscape in support of continued enhancement of the research process on a global scale.
I count myself as one of the lucky few to have attended that first PIDapalooza in Iceland about four years ago. It was certainly a strange experience to wake up in a hotel overlooking a dark, volcanic landscape only to find out the TV was strangely broken and would only play a second or two of audio of the new US president’s acceptance speech before turning itself off. Had I been transported to a parallel universe? Was this purgatory, perhaps? No, it was just winter and we were very far north, living in strange times.
This year’s event was a very different experience, both to its former self and to other events– a virtual conference, set over 24-ish hours and not tied to a particular place or timezone. Like the primal scream that Alice mentioned, PIDapalooza’s response to these particular strange times was unconventional, unapologetic, and it worked.
There were a couple of themes that I found interesting this year. For me, the most notable thing is how the conversation has evolved over time. Four years ago, much of the conversation was about making the case for uniquely identifying entities.
The case for unique identifiers, however, goes far beyond just putting a label on everything. As Ed Pentz, Executive Director of Crossref explained in his talk, the point of PIDs is that they form a critical part of an emerging open infrastructure that will link people, places, and things through their associated metadata. This infrastructure will, in turn, create efficiencies in information transfer, automate processes, and streamline reporting.
Cross-linking PIDs has enabled researchers to define and capture the underlying knowledge graph of research. One advanced effort to make the knowledge graph open and accessible is the so-called PID graph, which was an output of Project FREYA. There were several sessions on the PID graph, including Amir Aryani’s talk on creating collaboration networks around COVID-19, which is an excellent example of the potential positive impact of open infrastructure. There was also a session from Matt Buys and Sarala Wimalaratne of DataCite, where they discussed and asked for feedback on a variety of use cases for DataCite Commons, which is a web search interface for the PID graph. Uses included creating researcher profiles and mapping the connections to a research article.
Also on the theme of the PID graph, Maria Praetzellis of the University of California Curation Center (UC3), along with a number of collaborators, gave an interesting talk about their data management planning (DMP) tool. Based on work from an NSF EAGER grant, UC3 have been working with DataCite to connect DMPs to the PID graph using a common standard developed by RDA. In a related area, Natasha Simons of the Australian Research Data Commons (ARDC) teamed up with my colleague Fiona Murphy to talk about the Research Activity Identifier (RAiD), which is a compound PID for research projects that could have wide-ranging applications — helping institutions understand the impact of their collaborations, justifications for shared instrument grants, or providing identifiers for exhibitions or other practice-based research collections.
This progression of the conversation goes hand-in-hand with the growth of the community and the emergence of accessible real-world use cases. In particular, there is a lot of excitement about the role of funders and national research organizations like Jisc, SURF, CSC, NWO, and ARDC, as mentioned above. Two sessions were particularly interesting from that perspective. Maria Cruz of NWO and Clifford Tatum of SURF/CWTS, Leiden University described the motivation and approach to the holistic national PID strategy for the Netherlands, while my colleague Josh Brown teamed up with Chris Brown (no relation), to talk about the UK national PID roadmap being spearheaded by Jisc. Both strategies are based on the selection of priority PIDs (for people, organizations, outputs, grants, and projects in the case of the UK roadmap), the necessary integration points needed for automated workflows, and mechanisms by which participation in the infrastructure can be incentivized.
Just like research itself and the graph that represents it, many of the initiatives and ideas underway are strongly interacting and overlapping. The community continues to make huge strides in making the knowledge graph of research visible. A key question we will continue to grapple with is how best to coordinate and integrate related initiatives so that they can add value to one another, while balancing national strategic needs with the greater goal of improving how we expand and preserve knowledge.
The last session that I’ll recommend may be the one that many readers of this blog will be most interested in. Why should more publishers come to PIDapalooza: connecting PIDs and Open Infrastructure to publisher workflows, was a talk by Alessanda Auddino, a talented researcher working as an Erasmus intern at Hindawi, ably supported by Catriona MacCallum, also of Hindawi. As part Auddino’s research project, she interviewed a range of people working at various scholarly publishing companies. Based on those discussions, she developed a series of use cases ranging from disambiguation of authors and auto-completion of article submission forms to automated deposition in repositories. She also looked at opportunities for better reporting to address use cases like finding new emerging fields, and identifying suspicious patterns of behavior such as too many retractions or corrections associated with an individual or an institution. If you’re a publisher reading this and wondering why you should care about all this PID business, I’d suggest that talk is the place to start.