PIDapalooza - Revenge of the Nerds - The Scholarly Kitchen

This post was co-authored by Alice Meadows and Phill Jones. PIDapalooza was co-hosted by Alice’s organization, ORCID, together with California Digital Library, Crossref, and DataCite.

From Alice:

PIDapalooza, the first ever festival for scholarly persistent identifiers, set out to make PIDs – and the nerds who create, develop, and use them – cool! And, to a large extent, I think it succeeded. From the offbeat location (Reykjavik in November, anyone?) to the music festival vibe and fast-paced program (three tracks of mostly half-hour sessions interspersed with plenaries), it certainly felt very different from other scholarly communications conferences I’ve attended.

But what about the content? What happens when you put 120 or so PID aficionados in a room together? What did we talk about? Which topics generated the most buzz and why?

The “official” PIDapalooza themes were PID Stories, Emerging PIDs, PIDagogy, PID Sci, I(Interoperability) Word, Persistence, and Org IDs. Closing plenary speaker, Carly Strasser of the Betty and Gordon Moore Foundation, had a slightly different take, summarizing them as: granularity, which PIDs do we really need, learning, outreach, interoperability, ideals versus reality, and responsibilities.

Some topics generated especially lively discussions. I have never seen as many hands go up at a conference as I did during the session on project IDs, led by Martin Fenner of DataCite and Tom Demeranville of ORCID EU! Likewise, participants in the organization ID project session expressed strong and sometimes divergent opinions on what is needed. And, during Herbert van de Sompel’s plenary, a technical discussion about handles and DOIs (definitely not my area of expertise!) turned into quite a heated debate. But, as many attendees noted, being able to have these sorts of discussions with colleagues from the wider PID community — in person, and in good humor – made for a great start to what will certainly be much longer-term conversations.

Controversial topics aside, the big overall theme for me was the importance of communications in its broadest sense. Because, for all the talk about the technology behind PIDs, and the discussions of what PIDs are needed and which organizations should be involved in developing and maintaining them, the real challenge for us all is to get them used more widely, consistently, and appropriately. What Phill calls below the difficult “social” questions. And that means understanding – and effectively communicating – the value of PIDs to researcher organizations and researchers alike, in order to ensure their wide adoption and usage across the whole research community.

Simon Porter’s excellent plenary on Research Information Citizenship provided one vision of how to achieve this. He called on each scholarly communications sector to play their part in making the digital research infrastructure work better. For research institutions this means validating affiliations and, where applicable, publications; support for ORCID iDs for the new generation of researchers; and communicating their organization structure in a machine-readable format. For publishers, it’s all about metadata, including ORCID adoption – something which, as Simon pointed out, many organizations are already doing. As for funders’ responsibilities, Simon also pointed out that metadata for grants can become a catalyst for research processes. While service providers should take responsibility for closing the gap between the act of research collaboration, and reporting on it, as well as expanding the research information community. Critically, service providers should, where possible, collaborate to build shared infrastructure tools and services. One of Simon’s questions to attendees at the meeting was, should there be an open letter for service providers similar to the publishers’ ORCID open letter? Last, but very much not least, Simon urged us all to get our act together in terms of communicating with our researcher communities – and equally urged researchers to engage with us.

Continuing the communications theme, how do we get researchers to engage with us? What can and should we be telling them about PIDs? How can we best get our message across? Some of the ideas that came up during the sessions I attended included developing a PID curriculum and training resources (something that Crossref, DataCite, and ORCID will be working on during 2017); experiments to evaluate the most effective communications for researchers – and administrators (by Trisha Adamus and Mieneke van der Salm); and community-building opportunities, such as those being developed by the THOR project described in a presentation by their Training & Events Officer, Maaike Duine. Meaningful use of PIDs is also critical here (see also Phill’s comments about how many PIDs we really need!). This was addressed in several presentations I attended, including those by Cory Craig on the use of ORCID iDs in hyper-authored articles, and Richard Wynne on PID adoption, who noted that “implementations that solve real problems are the only way to drive PID adoption.”

PIDapalooza certainly gave me lots of food for thought and, from informal comments at the meeting, as well as the initial responses to our feedback survey, other attendees felt the same way. So here’s to PIDapalooza 2017!

From Phill:

Have you ever been to Reykjavik in November? I have. Do you know what it’s like? It’s dark. It’s really dark. It’s also cold and rainy. On November 9^th, when I woke up before the sun had risen on the barren, volcanic landscape to discover the result of the US election, it seemed all the darker. Thank goodness, I thought, that I’m here to learn about exciting world of information infrastructure.

Suffice to say, my mood as I followed the signs to the conference through the oddly sparse, understaffed and dark (Have I mentioned it was dark yet?) hotel could have been a lot better. Fortunately, things were about to get a whole lot more lively and interesting than I could have imagined.

I have to confess that I’m a bit of a neophyte to all of this persistent identifier business. I’ve recently changed roles at Digital Science and as a result am starting to get more involved in our relationships with institutions, funders and policy makers. I’m also having to take a deeper and more faceted look at the role of ORCIDs, DOIs, and institutional identifiers. I’ve always been aware that identifiers are more than just metadata but, until recently, hadn’t really understood how much untapped potential they still have and how much remains to be done to maximize the portability of research information on the web.

Although it’s tempting to write about all the things that I learned last week while developing a vitamin D deficiency, I will resist the urge to copy out my freshman notes. Instead, I’m going to make note of a couple of the conversations that I was a part of when I didn’t feel wildly out of my depth.

Q. What’s the alternative to using ORCID? A. Not using ORCID

As Alice mentions above, Trisha Adamus, a research data librarian at the University of Wisconsin gave an update on her work to support a trial of institutional support of ORCID. After the talk somebody asked an interesting question that caused at least some confusion. “What are the alternatives to supporting ORCID?”

What interests me about the question is that I think it’s based on an assumption that universities have always uniquely identified their researchers. After all, how else would they know what they’re staff are doing? It might surprise many people to learn that at Digital Science, we’ve encountered numerous institutions that, until a few years ago, couldn’t so much as generate a list of their own faculty, and many still can’t.

As I noted to Alice over lunch on the first day, you don’t need to use the ORCID identifier to be able to collate data on research staff; there are many commercial Current Research Information Systems (CRIS) on the market that will enable an institution to keep better track of this information. Where ORCID (and any other identifier for that matter) becomes invaluable is in making that information portable. As Clifford Tatum, Project manager at ACUMEN and researcher at Leiden University said in his plenary, if you have standards and protocols around identifiers, you don’t have to worry as much about system interoperability.

If you left a couple of label makers lying around, there’d be chaos

There were a number of sessions about proposed new persistent identifiers over the course of the conference. An identifier for projects, one for equipment, another for scientific protocols. While a case can be made for all of them, I’d argue that just because we can technically apply an identifier to some entity doesn’t meant that we necessarily should, or at least that we should just yet.

Take, for example, the idea of a protocol identifier, which was presented by Tom Gillespie, a graduate student from University of San Diego. As a former researcher in the biomedical sciences myself, I completely understood where Tom was coming from. The problem of lack of availability of protocols is the bane of many a researcher’s existence, particularly at early career stages. Anything that can be done to encourage making them available, is a good thing. Having said that, does giving protocols a PID really get to the reason why protocols aren’t shared? Fundamentally, lack of protocol sharing is down to a lack of incentives to do so, and in some cases, the negative incentive of protecting a competitive advantage.

I don’t mean to single out Tom’s idea specifically, because I felt it was an issue that came up a number of times. For example, how can we have a project identifier, if we haven’t defined what a ‘project’ actually is? Even more fundamentally, are we sure that it’s meaningful to characterize science as a series of projects, in addition to identifying both grants and outputs? Maybe it is, but I don’t think we’ve established that.

It seems to me that sometimes we’re not quite sure what the underlying problem is that we’re trying to solve, and fall back on figuring out what we can technically do rather than what is needed.

Based on many of the questions that were asked in the sessions, I don’t believe I was the only person to start thinking this way. Towards the end of the conference, especially, I felt that there was a tangible sense that while technology has its challenges, as Alice notes, the really difficult questions surrounding PIDs are social. How do we drive adoption of PIDs? What are the use cases and how do we bring them to light? If we do decide to create more PIDs, what will they be and how will they help?

PIDapalooza was a very timely conference. The field is still growing and evolving. For a first gathering of an embryonic community, PIDapalooza was an excellent start. I felt like many people turned up not quite knowing what to expect, but by the end, the community had begun to frame its questions and explore its parameters. I look forward with interest to next year, even if it is during the second darkest month of the year, in a country that touches the Arctic Circle.

All presentations from PIDapalooza are being made freely available on Figshare – enjoy!

Alice Meadows

I am a Co-Founder of the MoreBrains Cooperative, a scholarly communications consultancy with a focus on open research and research infrastructure. I have many years experience of both scholarly publishing (including at Blackwell Publishing and Wiley) and research infrastructure (at ORCID and, most recently, NISO, where I was Director of Community Engagement). I’m actively involved in the information community, and served as SSP President in 2021-22. I was honored to receive the SSP Distinguished Service Award in 2018, the ALPSP Award for Contribution to Scholarly Publishing in 2016, and the ISMTE Recognition Award in 2013. I’m passionate about improving trust in scholarly communications, and about addressing inequities in our community (and beyond!). Note: The opinions expressed here are my own

Discussion

5 Thoughts on "PIDapalooza – Revenge of the Nerds"

Phill makes some good points about project identifiers concerning how we should be thinking about the nature of projects and how they relate to science before diving in and assigning them identifiers. These questions, and others like them were at the forefront of our minds when myself and Martin Fenner put the presentation together. I think it was the exploratory nature of the session, with only ten minutes of slides and the rest as discussion, that meant we had a sea of hands in the air when the time for discussion came.

For my part, I think it’s very important to narrow down and identify the use cases you’re trying to solve before proposing any kind of technical solution. I feel we’ve partly done that, at least from our own perspective. A project, at its most basic level, consists of agents and resources. That is to say, the people or organisations that produce things, and the things that they produce. For us, it’s about linking disparate elements in a convenient manner in a reference-able place, but for others? Well that requires further work to unpick.

I do think that projects are a specialised instance of collections – in that there are other collections of things that can be modeled in similar ways. For example, conferences, teams, expeditions, soil samples and departments are all collections in a sense. Perhaps if we can work out, and agree on, the conceptual design for collections, then the more specialised versions may follow.

PIDapalooza was a great venue for this type of discussion and I’m really looking forward to next year.

By tomdemeranville
Nov 21, 2016, 11:17 AM

Hi Tom,

Thanks for your thoughtful comments.

I enjoyed your session at PIDapalooza. I have to confess that being new to the identifier business, I was a little out of my depth and am completely open to the idea that I may not have fully understood.

I suppose that what I don’t understand is what the primary need for a project identifier is. What makes ORCID so valuable is that it solves the fundamental problem of multiple people having the same name. DOI is important because it allows for the unique identification of articles on the web and isn’t routed in the legacy model of journal title, volume and page number. ISSNs make the identification to journals themselves easier and avoids confusion around name variations. These are all concrete things. To me projects aren’t concrete, they’re arbitrary frameworks that researchers allocate an amount of science to in order to write it up for a grant, a studentship, or an article.

The projects as defined in the grants don’t end up mapping to the projects that are defined in articles. Which one is more valid. Well, neither, both are arbitrary, with the limits of the project chosen to fit the form in each case.

As I say, I may have this completely wrong and would be interested to hear your take on it. Do you feel I’m wrong to say that projects are arbitrary? Are they more concrete than I give them credit for?

By Phill Jones
Nov 21, 2016, 4:10 PM

Phill, I think there is a clear use case for project identifier: link together all researchers, institutions, research outputs and additional funding associated with a grant. Something that is very painful right now, as this information is collected in a number of different places, or not at all. Project identifiers will help solve this problem, which will not only make funders happy, but is important for everyone else involved.

By mfenner
Nov 23, 2016, 2:17 PM

Phill, I don’t really have anything to do with project IDs, but I do think that while people and journals tend to be single, “concrete” entities, articles might be a little more complex than that.

For example an article that is assigned a DOI at the point of acceptance into a journal may have multiple published versions (eg one published online as an “Accepted Manuscript” immediately after acceptance, and then another “Version of Record” published after copy-editing and XML formatting the document). Both will have the same DOI. However if a DOI was assigned to the preprint version before submission/acceptance by a journal, it will have a different DOI assigned if later published in a journal.

Similarly, a correction to an article will have its own DOI. There are now journals like Matters that encourage the publication of single observations, each with their own DOI, that can later be “linked” with a narrative to form something more like the common article format – in other words, what might be a single publication with a single DOI in most journals is, in this publishing model, spit into multiple smaller items.

As CrossRef explains it here (http://blog.crossref.org/2016/05/members-will-soon-be-able-to-assign-crossref-dois-to-preprints.html), the overall aim is to provide a single DOI for an “intellectually discrete” document, but this is not always as straightforward as it might first appear.

Projects are more arbitrary than articles, but it seems to me it may be the scale of the problem that’s different, rather than the type of problem altogether. Just as for DOIs, decisions would need to be made about where to draw distinctions between intellectually discrete items. Whether the greater arbitrariness of projects makes the endeavour not worth the effort definitely seems like a valid question though!