Romanian ID card
Image via Wikipedia

OpenID has been a buzzword for a few years now, and Dick Hardt’s presentation is still fun to watch. But OpenID focuses on users. In STM publishing, authors provide a slight twist to the notion of a 1:1 match between an online identity and a person.

And the question to me is, Would creating one unique ID per author work for science?

There’s been a fair amount of work and thinking in STM publishing recently about identifiers for authors and/or researchers. Geoffrey Bilder of CrossRef was interviewed in Nature a few weeks ago about this very topic. CrossRef is in the midst of a pilot project, Thomson-Reuters has developed Researcher ID, and many others have been looking into the issues, feasibility, and approach.

An author ID sounds like a reasonable addition to the vetting and publication process. It could help enable some things you might reasonably advocate — from tools and services looking across the literature for a contributor’s works, to more reliable manuscript tracking systems, to a rationalized disclosure system.

But there are some key challenges in the midst of it, and one possible showstopper.

  1. There are many named authors per paper in scientific publishing. This amplifies the dataset’s size significantly. If we say there are 1.2 million papers published per year, and 6 authors each (this link is for medicine, which has fewer authors per paper on average), the dataset would contain 7.2 million entities in its first year. And while its growth would slow at some point, the height of the plateau is unknown. Who can afford to scale this infrastructure? How do you maintain it? It seems daunting, but not impossible.
  2. At what point do journals acquire an author’s unique identifier? If it’s upon submission, the work of the authors and the editorial office could increase significantly as author connections and questions proliferate. This could slow the peer-review process and increase expenses. If it’s after acceptance, the opportunities for leveraging the single identifier move away from aspects of editorial work and to post-publication opportunities.
  3. What about anonymous submissions? There are areas of the world and even areas in certain academic institutions where submitting scientific findings anonymously or pseudonymously is the safest way to get them out. If a government is uncomfortable being associated with an embarrassing disease outbreak, an indefensible social situation, or a game-changing discovery (resource discoveries, social trends, political viewpoints), would an author ID protect an author, grant them asylum? The history of science is replete with persecution, and those days are not firmly behind us. There’s even a physics blogger today who feels it’s best to keep his/her identity secret. I think this one might be a show-stopper.

But fundamentally, is focusing on authors in this way — as database entries across the domain — right for scholarly publishing? Aren’t we about the results of trials, not the trialists themselves?

Scientific publication is already too much of a numbers game — citations, impact factor, downloads, h-index, etc. In some cases, these measures lead to volume over quality, speed and priority over novelty. What if you are a lowly patent clerk in Switzerland? Would having an author ID and counts based on the number of times you go through the publication turnstiles tempt you to slice the next paper across a set of journals? What is you were a remote and unknown Australian physician? A run-of-the-mill GE consulting scientist? If you had a great finding, but the rewards system had yet another reward based on frequency of publication, would you dole it out differently?

Even the disclosure notion — in which each author could have a set of disclosures all journal editors could reliably assess — has the volume issues stated above as well as temporal/version issues the author ID doesn’t intrinsically address. Would science editors accidentally create a virtual cooperative tattle-tale state?

The best argument I’ve heard for adopting these is some view toward efficiency. But this is about convenience for publishers and, as stated above, it might actually backfire, increasing work and slowing down publication.

So for me, the question remains: would an author ID be good for science?

Reblog this post [with Zemanta]