Romanian ID card
Image via Wikipedia

OpenID has been a buzzword for a few years now, and Dick Hardt’s presentation is still fun to watch. But OpenID focuses on users. In STM publishing, authors provide a slight twist to the notion of a 1:1 match between an online identity and a person.

And the question to me is, Would creating one unique ID per author work for science?

There’s been a fair amount of work and thinking in STM publishing recently about identifiers for authors and/or researchers. Geoffrey Bilder of CrossRef was interviewed in Nature a few weeks ago about this very topic. CrossRef is in the midst of a pilot project, Thomson-Reuters has developed Researcher ID, and many others have been looking into the issues, feasibility, and approach.

An author ID sounds like a reasonable addition to the vetting and publication process. It could help enable some things you might reasonably advocate — from tools and services looking across the literature for a contributor’s works, to more reliable manuscript tracking systems, to a rationalized disclosure system.

But there are some key challenges in the midst of it, and one possible showstopper.

  1. There are many named authors per paper in scientific publishing. This amplifies the dataset’s size significantly. If we say there are 1.2 million papers published per year, and 6 authors each (this link is for medicine, which has fewer authors per paper on average), the dataset would contain 7.2 million entities in its first year. And while its growth would slow at some point, the height of the plateau is unknown. Who can afford to scale this infrastructure? How do you maintain it? It seems daunting, but not impossible.
  2. At what point do journals acquire an author’s unique identifier? If it’s upon submission, the work of the authors and the editorial office could increase significantly as author connections and questions proliferate. This could slow the peer-review process and increase expenses. If it’s after acceptance, the opportunities for leveraging the single identifier move away from aspects of editorial work and to post-publication opportunities.
  3. What about anonymous submissions? There are areas of the world and even areas in certain academic institutions where submitting scientific findings anonymously or pseudonymously is the safest way to get them out. If a government is uncomfortable being associated with an embarrassing disease outbreak, an indefensible social situation, or a game-changing discovery (resource discoveries, social trends, political viewpoints), would an author ID protect an author, grant them asylum? The history of science is replete with persecution, and those days are not firmly behind us. There’s even a physics blogger today who feels it’s best to keep his/her identity secret. I think this one might be a show-stopper.

But fundamentally, is focusing on authors in this way — as database entries across the domain — right for scholarly publishing? Aren’t we about the results of trials, not the trialists themselves?

Scientific publication is already too much of a numbers game — citations, impact factor, downloads, h-index, etc. In some cases, these measures lead to volume over quality, speed and priority over novelty. What if you are a lowly patent clerk in Switzerland? Would having an author ID and counts based on the number of times you go through the publication turnstiles tempt you to slice the next paper across a set of journals? What is you were a remote and unknown Australian physician? A run-of-the-mill GE consulting scientist? If you had a great finding, but the rewards system had yet another reward based on frequency of publication, would you dole it out differently?

Even the disclosure notion — in which each author could have a set of disclosures all journal editors could reliably assess — has the volume issues stated above as well as temporal/version issues the author ID doesn’t intrinsically address. Would science editors accidentally create a virtual cooperative tattle-tale state?

The best argument I’ve heard for adopting these is some view toward efficiency. But this is about convenience for publishers and, as stated above, it might actually backfire, increasing work and slowing down publication.

So for me, the question remains: would an author ID be good for science?

Reblog this post [with Zemanta]
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

12 Thoughts on "The Author ID Dilemma"

would an author ID be good for science?

If authorship means some form of ownership of ideas, then an unambiguous tie between a text and its creator(s) means that credit and responsibility can be allocated effectively.

Credit is “good for science” because it provides an incentive for further contribution.

Responsibility is “good for science” because it allows for some form of accountability if a creator knowingly publishes fraudulent work.

To allow for the counter-examples when ambiguity is “helpful to science,” an Author ID system should allow for pseudonymous authorship. By choosing so, however, an author forfeits receiving any credit for his/her work.

Its a trade-off. You can’t have your cake and eat it too.

Related article on author ambiguity in Asian names:

Qiu, J. 2008. Scientific publishing: Identity crisis. Nature 451: 766-767.
http://dx.doi.org/10.1038/451766a

I agree that credit and responsibility are vital to a trustworthy scholarly environment. So is honesty, something that isn’t always at work as we’ve seen today.

My concern is about the derivative activities that could be driven by a database approach to the very real and important issues Phil raises. The more systematic we make scientific publishing, it seems, the more of it there is, the thinner the slices are, and the more overwhelmed its practitioners become. Despite MORE information, we seem to find truth just as elusive as ever. Will an author ID generate more heat? Or shine more light?

Just wanted to make a few points:

1) Much as I would like to be able to say I was “interviewed in Nature,” I was, in fact, interviewed by Martin Fenner on his blog on Nature Network. This is not to deprecate Martin’s blog (which is clearly excellent), but there is an important difference between the two. BTW, I think that the mere fact that you made this confusion could be the subject of a whole new interesting thread ;-).

2) As to your point about the potential of author IDs to to exacerbate “the numbers game,” there is certainly that possibility, but I think there is also the potential to actually extend the ways in which we are able to measure how researchers “contribute” to their respective disciplines. Do they review lots of papers? Do they produce valuable data sets? Do they organize lots of conferences? Do they create videos illustrating complex methodologies? Do they blog? This is why CrossRef eschews the term “Author ID” in favor of “Contributor ID”. We view the creation of such a system as potentially allowing us to get a more rounded picture of how researchers participate in scholarly discourse.

“Scientific publication is already too much of a numbers game — citations, impact factor, downloads, h-index, etc. In some cases, these measures lead to volume over quality, speed and priority over novelty.”

True, but on the other hand is doesn’t hurt to play the game when it helps establish the value of what you publish. From the 2008 Annual Report of the Massachusetts Medical Society:

“NEJM’s 2007 impact factor stood at 52.589, up 2.5 percent from last year’s 51.296. Impact factors are citation and article-based calculations that reflect the influence of a particular journal’s articles on subsequent academic work. The higher the impact factor, the more credibility a journal has in the research community and the more likely papers published in it will be noticed. Consequently, authors often use impact factors to determine where to submit important papers for publication.”

Disambiguating author names is important in ensuring the accuracy of any kind of value measuring system, impact factor being only one of those, as you note.

As Geoffrey said, this was an interview on my blog at Nature Network, not something by the journal Nature.

I think that an author ID will solve many problems, and I think we can solve the issues you mention. My biggest concern is privacy. Even if you don’t have to publish anonymously, an author ID will make it much easier to connect different pieces of information together. Geoffrey has some scary examples in a comment to his CrossRef blog post that you link to

I’ve actually done quite a bit of research into this issue (for a client) — including an extremely helpful interview with the estimable Mr. Bilder 😉 — and have a couple more thoughts to offer. It became clear to me that _organizations_ have a very high interest in “person IDs.” Who exactly is this person? What else has she written? What conferences has she spoken at? What societies is she a member of? What working groups does she participate in? Etc. etc. In addition _authors_ have a very high interest in having their profiles, in these various ID systems, be complete, accurate, up to date (and they are the best at maintaining that profile validity). So I firmly believe we will be (actually, already are) in an environment of multiple, overlapping ID systems. I think that author will want to be sure her CrossRef Contributor ID is always complete and correct, but I think she will also want to be able to connect that ID with her ISI Researcher ID, her Scopus Author Identifier, her identity in the COS Scholar Universe, etc., not to mention the systems of organizations she belongs to. It’s kind of like standards–ain’t it great that there are so many of them?

In addition to the work that CrossRef is doing on this topic, there is similar work in this area at the ISO level on the International Standard Name Identifier, which is one of the activities of ISO technical committee 46, subcommittee 9 (TC 46/SC 9) on identification and description. NISO is the Secretariat for TC 46 / SC 9 and coordinates this work. The ISNI is a method for uniquely identifying the public identities of authors and contributors to media content such as books, music, movies, television programs, and serial-publication articles.

There is also related work by the Library of Congress, OCLC, British Library, BnF and DNB to create a Virtual International Authority File (http://VAIF.org)

The strength of the ISNI project, in particular is the engagement of large-scale media companies, who have a strong business case need for such an identifier: the payment of royalties. While science and STM publication could certainly take advantage of such a system, it is far more likely to be widely adopted and used when driven by large companies with a vested interest in adopting it.

(1) seems a little silly; that isn’t a very big database any more. It’s also a distributable problem, like IP or email addresses.

(2) actually doesn’t make sense to me. Editorial can ignore anything they wouldn’t find out now, and find it out faster if it’s something they need to know and can be systematized. Flaws in that are (3);

(3) can be solved by pseudononymity. We could nickname that ‘Student’ of Student’s t, for instance, or _Ex ungue leonem_. There’s also a long glorious history in the rest of civil society, and I’d be fine with science coming down on the side of pseudonymity rather than the ill-managed surveillance state.

And then what seems to be the main body of your complaint is that this will accelerate the `thin-slicing’ or `numbers’ game, but you don’t suggest a process by which that would happen. How will being accurately cited — microattributions for incremental datasets, and accurate attribution for publications before name-change or in another language, etc. — make a scholar’s automated `citation score’ less accurate? Or incorrectly more persuasive?

You claim the best argument you’ve heard is towards efficiency for publishers, but that’s not *any* of the arguments I’ve ever heard. People ill-served now are those with not-very-Anglo names or careers, or people who change their names mid-career, or even change their institution often; or scholars who work on things we all need which don’t thin-slice well.

Comments are closed.