List position matters when it comes to readership and citations.  That’s the result of an analysis of articles deposited into the arXiv by Cornell professor and inventor of the arXiv, Paul Ginsparg and graduate student Asif-ul Haque.

Their article, “Positional effects on citation and readership in arXiv” will appear in a forthcoming issue of the Journal of the American Society for Information Science & Technology.  A copy of the manuscript is available in — where else? — the arXiv.

Expanding and confirming an earlier study on the positional effects in the arXiv, Haque and Ginsparg focused this time on articles deposited between 2002-2004 in three subsections of the arXiv: Astrophysics (astro-ph), High Energy Physics-Theory (hep-th), and High Energy Physics-Phenomenology (hep-ph).

Submissions to the arXiv by time of day (in 10 min intervals)
Submissions to the arXiv by time of day (in 10 min intervals)

They were interested primarily in explaining why articles listed at the top of daily publication announcements were cited more frequently than articles listed lower.  Since the daily queue begins with articles deposited after 4:00 pm each afternoon, one is not surprised to see a spike in the first 10 minutes as some submitters jostle for top-ranked slots in the following day’s announcements.

In determining what makes the articles in top positions more likely to be cited and read more frequently, the researchers divided their dataset — articles submitted within the first 10 minutes after 4:00 pm (“Early”) were compared to articles appearing after 4:3o pm (“Not early”).

Median citations for each position for astro-ph announcements from the beginning of 2002 through the end of 2004. The red bars represented the self-promoted articles. The non-self-promoted articles in the top few positions, represented by the green bars, nonetheless receive more median citations than those lower down in announcements.
Median citations for each position. Red bars represent articles submitted shortly after 4pm ("Early"). "Not early" articles in the top few positions nonetheless receive more median citations than those occupying lower list positions.

Haque and Ginsparg report that “Early” articles found in the top position were cited more frequently and received more full-text downloads than articles found lower in the queue.

However, “Not early” articles found in the top position also received a similar, albeit much smaller, citation and full-text advantage compared to articles occupying lower list positions.  In other words, position really does matter.

What is noteworthy with these results is that the publication of the submission list is fleetingly ephemeral, lasting only one day.  It is surprising that such an early event can show up years later as a citation advantage.  Haque and Ginsparg remark:

we’ve documented here that accidental forms of visibility can drive early readership, with consequent early citation potentially initiating a feedback loop to more readership and citation, ultimately leaving measurable and significant traces in the citation record.

In other words, small, initial differences can grow into large, significant effects years later.

If list order does indeed make a difference (a spike after 4:00 pm indicates that many physicists believe it does, as well), then is it reasonable to change the arXiv to make the publication system more fair? Haque and Ginsparg suggest briefly some possible solutions, such as:

  • Breaking larger groups into subgroups, thus reducing the length of daily publication lists
  • Randomizing the order of submissions in the daily announcements
  • Allowing readers to set up personalized announcement services based on keyword rather than date of submission

While all of these solutions would help make the arXiv less immune to priority effects, we should not forget that readers may benefit from list order to help them make decisions on what to read.  Self-selection, which leads to higher-quality articles being placed in top positions, is the driving force behind the citation advantage.  Remove the ability of authors with good papers to send a high-quality signal to potential readers, and you’ve removed a valuable feature of the arXiv.

Epilogue: Their manuscript was submitted to the arXiv at 3:28 pm (EST), 32 minutes before the end of the day.  But that is another story.

Reblog this post [with Zemanta]
Phil Davis

Phil Davis

Phil Davis is an independent researcher and publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. His research has focused on the on the dissemination of scientific information, rewards and incentives in academic publishing, and economic issues related to libraries, authors and publishers.

View All Posts by Phil Davis


8 Thoughts on "Downloads, Citations, and Positional Effects in the arXiv"

How could self-selection lead to higher-quality articles being placed in top positions? Doesn’t everyone want the most possible attention for their article? You might as well argue that self-selection should lead people to send mass emails to strangers only about really important information.

Garnering attention has a caveat — those who attempt to gather too much create a collective negative image for themselves and are systematically ignored.

Every field has a few of these people: they respond to every email post on a listserv (copying dozens of other listservs for effect); they feel responsible for commenting on every blog post; they put their papers everywhere someone might wish to view them.

Academics have a name for these kinds of people. They call them “shameless self promoters” or SSP for short.

Self-promotion has negative consequences, especially when low-quality work is signaled as high-quality work. This, I imagine, is why the submission plot is not one giant spike.

Clever researchers (as noted in the previous article on this subject) always make sure to submit their articles at the exact time that will lead to higher placement on ArXiv’s list and hence higher citation and readership. It’s a great example of how easy it is to game so many of the new and newly proposed systems for academic recognition. I keep seeing proposals (and systems) for rating papers with 1-5 stars like Amazon reviews, and we all know how reliable those are, right? I’m not a big fan of the impact factor (particularly because of the secret formula Thomson/ISI uses to determine it, which they won’t reveal), but it seems at least a little harder to abuse.

So Philip, your theory is that somehow everyone tacitly acknowledges that only the best articles are supposed to be archived at 4 p.m., and anyone who archives lower-quality articles at that time is subsequently discredited? I’d like to see evidence for that.

I encourage you to read the article, especially the section where Haque and Ginsparg put readership and citation data together.

For instance, on page 19 of the manuscript (section 3.4) the authors write:

“Articles whose initial active period is much shorter than average (e.g., 3 days rather than 10) do tend to get somewhat fewer citations in the long run, as would be expected for lower quality articles, rapidly identified as such by discerning readers.”

Comments are closed.