List position matters when it comes to readership and citations.  That’s the result of an analysis of articles deposited into the arXiv by Cornell professor and inventor of the arXiv, Paul Ginsparg and graduate student Asif-ul Haque.

Their article, “Positional effects on citation and readership in arXiv” will appear in a forthcoming issue of the Journal of the American Society for Information Science & Technology.  A copy of the manuscript is available in — where else? — the arXiv.

Expanding and confirming an earlier study on the positional effects in the arXiv, Haque and Ginsparg focused this time on articles deposited between 2002-2004 in three subsections of the arXiv: Astrophysics (astro-ph), High Energy Physics-Theory (hep-th), and High Energy Physics-Phenomenology (hep-ph).

Submissions to the arXiv by time of day (in 10 min intervals)

Submissions to the arXiv by time of day (in 10 min intervals)

They were interested primarily in explaining why articles listed at the top of daily publication announcements were cited more frequently than articles listed lower.  Since the daily queue begins with articles deposited after 4:00 pm each afternoon, one is not surprised to see a spike in the first 10 minutes as some submitters jostle for top-ranked slots in the following day’s announcements.

In determining what makes the articles in top positions more likely to be cited and read more frequently, the researchers divided their dataset — articles submitted within the first 10 minutes after 4:00 pm (“Early”) were compared to articles appearing after 4:3o pm (“Not early”).

Median citations for each position for astro-ph announcements from the beginning of 2002 through the end of 2004. The red bars represented the self-promoted articles. The non-self-promoted articles in the top few positions, represented by the green bars, nonetheless receive more median citations than those lower down in announcements.

Median citations for each position. Red bars represent articles submitted shortly after 4pm ("Early"). "Not early" articles in the top few positions nonetheless receive more median citations than those occupying lower list positions.

Haque and Ginsparg report that “Early” articles found in the top position were cited more frequently and received more full-text downloads than articles found lower in the queue.

However, “Not early” articles found in the top position also received a similar, albeit much smaller, citation and full-text advantage compared to articles occupying lower list positions.  In other words, position really does matter.

What is noteworthy with these results is that the publication of the submission list is fleetingly ephemeral, lasting only one day.  It is surprising that such an early event can show up years later as a citation advantage.  Haque and Ginsparg remark:

we’ve documented here that accidental forms of visibility can drive early readership, with consequent early citation potentially initiating a feedback loop to more readership and citation, ultimately leaving measurable and significant traces in the citation record.

In other words, small, initial differences can grow into large, significant effects years later.

If list order does indeed make a difference (a spike after 4:00 pm indicates that many physicists believe it does, as well), then is it reasonable to change the arXiv to make the publication system more fair? Haque and Ginsparg suggest briefly some possible solutions, such as:

  • Breaking larger groups into subgroups, thus reducing the length of daily publication lists
  • Randomizing the order of submissions in the daily announcements
  • Allowing readers to set up personalized announcement services based on keyword rather than date of submission

While all of these solutions would help make the arXiv less immune to priority effects, we should not forget that readers may benefit from list order to help them make decisions on what to read.  Self-selection, which leads to higher-quality articles being placed in top positions, is the driving force behind the citation advantage.  Remove the ability of authors with good papers to send a high-quality signal to potential readers, and you’ve removed a valuable feature of the arXiv.

Epilogue: Their manuscript was submitted to the arXiv at 3:28 pm (EST), 32 minutes before the end of the day.  But that is another story.

Reblog this post [with Zemanta]