inconvenient_truth_ver2_xlg.jpg
Image by Yoshigi via Flickr

Do institutional mandates requiring authors to self-archive their papers lead to higher citation rates?  A new analysis argues that it does, yet closer inspection may give one pause.

The article, Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research,” by Yassine Gargouri and others at the University of Quebec at Montreal, was deposited into the arXiv on January 3rd by self-archiving advocate Stevan Harnad, who is the last author of this paper.

Comparing 6,000 mandated self-archived papers deposited in four institutional repositories (Southhampton University, CERN, Queensland University of Technology, and Minho University in Portugal) with 21,000 control articles selected by title word similarity, the researchers were interested in isolating and measuring the citation effect when authors willingly deposit articles on their own accord (self-selection) versus when their institution mandates deposit.

Discovering an independent citation advantage for mandated self-archived articles would suggest that open access (OA) is likely a real cause of increased citations; several articles in the past have argued that higher-quality articles are more likely to be self-archived and that the relationship between open access and increased citations is merely a spurious association (e.g. Kurtz (2005, 2007), Moed (2007), Davis (2007, 2008)).

Gargouri reports that institutionally-mandated OA papers received about a 15% citation advantage over self-selected OA papers, which seems somewhat counter-intuitive.  If better articles tend to be self-archived, their reasoning goes, we should expect that papers deposited under institutional-wide mandates would under-perform those where the authors select which articles to archive.  The authors of this paper deal, rather unscientifically, with this inconvenient truth with a quick statistical dismissal — that their finding “might be due to chance or sampling error.”  But even this explanation doesn’t hold water — their data set is large enough to detect even very small effects, and they report a statistically significant effect (p=0.048) in their appendix.  This fact seems to be conveniently ignored.

Similar inconsistencies are present throughout.  For instance, an independent effect due to institutional self-archiving mandates is present for articles that receive low-to-medium numbers of citations (Figure 4), but not present for articles that fall in both the low- and high-citation groups.

The main weakness in this study stems from how the researchers deal with their data, forming ratios that compare the logarithm of citation performance of different kinds of articles (e.g. OA, mandated versus OA, self-selected).  Why is this so problematic?  Consider the following citation ratio scenarios:

  1. log 3/log 2 = 1.6
  2. log 30/log 20 = 1.14
  3. log 1/log 2 = 0
  4. log 2/log 1 = logical error
  5. log 2/log 0 = logical error

In comparing scenario #1 with #2, both show the same raw citation ratio (3/2 = 30/20) and yet when you take the log of these numbers before dividing you get very different answers — a 60% citation differential for #1 but only a 14% differential for #2.  In scenario #3, a ratio which demonstrates a 50% citation difference before log transformation, becomes 0% after transformation.  Furthermore, any article in the numerator that receives one or zero citation makes the entire ratio unusable (e.g., #4 and #5).  Henk Moed criticized this approach (used by Harnad and Brody in an earlier paper) as being methodologically problematic since it can result in extremely high ratio values for very small citation differences.  You can see the effect of this ratio approach in Figure 2 where the most recently year of papers (2006) shows extraordinarily high impact.

Given the passionate language used in this article, a reader may come to the conclusion that its authors are not being driven by the data, but are bent on selectively reporting and interpreting their results while ignoring those inconvenient truths that do not conform with their preconceived mission — that all institutions should establish mandatory self-archiving policies. They write:

Overall, only about 15% of articles are being spontaneously self-archived today, self-selectively. To reach 100% OA globally, researchers’ institutions and funders need to mandate self-archiving, as they are now increasingly beginning to do. We hope that this demonstration that the OA Impact Advantage is real and causal will give further incentive and impetus for the adoption of OA mandates worldwide in order to allow research to achieve its full impact potential, no longer constrained by limits on accessibility

Someone with limited statistical background will find himself overwhelmed with complicated bar charts and may find it preferable to cite from the abstract.  Those with the ability to plow through the analysis may find the authors’ approach rather blunt and repetitive, when simpler and more elegant approaches are available.

Still, the researchers report what others have established before them:

  1. that characteristics of the article other than access status (e.g. type of article, number of co-authors, length of article, journal impact factor, field and country of authorship) are predictive of future citations,
  2. that citations are concentrated among a small group of papers, and
  3. initial citation differences tend to be amplified over time.

What is conspicuously missing from the discussion is that the citation advantage attributed to free access (Figure 2, OA/not-OA) is much smaller — by a factor of 10 (20% versus 200+%) — than was previously stated, and echoed relentlessly and unconditionally, by these same researchers.  This much smaller effect size, if truly attributable to access, is more in line with other rigorous studies.

In sum, this paper tests an interesting testable hypothesis on whether mandatory self-archiving policies are beneficial to their authors in terms of citations.  Their unorthodox methodology, however, results in some  inconsistent and counter-intuitive results that are not properly addressed in their narrative.

Given that one of its authors is bent on “ram[ming] open access down everybody’s throats,” I think you’ll hear a lot more on this article.

Reblog this post [with Zemanta]