First, a caveat. Phil Davis is a fellow blogger on this site. I like Phil and respect his work. But I reserve the right to disagree with him. I’m not in his back pocket by any means. So, even though this involves a paper by Phil, I’m telling you what I think, no holds barred.
OK, now that we’re clear on that, Davis and colleagues at Cornell University have just published in BMJ a very interesting and well-designed study addressing the question of whether open access drives citations.
This study has important advantages over prior studies. It was randomized by the researchers, so authors or publishers didn’t select which articles were made open access. It was done prospectively, so that the data were analyzed from point zero forward. These are crucial advantages in study design, in my opinion, and place this study head and shoulders above any prior study asserting a citation advantage. In fact, authors and editors often make studies they think are more significant free or push them online early. Retrospective, non-randomized studies fall prey to all sorts of problems because of these confounding effects.
The study covered articles published from January through April 2007. The researchers were allowed by the American Physiological Society (APS) to randomly assign OA status to up to 15% of articles (treatment group, n = 247) or not (control group, n = 1372). The researchers randomized the articles. Only research articles and reviews were included in the randomization. Davis et al also tested the effect of OA on downloads (full-text and PDF).
There is a good deal of sophisticated statistical and methodological rigor to the Davis et al paper, but the bottom line is unavoidable:
- Downloads increased for OA articles
- Citations did not increase for OA articles
Of the OA articles, 59% were cited after 9-12 months compared to 62% of the subscription-access articles. The chance of an OA article being cited was 13% lower, but this difference was not statistically significant.
I spoke with Marty Frank from the APS about this paper. He also feels it reports a strong, well-done study. I asked him about the additional traffic to the sites and whether this made a difference to APS publishing endeavors. As he stated it, “There wasn’t really an appreciable increase in traffic.” Also, as to whether he could commercialize additional traffic, Marty said, “Overall, advertising [against a traffic increase this size] ain’t going to make a hill of beans.” So, while the study found an increase in traffic, the publisher in question didn’t think it mattered much.
The traffic finding drew my attention, and the authors elaborate a good deal about it in the paper. For instance, other factors (review vs. research article, an associated press release, and featuring an article on the front cover) increased downloads, as did number of references and length. So, even the traffic finding has nuances that defy a generalizable cause-effect explanation.
There may be complaints that this study is just from one publisher, and from one scientific domain (physiology). Well, instead of complaining, I’d suggest people try to bring this superior study design to other domains to confirm or refute the findings.
You’ll probably have a pretty significant paper on your hands then.
6 Thoughts on "Open Access Doesn’t Drive Citations"
For a critique, see:
“Davis et al’s 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion”
(1) No self-selection control condition, showing the OA citation advantage, hence no evidence the null effect is a result of eliminating the self-selection citation advantage.
(2) Most studies reporting an OA citation advantage report failing to find the effect in the first year: the time-base is too short.
Just to clarify and correct: The PLoS study (Eysenbach, 2006) which you link to as “prior study asserting a citation advantage” was also a PROSPECTIVE study (a prospective cohort study), not a retrospective study, as the Davis paper incorrectly asserts, and as you imply in this blog. The important methodological difference between the Davis and the PLoS study was randomization. The PLoS study was an observational cohort study, statistically adjusted for known confounders, while Davis study is a RCT, which gets rid of known and unknown confounders. (It’s like following a cohort of smokers and nonsmokers and observing cancer prevalences, adjusting statistically for demographic differences, versus randomizing the smoking condition and following up the cohorts.)
It might also be a bit simplistic to refer to the PLoS study as a study “asserting a citation advantage”. If one reads the study in full rather than just skimming through the abstract (which few people seem to do) one would notice that statistically adjusting for all known confounders actually led to an elimination of any citation advantage of self-archiving (green OA) – the citation advantage only remained for gold-OA. Gold-OA articles have a citation advantage of 25-40% (which is already less than what previous quick & dirty studies have asserted – Harnard to this date disputes the presence of any bias or confounders in his studies which talk about 200-700% citation advantages). This advantage holds if we adjust for known confounders.
No-one disputes that an RCT is a superior methodology (I am also doing one), however, the main critique of this particular RCT is the ridiculously short follow-up period and that several “control” variables which SHOULD have been predictive for a citation advantage (articles with press-release, articles on the cover page, self-archived articles) are not significant predictors for citations either.
I disagree with Harnard that there was “no self-selection control condition” in the Davis study. The self-selection control condition is “self-archiving”, which – in line with Davis’ own argument – tend to be the “better” studies, which is why previous studies simply comparing citations of self-archived articles vs non-self-archived articles without adjusting for anything did see a citation advantage of self-archived studies. But the fact that “self-archiving” is NOT a predictor in the Davis study is a major paradox which only leaves the conclusion that the study is not internally valid. Davis should have waited with his publication long enough until all the other variables which are expected predictors for citations become significant.
For a full critique and questions for the author see also
Author’s Response: Insufficient timeframe to detect citation effects
I recognize that there was a tradeoff in choosing to submit our paper for publication with only one year of post publication citation data. We decided it was worthwhile to report preliminary results, rather than wait for more citation data, because of the importance of the issue and because of the stark contrast between our results and those in prior studies. Based on the effect size reported in previous studies, and our statistical power, we should have seen a significant open access effect by the end of the first year.
To further assess concern of insufficient timeframe, we have gone back and reexamined the issue with additional months of citation data. Since our manuscript was submitted to BMJ (with citation data from 2 January, 2008), we have run several update analyses.
As of 3 August, 2008 (15 to 18 months after article publication) the effect of randomized open access on citations remains insignificant (Incident Rate Ratio = 1.07, 95% confidence interval 0.95 to 1.20, P=0.23). Open access and subscription-access articles both have an average of 3.8 citations.
In sum, we still find no open access effect on citations. Nonetheless, we plan to gather more citation data for these two sets of articles, and reexamine this issue again, after allowing even more time to pass.
Professor Harnad comments that we should have implemented a self-selection control in our study. Although this is an excellent idea, it was not possible for us to do so because, at the time of our randomization, the publisher did not permit author-sponsored open access publishing in our experimental journals. Nonetheless, self-archiving, the type of open access Prof. Harnad often refers to, is accounted for in our regression model (see Tables 2 and 3).
We could identify only 20 instances of self-archiving, 11 cover stories, and 4 press-releases. Without doubt we lack the statistical power to report an effect with sufficient certainty and do not draw our conclusions based on these limited data. These variables were not the main variable being tested in our study (it was randomized open access).
To summarize, we believe that our research provides strong evidence that open access increases the dissemination of scientific articles, as indicated by our download results. However, we find no evidence of an open access citation effect, even after incorporating six additional months of citation data.
There are many societal benefits to making the scientific literature freely available beyond the research community; a citation advantage may not be one of them.
Author’s Rapid Response on BMJ.
On Eggs and Citations
Failing to observe a platypus laying eggs is not a demonstration that the platypus does not lay eggs. You have to actually observe the provenance, ab ovo, of the little newborn platypusses, if you want to demonstrate that they are not engendered by egg-laying.
Failing to observe a significant OA citation Advantage after a year (or a year and a half — or longer, as the case may be) with randomized OA does not demonstrate that the many studies that do observe a significant OA citation Advantage with NONrandomized OA are simply reporting self-selection artifacts (i.e., providing OA selectively for the more highly citable articles.)
You first have to replicate the OA citation Advantage with NONrandomized OA (on the same or comparable sample) and then demonstrate that randomized OA (on the same or comparable sample) eliminates the OA citation Advantage (on the same or comparable sample).
Otherwise, you are simply comparing apples and oranges (or eggs and expectations, as the case may be) in reporting the failure to observe a significant OA citation Advantage in a one one-year (or 1.5 year) sample with randomized OA — along with the failure to observed a significant OA citation Advantage for nonrandomized OA for the same sample either (because the nonrandomized OA subsample was too small):
The many reports of the nonrandomized OA Citation Advantage are based on samples that were sufficiently large, and based on a sufficiently long time-scale (almost never as short as a year) to detect a significant OA Citation Advantage.
A failure to observe a significant effect with small samples of short time-scales — whether randomized or nonrandomized — is simple that: a failure to observe a significant effect: Keep testing till the size and duration of your sample of randomized and nonrandomized OA is big enough to test your self-selection hypothesis (i.e., comparable with the other studies that have detected the effect).
Meanwhile, note that (as other studies have likewise reported), although a year is too short to observe a significant OA CITATION Advantage, it was long enough to observe a significant OA DOWNLOAD Advantage — and other studies have also reported that early download advantages correlate significantly with later significant citation advantages.
Just as mating more is likely to lead to more progeny for platypusses (by whatever route) than mating less, so accessing and downloading more is likely to lead to more citations than access and downloading less.
If the objective of OA is to produce citations, then I suppose that it’s measurement is important. But, here in Hillsdale County, Michigan where unemployment is 10% and rising, some 12% of the high school grads go to collage and very few of those study any of the STEM subjects, where we have no R&D college or university, no comunity collage(Jackson CC does have a branch here which teaches a few courses), I teach science to some of the “at risk” high school students and for us OA has been a life saver. No county library had ever heard of Issues