Enormous citation effects attributed to online and open access are spurious — an artifact of the simple failure to control for differences in article quality — two information economists report.

The working paper by Mark McCabe and Chris Snyder titled Did Online Access to Journals Change the Economics Literature? appeared on the SSRN network on January 23rd.

Dead End
Image by Famartin.

McCabe is a professor at the School of Information at the University of Michigan, and is well-regarded in the library community for his work on publisher mergers and their effects on journal prices. Snyder is a professor in the Department of Economics at Dartmouth College.  Together, they are known for their application of two-sided market theory to open access journal publishing.

Analyzing a dataset of nearly 260,000 articles from 100 journals in business and economics published between 1956 and 2005, they attempted to validate whether online  access to journal articles boosts citations.

Carefully analyzing their data and controlling for other “secular” explanations such as time and quality effects, they refute two claims that were made by University of Chicago sociologist James Evans — namely that online access concentrates citations on a smaller number of recent articles, and that it disproportionately benefits scholars in developing countries, .

McCabe and Snyder systematically test, and refute, most of the “access –> citations” claims made popular in the last decade, although they leave one standing — they find that certain publisher and delivery platforms  may have a small, but detectable, citation effects.  Specifically, they note that being hosted by JSTOR may boost article citations by about 10%. In contrast, they report no effect from ScienceDirect. This may not be too surprising considering the scope of their dataset and the  collection strength of JSTOR in economics.

Their rationale in conducting such a large and careful study of citation patterns is not based on any particular political agenda. McCabe and Snyder are more interested in how to make sense of the value of a citation in the academic arena.  They write:

If a small change in the convenience of access can cause a quadrupling of citations, then the typical citation may be of marginal value, used to pad the reference section of citing articles rather than providing an essential foundation for subsequent research. According to this view, citations would be at best a devalued currency, subject to manipulation through the choice of publication outlet. On the other hand, the finding of little or no citation boost would resuscitate the view of citations as a valuable currency and as a useful indicator of an article’s contribution to knowledge.

Given that open access may not lead to increases in an article’s citation rate, an alternative justification is required to persuade academics to alter their publication behavior.  Repeated surveys of scholars by Ithaka S+R reveal that open access venues take low priority over prestige, relevance and Impact Factor, and that many authors report strong aversion to paying for publication. Indeed, similar priorities were reaffirmed recently in the survey conducted by the Study of Open Access Publishing (SOAP), a study highly biased toward open access publishing.

If citations are a form of academic currency, there may simply not be enough payoff to move the publication market toward an equilibrium dominated by open access journals, McCabe and Snyder write:

The current lack of evidence that free online access performs better, implies that the citation benefits of open-access publishing have been exaggerated by its proponents. Even if publishing in an open-access journal were generally associated with a 10% boost in citations, it is not clear that authors in economics and business would be willing to pay several thousand dollars for this benefit, at least in lieu of subsidies. Author demand may not be sufficiently inelastic with respect to submission fees for two-sided-market models of the journal market to provide a clear-cut case for the equilibrium dominance of open access or for its social efficiency.

The weakness in this argument, however, is that it assumes that scholars are paying with their own money, and therefore are sensitive to the costs and benefits of author-side payments. Price sensitivity is diminished when others (foundations, libraries) are willing to foot the bill, and becomes a non-issue when publishing in an open access venue is mandated through policy.  When this happens, there is little incentive to force the system to reduce costs, compete on price, and become more efficient.

The main contribution of the McCabe and Snyder paper is their explanation for scores of initial papers reporting huge access-citation effects — i.e., they are artifacts of improper statistical analysis.  The fact that there are many of these poor studies in the literature, or that their claims have achieved consensus among certain like-minded individuals, does not make for good science, nor does it help to inform good science  policy.

There are many benefits to the free transmission of scientific ideas. A citation advantage, however, is not one of them. If science is a self-correcting process, it may be time to admit that we made a mistake, ventured down the wrong trail, and hit a dead-end.  It’s time to retrace our steps and move on to more important questions.

Enhanced by Zemanta
Phil Davis

Phil Davis

Phil Davis is an independent researcher and publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. His research has focused on the on the dissemination of scientific information, rewards and incentives in academic publishing, and economic issues related to libraries, authors and publishers.

View All Posts by Phil Davis


9 Thoughts on "Online Access and Citations — A Spurious Relationship, Economists Say"

Phil, In two extracts from McCabe and Snyder you seem to have highlighted weaknesses in their case against the effect of open access to increase citations, rather than strengths. You have leapt on their claim of “lack of evidence” and “exaggeration” by proponents of the case for OA impact to suggest poor studies and poor science. You have made the anti-OA impact case in your own work, and have argued the evidence extensively, of course, yet you are in danger of subverting that contribution and of perpetrating poor science yourself by suggesting the case can be closed with one additional study and a flip dismissal of the rest http://opcit.eprints.org/oacitation-biblio.html

However, your other extract from McCabe and Snyder is more telling of your and their positions on this: “If a small change in the convenience of access can cause a quadrupling of citations, then the typical citation may be of marginal value”. Surely that is what this is all about. Is open access merely a small change, a convenience? Perhaps it’s an absolutely fundamental and transforming change. That is not proof either, but that has to be allowed for in the investigation. The science of this is measuring the effect of the impact, not the effect of the statements.

I don’t see this study as yet another report of negative results — something to stack in a pile next to a larger pile of studies reporting positive results.

Irrespective of their results, the McCabe and Snyder study is the most careful and detailed retrospective observational analysis to date. Because of its methodological strengths, we should give it more weight of evidence than a simple uncontrolled comparison study.

If my summary of their paper is inadequate, please read their manuscript yourself. I think you’ll be impressed. You can call it “pro-OA” or “anti-OA”. I call it good science.

Thanks for the pointer to this paper. Having read it, their findings don’t appear to talk specifically about open access, while the conclusion of your summary does. Can you point me to where open access is treated independently from online access, or as a distinct subset of it, in their analysis?

While I may have missed it, given that their discussion of results — and specifically ‘Individual Channels’ in section 5.2 — says nothing about open access, and the title of their paper begins with “Did Online Access…” and not “Did Open Access…”, I don’t think I did.

You are correct in that McCabe and Snyder focus entirely on online access. Their manuscript needs to be read in context of two prior papers by James Evans [1, 2], especially [2], who reported significant online and open access effects using a similar methodology. By recreating a similar dataset (and Evan’s positive effects), McCabe and Snyder then show how the results are an artifact of uncontrolled variables. They write:

The first set of results show that the same huge effects of online access found in the previous literature can be generated if fixed effects capturing the quality level of journal volumes are omitted. Once appropriate fixed effects are included, however, the aggregate result cannot be distinguished from zero. Thus much of the estimated effect of online or open access from the previous literature can be attributed to bias due to omitted quality.

[1] Evans JA. Electronic Publication and the Narrowing of Science and Scholarship. Science 2008;321(5887):395-399. doi:10.1126/science.1150473.

[2] Evans JA, Reimer J. Open Access and Global Participation in Science. Science 2009;323(5917):1025-. doi:10.1126/science.1154562.

Thanks Phil. Viewing their findings specifically in the context of the Evans papers helps me understand it better.

That said, I still don’t see how their dataset and analysis can lead to such a strong assertion about open access as this (yours, which I’ll quote since it’s shorter than theirs): “There are many benefits to the free transmission of scientific ideas. A citation advantage, however, is not one of them.”

McCabe and Snyder compared the effects of expensive, location-based access (print) to more convenient but still expensive and affiliation-based access (online) on citation patterns. Extending their conclusion to citation pattern differences between expensive, restricted online access and free, unrestricted access (open) is a big leap…

…unless you confuse or equate online with open. Which brings us back to the first passage from their paper you quoted in your blog post, which begins “If a small change in the convenience of access can cause a quadrupling of citations…”.

Regardless of whether they’re talking about open vs. restricted access or online vs. print, I can’t see how either could be characterized as a small change. But since we’re talking about whether an open access citation advantage exists, in that context McCabe and Snyder are saying that the difference between open and restricted access is small.

Which may be true, at least from their perspective: McCabe did this research at Michigan, whose library spends millions of dollars each year to provide the appearance (illusion?) of free access to journal content to its researchers. We pay for full access to JSTOR, ScienceDirect, and dozens of other journal packages as well. I’m sure Chris Snyder, via Dartmouth, has similarly robust access to the literature.

However, many researchers do not. There is no appearance of free access to the many who either pay money for articles themselves, out of pocket or with their grants, pay in time via interlibrary loan, or do without.

So while I still don’t think this paper’s analysis says much, if anything, about open access citation, the quote from their paper you led with drives home your point about there being little price sensitivity among scholars.

Thanks for your thoughtful reply. There is a lot in the McCabe and Snyder article and they may have overstated their findings into the OA-citation claim while they concern themselves solely with the online-citation claim. As online access and open access are related — both deal with the convenience of access to the scholarly literature — I feel that their conclusions are justified and are supported by other econometric and clinical studies.

After reading the original post regarding our paper, and the subsequent comments, I thought it would be appropriate to address the issue that is generating some heat here, namely whether our results can be extrapolated to the OA environment. But before I do so, let me thank everyone for their interest in the paper, and Chris and I welcome further comments.

So here goes:

1. Selection bias and other empirical modeling errors are likely to have generated overinflated estimates of the benefits of online access (whether free or paid) on journal article citations in most if not all of the recent literature.

2. There are at least 2 “flavors” found in this literature: 1. papers that use cross-section type data or a single observation for each article (see for example, Lawrence (2001), Harnad and Brody (2004), Gargouri, et. al. (2010)) and 2. papers that use panel data or multiple observations over time for each article (e.g. Evans (2008), Evans and Reimer (2009)).

3. In our paper we reproduce the results for both of these approaches and then, using panel data and a robust econometric specification (that accounts for selection bias, important secular trends in the data, etc.), we show that these results vanish.

4. Yes, we “only” test online versus print, and not OA versus online for example, but the empirical flaws in the online versus print and the OA versus online literatures are fundamentally the same: the failure to properly account for selection bias. So, using the same technique in both cases should produce similar results.

5. At least in the case of economics and business titles, it is not even possible to properly test for an independent OA effect by specifically looking at OA journals in these fields since there are almost no titles that *switched* from print/online to OA (I can think of only one such title in our sample that actually permitted backfiles to be placed in an OA repository). Indeed, almost all of the OA titles in econ/business have always been OA and so no statistically meaningful before and after comparisons can be performed.

6. One alternative, in the case of cross-section type data, is to construct field experiments in which articles are randomly assigned OA status (e.g. Davis (2008) employs this approach and reports no OA benefit).

7. Another option is to examine articles before and after they were placed in OA repositories, so that the likely selection bias effects, important secular trends, etc. can be accounted for (or in economics jargon, “differenced out”). Evans and Reimer’s attempt to do this in their 2009 paper but only meet part of the econometric challenge.

Leave a Comment