A recent study, out as a preprint, offers something of a muddled bag of methodological choices and compromises, but presents several surprising data points, namely that voluntary publisher efforts may be providing broader access to the literature than Gold or Green open access (OA), and some confounding shifts in claims of an open access citation advantage.
“The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles” is posted as a preprint, and like all preprints, it has not yet gone through peer review and must be viewed with some level of skepticism. It suffers from a variety of issues which will hopefully be addressed via a rigorous peer review process should the article be submitted for formal publication.
As written, the paper veers too far into advocacy. One of the key jobs of a journal editor is to insist that authors stick to the data, rather than voicing opinions and letting their political and social views, or even their theoretical suppositions, color the factual reporting of research results. Here the authors do things like make suggestions to libraries about how they should allocate future budgets, which would be better suited to an article labeled as an “editorial”.
The preprint contains scant information on their statistical methods. It’s unclear whether the authors controlled for article properties like age, paper type, and journal type, etc. Some of the biggest problems come from the authors’ characterizations of papers, and in particular the definition of “open access.” A significant portion of the introduction of the paper discusses the many variations and confusions around the term “open access.” Here they create a novel definition for OA, which does not follow that of the more commonly cited Budapest Open Access Initiative (BOAI), instead declaring everything that can be read for free, regardless of reuse terms, to be OA.
But not everything that can be read for free — papers that can be found on academic social networks like Mendeley and ResearchGate are excluded from the study because of concerns over ethics (many of the posted papers violate copyright) and over reliability, as availability of papers on these networks may be fleeting. This seems an odd distinction, given that many of the papers they do include in the study could be similarly characterized (Do all repositories check all deposits for copyright compliance? Availability of articles made free on publisher sites for promotional purposes may be similarly fleeting.) I suspect the real reason for leaving out scholarly collaboration networks is a more practical one — due to their proprietary and closed nature, the authors didn’t have a good means of indexing their content.
Further, where many articles fall into multiple categories, the authors here assign each to a single category. Many journals provide free access to the articles they deposit in PubMed Central for example. In this study, such a paper would only count as the version in the journal, and all other copies in repositories are ignored. A paper in a Gold OA journal may be deposited in hundreds of repositories, but those are not counted in this analysis. This makes it difficult to attribute any effects to any particular factor, because you don’t know which version is responsible (or even whether there are indeed multiple versions playing a role). A better way to do the analysis would have been using the availability for each designation as an indicator variable. In that way you can measure the effect of each designation on the performance of the paper.
The authors invent a new flavor of OA, “Bronze,” which they describe as, “articles made free-to-read on the publisher website, without an explicit open license.” I’m not sure this helps do much other than add yet another confusing term to the pile. For years we have referred to freely available copies without getting into any particular license requirements as “public access”, and the US government, in their policy statement, uses this term to avoid such confusion.
With all those caveats in mind, there are two data points that caught my eye. We know that in recent years, many publishers have de-emphasized archive sales in favor of focusing on current subscriptions and Gold OA. Many mission-driven publishers have also increased their efforts to make more papers freely available, driving access as broadly as possible, hopefully without harming their subscription base. And so older papers are frequently made publicly accessible.
The extent of these efforts becomes evident in the data collected.
Bronze OA, or rather papers voluntarily made publicly accessible by publishers outnumber every category of OA:
Bronze is the most common OA subtype in all the samples, which is particularly interesting given that few studies have highlighted its role.
Several publishers, the study points out, are notable in that more than 50% of the articles they publish are freely available. This includes the Nature Publishing group (inexplicably separated out from Springer Nature), IOP Publishing, the American Physical Society, and Oxford University Press (full disclosure, my employer).
This has largely gone unnoticed, or at least uncommented upon in policy and advocacy circles. Publishers are responding to research community and funder desires, and are voluntarily opening up access to enormous swathes of the literature. This is all being done at no cost to the research community, and perhaps deserves greater attention.
The other interesting point seen from the data is about the alleged Open Access Citation Advantage (OACA). At this time, the argument is really moot. Given the existence of Sci-Hub, every single article published is freely available. There is no longer a corpus of articles that one can claim is strictly behind a paywall. This essentially abolishes the control group and removes most of the relevance from the question being asked, as everything is freely available.
That said, again, this study presents unexpected results. Of the OA classifications used in this study, citation performance ranked (highest to lowest) as follows: Green, Hybrid, Bronze, Closed Access, and last Gold, which surprisingly offers a citation disadvantage. Far too much can be read into these results, and new confounding factors muddy the picture even further. Here the refusal to count articles that fall into more than one category makes it impossible to attribute results to any particular factor. Many Hybrid, Bronze and Gold articles are also Green articles, yet these are excluded from the Green measurement offered.
Green OA articles (those found deposited in repositories) presumably favor authors with funding and subsequent deposit requirements, as well as authors who have or intend to remain in research over the long term. Much deposit in repositories is driven by mandates, and those not building a long term career in research are less likely to respond to pressure to follow those mandates. Authors may also be more willing to deposit their stronger papers in public repositories than their lesser works. Whether these factors are playing any role in the papers’ citation performance cannot be ruled out.
Hybrid journals have been argued to offer higher quality publications than fully-OA journals by some measures, and this study suggests they are a more effective means of driving future research (at least as measured by citations), this despite efforts to de-fund the use of hybrid journals or penalize authors for choosing this path. Bronze performance is not surprising, in that many journals make highlighted articles freely available, freeing up access to carefully selected papers for marketing purposes. Again, since both of these categories include Green OA papers found in repositories, the causation behind their citation performance remains unknowable.
But what to make of Gold OA coming in last, even behind articles that remain accessible only to subscribers? Clearly the notion that one can simply buy citations by paying an OA fee can now be dismissed as untrue. The authors suggest that the Gold category is becoming watered down by low quality publications:
Interestingly, Gold articles are actually cited less , likely due to an increase in the number of smaller, less prestigious OA journals (Archambault et al., 2013), as well as continued growth of so-called ‘mega journals’ such as PLOS ONE (PLOS, n.d.).
So while I’m hesitant to put much weight behind the paper’s conclusions, there are a few data points noted that warrant further investigation. There’s too much goalpost shifting that’s gone on in this paper, whether due to experimental convenience or desire to reach certain conclusions to solidly rely on what’s being offered here.
But if these data points hold up, particularly the state of voluntary efforts by publishers to broaden access to the literature, it may be time for a new narrative to emerge, one that talks about the collective efforts of all parties rather than pointing fingers at enemies or villains.