If you were privy to attend the UK Parliament’s Business, Innovation and Skills Committee hearing on open access, or spent the good part of your morning watching the archived video, you’ll note how economic and scientific studies were used to further one’s position and to undermine the position of another.
You’ll also note that anyone with a PhD in the hearing was addressed as “doctor” whether or not they were acting in the role of scientist, activist, politician or industry representative. Somehow, completing graduate school conferred lifetime authority to any person speaking on any topic.
Beyond the dramaturgy of UK Parliamentary theatre, I became intrigued in how the context of nuanced studies were stripped away to make simple declarative findings, and how complexity in science was used to make negative categorical statements, like “there is absolutely no evidence that…”
For this post, I’m going to focus on a single study, PEER (Publishing and the Ecology of European Research), and how this study, and its findings on article downloads, was retold in the first hearing.
Cameron Neylon, Director of Advocacy for PLoS, used the PEER study to argue that open repositories posed little risk to publishers (video time: 10:27:25). Indeed, archiving actually drove readership back to the publisher.
The PEER study was a large study of commercial publishers that deposited manuscripts of different ages into four institutional repositories. It was a randomized controlled trial that made half of the papers “visible” and the other half “hidden.” Hidden, in this case, meant unavailable for three months from one institutional repository — the reader was provided a link to the same paper located at another repository.
If you’ve read through many of the PEER reports, getting this study running was no small ordeal, and the organizers of the study ran into trouble both getting authors to deposit their own material (the vast majority of authors declined the invitation to self-archive) and getting materials into these systems. What was going into the archive were the raw author manuscripts in PDF form embargoed from anywhere between six months to two years.
The results of the usage analysis showed that publishers did receive more article downloads from their journal websites when manuscripts were freely available from the institutional repositories. But when you look at the data tables, the story is not so clear.
The mean number of publisher downloads recorded from the “visible” group was 17.1 article downloads and 15.3 article downloads from the “hidden” group — a difference of 1.8 article downloads over three months. Not a big difference in practical terms, but a statistically significant one.
However, the mean number of manuscript downloads from the repositories was just 1.77 for the “visible” group and 1.86 for the “hidden” group — a difference of just one-tenth of one download over three months. I don’t really understand how the “hidden” group could have downloads, but if you follow the logic, these very small number of manuscript downloads resulted in 10-times as many downloads to the publishers’ sites.
Now, the authors of the usage study, members of CIBER Research, describe how the metadata deposited along with the article manuscript may be the cause of the increase in downloads at the publishers’ sites, but more research is needed to understand the process — it always is. It could have been the result of indexing robots, which were not excluded from the download counts.
The complexity of the PEER study, and its surprising results, were obviously a source of concern of the authors of the analysis, who discuss its limitations, in detail, much later in the report. They write:
Action research in an environment as complex as the scholarly web is fraught with difficulties and caution must be applied to the findings of the study. We absolutely must not generalise from the findings here to green open access more generally since PEER has a number of characteristics that taken together make it unique. (p.20)
In other words, we should understand this study was so completely unique that the authors find it impossible to make inferences to any form of real-life archiving. The meaning of PEER floats adrift in a sea of ambiguity, unmoored to any plausible future of open access.
Is there nothing that can be learned from the PEER study, other than how to spend millions of dollars and years of publishers’ time? To me, what this study says is that access to final author manuscripts (in their raw form) through institutional repositories with embargoes poses little risk to publishers. Secondly, PEER cannot be generalized to large, systematic archiving of published, formatted articles in central subject repositories, such as the recently published study of PubMed Central.
The reason why this message is lost on readers is because the authors of the PEER study never made it in the first place, and buried the context of their results deep into the report. The great weakness of their report isn’t in the analysis of the data, but in the failure of the authors to frame the research in a clear and coherent message. As a result, open access advocates have snatched this report out the sea and framed in their own terms. Or, as a high ranking STM publisher confided in me last week, “they have taken our study and turned it against us.”
At heart, this argument over archiving has nothing to do with access and everything to do with counting. Publishers want to be recognized and rewarded for the value they add to the scholarly communication process, irrespective of their business model. Those working for PLoS want to see how PLoS articles are being used just as much as any subscription publisher, and for this to happen, the numbers generated by repositories have to be conveyed back to publishers, authors, their funders, and their institutions.
This necessarily increases the administrative burden on repositories; however, if repositories wish to compete as publishers, they have to start operating like one.