When journal articles are made freely available from PubMed Central (PMC) — a government-run digital archive of biomedical articles — readership counts drop at journal websites, a new study reports. The decline in readership  affects both full-text and the final PDF versions of the article and appears to be growing over time.

The article, Public Accessibility of Biomedical Articles from PubMed Central Reduces Journal Readership was published online in The FASEB Journal on April 3, 2013.

Jupiter and moons.
Jupiter and its four largest moons.

In this study, I was able to expand upon a previous analysis of physiology articles to include over 13,000 articles published in 14 society-run biomedical journals in nutrition, experimental biology, physiology, and radiology.

The study compared the performance (as measured by full-text and PDF downloads) between articles deposited in PMC and made freely available 12 months after publication with articles that remained accessible from the journal site. All journals included in the study also provide free access to their articles 12 months after publication, so access status was not a factor in the study — I was comparing free journal access with free journal access plus free PMC access.

Controlling for differences in their performance within the first 12 months of publication, when all articles were accessible only by subscription, those articles made freely available from PMC after the 12-month embargo experienced a reduction of 21% full-text HTML downloads in their second year of publication. Estimates for the reduction in full text article downloads were as high as 26% for some journals.

And while the journals only deposited the full-text (XML) into the archive and not the final PDFs, free accessibility from PMC also reduced PDF downloads from the journal websites by 14%, on average.

Journal estimates (±95% C.I.) measuring the effect of article availability from PubMed Central on (a) full text (HTML) downloads and (b) PDF downloads from the journal website.
Journal estimates (±95% C.I.) measuring the effect of article availability from PubMed Central on (a) full text (HTML) downloads and (b) PDF downloads from the journal website.

Clearly, PMC is having an effect on journal-side traffic even when the publisher attempts to give their content away for free. The study also suggests that PMC’s “printer-friendly” PDF rendering of a full-text article may provide a viable substitute to the publisher’s PDF for many readers of the scientific literature.

If you use PubMed (the index, not the archive), you’ll notice that its search results give preferential visibility to the PMC copy over a link to the journal website even if the latter is available free to readers. Keeping readers within the PubMed literature domain appears to be a design strategy of the PubMed universe.

PMC is now the largest repository of free biomedical literature, populated, in large part, by articles deposited by publishers on behalf of their authors. It is not surprising that as PMC gets larger, its gravitational effect over the journals (moons) that populate it is only getting stronger. The effect of PMC on articles deposited in 2012 is much greater than articles deposited in 2009.

While PMC may be providing complementary access to readers traditionally underserved by scientific journals, the loss of article readership from the journal website may weaken the ability of the journal to build communities of interest around research papers, impede the communication of news and events to scientific society members and journal readers, and reduce the perceived value of the journal to institutional subscribers.

Some readers may argue that none of this matters for full open access journals, but I think they are missing the point of the study, which is about the organizing powers of journals, editors, and learned societies, and their abilities to build communities of discourse around scientific findings. If this were just about finding the most efficient mechanism to reproduce and distribute information, we already invented that decades ago. It’s called the Internet.

Enhanced by Zemanta
Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/

Discussion

20 Thoughts on "PubMed Central Reduces Publisher Traffic, Study Shows"

An important point you don’t mention here is the impact this has on advertising revenue. Less traffic means fewer ads served and a smaller audience to see those ads and hence less revenue. That’s money coming in from commercial sources, outside of the research community, which can cover costs and relieve at least some of the financial burden placed on the researcher and institute, either via APC or subscription. It also brings much needed money into societies to fund programs on behalf of researchers. So for some journals, having a redundant free copy of articles in PMC is actually costing the research community.

I explored this and other financial implications earlier, after Phil had published a preliminary data set hinting at this effect. My estimate at the time is that even OA journals like those from PLoS are losing $70K worth of page impressions, assuming a consistent sell-through rate. Every online publisher depends on traffic for numerous commercialization approaches — site licenses, advertising, brand (which ultimately bears on APCs), etc.

This is a study with implications worthy of sober reflection.

I do not think PMC was created to be in competition with content creators. If that premise is correct then, after the 12 month embargo, perhaps an article should be in repository but not accessible except for providing an abstract and pass through to the creator of the content.

PubMed used to offer exactly this feature: it was called Publink and linked a searchable full-text dark archive at NLM back to the publisher’s version of an article. NLM have never explained why they abandoned this, but if as you say PMC was not intended to compete with publishers, then why did they do so…?

Other actions by PMC have shown they are competing, including the infamous search interface which puts the PubMed Central version of an article in the results list, but relegates the publisher’s version one click deeper in (against the abstract). That was an intentional design, and probably contributes to attrition from publishers’ sites while driving traffic to PMC.

Let’s suppose, for the sake of argument, that Harvey’s contention is correct — that PMC’s creators never intended it to compete with content creators. Does that actually matter? The marketplace is not impacted by intentions, but by actions. The consequences of those actions are what matter, whether intended or not.

In fact, if they never intended it, they may be more worried about unintended consequences, and therefore more responsive to data like these.

The articles are being read, then shared through online communities such as ResearchGate. The communities of interest are as strong as ever only now they are not bounded by specific publications or publishers but can span all.

Google accomplished the same thing, while driving traffic to publishers, not taking it away. There is a fundamental trade-off here — siphoning off traffic, which online publishers need to survive. Mendeley, ResearchGate, PMC — it’s starting to look like we’re headed toward death by a thousand cuts.

I think that’s an important question in the interlinked, internet age: Is it necessary to have the paper hosted and read at the same location as the community discussion or bookmarking of that paper? Is there a functional difference between linking to a version in the journal and linking to a locally hosted version? Is it better for the research community to have the revenue driven by that traffic going to the private owners of a company like Mendeley, or staying in the research community and going to societies and university presses?

And what about libraries? If users at the libraries are going to the PubMed version, then that is reducing traffic to the purchased version and therefore will negatively impact usage stats for the content and increase library cancellations.

Interesting article but there seems to be one statistic missing, the total accesses; journal web site plus PMC for the articles with NIH funding. Does placing a copy in PMC increase overall access or just shift some from the journal site to PMC?

If a publisher has a real problem with this, they are under no obligation to upload the published version to PMC. They can leave the onus on the author to deposit the submitted version. I doubt there are any data but it would be interesting to see what if any impact access to the submitted version on PMC had on the journal’s traffic to the published version.

Thanks David. PubMed Central and HighWire count article downloads differently–the HighWire follows COUNTER standards for counting and reporting article downloads and weeds out robot and systematic bulk downloads; PMC does not. So adding the downloads together would not help. Ideally, having IP-level data would allow me to estimate the reader communities for each site and see if they overlap, and if so, how much. However, the figures and tables from my study clearly show that competition is going on for the same reader population. In the case where the publisher is providing free access to journal articles within the embargo period, PMC could put the deposited article in a dark archive and point the reader back to the journal. PMC would have the assurances that the article was archived, use it for indexing purposes, but send the reader to the journal that published the article.

What was the most popular article in 2011 or 2012? How may downloads did that article receive? I have an impression – which could be wrong headed – that the number of folks actually reading articles is rather small. If that is so, then one not going to the originator’s site to read an article has great impact on the creator.

The median performance of articles (fulltext HTML downloads) in the dataset in their first year of publication was 178 (Interquartile Range 112 to 275; max 17264). In their second year of publication, it was 247 (IQR 166 to 378; max 12790). In general, these articles get some good use. Figure 1 in my article plots their performance by month. You’ll also note from the regression output tables (2 and 3) that PMC exerts a greater effect on highly performing articles, as measured by their baseline performance in year 1.

I hate to say it but actions do speak louder than words. That being the case in reference to PMC, then why isn’t the S&T publishing community speaking up? Is it because the audience is so small for the vast number of articles that it just isn’t worth the bother? Or is it because the audience is so small that the publishers don’t want to draw attention to that fact?

While I agree that the numbers indicate a problem for publishers, scientific organisations and their traditional revenue, it is not really clear, whether this is a problem for scientific community building or not. The download numbers are a poor proxy for that, as they do not reflect community interactions or influence of certain hubs.
How are the mentioned functions of a journal beyond content delivery affected ? A simple approximation would be to measure how the access to other parts of the publisher website apart from downloading was affected.
How have perceived value of a journal/scientific organisation or the (alt)metrics changed ?
Has the social graph of the researchers (collaboration/citation/co-authorship) changed ?
Which channels do researchers actually use to be at the height of the discussion in their field and to get information on community events ?

Comments are closed.