Are US Taxpayers the Primary Beneficiaries of the NIH Public Access Policy?

The US National Institutes of Health (NIH) began implementing the NIH Public Access Policy in 2008 as part of an effort to ensure that US taxpayers (“the public” as relates to the US government) have access to final papers emanating from NIH-funded research. A long-standing justification for PubMed Central (PMC) and public access policies in general has been “taxpayers paid for the research, so they should have access to the results.” Despite many controversial aspects to this statement, the basic tenet continues to affect governmental policy toward scientific publishing. The justification was used as recently as 2012 in the White House petition that helped move OSTP to write its public access memorandum. The Obama Administration specifically states in its response to this petition that:

. . . citizens deserve easy access to the results of research their tax dollars have paid for. . . . this research was funded by taxpayer dollars. Americans should have easy access to the results of research they help support.

By basing the justification for access on payment of taxes, there is an implicit quid pro quo — that is, if you paid US taxes, you have a right to research emanating from the taxes you paid. Conversely, if you did not . . .

I wrote an April Fool’s post about this logical construct last year, which in retrospect seems to confirm that within any joke there is an element of truth. Since then, some have pointed out that the RCUK OA policies create a one-way subsidy to the world on the backs of UK taxpayers. And documents recently acquired via my ongoing Freedom of Information Act (FOIA) requests show that taxpayer subsidy has been a sensitive issue within the NIH, and that it has not been settled.

The NIH correspondence begins in January 2012. The inciting email was not provided, but the emails written as the NIH, National Library of Medicine (NLM), and National Center for Biotechnology Information (NCBI) contemplated how to respond show enough to conclude it was not a friendly inquiry.

In an email dated January 18, 2012, Dennis Benson, an NIH employee, writes to David Lipman, Ed Sequeira, and Eric Sayers (all NIH employees), with a cc to Kent Smith, an omnipresent contractor. The email was sent at 12:24 a.m.

David,

I agree it’s best to avoid arcane defenses and simply admit the inescapable fact — the internet is international and usage is inevitably going to reflect the world’s internet population.

In terms of internet users, the U.S. accounts for 11.6%. So, the US is far more heavily represented (by a factor of 4) in PMC usage than its internet population would predict.

That said, the important point may be to show the incremental cost of supporting users from other countries is virtually nil. We could argue that PMC’s computing infrastructure has been designed to meet U.S. peak load. And PMC usage shows a strong non-uniform usage pattern. For the US there is a slow increase through the morning, peaking around noon to 3pm. Because of the time offset — six hours for Europe, ~12 hours for Asia — their loads (also showing a non-uniform distribution) occur when US loads are off-peak. Therefore, non-US usage is simply filling the gaps of US usage.

I would suggest trying to avoid an actual dollar figure as to what the non-US cost actually is. Once you admit to some number, you’ve made their argument, ie, taxpayer money is supporting a non-US audience. I think you’re on stronger ground with the filling in the gaps argument. If they as, well, what if the non-US traffic continues to grow faster than US (which it definitely is doing), the response could be that load balancing hardware could penalize non-US addresses and give preferential treatment to US users. Ugly, but possible. If forced to quantify the non-US cost, you could resort to percentages. Overall, we’ve given PMC a $4M pricetag. It could be said that the incremental cost for non-US users is less than 1% — the cost of a few servers and disk drives.

Dennis

Lipman responded at 7:04 a.m. the same day:

Totally agree — but it is worth knowing how many pmc servers we have and their cost.

Thanks.

It’s a fascinating starting point for the exchange that follows, as the response to an unseen inquiry reveals the assertion that must have been made — that a US taxpayer-funded initiative meant to benefit US taxpayers is instead benefiting non-US citizens — while also showing how much effort is made to create a palatable response.

There is also the math involved. As of late 2011 (this was written in January 2012, so stats from 2011 would be all they had to guide them), US traffic to PMC accounted for about 44% of the visits or views — which of these two traffic metrics they were using is not clear. We also don’t know the rate of growth in the rest-of-world (ROW) traffic patterns, so we can’t project forward with precision into 2013. But I can make an informed guess. Statistics given in May 2013 by Mary Meeker show that China and India are growing at 10% and 26% in Internet users respectively, while the US is growing at 3%, and that each have added millions of users while the US total has remained flat compared to figures used by NIH in the email above. Extrapolating from those figure it seems safe to say that now the US traffic is 40% or less of PMC traffic, and ROW is 60% or more. The numbers may actually be closer to 35/65 by mid-2013. Anyone from NIH, NLM, or NCBI is welcome to correct these speculative figures, in the spirit of transparency and accountability.

This phenomenon was also noted in a memo stamped December 16, 2011 (just prior to the email exchange started above). The memo is from Francis Collins, Director of the NIH, to Congressman Joseph R. Pitts, Chairman of the House Subcommittee on Health. It’s a nine-page memo, covering various questions raised as concerns about the NIH Public Access Policy and PMC. Each question is actually not a question, but a statement or an assertion. The points most salient to these issues come from so-called Question 3, which asserts, “The Public Access Policy promotes American competitiveness.” The arguments are as follows:

PMC makes papers funded by many sources freely available
It protects against piracy in the usual ways, so international availability shouldn’t be a problem for publishers
As the NIH budget becomes smaller relative to other countries’ research budgets, we need free information ourselves, so ultimately hosting free articles from other countries helps everyone, including the US

In essence, through PMC, US taxpayers are making more than US government-funded research available. Opponents could argue that his means US taxpayers are supporting a larger infrastructure than they probably should. Proponents could argue that this is probably worth it, because in exchange we get more free information supported by taxpayers or institutions in other countries.

So how much of the content of PMC is actually papers emanating from NIH-funded research? For that answer, we return to the same email string we left above. It’s still January 18, 2012, but it’s now 8:31 a.m., and Ed Sequeira is writing in response to an exhortation from David Lipman that “[w]e really do need a rough estimate of the # of non-us papers in pmc.”

The question here is: how do we define non-US papers?

The number of available (not embargoed) papers in PMC increased by 206,000 in 2010 and 226,000 in 2011 (but this is not necessarily by pub date).

In contrast, the NIH-funded papers that got into PMC — by pub date, and including embargoed articles — is 74,000 for 2010 and 69,000 for 2011, so far.

If we ignore that fact that we’re measuring somewhat different things, in the best case (2010) the NIH content in PMC is 40% (74/206).

We could look more strictly at everything in PMC published in 2010 (to eliminate most embargoes) broken down by publisher and try to classify publishers at US vs non-US. We’ll probably have to make the US/non-US split manually, but I’ll try to get the data. I suspect in this case we’ll have a much higher proportion of US content.

The discussion shifts then from NIH-funded to US-authored papers, as this seems to yield a more palatable statistic. By 3:35 p.m., Lipman is presenting a “buy one, get one free” argument:

so from this minimally we’re saying on the order of 1/3 are non-US — so on the order of 60,000 articles. So while the US taxpayer is paying for 74000 papers in PMC (the NIH fraction) they are getting from the rest of the world — 60,000 papers. And for free almost double the NIH papers.

That’s a good deal!

Sequeira responds:

Non-US is 30% of 175,000 = 52,500. Still a good deal, the way you’re looking at it.

So, the NIH portion of PMC is 40%. The US-based portion of PMC is the NIH 40% plus another 30% from non-NIH-funded, US-based authors. To the “taxpayer-funded research” question, however — which the US/non-US framework elides — PMC consists of a minority share of taxpayer-funded research. Is that a good deal?

The economic benefits of getting access to 60,000 articles are unclear. They certainly are not general benefits in any sense of the term. As noted elsewhere, the reading levels necessary to glean useful information from scientific papers have been achieved by only a small percentage of the US population, and domain-specific knowledge and context awareness present other barriers. But even the clock presents problems. Given 365 days in a year, each taxpayer in the US would have to read between 144 and 167 papers each day to derive direct value from the papers made available in this way — to get the “good deal.” The emphasis on quantity and bulk collections of articles continues a theme we see elsewhere in Gold OA publishing, where publishing more articles is how you make more money.

What this seems more likely to be feeding is the Matthew Effect, where free papers give scientific and educated elites more access at no cost while leaving those who cannot read at a college level or who have inadequate science training in the dark. Where those elites live — the US, China, India, Iran, Mexico, Brazil — is another question.

There is also a question about where those 60,000 articles are coming from. Are they coming from sources that US scientists would respect and find interesting? Or are they papers with findings that are uninteresting and outdated to scientists in one of the most advanced research societies on Earth?

There is discord between the messaging from Collins to Congress in 2011 — where these papers from the ROW are an unvarnished benefit — to the emails between Lipman, Sequeira, and Benson, where there is concern about spending US taxpayers’ funds on non-US papers and providing free access to US papers to non-US citizens. This suggests that perhaps some questions of purpose, scope, and intent are not entirely settled regarding the NIH Public Access Policy and PMC.

Related emails dissect technology costs to arrive at the conclusion that the marginal cost of providing access to non-US users is anywhere from zero to less than $100,000 per year. There is a palpable nervousness to these emails, which gradually dissipates as ways are found to plausibly lower the estimates. It’s also worth noting that since these emails were written, the US seems to have exhausted its willingness to subsidize public access, as the OSTP memorandum provides no additional funding.

What is the risk of tying access to taxpayer status? As long as the dollars are relatively small and the consequent effects on the publishing industry non-obvious, the question may not require an answer. But if PMC is competing with publishers worldwide, begins costing US taxpayers more than a few million dollars a year, and becomes a source of international controversy for some unforeseen reason, that question may require a straight answer.

With NIH-funded papers providing the minority of the content on display at PMC, and with US taxpayers providing a decreasing share of the traffic, such an answer could prove difficult for public access advocates wielding taxpayer advantage arguments.

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

10 Thoughts on "Are US Taxpayers the Primary Beneficiaries of the NIH Public Access Policy?"

As noted elsewhere (http://scholarlykitchen.sspnet.org/2013/07/16/the-price-of-posting-pubmed-central-spends-most-of-its-budget-handling-author-manuscripts/) the majority of PMC’s costs seem to come from converting manuscripts to XML. Is there any indication of the quantity of non-NIH-funded manuscripts that they’re providing this service for? That might get more at the real costs involved here.

By David Crotty
Sep 5, 2013, 8:24 AM

This is a good question. Unfortunately, I don’t have any information that points to a clear answer. Perhaps our friends who are paid by us taxpayers (the NLM/NIH employees) could provide an answer?

By Kent Anderson
Sep 5, 2013, 9:54 AM

Its my understanding the NIH will only converts NIH funded manuscripts. When there are non-NIH funded manuscripts in PMC, it would most likely be because a journal has chosen to be archived in PMC and thus all there content is in there, NIH funded or not. In those cases, the journal/publisher does the conversion to XML, not the NIH.

By Sandy DG
Sep 6, 2013, 3:23 PM

Very interesting, thanks. How do you think GenBank relates to this discussion and does the provision of equivalent resources (ENA, DDBJ) by other blocs affect that?

By Chris Taylor (@chrisftaylor)
Sep 5, 2013, 9:23 AM

If you want to go down this route keep in mind so could government funding agencies around the world and we would all end up loosing. They access our research we access their research and everyone is better off.

By David Solomon
Sep 5, 2013, 11:11 AM

Yes, it is important to keep in mind that others could go down this path. Some British taxpayers have already made this point about the RCUK mandates, which now look more out of step since the OSTP public access memorandum, which does not provide any more taxpayer support.

The scenario described in your last sentence can also exist without PMC. It would just have a different economic basis, one not so dependent on national tax systems.

By Kent Anderson
Sep 5, 2013, 11:17 AM

This statement stood out to me: “It protects against piracy in the usual ways, so international availability shouldn’t be a problem for publishers.”

In fact, PMC is a great source for plagiarists. I frequently find plagiarism in predatory journals, and the copied text often is lifted right out of PMC. So, I think that piracy is indeed a problem for (legitimate) publishers.

By Jeffrey Beall
Sep 5, 2013, 11:49 AM

The statements about this in the memorandum I received were more about piracy as defined as mass downloading. I knew it was a bit of hand-waving when I saw it, as it usually is. A smart pirate downloads one copy onto another server, then goes wild. And “smart plagiarist” is an oxymoron.

By Kent Anderson
Sep 5, 2013, 12:03 PM

The pirate downloads is a hidden complication of it all. Without the XML formats available all we do is download 25 articles in search of a good study, which anyways could still be without open access. We had this question from one editor as well whether we as publisher can make a specific journal open access to only Indian scholars and paid for ROW. But is it practical.

By Arun, Publisher India
Sep 9, 2013, 3:19 AM

The Scholarly Kitchen

Are US Taxpayers the Primary Beneficiaries of the NIH Public Access Policy?

Announcing Our 2026 New Directions Seminar: “What Is a Journal in 2030?”

Kent Anderson

Related Articles:

Next Article: