Editor’s Note: It’s been a busy month for US federal funding agencies, as many have announced their OSTP-required public access plans. While you can expect a post summarizing the plans in the near future, one thing that I think has surprised many is how opaque the process is from the outside. It’s been over two years since the initial White House memo and most of the agencies made few (and in many cases no) public statements about their plans until they were finally announced. An enormous amount of negotiation, planning and balancing of competing agendas went into each plan, but even those involved in what went on behind the scenes are not at liberty to disclose any of it.
Given the vagueness of most of the plans, it is assumed that the agencies have more in mind than what they’re saying, but what that “more” will turn out to be is unknown. It’s worth noting that we know very little about how the longest existing public access mechanism, PubMed Central (PMC) works. Back in 2013, our former blogger Kent Anderson filed Freedom of Information Act (FOIA) requests with the National Library of Medicine to get a glimpse behind the curtain. Here we revisit one of the more revelatory posts Kent wrote about the results of those FOIA requests, one of the rare pieces of public information available on what PMC costs and where those millions of taxpayer dollars are spent.
Agencies that have chosen PMC as their initial route to public access will fall under the same system described below, particularly since unlike the NIH, they’ll be relying primarily on author deposits of manuscripts rather than having them automatically deposited by publishers. Original post by Kent Anderson, July 16, 2013:
PubMed Central (PMC) costs US taxpayers about $4.45 million per year to run, according to documents recently obtained by an ongoing Freedom of Information Act (FOIA) request.
Surprisingly, most of the money is spent converting author manuscripts into online publications.
Over the past decade, speculation has been the best anyone could attempt, owing to a consistent lack of responses to budget information requests made to PMC staff and leadership. These new FOIA-obtained communications represent the first time we’ve seen actual figures about PMC’s expenditures. Judging from emails and spreadsheets recently obtained, PMC may have been preparing to reveal its expenditure level, but might also have been looking to low-ball the figure by 10-12%.
Not surprisingly, the bulk of the PMC budget is devoted to outside contractors — this has long been believed to be the case. Of the $4.45 million budget, it appears PMC spends between $3.5 million and $4 million on outside contractors — these figures are a little hard to nail down.
As stated earlier, most of the money spent by PMC ($2.7 million of the entire $4.45 million budget) is spent converting author manuscripts into XML and providing QA for these. Put another way, the deposit of author manuscripts as a source of open access (OA) content costs US taxpayers an additional $2.7 million per year.
It is clear from the enormous effort and expense PMC puts into conversion and editing that author-deposited manuscripts are not adequate on their own.
These author manuscripts (53,818 deposited in 2012, based on parameter searching on the PMC site) accounted for less than 20% of the materials posted to PMC that year (272,409 articles found via search), yet consumed 60% of the expenses. And with a recent push for more compliance, this amount seems poised to double.
In an email dated February 16, 2012, between Ed Sequeira and Kent Smith, these expenses were being pored over, potentially as part of preparations to finally announce PMC’s expenses. In a document labeled “FY 2012 ANNUAL COSTS FOR PUBLIC ACCESS,” the expenses were broken out to some extent. Instead of trying to reproduce the table, I’ve scanned the sheet in, and you can view it here. Essentially, it shows that personnel costs (some Federal employees but mostly contractors) consume $1.2 million of the budget. Manuscript tagging and QA consumes $2.25 mllion, with about $500,000 additional expense coming from overheads. In total, manuscript tagging and QA is given a bottom line figure of just over $2.7 million of the $4.45 million budget.
Smith is Kent A. Smith, a former Deputy Director of the NLM, who in 2003 started KAS Enterprises, LLC, and then departed NLM and NIH after 35 years to run KAS Enterprises in 2004. “KAS” apparently comes from his initials. According to Sequira, Smith works as a part-time consultant to the National Center for Biotechnology Information (NCBI).
In the email from February 16, 2012, Smith attached the document I’ve scanned to the following message:
The “John” above is John Mullican, a program analyst at NCBI.
Sequeira responded a couple of hours later, apparently after giving the calculator a little exercise (NIHMS is the NIH Manuscript Submission system; NIHPA is the NIH Public Access policy):
I confirmed via email that PTFS/DCL only deal with author manuscripts. As Sequeira wrote in his email reply to some questions I asked:
NCBI’s contracts with PTFS and DCL are for XML tagging and QA of the author manuscripts deposited in the NIHMS (NIH manuscript submission) system under the NIH public access policy. They’re not involved in QA of the XML deposited in PMC by journals with participation agreements.
It wasn’t supposed to be this way, as indicated by a budget spreadsheet from 2009. In that spreadsheet, the cost of article tagging and QA in 2009 was pegged to be between $1.5 million and $2.6 million, in a low, middle, and high set of budget scenarios (it seems to have tended toward the high scenario). Planing for the years of 2010-2013, these costs were supposed to fall from $2.3 million in 2010 to $997,500 in 2013. However, as shown above, these cost control plans did not come to fruition.
In fact, PMC may be about to find its expenses exploding, if a recent Nature News article is correct. The NIH’s stricter enforcement of author deposit rules has apparently increased the number of author manuscripts on deposit from what Richard Van Noorden estimates to be 5,100 per month (these emails show that it’s more like 4,800 per month) to about 10,000 per month. At $47 per article for tagging and QA, that doubles the largest part of PMC’s budget, and will cause it to balloon from $2.7 million to $5.6 million. PTFS and DCL will be thrilled, but PMC’s budget will then be nearly all devoted to managing these manuscripts.
This makes it clear that just posting an author’s manuscript in an open repository isn’t sufficient. Turning it into a useful resource costs money. In PMC’s case, it’s $47-50 per manuscript. We’ll have to see if the similar approach in the UK creates a similar expense problem. Will anyone tell us?
The rationale for publishing peer-reviewed author manuscripts has always been a little elusive. Now we know that doing so is also expensive.