Editor’s Note: It’s been a busy month for US federal funding agencies, as many have announced their OSTP-required public access plans. While you can expect a post summarizing the plans in the near future, one thing that I think has surprised many is how opaque the process is from the outside. It’s been over two years since the initial White House memo and most of the agencies made few (and in many cases no) public statements about their plans until they were finally announced. An enormous amount of negotiation, planning and balancing of competing agendas went into each plan, but even those involved in what went on behind the scenes are not at liberty to disclose any of it.

most of pie missing

Given the vagueness of most of the plans, it is assumed that the agencies have more in mind than what they’re saying, but what that “more” will turn out to be is unknown. It’s worth noting that we know very little about how the longest existing public access mechanism, PubMed Central (PMC) works. Back in 2013, our former blogger Kent Anderson filed Freedom of Information Act (FOIA) requests with the National Library of Medicine to get a glimpse behind the curtain. Here we revisit one of the more revelatory posts Kent wrote about the results of those FOIA requests, one of the rare pieces of public information available on what PMC costs and where those millions of taxpayer dollars are spent.

Agencies that have chosen PMC as their initial route to public access will fall under the same system described below, particularly since unlike the NIH, they’ll be relying primarily on author deposits of manuscripts rather than having them automatically deposited by publishers. Original post by Kent Anderson, July 16, 2013:

PubMed Central (PMC) costs US taxpayers about $4.45 million per year to run, according to documents recently obtained by an ongoing Freedom of Information Act (FOIA) request.

Surprisingly, most of the money is spent converting author manuscripts into online publications.

Over the past decade, speculation has been the best anyone could attempt, owing to a consistent lack of responses to budget information requests made to PMC staff and leadership. These new FOIA-obtained communications represent the first time we’ve seen actual figures about PMC’s expenditures. Judging from emails and spreadsheets recently obtained, PMC may have been preparing to reveal its expenditure level, but might also have been looking to low-ball the figure by 10-12%.

Not surprisingly, the bulk of the PMC budget is devoted to outside contractors — this has long been believed to be the case. Of the $4.45 million budget, it appears PMC spends between $3.5 million and $4 million on outside contractors — these figures are a little hard to nail down.

As stated earlier, most of the money spent by PMC ($2.7 million of the entire $4.45 million budget) is spent converting author manuscripts into XML and providing QA for these. Put another way, the deposit of author manuscripts as a source of open access (OA) content costs US taxpayers an additional $2.7 million per year.

It is clear from the enormous effort and expense PMC puts into conversion and editing that author-deposited manuscripts are not adequate on their own.

These author manuscripts (53,818 deposited in 2012, based on parameter searching on the PMC site) accounted for less than 20% of the materials posted to PMC that year (272,409 articles found via search), yet consumed 60% of the expenses. And with a recent push for more compliance, this amount seems poised to double.

In an email dated February 16, 2012, between Ed Sequeira and Kent Smith, these expenses were being pored over, potentially as part of preparations to finally announce PMC’s expenses. In a document labeled “FY 2012 ANNUAL COSTS FOR PUBLIC ACCESS,” the expenses were broken out to some extent. Instead of trying to reproduce the table, I’ve scanned the sheet in, and you can view it here. Essentially, it shows that personnel costs (some Federal employees but mostly contractors) consume $1.2 million of the budget. Manuscript tagging and QA consumes $2.25 mllion, with about $500,000 additional expense coming from overheads. In total, manuscript tagging and QA is given a bottom line figure of just over $2.7 million of the $4.45 million budget.

Smith is Kent A. Smith, a former Deputy Director of the NLM, who in 2003 started KAS Enterprises, LLC, and then departed NLM and NIH after 35 years to run KAS Enterprises in 2004. “KAS” apparently comes from his initials. According to Sequira, Smith works as a part-time consultant to the National Center for Biotechnology Information (NCBI).

In the email from February 16, 2012, Smith attached the document I’ve scanned to the following message:

Ed–Take a look at this and then give me a ring. The manuscript tagging number is what we need to talk about as there is a difference between using the 40-60 formula and the actual contract costs. Under the % basis I am using here $47 per article. John and I looked at this yesterday and based the number on a sampling of a few months billings. It consists on the average of about $34-35 per tagged article plus $10-11 for Q/A plus administrative fees of $2-3, where applicable.

Then there is the issue you brought up before about some articles that are in limbo (author hasn’t responded etc.) and then there are those before 2009 and I don’t know to what extent we have included those costs appropriately.

The “John” above is John Mullican, a program analyst at NCBI.

Sequeira responded a couple of hours later, apparently after giving the calculator a little exercise (NIHMS is the NIH Manuscript Submission system; NIHPA is the NIH Public Access policy):


If you take the actual PTFS/DCL contract cost (2,711,852) and divide it by your per article rate (47), you get an estimated 57,700 manuscripts handled by the contractors.

Number I got earlier from the NIHMS system: Annual average for 2010/2011 of actual articles [fully processed (i.e., ready for PMC) + partially processed (i.e., waiting for author action)] = 56,900.

The two numbers are roughly equal, which is good. At least we’re consistent. Split the difference and you get 57,300.

So actually processed (57,300) is 9,300 (20%) more than what we estimate will go into PMC based on publication date (48,000).

The 20% extra can be attributed to backfilling for previous years and to articles that are in limbo.

Some possibilities for handling the 20% overage:

1. Include it in the baseline, in which case the total NIHPA cost is $4.45M.

2. List it as a footnote, similar to what you’ve done, with the explanation from above of what causes the overage. In this case we would report to the world a total NIHPA cost of $4M based on the number of NIH articles collected by publication year, but still be able to reconcile these figures with our contract expenses.

3. Since your per article cost of $47 is towards the lower end of your estimate, we could also use an article cost of $50, in combination with option #2. Then we would have a total reported cost of $4.15M and a smaller overage figure.

I have a meeting to go to in a few minutes. I’ll call you when I’m done, if you’re around.


I confirmed via email that PTFS/DCL only deal with author manuscripts. As Sequeira wrote in his email reply to some questions I asked:

NCBI’s contracts with PTFS and DCL are for XML tagging and QA of the author manuscripts deposited in the NIHMS (NIH manuscript submission) system under the NIH public access policy. They’re not involved in QA of the XML deposited in PMC by journals with participation agreements.

It wasn’t supposed to be this way, as indicated by a budget spreadsheet from 2009. In that spreadsheet, the cost of article tagging and QA in 2009 was pegged to be between $1.5 million and $2.6 million, in a low, middle, and high set of budget scenarios (it seems to have tended toward the high scenario). Planing for the years of 2010-2013, these costs were supposed to fall from $2.3 million in 2010 to $997,500 in 2013. However, as shown above, these cost control plans did not come to fruition.

In fact, PMC may be about to find its expenses exploding, if a recent Nature News article is correct. The NIH’s stricter enforcement of author deposit rules has apparently increased the number of author manuscripts on deposit from what Richard Van Noorden estimates to be 5,100 per month (these emails show that it’s more like 4,800 per month) to about 10,000 per month. At $47 per article for tagging and QA, that doubles the largest part of PMC’s budget, and will cause it to balloon from $2.7 million to $5.6 million. PTFS and DCL will be thrilled, but PMC’s budget will then be nearly all devoted to managing these manuscripts.

This makes it clear that just posting an author’s manuscript in an open repository isn’t sufficient. Turning it into a useful resource costs money. In PMC’s case, it’s $47-50 per manuscript. We’ll have to see if the similar approach in the UK creates a similar expense problem. Will anyone tell us?

The rationale for publishing peer-reviewed author manuscripts has always been a little elusive. Now we know that doing so is also expensive.

Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.


44 Thoughts on "Revisiting: The Price of Posting — PubMed Central Spends Most of Its Budget Handling Author Manuscripts"

The complexity of the PMC system is as much of an issue as the cost, especially for authors and publishers. The HHS health agencies are all going with PMC, which makes sense because they share a culture. But NASA and NIST are also trying the PMC route and that will be interesting to watch. For example, PMC approves journals, which is how Kent Anderson got involved in the first place as I recall. The journals that publish NASA and NIST funded research will mostly be strangers to PMC. Nor has PMC ever been a contractor to other agencies, to my knowledge. Many experiments lie ahead.

Thanks for revisiting this post. As an interesting aside, recently released minutes from the PubMed Central National Advisory Committee meeting in June 2014 show that only about 11% of articles deposited in PMC are author manuscripts. This means that 89% of what constitutes PMC comes from publisher deposits. (Minutes are here: http://www.ncbi.nlm.nih.gov/pmc/assets/mins-2014jun.pdf).

Most of the announced public access plans appear to be focused on collecting accepted manuscripts, because that is what the government claims to have a right to. But some agencies are just going to post the PDFs and thus avoid the XML costs.

This is an important point. NIH may be able to sustain on publisher provided author manuscripts given the bulk of content being published with the large commercial publishers. Other agencies with material published by a whole host of publishers should not count on the same level of service. We publish papers with funding predominantly from 14 different agencies–none of them NIH. I can’t imagine building out a system whereby we would provide feeds to a multitude of repositories.

By the way, I do see a significant business opportunity for journal submission systems. If they built out feeds to various locations and allowed authors to select where their accepted manuscript has to be sent with relevant metadata, this would be one way to standardize the process and make it easier on the researcher.

The cost to run PMC is but one cost in overall price of paying for this service. As PMC gets larger and hosts more material, it draws more and more readership away from journal websites, reducing their value. The study (see: http://bit.ly/1PjgVqX ) I conducted on 14 biomedical journals is now several years old and yet it revealed that even for early adopters of publisher-mediated deposits, the effect of reducing article downloads was substantial, and more importantly, growing over time. I would be interested in conducting a larger study (more journals) using different publication models over a longer period of time to see whether the effect persists, or differs between disciplines and publication models.

David you should read that blog post more carefully.It wasn’t the cost of hosting that put BLD out of business. It was the APC that BioMed Central charged for the whole range of publishing services they provide not just hosting. Marcus Banks even states he agrees with Andrew Odlyzko the costs of publishing an OA article should be about $400. Ubiquity Press provides everything a society needs to publish a journal including hosting, a managing editor, journal management system and formatting for about that price.

The Medical Library Association already pays the bulk of the cost for publishing JMLA. They publish a print version. I assume the subscription fees for the print version pays most of the cost of publishing the journal. All PubMed provides JMLA is hosting. Hell JMLA already has a journal web site. All they would have to do is upload the digital copies of the articles to their existing web site and put the links to those copies rather than PubMed’s but who cares? What difference does it make whether or not they host the copies of the articles on their web site along with the duplicates on PubMed Central?

I read the article quite carefully, and phrased my comment above quite carefully as well. You’ll note that I did not state that the journal went out of business because of the lack of hosting, merely that it lacked that advantage and went out of business.

But now that you’ve raised the point, if BLD had the option of getting free hosting from the NLM, they wouldn’t have had to rely on BioMed Central and wouldn’t have had to submit to their demands for a high APC, would they?

Well, if you want to split hairs. What point were you trying to make?

As for the services BMC provided they were far more than just hosting as you well know. You can purchase hosting from anyone of dozens of web hosting sites for less than $200 a year.

The point I was trying to make was as follows:

Also discussed here from the point of view of a competitor journal that didn’t have the same advantage and went out of business:

It seems pretty clear to me. The issue Kent raised was also discussed in a recent interview from the point of view of a similar journal that was not hosted for free by PMC and that was not sustainable. Then I provided a link to that discussion so others could read it and learn more about their situation.

And if you go back and read the post Kent linked to in the start of this thread (read “carefully” I might add if I were a bigger jerk), you’ll see that according to T. Scott Plutchak, the equivalent services for what was offered by the NLM, “were all in the $20K plus range annually.”

I do remember that but sure would like to know what services PMC provides that would cost them anywhere near 20K a year. JMLA published 21 items (articles and other publications) in the first issue this year so that’s about 84 items a year or $240 per item for setting up the table of contents and hosting an HTML and PDF version along with the citation in a couple of formats. PMC might do other things but that is all the Medical Library Association would need to do to host a version of the articles on their own web site.

If you were to go out on the market and secure a contract for hosting of a journal with sizeable archives and 80-100 articles per year, including support, maintenance, reporting, and so forth, just a basic contract, it would run $20-50K easily. If the MLA were not benefiting to a significant degree, then why do it? Clearly, free is a sufficient benefit and advantage for them to deflect criticism and persist doing it. And that isn’t fair.

Just a couple of quick notes to clarify some items in the other comments — I won’t address the fairness issue, since that’s a point where Kent and I will continue to disagree. 1) The costs of publishing the JMLA, at the time I was the editor (2000-2006) were covered by subscriptions and advertising in the print version. At the time I left, those revenue sources were steadily decreasing. The MLA Board has always insisted that they are committed to publishing the JMLA even if it no longer is self supporting. They may have reached that point, but I haven’t looked at the figures in nearly 10 years. 2) My comment about options being in the $20K ranged at the time the MLA was looking to go online referred to quotes we were getting from vendors to provide hosting/publishing services. By posting files directly to PMC rather than building a separate e-journal, MLA did without most of those services, so it’s not accurate to say that MLA got $20K worth of services. Through Allen Press, which has been the printer for MLA for many years, the files are sent to NLM which handles them in exactly the same way they handle the files from any other publisher that chooses to deposit their articles. MLA gets exactly the same services as anybody else — no more, no less. At the time we started that arrangement, NLM had no requirement that journals have an independent electronic site. They just required that the files be sent to them in the necessary format (which Allen Press was already doing on behalf of other publishers). Presumably any other journal that was within their scope could have done the same thing. Whenever Kent fumes about MLA using NLM as their publishing platform, I think about his long lists of “things publishers do.” NLM does practically none of these. I have, over the years, encouraged MLA to investigate establishing a true e-journal version of JMLA. I hope they do, not because I think it would be more fair, but because it would enable them to take advantage of the opportunities provided by being a “real” e-journal.

Here is an interesting crowdsourced version in progress:

But it is just a start and needs a lot more detail. There are several hundred pages of agency plans, with dozens of variables and no two plans are alike. Some of the big ones are not even funded yet, so we really do not know what the final policies will look like. A lot of what I predicted is coming true, alas: http://scholarlykitchen.sspnet.org/2013/02/25/confusions-in-the-ostp-oa-policy-memo-three-monsters-and-a-gorilla/.

Over a decade ago I attended a meeting run by JISC which was mainly attended by librarians. Indeed I think only one other publisher was present and he was hidden away at the back of the room. The star speaker was David Lipman and he explained that his aim at PMC was to produce enough added value to the article so that (as massaged) it would be a better version than the VOR on the publisher site with the result that researchers would go to PMC to check the “best” version rather than to the journal publisher. I wonder if this is still his aim or he has begun to learn part of what publishing involves.


Which explains a lot about why so many publishers view PMC as a “competitor” rather than as a “partner”.

As the editor of Biomedical Digital Libraries (BDL) and the person whom Richard Poynder recently interviewed, it never crossed my mind that JMLA’s arrangement with PMC was somehow a threat to BDL. And I don’t believe that, had BDL been hosted on PMC instead of BMC, we would have definitely survived. The publisher in either case would still be BMC, which raised its article processing charges (APCs) precipitously and suddenly.

David, I appreciate you linking to my interview with Richard Poynder. But framing it as “the point of view of a competitor journal that didn’t have the same advantage and went out of business” is misleading. Thanks to David Solomon for correcting the record.

It is at best a correlative statement, and taken as such rather than as causative it is accurate. I do think there’s a great deal beyond this minor point in that interview and recommend that all read it, some really interesting stuff on trying to serve the OA needs of a community that lacks the enormous funding of the STM world.

But it remains an interesting hypothetical question: if your journal had been relieved of $20-50K annually in expenses, would it have made a difference in sustainability? Why must the publisher have been BMC?

My guess is that many if not most people would interpret that particular phrasing as implying causation and not mere correlation. But onward.

I think that even if PMC hosting “saved” BMC 50K they would have raised APC amounts a great deal. Why not? OA was proving a successful business model, and of course by now BMC has long been part of Springer. So I don’t think that APCs are only about cost recovery, they are about profit.

The interview covers this in great detail, as well as this subsequent post which offered additional thoughts: http://mbanks.typepad.com/my_weblog/2015/04/the-way-forward-for-scholarly-publishing-in-the-biosciences-more-thoughts-following-the-poynder-inte.html

As to why it had to be BMC, it did not. We explored 2 other alternatives, as recounted in the interview as well.

Honestly that was not my intent–what I wanted to do was point readers to a further discussion of the issue, and perhaps dashing off a quick sentence and a link left things open to interpretation.

Perhaps the lesson learned then is about getting into bed with commercial, for-profit companies and how that limits the flexibility one has in serving one’s community. I wonder though, had you taken an independent route, if things would have turned out differently, and if so, I suspect that having free hosting would have been beneficial.

My take on the demise of BDL, which I think Marcus covers in his interview, is that librarianship is one of those fields in which few authors have the resources to pay APCs. In those disciplines I you need to have some kind of institutional or organizational support covering the costs. BDL didn’t have that. I think Marcus (and Charlie Greenberg before him) were courageous in trying to get it up and running, but they were run aground on those financial realities.

Thanks very much David, noted. I also encourage readers to check out the Poynder interview.

I can attest that the process for developing public access plans to research and data can be quite opaque to those on the inside of agencies as well. The recently released NOAA plan for public access of research results (PARR) includes an interesting remark under “Public Consultation” discussing their goal of using the CHORUS framework but that interaction with publishers and others has been limited because “NOAA has not yet been authorized to make this Plan public, and has therefore not been able to broadly solicit comments from external stakeholders.” Their timeline is also interesting, with their draft plan going to OSTP 6-months after the OSTP 2013 memo, but OSTP taking 7-months to even return comments to NOAA.

A bit ironic that the implementation of a policy encouraging greater transparency and public access to research and data is itself less than transparent. Further irony in this post describing Kent Anderson’s having to resort to FOIA requests to get open access to PubMed costs of open access.

The NOAA plan is an interesting read and hits many issues often dissected in this blog. It might itself be a good topic for a further SK analysis and post (hint hint). The complexity of the process within an organization with many different research programs and institutional customs is impressive, and this from an organization with a tradition of serving digital data and publicly accessible research.

The NOAA plan gets at Kent Anderson’s closing remark that “the rationale for publishing peer-reviewed author manuscripts has always been a little elusive.” The main (sole?) rationale for NOAA’s proposed institutional repository of the accepted, author- versions of research articles appears to be because NOAA authors may choose to publish research in journals that do not participate in CHORUS. Perhaps this belt-and-suspenders arrangement results from a circular problem: science agencies don’t know how comprehensive CHORUS will really be until it gets off the ground, and CHORUS can’t really get off the ground without science agency commitments to use it.

See the plan at http://docs.lib.noaa.gov/noaa_documents/NOAA_Research_Council/NOAA_PARR_Plan_v5.04.pdf

Thoughts from an environmental science agency foot soldier.

There is a basic reason for the agencies collecting author manuscripts instead of VORs. The government claims that it has a Federal use license to the manuscript but no right to the VOR. CHORUS has nothing to do with this basic claim. When I pressed a government lawyer he said that the Federal use right was an unwritten rule, nowhere spelled out in law.

The government claims that it has a Federal use license to the manuscript but no right to the VOR. CHORUS has nothing to do with this basic claim. When I pressed a government lawyer he said that the Federal use right was an unwritten rule, nowhere spelled out in law.

2 C.F.R. § 215.36(a):

“The recipient[*] may copyright any work that is subject to copyright and was developed, or for which ownership was purchased, under an award. The Federal awarding agency(ies) reserve a royalty-free, nonexclusive and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes, and to authorize others to do so.”

* See 2 C.F.R. § 215.2(cc).

Indeed Boris, but on its face this language only applies to articles (“work”) the writing of which is paid for under the award (“developed or purchased”). Most articles are neither, because they are written well after the research is completed and the grant contract is closed out. In Public Access the government is now claiming this license obtains merely because the article is reporting on research funded in part by the award. There is aparently no time limit on this claim. It is a huge extension of the “paid for” license claim, for which I can find no legal basis.

How does paying me to do some research create a right to everything I choose to write about it thereafter? Is “developed” the magic word? If I work for you do you own part of everything I ever write about it? Novelists would find that very interesting indeed.

The NOAA plan has an interesting confusion. NOAA says they are contracting with the CDC to use http://stacks.cdc.gov/ as the NOAA journal manuscript repository. But the CDC plan says they are contracting with PubMed Central for their repository. Meanwhile the role of CHORUS seems vague. See section 7.2.1 Establish Institutional Repository.

I think if you read any of the agency plans you’ll find all sorts of contradictions and vagueness.

My understanding is that each agency had to provide multiple routes to compliance. As you note, an agency that went with CHORUS had to also provide a route for articles published in journals that are not members of CHORUS. There was also an emphasis on having these systems evolve over time, and a suggested approach of having multiple routes to allow for each to be tested, with an eventual move to the most efficient.

Indeed, it looks like we are going to have multiple moving targets for at least the next several years. In the meantime authors and publishers are being treated like experimental subjects. Trial and error is not a good way to make Federal policy.

Referring back to the notion that “Federal use right was an unwritten rule, nowhere spelled out in law:” It is spelled out in law, but with considerable room for interpretation. As I understand it, it’s not that there is a federal use right or license, but that government works generally cannot be copyrighted in the first place and are in the public domain within the U.S.. The present federal statute on copyright (17 USC 105) includes language that copyright protection “is not available for any work of the United States Government” and “no copyright shall subsist … in any publication of the United States Government, or any reprint, in whole or in part, thereof.” A “work of the United States Government,” is defined in 17 USC 101 as a work prepared by an employee of the United States Government as part of that person’s official duties.
OK, technical publications of the US government are straightforward mostly, and I sometimes quip that I work in an agency that has been an open access science publisher since 1879. The “government works” definition has been interpreted by many similarly to what you (David Wojick) articulated, with the government authors’ content free from copyright in the US, but the version of record (VOR) belonging to the publisher. This is congruent to how publishers have loosened copyright assignment requirements in the last several years in allowing for posting of green open access self-archives by authors or institutions, but NOT for free posting of publishers’ VOR articles. But keep reading. It’s not clear that VORs by government authors do have copyright protection.
For years, I had interpreted government author responsibilities similarly to David Wojick’s argument. But after haranguing some colleagues about posting VORs for which they had not paid for open access privileges on their “.gov” study websites, ResearchGate, and such, I was directed to arguments why even VORs by government employees have no copyright protection in the US and may be freely posted on the web.
The gist of the argument why publishers have no US copyright protection for article VORs and thus may be freely posted by others is that the various “sweat of the brow” enhancements that publishers add to the peer-reviewed versions such as typographical or substantive corrections, layout, and indexing are insufficient intellectual contributions to be protected by copyright law. This follows a series of court rulings related to copyright disputes between legal publishers with many parallels to the open access debates of government supported scholarly works. US court opinions are by definition in the public domain, but the legal publishers digitize, copy edit, annotate, compile, cross references and index these, among other enhancements. Then these enhanced compilations are sold through subscription services to the legal community. More than a couple parallels to the scholarly publishing situation.
Apparently many jurisdictions treated the West Publishing Co. versions of US court opinions as the VORs, and competing companies (Matthew Bender & Co. and HyperLaw) wanted to use the West versions with certain redactions in their own compilation and indexing services, without payment to West. The court found that the original publisher’s (West’s) revisions to judicial opinions were merely trivial variations from the public domain works, and that the publisher’s versions were therefore not copyrightable as derivative works. In reaching this conclusion, the court found that the original publisher did not have a protectible interest in any of the portions of the opinions that their competitor intended to copy because the publishers alterations lack even minimal creativity. (Matthew Bender & Co. v. West Publishing Co, https://www.law.cornell.edu/copyright/cases/158_F3d_674.htm

The rationale from these rulings have been extended to why VORs of scholarly articles by government authors may be freely posted on the internet. See http://www.cendi.gov, “Frequently Asked Questions About Copyright (CENDI/2008-1).” A couple of key FAQs from that follow:
“ 2.4.2 Can the published version of a U.S. Government work that has been published in a non-government product be posted on a public Web site?
It depends. If the publisher has made original and creative contributions to the published work, the publisher may have some rights. Check with your General Counsel’s Office or agency policy. Alternatively, the original manuscript as submitted to the publisher could be posted. (See FAQ Sections 3.2.3 and 3.2.4.)”
“3.2.3 May the Government reproduce and disseminate U.S. Government works, such as journal articles or conference papers, which have been first published or disseminated by the private sector?
Assuming the article is written by a government employee as part of his or her official duties and the publisher does not add original, copyright protected content, then the government may reproduce and disseminate an exact copy of the published work either in paper or digital form. (Matthew Bender & Co. v. West Publishing Co.,73 158 F.3d 674 (2d Cir. 1998), cert. denied, 119 S. Ct. 2039 (1999)).”
I suspect these interpretations might raise more questions from SK chefs who have been thinking about this problem long before this particular befuddled author was thrown into it. The referenced case law was from the era of mailed, CD-ROM subscription services, and the material in question (US case law) was of primary value only within one country. So scholarly articles considered US-government works could be freely posted on a US-Wide-Web (or Canadian, UK, and other governments which may assert similar policies). But I’m familiar only with a World-Wide-Web, which seems to leave some questions on posting US copyright-exempt VORs for access from outside the US. And there’s the question, what is a government work? For copyright purposes, this is explored in detail on the CENDI.gov site, and its many links to law review articles and other sources. US Government-funded works by contractors or grantees are not, extracurricular commentaries on a blog written by a government researcher on a Sunday morning are not, and articles reporting US government funded work with inseparable contributions from government and non-government authors are not “government works.” (This last point and contradictory internal guidance is what pulled me into digging into this arcane topic to start with).
So the recent mandates over public access to federally funded research results iare much broader than “government works” for copyright purposes and seem to rely more on the force of persuasion and authorities other than US copyright law.

I think one needs to separate out the rules for government employees and for grant recipients. As far as I know, recipients of government funding are not considered employees of the federal government, nor is their work considered contracted work or “work for hire”.

Most journals grant distinct licenses to employees of the US government, putting the works in the public domain. This is not the case for an employee of a university whose work has been funded in part by the government.

That is right, David. Chris’s correct points about federal publications are not relevant to the public access issue. Moreover, I am talking about the many articles that are not paid for by the government grant, so the issue of work for hire is also irrelevant. The government is claiming a license for articles written long after the grant ends, which I think is the majority of articles, based solely on the fact that the articles describe the research funded by the grant. The government is getting a valuable license for which it has paid nothing.

What happens if I buy a microscope with an NIH grant and twenty years after the grant expires, I take a picture using that microscope? Do all experiments that use that microscope for the rest of time belong to the US government? What if someone from the lab next door uses the microscope? Can the government claim ownership of those experiments as well?

A very interesting case indeed, David, because in effect the funding does not end when the grant ends. Every time the instrument is used the work includes partial funding by the government. The published agency Public Access plans are all extremely vague on this issue of the threshold for requiring submission of an article or of data. It is a central question that they are avoiding. The agencies should be going through rule making procedures to clarify central issues like this, but so far only DOD has said they might do so.

Ok, fair. The PMC program costs money, and perhaps a bit more than the Chefs thought it actually would have cost. But what are the benefits of the work that PMC is undertaking? I would love to see a benefit analysis of all the xml that has become available from across many different publishers that can be contentmined. PMC is a treasure trove for bioinformaticians, big data researchers and all kind of modern geeky things researchers want to do and should do. For the benefit of science, and probably the next miracale drug. I beg the Chefs to investigate how often, how much PMC is used for these kind of purposes, and also if this kind of research based on contentmining of all the available ftxt PMC collection has already led to breaktrough insights, understandings etc. There must be evidence for this, most likeley in PMC itself 🙂

I think it’s an interesting idea and would resolve the current questions over whether text- and data-mining (TDM) is a major future path in research or a niche activity. Where I’ve spoken with publishers about this, they report very few requests to do TDM on their articles, so as far as I’ve heard, there’s not a lot of activity going on. Neither have I seen any major breakthroughs resulting from TDM efforts, though it’s early days and it may still take some time to turn the correlations recognized through TDM into actual proven causal relationships.

All that said, the question then must be asked whether there is value in investing these sorts of costs to make one limited pool of articles for TDM efforts. The ideal TDM study is going to be comprehensive–you want to look at all of the literature on disease X, not just the research funded by one group (nor just the research that was done after that platform came into existence). So to really do it right, you’re going to need to build systems that can reach across multiple platforms and deal with multiple XML schema. You’re going to need to deal with many different licensing terms as well.

I’d rather see investment going into tools that can span the literature rather than spending to create one limited pool of custom-built versions of a portion of the literature.

Well David, the last point you make is exactly the problem. There is not a single place where you can go to contentmine (which is text and data mine) all of the literature. Toll Access places barriers for these kind of techniques. It requires negotiations with each and every publishers to get access for content mining. That’s why PubMed and PMC combined are a first class resource for these techniques.
The techniques needs to prove themselves. True, But it is a growing area of research, and an important one if you’d ask me.

Is it realistic to think there could ever be one single place to perform TDM on the entirety of the scholarly literature? You have the last 100 years or so which are under copyright to hundreds if not thousands of entities, both personal and corporate. You have it collected on a multiplicity of systems in a multiplicity of formats. To me these issues make it unlikely that such a resource could ever exist in a comprehensive manner.

But the more important question is why such a resource should be necessary? Google seems to do a pretty good job of TDM of the majority of the internet without having everything collected on one single website. Are we better off working toward universal standards for formats and access methodologies, as well as standard licensing terms instead of trying to build one central resource (and one potential point of failure)? We live in a distributed information architecture, so shouldn’t forward looking research techniques build with that in mind?

That’s why PubMed and PMC combined are a first class resource for these techniques.

True, if your research involves only looking at a subset of the literature from a limited time period in a limited number of fields.

Comments are closed.