The White House Calls for Information on Public Access to Publications and Data

Office of Science and Technology Policy — Image by justgrimes via Flickr

If you’re reading this blog, you likely have an opinion about open access to journal articles and research results. The White House Office of Science and Technology Policy (OSTP) has put out two formal Requests for Information; one on the subject of “Public Access to Peer-Reviewed Scholarly Publications” and the other on “Public Access to Digital Data.”

While most of us enjoy the seemingly endless back and forth discussion online (or ranting and raving, as the case may be), this is a chance for all stakeholders to have a direct influence where it matters most. The White House is crafting requirements for recipients of federal research funding and the information received here will be crucial to setting policy.

There are two separate issues here, public access to journal articles from federally-funded research, and the tricky question of how to make the most of the raw data collected in those federally-funded experiments.

Journal Articles
The OSTP previously called for a public consultation on the subject and is now extending that process further. Here they’re specifically looking for input from “the public, universities, nonprofit and for-profit publishers, libraries, and research scientists.” There are 8 specific questions asked:

Are there steps that agencies could take to grow existing and new markets related to the access and analysis of peer-reviewed publications that result from federally funded scientific research?
What specific steps can be taken to protect the intellectual property interests of publishers, scientists, Federal agencies, and other stakeholders involved with the publication and dissemination of peer-reviewed scholarly publications resulting from federally funded scientific research?
What are the pros and cons of centralized and decentralized approaches to managing public access to peer-reviewed scholarly publications that result from federally funded research in terms of interoperability, search, development of analytic tools, and other scientific and commercial opportunities?
Are there models or new ideas for public-private partnerships that take advantage of existing publisher archives and encourage innovation in accessibility and interoperability, while ensuring long-term stewardship of the results of federally funded research?
What steps can be taken by Federal agencies, publishers, and/or scholarly and professional societies to encourage interoperable search, discovery, and analysis capacity across disciplines and archives? What are the minimum core metadata for scholarly publications that must be made available to the public to allow such capabilities?
How can Federal agencies that fund science maximize the benefit of public access policies to U.S. taxpayers, and their investment in the peer- reviewed literature, while minimizing burden and costs for stakeholders, including awardee institutions, scientists, publishers, Federal agencies, and libraries?
Besides scholarly journal articles, should other types of peer-reviewed publications resulting from federally funded research, such as book chapters and conference proceedings, be covered by these public access policies?
What is the appropriate embargo period after publication before the public is granted free access to the full content of peer-reviewed scholarly publications resulting from federally funded research?

That these questions are being asked shows that the OSTP has a deep understanding of the issues involved and is looking for a nuanced solution that provides maximum benefit from research results without causing irreparable economic harm or destroying the functional system of filtering, verification, and distribution of those results.

This isn’t just a simple decision whether papers should be free. Questions 1 and 4 spell out an agenda that seeks to create new markets and new products from the scholarly literature.

Question 3 appears directed at the seemingly nonsensical approach of PubMed Central, which for unknown reasons requires all papers to be held in one physical repository. This seems an archaic absurdity in an interconnected age. Google seems to work just fine in indexing material spread throughout the world; why must PubMed Central have everything in one box? Publishers would be much more supportive of PubMed Central if the traffic for viewing free articles went to the journals themselves rather than PubMed Central.

That question, combined with question 5, points to an interest in creating a science-wide index, one that reaches beyond the bio-medical range of PubMed. The utility of such an index is clearly obvious in an increasingly interdisciplinary world.

Question 8 is the one that likely has the most immediate importance for scholarly publishers. The NIH requires that research be released after a 12-month embargo, but it’s unclear if the same time scale that seems to work for medicine and life sciences would be appropriate for the physical or social sciences. Does the usage and citation pattern of articles in all fields match that seen in medicine and biology?

Access to Data
Providing public access to the data collected in federally-funded experiments is a great idea, but it’s an incredibly complex undertaking. There are costs in preparing the data for the use of others, costs in storing the data and costs in providing the data.

Beyond finding a way to pay for all those extra expenses, you have the near-infinite variety of types of data collected. Data archiving and availability will require a clear set of standards, but is it realistic to consider the development of such standards for such wildly variant material?

And there’s still a deeper underlying question — for some types of data, is it worth the bother? Many experiments are posed to ask a very specific question under very specific conditions — it’s unclear if the data generated could ever be re-used for a different purpose. As technology improves, we will reach (if we haven’t already reached) a point where it’s cheaper to recreate some data than it is to store it.

The request for information here asks 13 very thoughtful questions for moving the process forward. The NIH and NSF already require “data management plans” from funded researchers but these requirements could use much further refinement.

Even ignoring the potential benefits of data sharing, researchers stand to benefit greatly if funding agencies are willing to put money toward the organization and archiving of data. Having your own laboratory’s data in a clear and permanent system will save countless hours of work as one student graduates and moves on, taking precious knowledge with him that often needs to be recreated from scratch.

For libraries and publishers, there’s great opportunity here. Librarians are experts in the organization and retrieval of information. There’s an open door here for the creation of an entirely new role for the institutional library as the data archive and the librarian as the archivist.

For publishers, our authors and readers have a crying need for systems to manage and store the vast hordes of data used to generate research articles. The company that masters the technologies involved and offers services and solutions in this field will prosper by filling a tremendously valuable niche. It’s not clear if publishers are the right people to fill this role, but our expertise in content management seems a good fit.

These requests for information are of great importance for the future of academia and scholarly publishing. If you’re a traditionalist who sees open access as the downfall of civilization, an advocate who thinks information must be free, or someone who falls somewhere in between, this is your chance to create the future you’re seeking. If you’re a researcher who doesn’t want to be burdened with the storage and tracking of minutiae, a completist who wants his every action recorded for posterity or someone somewhere in between, this is your chance to determine the data policy of the future.

For publishers and librarians, the shape of your industries and careers are on the line here. Send in your responses by January 2, 2012, or let others decide your fate.

David Crotty

@davidacrotty

David Crotty is a Senior Consultant at Clarke & Esposito, a boutique management consulting firm focused on strategic issues related to professional and academic publishing and information services. Previously, David was the Editorial Director, Journals Policy for Oxford University Press. He oversaw journal policy across OUP’s journals program, drove technological innovation, and served as an information officer. David acquired and managed a suite of research society-owned journals with OUP, and before that was the Executive Editor for Cold Spring Harbor Laboratory Press, where he created and edited new science books and journals, along with serving as a journal Editor-in-Chief. He has served on the Board of Directors for the STM Association, the Society for Scholarly Publishing and CHOR, Inc., as well as The AAP-PSP Executive Council. David received his PhD in Genetics from Columbia University and did developmental neuroscience research at Caltech before moving from the bench to publishing.

Discussion

49 Thoughts on "The White House Calls for Information on Public Access to Publications and Data"

David, I am surprised at your response to number 3. The obvious answer is preservation. Having all the content in a well maintained XML archive with very strict and extensive requirements for embedded meta data ensures the knowledge base generated by the all the federal funding going into research is going to remain easily accessible 10, 20, 50 even 100 years from now.

To me it is well worth the added expense which is I suspect a small fraction of the cost the government puts in to generating the research. I think John Willinsky estimated we spend $60,000 USD per article to generate a research article by dividing the total NIH budget by the number of articles published from that research in 2008, There are other options for for preservation like Portico but having a single federally funded well maintained archive with a consistent format is the best way to go.

By David Solomon
Nov 14, 2011, 7:09 AM

Preservation and access are two separate issues. If the government wants to go to the expense to maintain a separate archive beyond that of well established archives such as Portico or LOCKSS, that’s fine (though it would be more efficient and likely just as effective to have a set of archival criteria and let the publishers pay for this themselves). The problem is that PubMed Central uses its archive to drive the question of access, and uses the papers (often early drafts rather than the final product) as the paper of record, sending readers there rather than to the final version on the publisher’s site. As a reader, I’d rather see the final edited version of the paper. As a publisher, if I’m offering free access to the paper after 12 months, I’d rather that I still get the traffic generated by my hard work on that paper (I can still benefit from selling ads and from libraries realizing their institutional usage).

The argument for sending the reader to the archived PubMed Central version is usually a suggestion that having everything in one place improves search and discovery, which is clearly not accurate given modern search technology.

By David Crotty
Nov 14, 2011, 7:39 AM

Theoretical issues aside, none of the federal research agencies has the money to replicate PubMed. I suggest that it is pointless to propose expensive solutions in the present budget context, which is pretty desperate. For example, the National Library of Medicine, which runs PubMed, has a budget of about $350 million. DOE’s equivalent, the Office of Scientific and Technical Information, gets maybe $9 million, yet DOE funds the majority of federal hard science. Even worse, NSF has no such facility at all. If I were OMB I would look at ways to scale back PubMed, not ways to clone it.

By David Wojick
Nov 14, 2011, 9:32 AM

One supposes that any such proposal would come with an accompanying funding mechanism. How much would it cost to run a scientific research discovery index per year, particularly if you’re outsourcing the archival nature, and merely linking to papers stored elsewhere?

By David Crotty
Nov 14, 2011, 10:48 AM

The last I heard, the NIH budget was in the range of 28 billion. I believe it is actually a little higher now. The NLM budget @ $350 million is just over 1% of that. I expect the majority of the cost of running PubMed goes to the cost of indexing, not PubMed Central. The National Library of Medicine does more than run PubMed/PubMed Central. I don’t know if they still do but they used to even have a grant program of their own.

Obviously the government needs to find ways of saving money but in this case, weighing the value of a PubMed Central like archive should probably be based on the percentage of the total money spent on research funding not the dollar amount. It would be an added cost of running the system like peer review or grants management.

By David Solomon
Dec 22, 2011, 9:28 AM

I am not as familiar with Portico but LOCKSS just archives the PDF/HTML or what ever version that is on the Web rather than an XML version including key metadata fields. From an individual reader point of view that doesn’t matter but having all the published NIH funded research reports with key metadata in one place in a consistent machine readable format does add a lot of value from the perspective of efficient secondary research. Also compared with the full expenditure for NIH funded research this doesn’t cost pennies on the dollar but mils on the dollar.

It is the accepted version not an early draft that must be submitted. PubMed Central stores the version that is deposited. The NIH requirement is the accepted version but the publisher can deposit the final version if that is the version they want for the paper of record.

I am not sure what you mean paper of record. PubMed lists the full reference including the DOI. Someone can go to the publisher’s site or access the version in PubMed Central via a direct link. If they are at a university and their library has a subscription they can also go by a direct link from PubMed to the publisher version.

As a reader you are welcome to spend your ~ $30 to see the copy edited and typeset version on the publisher’s web site or if you have access go through your library. For those who don’t have access through a library and don’t want to spend around $30 to view a paper when if they live in the US about 95% of the cost of creating it came out of their tax dollars, the NIH mandate gives them another option to view a version of the paper.Those who don’t have access includes researchers at large US based universities. I work at one and about 10% of the articles I try to access are not available through my library. That hinders my ability to do research and wastes time. PubMed Central avoids that problem at least for NIH funded research.

I have some sympathy for publishes being required to allow the material they publish to be freely available in PubMed Central but not much. There is no mandate on the publisher, it’s on the researcher. If a publisher doesn’t want to accept the mandate no one is forcing them to accept papers based on NIH research. They also have a year in which the published version is the only version available and they can still sell much of the added value they provide, i.e. the copy editing and typesetting after that point. Is their any evidence libraries are dropping submissions due to the mandate? If not, I have a hard time seeing what publishers are complaining about.

By David Solomon
Dec 22, 2011, 1:20 AM

I think publishers are complaining about a brute force abrogation of their copyrights. Journal articles are seldom, if ever, funded by the federal research contracts, so the government has no ownership interest. Thus these mandates are simply seizing someone’s property, just because it might be nice for someone else. The government does get final technical reports, which should satisfy the requirement for making research results publicly available.

By David Wojick
Dec 22, 2011, 7:56 AM

I think that is the crux of the disagreement. Scholarly publication has traditionally been funded by authors assigning copyright to publishers in turn for getting their work published. I don’t agree with much of what Stevan Harnad has to say but I do agree it’s a Faustian bargain that with electronic publishing is no longer necessary. It’s not that publisher do not provide a great deal of value and deserve to be paid for what they do as well as earn a good profit. They just don’t need to own the finished product to get paid for what they do. But that is a whole other discussion that has gone on way too long.

I don’t think publisher can complain about brute force anything or have a legitimate reason for feeling like they are being ripped off. They may own the copyright but it is with the UPFRONT stipulation that they will allow at least the accepted version to be deposited in PubMed Central. Publishers are making a Faustian bargain. As an author I know what it feels like and it sucks. We have been putting up with it for years to get our manuscripts published. I don’t mean to sound snarky about it but it really is a parallel situation and it really does bother me to sign over copyright in much the same way I expect the NIH requirement bothers you as a publisher.

That is just not true about final reports from grants. They serve a very different purpose and do not adequately serve the purpose of making research results from NIH funded research freely available. I also do not agree the articles are seldom written using protected time funded by grants. Not only is at least part of the writing often going on during the funding period of a grant but dissemination is an clearly an expectation of the granting agreement. The fact that article processing fees are a legitimate use of grant funding clearly highlights that fact.

By David Solomon
Dec 22, 2011, 10:31 AM

David Solomon: What is the basis for your claim that research reports ” do not adequately serve the purpose of making research results from NIH funded research freely available”? I don’t know about NIH but at DOE these reports are estimated to average 60 pages or so. They provide far more information than journal articles. Journal articles involve various sorts of filtering, which the government should have no interest in. I think research reports are ideal for reporting the results of federally funded research. That is their sole function.

By David Wojick
Dec 24, 2011, 7:15 AM

To David Wojick. I’ve been the PI on 3 HRSA funded projects and Co-PI on about 5 or 6 others and several NIH grants. I can give you specific examples of articles that were written based on grant funded research that were written during the granting period where most or all authors had grant funded protected time. They are mentioned in the final report along with maybe a few sentences about the findings but in no way give an adequate description of the research methodology and results and discussion of the results. What goes into at least the HRSA final reports is not very useful in terms of research coming out of the project. Most of it addresses how the money was spend and documenting you did what you said you would do.

Research is not the main object of the HRSA projects I have been involved. They are focused on developing and enhancing physician educational programs but dissemination is one of the stated objectives and dissemination via peer reviewed publication is strongly encouraged.

I have less experience with NIH grants but I believe the final reports also do not contain the same type of information nor then in depth that go into a manuscript. Some else also commented on this in an earlier blog post when this claim was brought up.

By David Solomon
Dec 24, 2011, 12:40 PM

David Solomon: Why do publishers accept this UPFRONT stipulation that the government will get and publish a copy? Only because it has been forced on the author-publisher system without compensation. Has this taking been litigated? Does the government have a right to everything I write that flows from my research? For how long? My lifetime? Suppose 10 years from now I write a book that uses some of my results, does the government have a right to seize and publish it? If not then how do they get a right to my journal articles?

I fail to see where this right comes from. They have my research report, as specified by our contract, and that should be all they get. They should not get an open ended right to everything I write thereafter, at their choosing.

By David Wojick
Dec 24, 2011, 7:29 AM

To David Solomon, it does not surprise me to learn that agencies that do not publish their research reports do not require good reports. DOE does both and last year they saw over 30 million report downloads. That is public access to federally funded research.

By David Wojick
Dec 24, 2011, 2:12 PM

I’m not questioning the value of an archive. What I’m questioning is the need to have it all archived in one physical location. If a publisher accepts that papers from federally funded research must be free after a given embargo period, they would be much happier if the traffic generated by those free articles continued to contribute to their business, rather than having to cede all traffic to PubMed Central, which seems to desire all that traffic as a means of justifying its own existence.

Can you instead create a set of archiving requirements (XML with metadata as you suggest) and instead have PubMed Central link only to the free version on the publisher’s site? Google seems quite capable of indexing material at more than one location. Are they that much more advanced than scientific researchers? Is it possible to mine the literature if it’s located at different URL’s?

A few problems with the rest of your post though–not all research is funded by the NIH and not all researchers live in the US. Should foreign researchers who don’t pay US taxes be given free access to the research? The argument that free access is required for anything tax funded is also a bit absurd. My taxes pay for upkeep on the NYC subway system but I must pay a fee to access it. Why aren’t subway rides free for everyone?

By David Crotty
Dec 22, 2011, 9:49 AM

If reference to your last paragraph, in my view whether the researcher is in the US or somewhere else is irrelevant. The point is their research is funded with public money with the goal of fulfilling the objectives of the grant i.e., conducting research in an attempt to answer the research questions. It seem reasonable to make the results generally available except where there is a good reason not to do so, for example defense research. I don’t think there is some absolute need or moral requirement to make it available it just makes sense and is good public policy.

As (I assume) a New York City resident, your taxes go to subsidizing the subway system. You or anyone else including me if I am visiting NYC pays a fee to ride on the subway but it is far less (thank you) than it would be if you weren’t subsidizing the subway with your taxes and probably beyond what many people could afford to pay. The shared cost model in my view is good public policy.

I live in the Lansing Michigan area and I paid $320 in property taxes (just paid them and it is a line item) this year for the Capitol Area Transit Bus System that I never use. If I (or you) were to ride it, I or you would pay a fare but far less.than if it were not subsidized. Further more, it probably wouldn’t be around or would have a far more limited route system. A lot of people do use it and many of them probably couldn’t afford a car so I don’t mind paying it.

I don’t think your example supports the argument that free access to publicly funded research is absurd.

By David Solomon
Dec 22, 2011, 12:26 PM

I’m not making an argument stating that free access to research results (no matter how funded) is absurd. I’m arguing that an argument based on the funding nature of the research doesn’t logically work. If one argues, I’m a taxpayer, my taxes paid for the research, hence it should be free, that doesn’t really work, particularly because under Bayh-Dole, researchers are encouraged to keep the results of their federally funded research locked up and private in order to spur personal financial gain. There are many things taxes pay for that are not made freely available to all. There is much research that is not paid for with public funds. There are many accessing research who do not pay taxes in the US. I think there are much better arguments for opening science results, but “everything that’s paid for by the government should be free” doesn’t fly.

As for the subway, you are essentially correct, and why it is a good analogy. Our taxes pay for part of the process of building, maintaining and running the subway, but not all of it. Hence, I have to pay a fee every time I access it in order to cover the rest of those costs. Research grants pay for part of the process of creating a published result, but not all of it. Hence, further fees are charged for access. If you or I had to pay for the entire research process, likely it would cost much more than the current subscription/pay-per-view fee.

By David Crotty
Dec 22, 2011, 5:49 PM

Sorry if this ends up in the wrong place in the blog. In some cases I can’t figure out where to reply to put the reply under the comment it is referencing.

The Bayh-Dole Act addresses inventions that are developed from federally funded research and the patents from those inventions. It does not research reports.

http://edocket.access.gpo.gov/cfr_2002/julqtr/37cfr401.1.htm

They are very different types of intellectual property governed by very different sets of laws.

I believe the basic rationale for the act was government owned patents were not being commercially licensed and put to use. Part of the problem was they were tied up in all sorts of government regulations. The Act was designed to make it easier for them to be commercialized and put into use.

The Act does allow universities and even private business to own the patents from inventions their employees created using federal research funds but with stipulations. These include sharing the royalties with the actual inventor and using any remaining income for education and research. It seems a pretty reasonable compromise for that type of intellectual property and makes good use of publically funded research.

Research reports are quite different from inventions. I don’t think today it is necessary as it apparently was with patents to give research reports to a private entity to allow the knowledge to be effectively disseminated and used. Also, with the Bayh-Dole Act there are at least some basic stipulations to help ensure the money the patents generated is at least put towards education and further research and shared with the individual who actually invented it.

I don’t think there is a moral imperative to archive and disseminate research findings at no cost through PubMed Centeral; it is just possible to do and makes practical sense. In my view it’s a good use of public funds. In the same vein there is no moral imperative to use the subscription publishing model to disseminate the research findings.

Bayh-Dole sets the rules for how inventions coming out of federally funded research are used. It was felt that from a practical standpoint it was in the public interest to allow the inventor’s organization to keep the patent as long as they met certain stipulations in the legislation.

The Consolidated Appropriations Act, 2008 sets the rules for how NIH grantees can distribute research results.

“The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.”

Like in Bayh-Dole Congress felt this was the best way to have this type of federally funded research results put into use. Like Bayh-Dole it’s a compromise of competing interests and one that in my view works.

Subscription publishers may be complaining a lot but they are still making money and getting copyright for federally funded research reports when they invest very little of the total cost of creating them. Those like me who want to access the those reports freely and easily in the form the investigator turned them over to the publisher can do so through PubMed Central but may have to wait a year to do so. Neither of us is getting exactly what we want but we are getting something reasonable.

In my view it doesn’t matter whether or not a person accessing PubMed Central is a tax payer. It doesn’t cost anything to give them access to PubMed Central so why not? I have access to OA research funded for example by the EU so why shouldn’t researchers in Europe and elsewhere have access to our federally funded research? There is no moral imperative to do it, it just makes sense. Everybody wins and science operates more efficiently.

This is the last I am going to say on this topic. It was interesting discussing this with all of you; can we just agree to disagree? Merry Christmas!!

By David Solomon
Dec 23, 2011, 11:04 AM

I would contest the 2008 law, up to the Supreme Court if necessary. The government already gets a research report, which it owns and can make public. The DOE report archive had 30 million downloads last year. I see no valid claim for getting peer reviewed journal articles as well, especially as this interferes with long established trade practices.

By David Wojick
Dec 24, 2011, 7:44 AM

To David Wojick. Go for it. Let the courts decide,.

By David Solomon
Dec 24, 2011, 11:33 AM

Hear, hear. PubMed Central is very, very good for text mining, and an obvious and essential interest for the bioinformatics community. It is an utter victory that publishers are up against a wall when it comes to NIH funding — and I was under the impression that they’d recognized this as a public good.

By Alex Garnett
Nov 14, 2011, 12:50 PM

The question remains, if publishers gave the bioinformatics community the same level of access for text mining, would it make a difference having the material distributed rather than stored in one box?

By David Crotty
Nov 14, 2011, 12:57 PM

Not really if it were in XML contained the same meta data with the same or equivalent Document Type Definition and was linked to PubMed/other indexes and it was freely available.

By David Solomon
Dec 22, 2011, 11:30 AM

I think that would go a long way toward generating good will and getting publisher buy-in to PubMed Central. I’ve never understood the insistence on taking away traffic to journal sites, other than that it creates a way for PubMed Central to tout its own numbers.

By David Crotty
Dec 22, 2011, 11:42 AM

With regard to access to research data, we need to acknowledge that secrecy is just as important as transparency in the functioning of science. While openly sharing research data does, in some circumstances, benefit other scientists, it can undermine an important incentive that drives scientists to publish their results in the first place.

See Openness and Secrecy in Science — A Careful Balance

By Phil Davis
Nov 14, 2011, 7:35 AM

Discussions about the publication of the results of publicly funded research always stirs in me the question “Who is entitled to take benefit from that research?” Phil Davis’s comment implies that scientists need an incentive to publish — shouldn’t that be an obligation? – and are, perhaps, entitled to secrecy of their publicly funded results. David Crotty’s comment suggests entitlement to web traffic (which is reasonable in a return for provide a copy, but questionable if it’s in return for providing the only copy – is the publisher entitled to a monopoly on the research results?).

My concern is greater when it comes to commercial exploitation of publicly funded research. As a profit making business, am I entitled to built upon publicly funded research, and pay nothing towards its cost? Is private research that is no more than a patentable tweak on top on publicly funded research enough to justify 100% of the value going to the tweaker, and none to the funder of the real research?

By Dave Pullin
Nov 14, 2011, 8:19 AM

Phil Davis is talking about data, not publications. Data is IP under present law. Proposals that require new laws go into a special basket.

By David Wojick
Nov 14, 2011, 9:35 AM

Please clarify “data is IP under present law.” How so? I’m aware that after the Feist decision in 1991, the AAP attempted to persuade Congress to pass legislation protecting databases, but the effort did not succeed.

By Sandy Thatcher
Nov 14, 2011, 11:21 AM

I am no expert on this but “rights in data” are a major element of federal contract law and regulation, so who owns what depends on the agency and the contract, but somebody always owns the data. It is valuable property. This is also true of data developed specifically under federal R&D contracts. Many science agencies have data sharing policies, but they are often advisory, not regulatory. Here is a brief summary, although it appears old: http://en.wikipedia.org/wiki/Data_sharing.

Researchers typical don’t have to share if the data is still in use, as it were. There was a famous case about 10 years ago, where EPA proposed some air quality regulations, based in part on a Harvard epidemiological study. Industry asked for the data and Harvard refused, on the grounds that they were still collecting data, and would be for a long time to come. Congress then passed a law requiring that all data developed for regulatory purposes must be publicly available, but so far as I know that is the only mandatory legal requirement of this sort.

By David Wojick
Nov 14, 2011, 1:41 PM

It’s worth drawing a distinction about which data you want to make available. UK funding agencies have requested that the data resulting from a grant be made publicly available for years, but there didn’t seem to be much enforcement. A much more natural approach is to expect that the data associated with a published paper is made available- after all, most of it is usually visible in the graphs or tables. It’s also quite easy to identify which data are needed (the data underlying the results), and there’s a clear timeline for when it should be made available (on publication or after an embargo). Neither of these is true for the data resulting from a grant.

Since government agencies are keen for research data to be made available, they ought to ensure that the appropriate repositories exist. However, it looks like they’re trying to run in the opposite direction- NCBI’s GenBank is wavering on whether to take in the exabytes of Next Generation Sequencing data becoming available, and bodies like the NSF are unable to make the long term funding commitments needed to pay for other data archiving initiatives (even when these initiatives cost about as much as a single grant and would last for years).

By Tim Vines
Nov 14, 2011, 3:08 PM

Tim, the US science agencies have been paralyzed for several years by the growing budget crisis and political stalemate. They have been operating under so-called continuing resolutions, which preclude new initiatives.

But there is also the core RFI issue of how to decide which data to create repositories for? The projects you refer to may be cheap but there may be hundreds of others as well, some quite expensive. What is desperately needed is a selection policy, and cheapest first is not a viable policy. Data RFI responses need to focus on this selection issue. (I did staff work for the group that will be looking at these responses.)

By David Wojick
Nov 14, 2011, 4:23 PM

There seems to be a fundamental misunderstanding about the nature of publicly-funded goods and services, that they must somehow be made freely available to all, and that no one should be allowed to profit.

Defense contractors seem to have no problem profiting from government-funded projects, nor are the results of those projects freely available to all who ask. Federal small business loans don’t forbid a fledgling company from making a profit, in fact it is encouraged. As a taxpayer, I’m not offered free access to the goods and services offered by those companies. My taxes pay for upkeep on NYC’s bridges, tunnels and subways, yet I must pay a toll in order to access any of these.

The Bayh-Dole act specifically gives researchers intellectual property rights over the fruits of their federally-funded research. Many universities rely heavily on their patent portfolios for operating funds. With no such provision, we’d have no Google, as the algorithms were developed at Stanford with NSF funding.

In reading the questions in these RFI’s, there is clearly an agenda of creating new businesses and new markets, in using federal funding to drive the economy.

By David Crotty
Nov 14, 2011, 10:46 AM

I’m actually working on a post about one of the ideas you’ve raised. “Open” is viewed as an unadorned good, but open can also lead to paralysis, and it also has its downsides, some of which we’ve seen lead to sad results.

Imagine if Apple had to be “open” about its innovation, or even Google? Scientists are just as guarded about their big projects. Science is remarkably open, but even then, a certain number of curtains are needed so people can do things without fear of having their ideas ripped off, exposed, or thwarted. Having some ability to operate outside of the public eye is crucial to getting things done. Part of what has paralyzed our government over the past few years is that the Tea Party people and their ilk can’t be trusted to work things out behind the scenes for the benefit of their constituents. Everything is “open” and exposed, so nobody can bargain safely.

To your other point, if you really want the government to compete with private enterprise on innovations, I’d only caution you to be careful what you ask for. Others have tried to make all innovation state-owned. You can think of those examples yourself.

By Kent Anderson
Nov 14, 2011, 10:49 AM

Wikileaks should be mentioned here, teaching us once again that diplomacy and negotiation is often incompatible with openness.

By David Crotty
Nov 14, 2011, 10:52 AM

Clearly, the Manhattan Project was not “open” science. How would such a policy apply to projects like that?

By Sandy Thatcher
Nov 14, 2011, 11:18 AM

A public access policy certainly would not apply to so-called “classified” or otherwise restricted military data. It would probably not apply to DOD at all, as they tend to frown on sharing of military use data, classified or not.

By David Wojick
Nov 14, 2011, 1:46 PM

One small correction: Comments are indeed due on the Publications RFI by January 2, 2012, but they are not due on the Data RFI until January 12.
The Publications RFI is at http://federalregister.gov/a/2011-28623
The Data RFI is at http://federalregister.gov/a/2011-28621

By David Wojick
Nov 14, 2011, 10:00 AM

I’ve been told the 12th deadline is a typo that will shortly be corrected to January 2.

By David Crotty
Nov 14, 2011, 10:37 AM

You are probably right. If so it creates a mess because some folks will not see the correction. This FR notice has already traveled far and wide. Let the confusion begin.

By David Wojick
Nov 14, 2011, 11:15 AM

I already have a draft proposal for the Publication RFI on the Kitchen table:

http://scholarlykitchen.sspnet.org/2011/09/21/taxpayer-oa-is-already-here-in-principle-in-reports/

“Taxpayer access to US federally funded research results need not involve publishers giving away their product. An alternative mechanism is available, one that is already partially implemented. It is called the research report.

Demands for free access to taxpayer funded research results are in full cry. The focus is on journal publishers and their product. What is puzzling is that this access already exists in the US, in principle if not always in practice, and it does not involve the publishers. By law every federally funded research project is required to provide a detailed final report. Some science funding agencies make these reports freely available via the Web, but others do not. Making them all available would solve the access problem, without involving journal publishers.”

By David Wojick
Nov 14, 2011, 11:23 AM

Making final reports freely available would be a good and cheap substitute for allowing interested third parties to see what that tranche of funding was spent on, but they’re not an adequate replacement for peer reviewed articles when one wants to see the actual /results/. Since these reports are a hefty bureaucratic burden on the researcher and don’t provide any other benefit, they’re typically cobbled together in a big rush at the end of the project. In addition, there’s no external evaluation to rein in unjustified speculation or faulty results.

By Tim Vines
Nov 14, 2011, 2:24 PM

It would be easy enough to improve them. I find the DOE reports, including my own, to be quite good. See http://www.osti.gov/bridge/. If other agencies are accepting bad research reports they ought not to be.

As for speculation I have a different take. This is often where the best ideas lie. In either case I regard articles and reports as mere notices, long abstracts if you like. If I see something useful I contact the researchers.

In any case, I am not saying the rigor of journal articles in unnecessary for science, but why is it necessary to fulfill the public access mandate? Part of the confusion is that this mandate has never been spelled out. This too should be the subject of RFI responses. What is the government supposed to be doing, and at what cost? The cost benefit stuff is central, otherwise everyone wants all they can get for nothing. My argument is that final reports, properly supervised, are good enough for public access. No new channels need to be created.

By David Wojick
Nov 14, 2011, 4:37 PM

As expected, the deadlines for both of these RFI’s have been extended to January 12. No excuses, make your voice heard!
http://www.whitehouse.gov/blog/2011/12/21/extended-deadline-public-access-and-digital-data-rfis

By David Crotty
Dec 21, 2011, 9:31 PM

Good news indeed! I suspect this extension is due in part to the fact that one of the RFI’s originally had a Jan 12 due date by mistake, but Jan 2 never make any sense, it being the end of the holiday.

Start the new year with a new vision for federal access policy.

By David Wojick
Dec 22, 2011, 7:03 AM

The Scholarly Kitchen

The White House Calls for Information on Public Access to Publications and Data

SSP Announces Release of Individual Results for the Insights Benchmarking Compensation & Benefits Study

SSP Virtual 5K Run, Walk, and Roll Returns for Fourth-Year

David Crotty

Related Articles:

Next Article: