Last month, President Obama showed off his dad-joke skills while announcing the appointment of the first US Chief Data Scientist. The focus of much of the White House’s messaging around this appointment has been on making the government’s own data publicly available. In his ‘memo to the American people’, however, Dr. D.J. Patil talked about acting as a conduit between government, academia and industry. In some ways, this latest move can be seen as a continuation of a US government push toward open data that mirrors efforts in Europe and elsewhere.
For a long time, there has been an expectation that researchers share data upon request with other academics but more recently, the trend has been towards making data widely and publicly available. In February 2013, the White House Office of Science and Technology Policy released a memorandum on Expanding Public Access to the Results of Federally Funded Research. While the 2013 statement got a lot of people’s attention, funding agencies have been moving towards open data for over a decade. In 2003, the NIH announced a data sharing policy in which they stated.
Data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data. To facilitate data sharing, investigators submitting a research application requesting $500,000 or more of direct costs in any single year to NIH on or after October 1, 2003 are expected to include a plan for sharing final research data for research purposes, or state why data sharing is not possible. [Emphasis theirs]
A similar Proposal and Award Policies and Procedures Guide (PAPP), came out of the NSF in 2007. For both of the largest governmental scientific funders in the US, researchers are required to describe how they intend to both manage their data and make it available to those who wish to build upon it. In other words, the US government fully intends that data sharing be a requirement for receiving federal funding.
Interest in data sharing isn’t just restricted to the sciences. In the humanities, the NEH also require grant applicants to submit a similar data management plan. There is also an Office of Digital Humanities within the NEH, that focuses on harnessing new technology. Its activities include educating researchers in data curation, the agency’s policies, and best practices for digital archiving.
While these policies and initiatives are clearly intended to show government support for data sharing, many in the data science world say that they don’t go far enough. There’s an argument to be made that merely requiring a plan in a grant doesn’t necessarily mean that data will be shared. After all, sharing data requires adding an extra step to the workflow of researchers that are already pressed for time. When the NIH began threatening that grants would not be renewed for those who failed to comply with their green OA policy, compliance jumped from 19% to 49%. Similarly, a truly effective data sharing policy may have to have consequences for noncompliance.
Internationally, the UK seems to be leading the field in terms of open data mandates. According to Sherpa/Juliette (which is jointly funded by JISC and RLUK), over a quarter of all UK based funders now have data archiving policies in place, including the Wellcome Trust, MRC, BBRC and most aggressively, EPSRC. EPSRC’s policy is important because it’s the first to cross the line from a statement of policy to a mandate with teeth. The policy Comes into full force in May 2015 and EPSRC promises to:
…investigate non-compliance; if it appears that proper sharing of research data is being obstructed EPSRC reserves the right to impose appropriate sanctions.
Given that the new EPSRC policy is based on RCUK guidelines, It seems likely that if the policy is successful, other research councils will follow suit.
That’s not to say that other governments are not being aggressive on this, everybody from the Canadian government to the Austrian Science Fund have policies in place. Many private funders are also getting involved; the Bill and Melinda Gates Foundation claim to have the worlds strongest policy on open access, which includes a requirement for open data. With the appointment of a Chief Data Scientist in the US, the new EPSRC policy, and the ever quickening pace of mandates, it looks like we may be at a tipping point for open data.
Why has it taken until now for us to reach this point? One thing that has held the open data movement back in recent years is the concern that while increased transparency is widely accepted to be good for science generally, some are concerned that sharing data might have career risks for researchers individually. These concerns have been articulated very well in The Kitchen previously and include the lack of citability of data, fear of getting scooped, and the desire to get proper credit for work done.
For some time now, some researchers, like the ones in this article in Science magazine, have actively advocated for data sharing. These pioneers of open data claim that while there are risks, on balance, the benefits outweigh them. According to the Knowledge Exchange report Sowing the seed: Incentives and motivations for sharing research data, a researcher’s perspective, which is based on interviews with academics, many researchers see data sharing as an important strategy to make their research and research group more visible. More quantitatively, Piowowar et al., found a 69% increase in citations for microarray cancer clinical trial data when the data was made freely available. In order to explore these issues further, Digital Science recently organized the first in a series of open data spotlight events for researchers. Nicko Goncharoff wrote a summary of the event in Research Information. One theme of the meeting was the need for a shift in the way that we value research output to give greater credit to data. On the other hand, there was also a lot of talk about the benefit to science of sharing data, and the ways in which it can benefit researchers directly, often in unexpected ways with other researchers applying techniques and ideas that the originating lab hadn’t imagined.
Last November, Alice Meadows wrote an excellent post based on Wiley’s data sharing survey of some 90,000 researchers in which she noted that a significant number are concerned that giving away their data might either cause them to be scooped, lead to them not getting adequate recognition, or have their work undermined. On the other hand, in that same post, Alice noted that 53% or researchers globally now do share their data. We’ve been hearing for a while now about the theoretical risks and benefits of data sharing, but the proof of the pudding, as they say, is in the eating. With just over half of researchers sharing data, either because they find it to be beneficial, or because their funders asked them to, again, it looks like we’re at a tipping point.
But what about the minority of researchers that aren’t ready to share data? If we’re going to address the understandable concerns that some researchers have and not simply ride roughshod over them, or end up with a significant minority of researchers that refuse to comply with data sharing mandates, we’re going to have to make sure that their concerns are addressed. Returning to the appointment of the White House Chief Data Scientist, part of Patil’s new job is to work with agencies to define best practices for data sharing. During his memorandum to the American people, he expressed a desire to…
position ourselves for the next wave of innovation and … for everyone to benefit holistically, and I want to emphasize that everyone benefit holistically
Perhaps, part of the thinking here is that if the US is to avoid falling behind in the race towards open data, the government and the funding agencies must shape their mandates in such a way as to mitigate the concerns of individual researchers, maximize the benefits, and apply adequate consequences for noncompliance. It will be interesting to see what comes out of this during the next year or so but one thing’s for sure: the growing need to support open data isn’t going away anytime soon.