If you pay any attention at all to scholarly publishing, you’re likely aware of the current uproar over PLOS’ recent announcement requiring all article authors to make their data publicly available. This is a bold move, and a forward-looking policy from PLOS. It may, for many reasons, have come too early to be effective, but ultimately, that may not be the point.
Make no mistake, data availability is an important new frontier in scholarly research. Last year’s White House Office of Science and Technology Policy (OSTP) memo on public access to research results had two separate objectives: access to papers resulting from funded research and access to data resulting from funded research.
The OSTP is not just talking about the data used in published research papers. It’s talking about the entire dataset from the funded research. To quote the memo, “data” is defined as;
…the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications…
Once this policy goes into effect, PLOS’ requirements would seem to be an afterthought for authors funded in this manner. The problem is that the OSTP policy seems nowhere near being implemented, and an enormous number of questions remain about how it will work and whether it will work at all. For example, no one is exactly sure where all of this data will be stored and how to pay for the efforts and services required. Since patient data has strict privacy requirements, it’s unclear how it will be handled. One also must wonder how any such policy can be monitored and enforced–if I don’t show you my data, how do you know it exists?
Journal publishers come into the picture because under US law (the Bayh-Dole Act), the intellectual property (IP) generated as the result of federal research funds belongs to the researcher and their institution. This creates a loophole for researchers required to comply with the OSTP memo. US funding agencies can request that researchers make their data publicly available, but it is unclear if they can require researchers do so without violating IP law. The OSTP memo specifically requires that any resulting procedures, “recognize proprietary interests, business confidential information, and intellectual property rights.” Similarly, the intellectual property policies of groups like RCUK and Wellcome leave everything in the hands of the researchers and their institutions.
Journals, however, are not under these same IP restrictions. Consider data repositories and databases that are already great successes–GenBank for example. GenBank is the NCBI’s genetic sequence database, an annotated collection of all publicly available DNA sequences. It has been an enormous success, not because funding agencies require deposit of DNA sequence data, but because it is the practice of the community which is enforced by journals requiring deposit for publication. This success is something for which journal publishers do not receive appropriate credit (Kent, perhaps this is worth adding to your list of services journals provide).
Take a look at the instructions for authors for any genetics/genomics/bioinformatics journal, or these days, even general biology journals. Most, if not all, contain language like this, from the Nature Journals:
For the following types of data set, submission to a community-endorsed, public repository is mandatory. Accession numbers must be provided in the paper. Examples of appropriate public repositories are listed below.
Another example, from Genome Research:
Genome Research requires that data from a publication be easily available to the broader community in publicly held databases when available…
If funding agencies hope to get traction for data release policies, then these types of requirements, represent the future. PLOS has roots in the genomics and computational biology communities. The practices and attitudes of these fields are clearly ingrained in much of what PLOS does. These bioinformatic influences may have made it seem straightforward to extend the policies for sequence data across all of science, but as we’ve seen when the biomedical world has tried to impose its vision of open access on humanities researchers, one size does not fit all. Assuming that the way your field works is universal usually leads to flawed approaches and unhappy researchers.
Not all research is the same, and not all research data are created equal. There are clear cases where data are easily archived and reused, and that reuse has been successful in driving new experiments. But there are also data that are not quite so easy to handle, and that were generated to ask a very specific question under very specific circumstances. Those data may not be so re-usable.
Similarly, some types of data easily lend themselves to standard practices and file forms, others not so much. Without standards, every piece of data made available will be different and the resulting chaos may make generating new data an easier process than sorting out archived data. Many other issues have been raised across the blogosphere, including whether this puts researchers in low-to-middle income nations at risk, what peer reviewers are supposed to do with raw data (when most don’t even look at supplemental material) and whether it’s fair to ask researchers to give up data that they intend to continue to exploit for further experiments, risking being “scooped” by others.
Perhaps the biggest practical problem with PLOS’ policy is that it puts an additional time and effort burden on already time-short, over-burdened researchers. I think I say this in nearly every post I write for the Scholarly Kitchen, but will repeat it again here: Time is a researcher’s most precious commodity. Researchers will almost always follow the path of least resistance, and not do anything that takes them away from their research if it can be avoided.
When depositing NIH-funded papers in PubMed Central was voluntary, only 3.8% of eligible papers were deposited, not because people didn’t want to improve access to their results, but because it wasn’t required and took time and effort away from experiments. Even now, with PubMed Central deposit mandatory, only 20% of what’s deposited comes from authors. The majority of papers come from journals depositing on behalf of authors (something else for which no one seems to give publishers any credit, Kent, one more for your list). Without publishers automating the process on the author’s behalf, compliance would likely be vastly lower. Lightening the burden of the researcher in this manner has become a competitive advantage for the journals that offer this service.
But with PLOS’ new policy, they’re doing just the opposite and putting their own journals at a disadvantage. If publishing in a PLOS journal requires you to do weeks of additional work to organize your data into a reusable (or at least recognizable) form, adds the potential expense of hosting and serving that data or requires time and effort to find a suitable repository and uploading it to that repository, then why not publish the same paper in a different journal and eliminate those costs and timesinks?
Because data requirements are not uniform across all journals, PLOS has put itself at a disadvantage as far as attracting authors because other journals offer an easier path. If strictly enforced, this new policy is likely to result in a drop in submissions to PLOS journals. While no other mega-journal has been able to shake PLOS ONE’s hold on the market, this policy may provide an opening for competitors to gain on PLOS ONE and even overtake it.
So why take that risk? Why create this policy now? Only those at PLOS know for sure, but from the outside, this can’t be seen as anything other than a not-for-profit publisher putting mission above business concerns. PLOS has never been a risk averse organization, and this policy would seem to fit well with their ethos of championing access and openness as keys to scientific progress. Even if one suspects this policy is premature and too blunt an instrument, one still has to respect PLOS for remaining true to their stated goals.
The policy seems deliberately provocative, a strategy that has worked well in the past for driving change in scholarly publishing. Remember that a key moment in the modern open access movement was a controversial boycott threat. While that threat never materialized, it did start the ball rolling and led to things like the founding of PLOS. This policy may be meant as a similar opening salvo, not necessarily as the final step in the process but one to serve notice that change is on the horizon, to drive the conversation and eventually, progress.
As noted above, the culture of computational biology and bioinformatics remains a strong influence at PLOS, and most scientific communities seem to picture their own practices as the norm. It’s also worth remembering that PLOS is based in San Francisco, close to the heart of a culture that is building entire industries around gathering and analyzing data, preferably data created by others and made freely available for reuse and economic gain. Both of these strong cultural influences may be in play here helping to drive the policy.
Time will tell if PLOS has acted prematurely with this policy or if they’re ahead of the curve (as has often been the case). This is a particularly bold risk for PLOS — in the past, their experiments have offered some new benefit to authors, broadening access to their papers or streamlining the peer review process for more rapid publication. Here they’re putting a burden on authors for the potential benefit of others. It’s unclear whether researchers will respond in the same way they have in the past when they’re the ones being asked to make sacrifices.
Regardless, PLOS’ willingness to take such bold risks and court controversy continues to make them a tremendously valuable part of the scholarly publishing landscape and points to the crucial role played by university presses and not-for-profit publishers who can put mission ahead of margin. Even if this policy falls short, it is certainly bringing a lot of attention and thought to the questions that need to be answered if data availability is to happen. This particular policy itself may be a failure, but it is likely to open the door to better policies in the future.