While predicting the future is fraught with peril, there seems a fairly clear consensus about where the research community would like to see things go. Our efforts these days are focused on broadening access to the research literature and research results, and toward improving their quality, through better transparency and reproducibility. Academia is a notoriously conservative community though, and has been slower to move in these directions than some would prefer. Rather than waiting decades for consensus, many research funders, governments, publishers and institutions have chosen instead to push things forward through policies that impose requirements on researchers.
All of these policies come with a cost, and one of the biggest costs comes in monitoring and enforcing compliance. A policy without teeth — without actual consequences for non-compliance — loses all effectiveness. We know that researchers are overburdened and short on time. Anything that they don’t have to do, they won’t do. When the NIH’s PubMed Central deposit policy was not actively enforced, compliance was poor. Now that it has been tied to grant renewal and receiving future grants, compliance has improved (although is still not 100% and is still heavily reliant upon the efforts of publishers on behalf of authors). MIT continues to struggle with public access and archiving policy compliance, as does Oregon State University.
Understanding the cost of compliance is vital to the effective design of the policy itself. The RCUK has a complex open access policy, with complicated reporting requirements. This complexity and the costs it generated were not predicted by policy makers, and as a result, universities have been hit hard financially, and more money than was expected is being spent on paperwork and bureaucracy.
The NIH’s policy, in comparison, seems vastly simpler, at least in terms of the compliance burden. There are no required institutional reports to compile, and no spending records to track. Researchers are simply required to note the PMC ID on any paper they list as having resulted from NIH funding when they write up progress reports or apply for further funds. Even still, there are costs with NIH compliance. Aside from the costs of the PMC repository itself, a system that generates and keeps track of PMC ID numbers had to be built, and must be maintained to ensure that they perpetually resolve to the right paper. Furthermore, when an applicant lists such a number, someone somewhere must actually check that it is an accurate number that corresponds with an actual paper. Quantity becomes a factor as well. The NIH is dealing with more than 50,000 applications per year and greater than 60,000 ongoing grants. That’s a lot of numbers to check. Universities also spend varying amounts to track their researchers and make sure they are following the rules.
For journals, policies can range from the simple to the complex. The recently announced policies requiring ORCID iDs for article submissions is fairly inexpensive and straightforward, at least as far as compliance. More expense comes from the integration of the iD with the paper and the journal platform, but that’s a separate cost from tracking compliance. A required field asking for the author’s ORCID iD is added to the journal’s paper submission system. The researcher cannot proceed further with their submission until this field is filled, so no manual intervention is needed by the journal. Presumably the iD number is checked at some point before publication for accuracy, although this could be done at the copyediting stage for accepted articles, reducing the number of checks needed by only dealing with the smaller group of submissions that made it through peer review. Costs might be higher for a policy signatory journal like PLOS ONE should they choose to monitor compliance in this manner, given the large number of papers accepted and the absence of a copyediting stage. Adding in a time-consuming additional step to the production workflow can be expensive, both in terms of paying for labor and in potential delays in publication.
PLOS, however, seems to have taken something of a hands-off approach toward enforcing their data publication policy. The publisher made a big splash with the 2014 implementation of a policy requiring authors to make all data behind their articles publicly available. Later that year, Scholarly Kitchen alumnus Tim Vines and colleagues did several studies to see how well authors were complying, and found that many were not making the required data publicly available. PLOS’ response was that the responsibility for checking the availability of data had been passed along to the external peer reviewers, with seemingly little, if any oversight happening at PLOS itself. Not surprisingly, a requirement without any enforcement was not taken seriously by authors.
In recent months, this lack of policy enforcement came to a head when researchers asked to see the data behind a controversial PLOS ONE paper on carcinogens in laboratory animal food. Despite repeated requests and promises that action was being taken, the data has still not been made publicly available and the paper remains in the journal with no indication of this apparent violation of a mandatory policy. (A similar issue has arisen with a different PLOS ONE paper, but one that was published before the data policy went into effect).
With no enforcement, is data archiving really a requirement or just a gentle suggestion? Without effective monitoring and enforcement, the policy becomes an empty promise. But how would a journal go about enforcing such a policy? One of the pioneers in data publication, GigaScience, requires authors to deposit their data in the journal’s own database, and assigns a peer reviewer to specifically review that data. No data, no paper.
But GigaScience has the advantage of being associated with a major genomics company, BGI, which provides subsidized server space for this repository. That may not be feasible for journals without a similar partner. Still, there are reputable repositories that could serve a curation and monitoring function here. Most journals won’t publish DNA sequence without an accession number from GenBank or the like, proof that the sequence has been made publicly available in an established repository. Similar arrangements could be built into the article publication process for the entire data set via reputable public repositories.
This would only provide an indication that some data was available, and careful review would still be needed to determine whether it was complete and usable. Asking a peer reviewer to do this work may have variable results. GigaScience reports an enthusiastic response from their data reviewers who enjoyed the novelty of the requested task. Different fields may react differently though, and once the process becomes routine, finding a data reviewer may become even more difficult than finding a reviewer for the paper itself, an increasingly onerous task for journal editors.
And that doesn’t even touch on the notion of monitoring over time. Is it enough that the data are available upon publication? What happens in a year? Five years? Ten? Does an effective policy require permanent availability of the data, at least as long as the paper is made available?
All of which means more costs; costs to store the data and serve the data. Funding for several key data resources is already in jeopardy. Costs must also be covered for curation of the data and monitoring the data. Costs are incurred by the journal when it spends the time to check each data set to confirm that it is available and accurate. The costs of customer service, time spent explaining the policy to authors as well as responding to complaints when the data is found to be missing or inadequate, must also be factored in, as must the costs of monitoring the availability of those same data sets over time.
It has been proposed that journals are potentially the key vehicle for improving transparency and reproducibility of research results. This is not an insurmountable task, and indeed is a service that journals and publishers could provide. But it is not something that will just happen, and it’s not something that can effectively be done for free. PLOS’ hands-off policy has proven this. So who will pay for it? We know that library budgets are flat, if not declining. Institutions continuously complain about subscription prices for journals and author charges for open access articles.
If it’s worth doing, then it’s worth doing right. Does the research community value data publication enough to actually pay for it?