In the “chicken little” world of scholarly publishing, reeling from crisis to crisis is business as usual. We seem averse to the concept of steady, incremental improvement and must instead face constant impending doom. The serials crisis has become a permanent fixture in our culture, and we remain in the throes of the access crisis and the (at least perceived) peer review crisis. The data access crisis is just coming over the horizon, and it’s going to be a doozy. The reproducibility crisis and the negative results crisis are both coming into their prime, and represent an interesting conflict. While both chase similar goals–increased transparency, increased efficiency and trust in the literature–their proposed solutions seem at odds with one another.
The reproducibility crisis is based on reports suggesting that the majority of published experiments are not reproducible. One of the particular concerns is that researchers will waste time and effort trying to build on results that are not true:
There is, however, another group whose careers we should consider: graduate students and postdocs who may try to build on published work only to find that the original results don’t stand up. Publication of non-replicable findings leads to enormous waste in science and demoralization of the next generation. One reason why I take reproducibility initiatives seriously is because I’ve seen too many young people demoralized after finding that the exciting effect they want to investigate is actually an illusion.
The proposed solution to the crisis is that time, effort and funds be put toward repeating published studies, and that some sort of career credit be offered for doing so. It is unclear who will provide those funds, and if their use for replication means they won’t be spent on new experiments, slowing progress. Further, if one assumes that the most talented researchers will be those pushing the envelope, then how much should we trust the skills of those whose careers are based on the unoriginal work of repeating experiments. More importantly, those who advocate for creating new publication outlets for replication and validation studies seem to ignore that science has both built right in to the process.
The negative results crisis (also known as the “file drawer problem“) comes from the notion of publication bias, the idea that researchers never publish experiments that don’t work or that provide null results. Again, one of the concerns is that when one researcher hides this sort of work away, other researchers may waste their time and efforts:
He and others note that the bias against null studies can waste time and money when researchers devise new studies replicating strategies already found to be ineffective.
The proposed solution is the creation of a registry for all data generated by all experiments, as well as, “Creating high-status publication outlets for these [null] studies.” Problems arise with this solution as well–is writing up a failed experiment a valuable use of a researcher’s precious time? How willing are researchers to publicly display their failures? How much career credit should be granted for doing experiments that didn’t work?
While both sets of solutions are aimed at greater transparency and time savings, the contrast seems obvious. Don’t trust published positive results, spend time and money to repeat them, but trust unpublished negative results, and don’t waste your time repeating them.
There’s no easy solution here. Both approaches suffer from a similar problem–negative results, experiments that don’t work and failed replications are very difficult to interpret. Did the experiment fail to work because the theory behind it was truly incorrect, or was the methodology flawed? Or was the theory right, and the methodology sound, but the researcher just messed up and missed a decimal point or made up a vital solution incorrectly?
Similarly, if you are able to reproduce someone else’s results, that’s meaningful, but if you are unable to repeat their experiment, it’s hard to know what that means. Was their theory/methodology wrong or are you just not as good at the bench as they were? Cell biology offers myriad examples of the complexity of research specimens and techniques, and the almost absurd level of detail required for proper troubleshooting. Who should we trust?
I suspect that we may just have to accept the notion that research requires some level of redundancy. If we create a repository of failed experiments and no one will risk performing (or funding) a new attempt where others have failed in the past, then we may block important discoveries from happening. Suppose some researcher proposes that compound X will cure disease Y, but unknowingly uses a contaminated sample of X in the experiments and the cure fails. Do we prevent anyone else from testing that hypothesis and miss out on the potential cure? Given the state of research funding, how likely is it that funding agencies are going to offer grants for a project that has already been declared a failure? Is it better to let people keep plugging away at theories that seem sound, even if it does mean some redundancy and waste?
As for reproducibility, should it be a separate act required for confirmation or instead is it just a normal part of doing the next experiment? If you’re going to spend the next few years of your life building on someone else’s results, doesn’t it make sense to be certain things work the same way in your hands? I know that when I was a graduate student, I had to go back and resequence areas of published genes to find out whether the oddities I was seeing were my own errors or the original author’s. But overall, are we better off testing the validity of research results by performing new experiments to test the conclusions they offer (if X is true, then Y should happen)? That seems a way to combine progress with confirmation, rather than holding in place until everything is double checked.
There’s probably a spectrum of right answers for resolving both of these crises. Both proposed solutions here increase researcher workload, taking valuable time away from doing new experiments. The gains in accuracy must be carefully weighed against the losses in progress. Some results, particularly those which will directly impact health and care for humans may require a higher level of redundancy than less vital research areas, and warrant the extra time and expense. Overall though, scholarly inquiry is fueled by skepticism. It’s perfectly reasonable to doubt anyone else’s results, even your own results. That may require doing the same thing a few times. The question is how to make best use of those redundancies.