We tend to think of research as either being reproducible and thus valid, or irreproducible, and questionable. This sort of binary thinking is problematic, because there’s a large body of research that’s entirely accurate but not easily reproduced. Do we need a new term for results fall into this in-between zone?
At the recent STM Annual Meeting in Washington, Moshe Pritsker, founder and CEO of the Journal of Visualized Experiments (JOVE) gave a talk about the gaping hole present in efforts to drive scientific reproducibility. Enormous amounts of effort, money, and regulation have been put toward opening up the data behind published experiments. But very little attention seems to have been directed toward the protocols and methodologies used to collect those data.
While I’ve made this argument before, it bears repeating: If I want to reproduce your experiment (or check to see if your conclusions are valid), then access to your data is only part of the puzzle. Your data may accurately support your claims, but if you performed your experiment in a biased or poorly conceived manner, I may not be able to see that from just looking at the data. I need to know how you gathered it. Further, if I want to reproduce your experiment, then I need to know how you did it.
While there is great promise in the reuse of data, there is likely just as much, if not more progress to be gained from public release of detailed experimental methodologies. Many experiments are designed to answer specific questions under specific conditions. The data generated may not be of much use outside of answering just those questions, but the methods used to generate the data can be broadly adapted to ask new questions. This failure of knowledge transfer hinders scientific progress.
Journals can greatly improve the reproducibility of research by requiring methodological transparency. The print paradigm of journal publishing led us to poor practices in an attempt to save space and reduce the number of printed pages. When trying to cut down an article to reach an assigned page/word limit, usually the first thing to go was a detailed methods section. In a digital era where journals are doing away with page limits, why not add back in this vital information? For a journal that still exists in print, why not require detailed methodologies in the supplementary material? If you have a policy requiring public posting of the data behind the experiments, why not a similar policy for the methods? To their credit, Nature has expanded their methods sections and Cell Pres has implemented STAR Methods, doing away with page limits to create methods sections that are actually useful.
But even with openly available methodologies, we still need to recognize that science is hard. Some research results stem from once-in-a-lifetime events, like a particular storm or celestial event. A hurricane can’t be replicated.
Often, a bench technique will take years to perfect, and even then, some things can only be done by the most skilled practitioners. This can lead to scientific results that are entirely accurate yet very difficult to reproduce. An inability to replicate an experiment can tell us more about the technical skills of the replicator than the validity of the original work. Maybe you can’t reproduce my experiment because you’re not very good at this particular complicated technique that I spent much of my career mastering. Does this mean that my work should be labeled “invalid”? Mina Bissel from the Lawrence Berkely National Laboratory puts it succinctly:
People trying to repeat others’ research often do not have the time, funding or resources to gain the same expertise with the experimental protocol as the original authors, who were perhaps operating under a multi-year federal grant and aiming for a high-profile publication. If a researcher spends six months, say, trying to replicate such work and reports that it is irreproducible, that can deter other scientists from pursuing a promising line of research, jeopardize the original scientists’ chances of obtaining funding to continue it themselves, and potentially damage their reputations.
This is why claims like the oft-cited one from Amgen are so vexing. First, any time this “study” is mentioned, it must be made clear that it is not an actual study, but was instead a commentary that was published with no supporting data whatsoever. They claimed that they tried to reproduce 53 landmark studies and could only make 6 of them work. To this day, despite calls for further information, Amgen has only made public data from 3 of the 53 claimed experiments.
We have no idea how hard Amgen really tried to reproduce these papers. How many people worked on each one, what were their qualifications and how much time did they spend troubleshooting each experiment and mastering every technique involved? Did they do the dogged detective work that a researcher like those in Bissel’s lab did and spend an entire year tracking down the one minuscule methodological variance upon which reproducibility rested? Are their claims of irreproducibility reproducible in any way?
While we can reduce the number of experiments that fall into this “too hard for me to reproduce” category through the availability of detailed research protocols, we still need to tread lightly around those results that do require such expertise. It is always preferable instead to do a new experiment that tests the original experiment’s conclusions, rather than just trying replicate it, and that may be the best way to consider whether an experiment is valid: do its conclusions stand up to further experimentation? The benefit of this type of approach to reproducibility is that it offers potential for uncovering new knowledge, rather than just repeating something already known.
At the very least, we need a new term for these works that are essentially irreproducible, but not invalid. Any suggestions for a term that is neutral and does not denigrate the validity of the work in question but that notes the difficulty of reproduction would be welcome.