We’ve quickly come to understand that reproducibility is a problem within the scientific literature. Companies and scientists are having trouble replicating results reported in journals. This problem has the potential to truly upend the scholarly publishing world, no matter what business model is being used. If trust in peer-review, journals, and articles falls beyond a certain point, the emperor will have no clothes.
The mainstream media is picking up on this problem, as a recent story in the New York Times demonstrates. In a tale of woe from the infectious disease community, an editorial team came to the conclusion:
. . . that science had turned into a winner-take-all game with perverse incentives that lead scientists to cut corners and, in some cases, commit acts of misconduct.
Given the incentives being baked into the system — from payments for publication in high-impact journals, to h-index calculations being used as convenient substitutes for more thoughtful and qualitative evaluation techniques, to the reliance on soft money in academia — this situation is not surprising. The question is, How far is too far? Or, more pointedly, Is “too far” already in the rear-view mirror?
With retractions on the rise (see figure below, from the New York Times), and a perception that more is not better when it comes to published papers, some people are attempting to develop a diagnostic test, if not a potential remedy.
One group is taking the psychology literature to task, according to a recent Chronicle of Higher Education story:
A group of researchers have already begun what they’ve dubbed the Reproducibility Project, which aims to replicate every study from those three journals for that one year. The project is part of Open Science Framework, a group interested in scientific values, and its stated mission is to “estimate the reproducibility of a sample of studies from the scientific literature.” This is a more polite way of saying “We want to see how much of what gets published turns out to be bunk.”
The approach has two interesting aspect — whether the original study procedures can be reproduced and, if so, whether the claimed results can be achieved.
This isn’t the only initiative in the field of psychology. The other, PsychFileDrawer.org, has similar ambitions. While the Reproducibility Project is truly nascent, PsychFileDrawer.org has had contributors attempt to reproduce nine studies — so far, only 33% have been reproduced.
Replication isn’t something the current system has sufficient incentives to encourage — it’s time-consuming and generates papers unlikely to be published in high-profile journals. Even mega-journals aren’t likely to receive these papers for the first reason — they’re time-consuming to conduct. The opportunity cost is simply too large for scientists to do the work.
Replication is also nearly impossible for some trials. The psychology literature is available for this because many psychology experiments involve a relatively small population performing tasks in a manageable timeframe. Longitudinal, invasive, or extremely technical studies aren’t as available to replication. Some studies — especially medical studies involving thousands of patients over decades — can never be replicated.
Scientific claims should be validated and replicated whenever possible. New incentives for such seeminly mundane but ultimately important work should be developed. We may need fewer new theories and more validation steps.
But science is not one thing — it consists of many domains, each different, each particular. Some claims will be harder to replicate, some easier. Some domains will be amenable to open source replication initiatives, while many won’t be. Statistics, careful review, reader-focused thinking, and clear editing matter even more in these fields, where mistakes can have lasting effects — because doing it over simply isn’t an option.