We’ve quickly come to understand that reproducibility is a problem within the scientific literature. Companies and scientists are having trouble replicating results reported in journals. This problem has the potential to truly upend the scholarly publishing world, no matter what business model is being used. If trust in peer-review, journals, and articles falls beyond a certain point, the emperor will have no clothes.
The mainstream media is picking up on this problem, as a recent story in the New York Times demonstrates. In a tale of woe from the infectious disease community, an editorial team came to the conclusion:
. . . that science had turned into a winner-take-all game with perverse incentives that lead scientists to cut corners and, in some cases, commit acts of misconduct.
Given the incentives being baked into the system — from payments for publication in high-impact journals, to h-index calculations being used as convenient substitutes for more thoughtful and qualitative evaluation techniques, to the reliance on soft money in academia — this situation is not surprising. The question is, How far is too far? Or, more pointedly, Is “too far” already in the rear-view mirror?
With retractions on the rise (see figure below, from the New York Times), and a perception that more is not better when it comes to published papers, some people are attempting to develop a diagnostic test, if not a potential remedy.
One group is taking the psychology literature to task, according to a recent Chronicle of Higher Education story:
A group of researchers have already begun what they’ve dubbed the Reproducibility Project, which aims to replicate every study from those three journals for that one year. The project is part of Open Science Framework, a group interested in scientific values, and its stated mission is to “estimate the reproducibility of a sample of studies from the scientific literature.” This is a more polite way of saying “We want to see how much of what gets published turns out to be bunk.”
The approach has two interesting aspect — whether the original study procedures can be reproduced and, if so, whether the claimed results can be achieved.
This isn’t the only initiative in the field of psychology. The other, PsychFileDrawer.org, has similar ambitions. While the Reproducibility Project is truly nascent, PsychFileDrawer.org has had contributors attempt to reproduce nine studies — so far, only 33% have been reproduced.
Replication isn’t something the current system has sufficient incentives to encourage — it’s time-consuming and generates papers unlikely to be published in high-profile journals. Even mega-journals aren’t likely to receive these papers for the first reason — they’re time-consuming to conduct. The opportunity cost is simply too large for scientists to do the work.
Replication is also nearly impossible for some trials. The psychology literature is available for this because many psychology experiments involve a relatively small population performing tasks in a manageable timeframe. Longitudinal, invasive, or extremely technical studies aren’t as available to replication. Some studies — especially medical studies involving thousands of patients over decades — can never be replicated.
Scientific claims should be validated and replicated whenever possible. New incentives for such seeminly mundane but ultimately important work should be developed. We may need fewer new theories and more validation steps.
But science is not one thing — it consists of many domains, each different, each particular. Some claims will be harder to replicate, some easier. Some domains will be amenable to open source replication initiatives, while many won’t be. Statistics, careful review, reader-focused thinking, and clear editing matter even more in these fields, where mistakes can have lasting effects — because doing it over simply isn’t an option.
Discussion
6 Thoughts on "Reproducibility — An Attempt to Test the Psychology Literature Underscores a Growing Fault Line"
And might not some experiments, like the one conducted by Stanley Milgram at Yale about obedience, become so well known that participants in future replications would not be likely to be fooled into believing they were actually torturing the “victims”?
From http://www.lhup.edu/~DSIMANEK/cargocul.htm
“I explained to her that it was necessary first to repeat in her
laboratory the experiment of the other person–to do it under
condition X to see if she could also get result A, and then change
to Y and see if A changed. Then she would know that the real
difference was the thing she thought she had under control.
“She was very delighted with this new idea, and went to her
professor. And his reply was, no, you cannot do that, because the
experiment has already been done and you would be wasting time.
This was in about 1947 or so, and it seems to have been the general
policy then to not try to repeat psychological experiments, but
only to change the conditions and see what happens.”
Adapted from Richard Feynman’s Caltech commencement address, 1974