Access to data is the new frontier for scientific researchers. Governments, funding agencies and journals are increasingly calling for researchers to provide public access to their data. More access to more information is a good thing, and over time, the current controversies over these policies (embargo periods, patient confidentiality, archive costs and maintenance, etc.) will be worked out, and releasing at least some portion of one’s data, post-publication, will become just another part of the research process. While one of the goals for these policies is to increase verification and reproducibility of research, successfully doing so is going to vary wildly depending on the data and the types of experiments performed.
Some types of data, and areas of research lend themselves readily to reuse and rapid verification. It’s no surprise that computational research and areas like bioinformatics, are leading the charge for data access. If your experiment consists of running numerical data through an algorithm, then releasing your data and your code allows others to quickly verify that you’ve done what you’ve said you’ve done. But when it comes to other types of research, wet bench experiments or observational work for example, reproduction is not quite so simple.
You’re probably familiar with attention-grabbing articles from Amgen and Bayer, claiming that the majority of research results are irreproducible. The fact that drug development companies, often working at a fast pace and under economic pressure, could not reproduce the findings of complex experiments should perhaps not be surprising. Research, particularly cancer research is incredibly complex and one must contend with a near-infinite number of variables. Scientists literally spend years developing and mastering the difficult and esoteric techniques necessary for their experiments and the slightest environmental or methodological variance can produce very different results.
In a recent article in Cell Reports, Lawrence Berkeley National Laboratory breast cancer researcher Mina Bissel and colleagues offer a cautionary tale about how difficult reproducibility can be. Bissel’s lab set up a collaboration with a group in Boston. The groups were working with breast tissues, and as part of their experiments, these tissues were broken down and the resulting cells sorted by type using a method called Fluorescence Activated Cell Sorting (FACS). But the two labs, working with the same tissues and the same protocols, could not get similar data sets from their FACS techniques no matter what they tried. After a year of painstakingly breaking down every aspect of the process it was discovered that one small mechanical detail of the technique, essentially how rapidly the tissues were stirred during breakdown into cells, was the culprit.
Bissell and her collaborators deserve an enormous amount of credit for their doggedness and attention to detail. Many researchers would have given up at some point, and I suspect that’s more than likely what happens when a company like Amgen or Bayer encounters a problem like this. In the Amgen article cited above, they tried to reproduce 53 different studies over a 10 year period. While the details are not given, I assume it is unlikely that Amgen devoted the time and resources of entire research groups to spend a year focused solely on the minutiae of each of these 53 experiments. But that’s what it may take to reproduce some results.
A former colleague shared the story of his laboratory and several others who shared a tissue culture technique but couldn’t reproduce each other’s findings. Again, after nearly a year of delving into the details, it was discovered that the collagen they had all ordered from the same supplier had enough variability between different lots to fundamentally change the results seen. I know at least two laboratories that moved to new universities and couldn’t reproduce their previous results and both, after much perseverance, found that minute ionic differences in the local water supply altered their solutions. I’ve seen laboratories where the cells in the dishes kept near the top of the incubator were different from those at the bottom due to vibration from an internal fan.
This is the life of the cell biologist, and working with cancer cells adds further complexity. A cancer cell researcher recently offered the following, suggesting that varying results are the norm, not an anomaly:
In the specific context of cancer cell biology, there is an additional factor that needs more thought and discussion as far as reproducibility of experimental results. Many investigators strongly favor the use of human cancer cell lines, but most of the common cancer cell lines have chromosomal abnormalities. For example, individual MCF-7 cells cells can each have a wide range of duplications of chromosomes, resulting in different genetic content within each cell. So, even assuming that everyone is conducting their experiments perfectly and that everyone has “authentic” MCF-7 cells, they could still be analyzing cells with chromosome numbers anywhere from 44 to 87. As concerning as the variation in the mean chromosome number is the large range of observed values. Every experiment is therefore conducted on a hugely diverse population of cells and selective pressure can bring different variants to the fore. Change the experiment, change the selective pressure, and you have different cells from the “same” cell line.
Layered on top of this genetic variation is phenotypic variation. Tumors are now recognized to consist of cancer cells in very different phenotypic or differentiation states, even from the same genetic background. So the cellular material for cancer cell line experiments is highly heterogeneous. An additional complication is that most common assays (e.g. scratch assay, transwell migration) don’t have a large dynamic range. Even “big” effects are typically around 2-fold up or down. Given that there are many cell behaviors that could impact these assays and the hugely variable cellular inputs, what is reasonable to expect in terms of reproducibility?
And that’s just one sub-field of cell biology. I’m sure you can find similar comments about complexity and variability from any number of researchers in any number of fields.
The notion then, that public access to researcher data is a magic bullet which can end questions of experimental reproducibility, is naïve at best. Given that the data access movement has been primarily driven by the computational research world, where reproducibility can be achieved by re-running the data, it’s an understandable assumption but it also falls victim to a myopic worldview where all researchers work the way one’s own field works.
While broad data availability is worth pursuing, we should temper our expectations for the benefits it offers. It is likely going to have greater value in some fields than others, and for some experimental methods than others. We should also take great care when interpreting replicative studies. A positive result, showing that results can be reproduced, offers meaning. A negative result, however, does not immediately invalidate previous findings because we cannot differentiate between an initial experiment that was wrong and a failure by the second group to handle all of the complex details of variability.