Open Data and Trust in the Literature

David Spiegelhalter from the Royal Society makes a compelling case for the need for accurate statistical analysis of data, and the availability of that data in order to know whether one can trust a research result. In this short film, he visits with the University of Cambridge’s Nicole Janz to talk about reproducibility. One really intriguing suggestion Janz makes is that we need better training for students in this area. A course that required students to replicate published results from available data would go a long way toward building an understanding of what they need to do to generate (and publish) their own trustworthy results.

David Crotty

@davidacrotty

David Crotty is a Senior Consultant at Clarke & Esposito, a boutique management consulting firm focused on strategic issues related to professional and academic publishing and information services. Previously, David was the Editorial Director, Journals Policy for Oxford University Press. He oversaw journal policy across OUP’s journals program, drove technological innovation, and served as an information officer. David acquired and managed a suite of research society-owned journals with OUP, and before that was the Executive Editor for Cold Spring Harbor Laboratory Press, where he created and edited new science books and journals, along with serving as a journal Editor-in-Chief. He has served on the Board of Directors for the STM Association, the Society for Scholarly Publishing and CHOR, Inc., as well as The AAP-PSP Executive Council. David received his PhD in Genetics from Columbia University and did developmental neuroscience research at Caltech before moving from the bench to publishing.

Discussion

3 Thoughts on "Open Data and Trust in the Literature"

Maybe the problem isn’t with the lack of “better training” but rather with the ad hoc statistical methods themselves

By Enrique Guerra-Pujol
Jan 15, 2016, 4:49 PM

This could be a chicken:egg argument, but if researchers were trained in better techniques, experimental design and statistical analysis, wouldn’t they likely use better methods in their future work?

By David Crotty
Jan 15, 2016, 5:42 PM

the issue is more subtle and terminology is confusing. Spiegelhalter emphasizes reproducability but probably means replicability. we tried to clarify these various terms in http://www.nature.com/nmeth/journal/v12/n8/nmeth.3489/metrics/googleplus

the bottom line is that research uses numbers to derive findings. To clarify what are the claims of the research authors (what they found or did not find), we suggest that they should generalize their findings and map a “boundary of meaning” delineating what their findings imply and what they do not imply.

generalization is one of the eight information quality dimensions we describe in
Kenett, R.S. and Shmueli, G. (2014) On Information Quality, Journal of the Royal Statistical Society, Series A (with discussion), Vol. 177, No. 1, pp. 3-38.

In addition to replicate study results analysis as a technical exercise (with open source data), one should replicate experiments in order to verify the claimed generalization of its findings, what we consider is the essence of reproducability. This is in contrast to replicability that attempts to reconstruct the specific experimental set up conditions in order to obtain a replication of the experimental outcomes. In fact, reproducibility requires changes, and replicability attempts to avoid them. A critical point of reproducing an experimental result is that irrelevant events are not necessarily replicated. A successful generalization providing extensive reproducability allows replicating the findings of a scientific concept rather than reproducing distinct experiments.