Editor’s Note: This guest post features an interview with Annette Flanagin, Vice President and Executive Managing Editor of JAMA and JAMA Network; and John Ioannidis, C.F. Rehnborg Chair in Disease Prevention, Professor of Medicine, of Health Research and Policy, of Biomedical Data Science, and of Statistics; co-Director, Meta-Research Innovation Center at Stanford; Director of the PhD program in Epidemiology and Clinical Research, at Stanford University. Annette and John are Executive Director and Co-Director, respectively of the International Congress on Peer Review and Scientific Publication.
The interviewer is John Sack, Founding Director of HighWire, who is responsible for the questions.
John, John, and Annette came together to talk about the Peer Review Congress (PRC), established in 1986 by then deputy editor for JAMA Drummond Rennie. The 9th international Peer Review Congress will be held in Chicago in 2021.
In this post John Sack has posed questions to Annette and John about their perspectives, based on longstanding participation in the Peer Review Congress, about issues that have arisen and changed over time, and issues which now seem to hold the most promise in Peer Review.
How long have you been involved with the PRC?
Annette: Since 1988, when I joined the JAMA team. The first Peer Review Congress was announced by Drummond Rennie in 1986, and held in 1989. There have been 8 quadrennial Congresses since then.
John: I am, relatively speaking, a newcomer. About 10 years.
What “future thinking” on peer review have you seen that has become accepted practice?
John: Pre-registration of randomized trials and use of reporting standards are two examples of ideas that have become widely accepted. Both of them started as theoretical concepts adopted by a few methodologists and currently they are the norm. This does not mean that all randomized trials are pre-registered or that all studies use reporting standards, quite the opposite. But the expected norm is to follow these practices.
Annette: The most prominent of experiments that we have seen since the early Congresses have been the testing of various forms of peer review. As Drummond and I wrote in our January 2018 summary editorial on the history of the Peer Review Congresses, it’s clear from multiple studies that no one form of review is preferred or necessarily more successful:
“While previous trials have compared the quality of various forms of peer review (double-blind, single-blind, and open), several 2017 studies evaluated new processes that offer authors, reviewers, and editors multiple options for choosing different forms of peer review. These studies evaluated various rates of uptake and views of usefulness of different types of peer review across a range of scientific disciplines, multiple journals, and postpublication media.”
This type of research continues. A report released in early 2019 about open review, for example, studied 18,000 reviews of 9,000 manuscripts for five Elsevier journals (2000-2017). Although relatively few reviewers (8%) agreed to open review, those that did were no more or less likely to accept the invitation to review and there were no differences in their recommendations for publication. Also, studies that tested ways to improve the quality of reporting with checklists and reporting guidelines have been a forward-looking mainstay of the Peer Review Congresses.
Have you seen innovation turn into practice faster lately? Since when?
Annette: In biomedicine, the requirement for registration before submission of papers reporting clinical trials took about 7 years (post federal regulation and the ICMJE requirement), which is relatively short compared with how long it is taking for requirements for data sharing – more than 30 years.
John: I think that the discussion about a “reproducibility crisis” in the last 7 years or so, has led to an accelerated pace of discussion and implementation of various ideas into practice. For example, many data sharing initiatives have thrived in complementary fashion.
What “future thinking” on peer review seems to you to have turned out to be just a fad, or a phase?
John: It’s hard to pick one example; like in any field of research, most early observations don’t go very far.
Annette: Over the years, we have seen some studies follow technical improvements. For example, following the widespread adoption of manuscript submission and peer review software, we saw a flurry of observational studies of improvements in time to editorial decisions, and where rejected papers ended up being published.
How have the issues and emphases at the PRC changed over the years?
Annette: A number of common subject areas started with the first Peer Review Congress and continue today, including editorial and peer review processes; authorship and contributorship; conflicts of interest; research misconduct; ethical issues in research; bias in peer review, reporting, and publication; quality of the literature; quality of reporting; trial registration; data sharing; funding/grant review; research methods and statistics; postpublication review; publication citations and bibliometrics; and access to and dissemination of research.
More recent studies have addressed newer subjects such as the role of preprints, new post-publication metrics (e.g., social media), online commenting, and research reproducibility, although we have not seen enough quality research on reproducibility.
Other improvements have been seen in the scope and diversity of research into disciplines outside of biomedicine – an intentional goal of the Congress organizers, and also in the diversity of researchers presenting studies (e.g., more women and junior researchers).
John: There is broadening of the horizon to consider not only medicine, but also science at large; there are commonalities (but also interesting differences) across disciplines. For example, different scientific fields vary a lot (and change over time) in terms of whether they adopt data sharing, whether they encourage replication studies (and if so, at what stage), whether they depend on public or private funding (and how this may affect the design, conduct, and review of the studies), and how much teamwork is involved.
Are there some types of innovations that are intrinsically hard to study in a formal way, so we have to rely on “anecdata”?
John: Changes in peer review for funding decisions is more difficult to capture through experimental studies than peer review of articles submitted to journals. However, both smaller/more nimble funding agencies such as Wellcome Trust, and large funders such as the National Institutes of Health do make changes to their processes. It would be ideal to try to study rigorously the impact of these changes.
Annette: Not sure if these are hard to study or have just not yet been properly and systematically studied, but such issues as threats to scientific publication and science itself, caused by fake peer review and predatory journals, are current concerns. We have seen some sting-operation studies of fake papers, for example, but without controlled comparisons.
Another difficult area is the problem of reviewer fatigue, and innovative methods to address this need to be identified and tested. The recent Publons report, the Global State of Peer Review, 2013-2017, indicated that 50% of reviews are provided by 10% of reviewers and worldwide there is uneven availability of quality peer reviewers.
And although innovative tools such as AI-assistance software have been available for assisting editors and reviewers (ranging from duplication/plagiarism detectors to programs to identify statistical errors and inappropriate statistical tests), these require human assessment for proper interpretation. However, there have been some studies that have used these tools to cast a large pall over groups of publications or disciplines without using these tools in a manner that accounts for the biases inherent in the tools, confounders, and false positives. Quality studies that assess such innovations require resources and funding.
What do you see gaining steam right now? Looking at the next PRC, what will be newly accepted practice, and what will be sources of debate?
Annette: Three areas – reproducibility, open science, and collaboration — are gaining steam. For example, experiments by Nature journals, GigaScience, Cell Press journals, and eLife with Code Ocean to facilitate the review of code address these issues. The Center for Open Science’s Reproducibility Project is also assessing the reproducibility of research findings in oncology and previously in psychology. eLife has been doing interesting work with collaboration throughout the submission and publication life cycle.
John: There are many frontiers of meta-research that are active, and many of them converge. I’ll bet that major changes will happen regarding open data, code, and protocol sharing. More people realize that the typical research paper has become a mere advertisement that some research was done. We need to be able to really see the full research.
Preprints put dissemination ahead of review, and now F1000 Research and PeerJ had put publication ahead of review. What do you think of that? Can we study it, and are we studying how it works out?
Annette: It would be good to see studies of the quality of research reports through the dissemination life cycle: grant proposal, preprint, peer review before publication, and peer review after publication. For example, how do funders, preprint servers, journals, and open access platforms, such as F1000 Research and Peer J, address bias, conflicts of interest, errors, and scientific misconduct? There has been recent research on so-called markers of quality, such as subsequent publication rates of preprints as well as their citability and coverage in social media.
John: I am fine with all of these options, we just need to measure how they work and where they may be deficient. I favor different modes being explored and of faster dissemination with proper acknowledgment of caveats. There is no end to peer-review, peer-review is continuous even well after publication, but many of the new forms of dissemination have had no peer-review at all.
What about portability, transferring reviews, and publishing reviews?
Annette: This may be one that falls into the fad category, now that cascading is becoming standard for many journals and publishers and is supported by MECA (Manuscript Exchange Common Approach) and NISO. I hope to see multijournal studies that assess the value of cascading in terms of efficiencies for authors, reviewers, and journals. I would also like to see more research on collaborative review as has been used by Copernicus Publications since 2001 and more recently by eLife. A number of journals, including the BMJ and the BioMed Central journals, have been publishing peer reviews. I would like to see research on the quality and usefulness of published peer reviews – i.e., value vs noise?
What are the biggest problems with peer review that need to be addressed?
Annette: Bias, Reviewer fatigue, and Quality control. These concerns have been at the forefront since the first Peer Review Congress 30 years ago, but have multiplied in presaging complexity in parallel with technologic advances, information overload, and disrupting innovation. For each of these problems, we need to identify and test solutions (eg, guidelines, checklists, and software to support the quality of research reporting and peer review); processes (eg, collaborative and transparent modes of peer review before and after funding and publication and training and credit for peer reviewers); innovations, such as AI-Assistance, blockchain, and other technologies; and programs, such as peer review portability and research reproducibility, to improve the quality and efficiency of peer review and scientific publication.
John: Peer review probably improves 30% of the papers currently and makes about 5% worse; it is slow; it is difficult to find reviewers; and review gets no credit. In my opinion, many of these problems are inter-related.