We throw around the term “peer review,” but like so many terms, it’s often used without fully understanding what it signifies. In the “taxonomy of confusions” system David Wojick recently published, our use of the term “peer review” would tick many of the boxes — poorly defined concepts, rules, and procedures; misleading text (use of the term); weak factual assumptions; weak ethical assumptions; ambiguous rules; vague procedures; and so on down the taxonomy, the pencil growing dull with use.
One peer-review system isn’t equivalent to another.
These differences have traditionally been matters of style, not substance — preferences about process or smaller caliber systems based on fewer available resources. But today, there are differences of substance.
Within this changing framework, we’re potentially over-emphasizing one aspect of peer review while diminishing the centrality of another aspect that is actually more important and more reliable. In the midst of this, we may be neglecting the fact that pre-publication peer review is just a step in science, where peer review never ends. Science is the ultimate peer review environment, and journal peer review has a limited and particular place and role within it.
Lately, a lot of peer review initiatives have emphasized one of peer review’s functions — validating a work. Peer review can do a pretty decent job at this, but it is complicated, flawed, and unreliable. Some of the newer publishing approaches — PLoS ONE, Scientific Reports, PeerJ, and mega-journals in general — base their value almost completely on the validation step, while new third-party peer review businesses (Rubriq, Peerage of Science) are also heavily dependent on peer review as a validation or quality improvement step.
But there may be a more important aspect to peer review, one which is intricately involved with both validation and quality improvement. It is lacking from newer publishing businesses either entirely or meaningfully. This aspect involves determining where the work belongs and, by extension, how it ranks.
Pre-publication peer reviewers are traditionally working for one journal, and are therefore asked to determine if the paper in question is of sufficient quality and interest to be published by Journal A, meaning to the presumed audience of Journal A. This is a two-part question, involving both quality and relevance, or, put more academically, designation and filtration. These functions are strongly linked, as they both depend on judgments related to the anticipated audience.
We pay a lot of attention when validation via peer review fails — such failures are often a source of controversies, retractions, or scandals. One such example was recently discussed in the comments section of the Kitchen — the infamous arsenic study published in Science. This paper, you probably recall, posited that a certain bacterium could live in an environment replete with arsenic and lacking phosphorus, and seemed to integrate arsenic into its DNA and proteins. This paper would have had far-reaching implications about evolution and where life might exist. However, after publication, the paper was quickly thought to be flawed, and later experiments confirmed the skepticism.
But was this a failure of validation-based peer review? Or a success of relevance-based peer review?
Validation-based peer review has severe limitations. Peer reviewers spend hours with papers, not days or weeks or months. Experiments aren’t recreated by peer reviewers, and data analyses are mostly taken at face value. Reviewers can vary in their interests and abilities. There are usually only two or three per paper. Their skills at statistical analysis can vary, as well. A well-written paper can hide a fatal flaw in study design or logical progressions. Opinions about the validity of certain methodological approaches can vary, affecting acceptance based on subjective validity assumptions. Validation peer review is a tricky business. It improves papers, but can’t ferret out fraud, reliably catch all possible design or statistical issues, or be expected to replicate experiments.
In short, validation-based peer review is trickier to execute well. It’s especially treacherous if used to the exclusion of its natural allies — designation and filtration.
Relevance-based peer review has fewer inherent limitations or landmines — the reviewers usually are of the same audience as the journal they’re reviewing for, and their own interest in a topic or the novelty of the results is a good indication of relevance. Relevance-based peer review also has a clear upside — it puts the right papers in front of the right audience.
In the case of the Science arsenic paper, validation-based peer review probably worked as well as it can be expected to, while relevance-based peer review worked very well. That is, the audience best suited to evaluate the paper — thousands of scientists, not just the 2-3 who saw it pre-publication — were made aware of it thanks to the prominent venue Science provides, and therefore could tear it apart, try to replicate the experiments, test the data, and challenge the assumptions.
In essence, relevance-based peer review effectively returns reports to the overall peer review environment of science, pitched at a level commensurate with their interest to the field, plausibility of their findings, and impact of their claims. This is incredibly important. It’s what makes science work.
Our underappreciation of this important flow through journals and back into science is becoming rather startling to me, as if publication is the end of the road for a hypothesis. As noted above, too many new publishing initiatives are predicated on the belief that peer review as a validation step is sufficient — both to validate a paper as “true” (it does not), and to create interest in the paper (again, it does not).
The potential irrelevance problem is not large yet, but it may be growing. We’ve seen citations falling as a percentage of the literature for years now. A recent study of articles published between 2006 and 2008 found that relevance was not a terrible problem back then, as Phil Davis wrote in his coverage of the study:
The map of science, as measured by the flow of manuscripts, is an efficient and highly-structured network, a new study reports. Three-quarters of articles are published on their first submission attempt; the rest cascade predictably to journals with a marginally lower impact factors. On average, articles that were rejected by another journal tend to perform better — in terms of citation impact — than articles published on their first submission attempt.
This study occurred before mega-journals emerged. These have published thousands of papers into systems even their proponents admit are not good at making the right audience aware of them.
The questions at the heart of this are complicated because “quality” and “relevance” are not completely unrelated concepts, which poses problems for third-party reviewer initiatives and mega-journals alike. Was the arsenic paper an organic chemistry paper? An origin of life paper? An evolutionary biology paper? What elements deserved to be emphasized? You need to know the presumed audience to select a reviewer who can validate the paper properly for “quality.” If an evolutionary biologist is reviewing a paper that is ultimately an organic chemistry paper, you could have a mismatch, and either a false-positive or false-negative outcome from the validation review.
This creates a dilemma for services like Rubriq, where:
Reviewers determine the level of novelty and interest . . .
How can you pick the right reviewers to make this judgment before you know what kind of paper it really is? Authors have input into this, of course, and generally know where they want to publish anyhow. But a Rubriq review may cloud this if the wrong reviewers are selected at the outset. And it gets a little mixed up in Rubriq’s own claims, as Keith Collier notes in his recent interview here:
It is important to know that if you are reviewing for Rubriq, you are essentially reviewing for any journal.
As a former journal editor, the core of my job was to make sure that each accepted article had been vetted not just by experts, but by the right experts. Rubriq turns this guided process into a stochastic one, assuming expertise exists based on keyword matching or willingness to review.
Post-publication peer review systems like F1000 Research are even more handicapped when seen through these lenses. The publication event that kicks off their post-publication review process slots the paper in a mega-journal-like repository. Without adequate peer review, the publication event likely slots it poorly. Any peer review after this event is less likely to be robust validation — who knows if the correct domain expert is reviewing the paper? And the process cannot shift the venue of publication, making it lacking entirely in designation or filtration aspects. It is comparatively weak tea even on the validation front.
There is also a problem for the perception of science and scientific publishing — that is, we may be leaning too hard on the concept of “peer review” by narrowing its functions down to validation. Some new initiatives are selling “peer review” in as narrow a way as possible — we looked at it, found it “methodologically sound” or “scientifically sound,” and was therefore deserving of publication. But for whom? Where is the element telling us what it is, where it belongs, who should care? How can science, writ large, do the next round of peer review in these cases? How can the public know whether it was important or not? How can science maintain its integrity if papers are effectively being buried in mega-journals, yet still can be cited as if they benefited from robust, multi-faceted peer review, replete with validation, designation, and filtration?
Overall, we need to watch what elements of peer review we’re marketing. It seems to me that we are currently over-marketing the “validation” aspects of peer review — which are uncertain — while ignoring the more important and more reliable “relevance” aspect of peer review.
Journal-based peer review occurs within a larger environment of peer review called “science.” If we don’t move reports from one small set of peer reviewers into the strongest possible pool of scientists for major scientific review after publication, we’re not truly serving science. We are being cynical about peer review by treating publication as an end unto itself, and not as a means to a larger goal.