fallen tree
If a tree falls…

We throw around the term “peer review,” but like so many terms, it’s often used without fully understanding what it signifies. In the “taxonomy of confusions” system David Wojick recently published, our use of the term “peer review” would tick many of the boxes — poorly defined concepts, rules, and procedures; misleading text (use of the term); weak factual assumptions; weak ethical assumptions; ambiguous rules; vague procedures; and so on down the taxonomy, the pencil growing dull with use.

One peer-review system isn’t equivalent to another.

These differences have traditionally been matters of style, not substance — preferences about process or smaller caliber systems based on fewer available resources. But today, there are differences of substance.

Within this changing framework, we’re potentially over-emphasizing one aspect of peer review while diminishing the centrality of another aspect that is actually more important and more reliable. In the midst of this, we may be neglecting the fact that pre-publication peer review is just a step in science, where peer review never ends. Science is the ultimate peer review environment, and journal peer review has a limited and particular place and role within it.

Lately, a lot of peer review initiatives have emphasized one of peer review’s functions — validating a work. Peer review can do a pretty decent job at this, but it is complicated, flawed, and unreliable. Some of the newer publishing approaches — PLoS ONE, Scientific Reports, PeerJ, and mega-journals in general — base their value almost completely on the validation step, while new third-party peer review businesses (Rubriq, Peerage of Science) are also heavily dependent on peer review as a validation or quality improvement step.

But there may be a more important aspect to peer review, one which is intricately involved with both validation and quality improvement. It is lacking from newer publishing businesses either entirely or meaningfully. This aspect involves determining where the work belongs and, by extension, how it ranks.

Pre-publication peer reviewers are traditionally working for one journal, and are therefore asked to determine if the paper in question is of sufficient quality and interest to be published by Journal A, meaning to the presumed audience of Journal A. This is a two-part question, involving both quality and relevance, or, put more academically, designation and filtration. These functions are strongly linked, as they both depend on judgments related to the anticipated audience.

We pay a lot of attention when validation via peer review fails — such failures are often a source of controversies, retractions, or scandals. One such example was recently discussed in the comments section of the Kitchen — the infamous arsenic study published in Science. This paper, you probably recall, posited that a certain bacterium could live in an environment replete with arsenic and lacking phosphorus, and seemed to integrate arsenic into its DNA and proteins. This paper would have had far-reaching implications about evolution and where life might exist. However, after publication, the paper was quickly thought to be flawed, and later experiments confirmed the skepticism.

But was this a failure of validation-based peer review? Or a success of relevance-based peer review?

Validation-based peer review has severe limitations. Peer reviewers spend hours with papers, not days or weeks or months. Experiments aren’t recreated by peer reviewers, and data analyses are mostly taken at face value. Reviewers can vary in their interests and abilities. There are usually only two or three per paper. Their skills at statistical analysis can vary, as well. A well-written paper can hide a fatal flaw in study design or logical progressions. Opinions about the validity of certain methodological approaches can vary, affecting acceptance based on subjective validity assumptions. Validation peer review is a tricky business. It improves papers, but can’t ferret out fraud, reliably catch all possible design or statistical issues, or be expected to replicate experiments.

In short, validation-based peer review is trickier to execute well. It’s especially treacherous if used to the exclusion of its natural allies — designation and filtration.

Relevance-based peer review has fewer inherent limitations or landmines — the reviewers usually are of the same audience as the journal they’re reviewing for, and their own interest in a topic or the novelty of the results is a good indication of relevance. Relevance-based peer review also has a clear upside — it puts the right papers in front of the right audience.

In the case of the Science arsenic paper, validation-based peer review probably worked as well as it can be expected to, while relevance-based peer review worked very well. That is, the audience best suited to evaluate the paper — thousands of scientists, not just the 2-3 who saw it pre-publication — were made aware of it thanks to the prominent venue Science provides, and therefore could tear it apart, try to replicate the experiments, test the data, and challenge the assumptions.

In essence, relevance-based peer review effectively returns reports to the overall peer review environment of science, pitched at a level commensurate with their interest to the field, plausibility of their findings, and impact of their claims. This is incredibly important. It’s what makes science work.

Our underappreciation of this important flow through journals and back into science is becoming rather startling to me, as if publication is the end of the road for a hypothesis. As noted above, too many new publishing initiatives are predicated on the belief that peer review as a validation step is sufficient — both to validate a paper as “true” (it does not), and to create interest in the paper (again, it does not).

The potential irrelevance problem is not large yet, but it may be growing. We’ve seen citations falling as a percentage of the literature for years now. A recent study of articles published between 2006 and 2008 found that relevance was not a terrible problem back then, as Phil Davis wrote in his coverage of the study:

The map of science, as measured by the flow of manuscripts, is an efficient and highly-structured network, a new study reports. Three-quarters of articles are published on their first submission attempt; the rest cascade predictably to journals with a marginally lower impact factors. On average, articles that were rejected by another journal tend to perform better — in terms of citation impact — than articles published on their first submission attempt.

This study occurred before mega-journals emerged. These have published thousands of papers into systems even their proponents admit are not good at making the right audience aware of them.

The questions at the heart of this are complicated because “quality” and “relevance” are not completely unrelated concepts, which poses problems for third-party reviewer initiatives and mega-journals alike. Was the arsenic paper an organic chemistry paper? An origin of life paper? An evolutionary biology paper? What elements deserved to be emphasized? You need to know the presumed audience to select a reviewer who can validate the paper properly for “quality.” If an evolutionary biologist is reviewing a paper that is ultimately an organic chemistry paper, you could have a mismatch, and either a false-positive or false-negative outcome from the validation review.

This creates a dilemma for services like Rubriq, where:

Reviewers determine the level of novelty and interest . . .

How can you pick the right reviewers to make this judgment before you know what kind of paper it really is? Authors have input into this, of course, and generally know where they want to publish anyhow. But a Rubriq review may cloud this if the wrong reviewers are selected at the outset. And it gets a little mixed up in Rubriq’s own claims, as Keith Collier notes in his recent interview here:

It is important to know that if you are reviewing for Rubriq, you are essentially reviewing for any journal.

As David Crotty wrote in a comment on the same post:

As a former journal editor, the core of my job was to make sure that each accepted article had been vetted not just by experts, but by the right experts. Rubriq turns this guided process into a stochastic one, assuming expertise exists based on keyword matching or willingness to review.

Post-publication peer review systems like F1000 Research are even more handicapped when seen through these lenses. The publication event that kicks off their post-publication review process slots the paper in a mega-journal-like repository. Without adequate peer review, the publication event likely slots it poorly. Any peer review after this event is less likely to be robust validation — who knows if the correct domain expert is reviewing the paper? And the process cannot shift the venue of publication, making it lacking entirely in designation or filtration aspects. It is comparatively weak tea even on the validation front.

There is also a problem for the perception of science and scientific publishing — that is, we may be leaning too hard on the concept of “peer review” by narrowing its functions down to validation. Some new initiatives are selling “peer review” in as narrow a way as possible — we looked at it, found it “methodologically sound” or “scientifically sound,” and was therefore deserving of publication. But for whom? Where is the element telling us what it is, where it belongs, who should care? How can science, writ large, do the next round of peer review in these cases? How can the public know whether it was important or not? How can science maintain its integrity if papers are effectively being buried in mega-journals, yet still can be cited as if they benefited from robust, multi-faceted peer review, replete with validation, designation, and filtration?

Overall, we need to watch what elements of peer review we’re marketing. It seems to me that we are currently over-marketing the “validation” aspects of peer review — which are uncertain — while ignoring the more important and more reliable “relevance” aspect of peer review.

Journal-based peer review occurs within a larger environment of peer review called “science.” If we don’t move reports from one small set of peer reviewers into the strongest possible pool of scientists for major scientific review after publication, we’re not truly serving science. We are being cynical about peer review by treating publication as an end unto itself, and not as a means to a larger goal.

Enhanced by Zemanta
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

76 Thoughts on "Validation vs. Filtration and Designation — Are We Mismarketing the Core Strengths of Peer Review?"

In short, validation-based peer review is trickier to execute well. It’s especially treacherous if used to the exclusion of its natural allies — designation and filtration.

Here you seem to be begging the question. A lot of us don’t feel that designation and filtration are at all natural bedfellows of validation. In fact a lot of people think they have have no place at all in 21st-century scienced. I know you disagree, but you’ve not made a case, just stated your favoured conclusion as though it’s axiomatic.

In the case of the Science arsenic paper, validation-based peer review probably worked as well as it can be expected to.

I am more than a little surprised that you would say this when scientific consensus is that the publication of that paper represented a failure of (validation-based) peer-review.

… while relevance-based peer review worked very well. That is, the audience best suited to evaluate the paper — thousands of scientists, not just the 2-3 who saw it pre-publication — were made aware of it thanks to the prominent venue Science provides, and therefore could tear it apart, try to replicate the experiments, test the data, and challenge the assumptions.

Are you arguing in favour of post-publication peer-review?

Or are you saying it’s good that this paper got into Science despite its flaws because it was important enough that it deserved scrutiny by many people? If so, you seem perilously close to saying that the glamour mags not only do but should select papers for publication on the basis of how sensational their claims are rather than how good the science is.

As studies have since shown that peer-review verdicts are two thirds noise to one part signal, I now stand even more firmly by what I said last year: the best analogy for our current system of pre-publication peer-review is that it’s a hazing ritual. It doesn’t exist because of any intrinsic value it has, and it certainly isn’t there for the benefit of the recipient. It’s basically a way to draw a line between In and Out. Something for the inductee to endure as a way of proving he’s made of the Right Stuff.

So: the principle value of peer-review is that it provides an opportunity for authors to demonstrate that they are prepared to undergo peer-review.

You point to a story in “USA Today” to claim that there was a “scientific consensus” that this was a failure of validation peer-review? Thankfully, the story links to scans of the actual reviews, which are both thorough and complex. In fact, I believe the reviewers asked the key question (“What is the lowest concentration of phosphate that sustains similar growth as observed in the +As/-P medium?”), which elicited the answer:

We have no evidence for the answer to this question yet. While this is indeed an interesting experiment, which we intend to pursue for our next publication, in this report we wanted to establish this physiological phenomenon and present to the community these data.

Basically, from what I can tell, if the researchers had done this experiment suggested by the peer reviewer, they might have seen the problem with their paper. Peer reviewers are not in a position to force authors to redo experiments if the results are otherwise solid. But they did raise a key question other authors might have paused over. Ultimately, the results were a failure of the scientists doing the experiments and getting overly excited. It’s disingenuous for NASA, these scientists, or the media to blame the peer review process, actually. It did what it was designed to do, including putting the paper in a prominent place for further scientific validation (or not).

As this shows, these scientists, and I’ll bet the peer reviewers, also viewed filtration and designation as natural allies of validation peer review.

Your other points only underscore the point of this essay, which is that validation peer review is helpful (1/3 of comments are substantive, even in your worst case); being willing to undergo it does validate both the reputations and results to some degree; but that getting the information into the right hands and in front of the right eyes is crucial for effective (not “lip service modern” and therefore ineffective) post-publication review by scientists capable of evaluating and replicating the results.

You point to a story in “USA Today” to claim that there was a “scientific consensus” that this was a failure of validation peer-review?

No. I linked to a synthesis page which has a link to the USA Today article, to the reviews themselves, harshly ciricial comments from Rosie Redfield and Leonlid Kruglyak, a blog post on the peer-review failure by a biochemist and a specific criticism from another biochemist. If you want more, Retraction Watch called it “an illustration of the abysmal failure of scientific peer reviewers … to do their jobs with competence and integrity.”

Again, you’re just proving my point, which is that we’re overmarketing validation peer review as the only purpose of peer review. When a paper is found lacking, why point to the peer reviewers? Why not point to the authors? And, why not check on how quickly and thoroughly the comeuppance happened? That would be evidence that it was published to the right audience.

You’re cherry-picking from the Retraction Watch article. The full quote from that “Monday morning quarterbacking” essay on Retraction Watch is:

The article itself is, however, only a small part of the story. As we’ll see in future essays, the case provides an illustration of the abysmal failure of scientific peer reviewers, scientific journals, government and academic institutions, the media and numerous individuals to do their jobs with competence and integrity.

So this essayist is saying that one paper with really interesting results that were disproven by further experiments is an indictment of pretty much everything, including the media you’re using to justify your point of view? I think that’s just histrionics.

Again, if we weren’t marketing peer review as something that we think can separate truth from fiction, find all flaws in studies, and validate papers with 100% certainty, we’d be doing a better job of matching expectations with reality. Science is peer review. Journal-based peer review has particular functions, one of the most important of which is matching a paper to a group that can address its claims. (Most papers make claims, even if they present them as “findings.”)

My concern currently comes from all those papers being published as if they’ve been peer reviewed and validated, but aren’t even finding an audience capable of or interested in challenging them, because they’re in mega-journals with very poor audience-matching capabilities. These papers are going nowhere fast, but their claims may never be challenged adequately. And that’s the looming scandal and shame, lurking right under our noses.

Mike, in your hazing analog you are basically claiming that the present system of journal communication is fundamentally wrong. I find such industry wide claims to be preposterous. One does not have to argue for the way the world is, rather one has to argue for changing it. The bigger the proposed change the stronger the argument must be. So far as I know you have provided no substantive argument at all, merely argument by assertion. Sorting and ranking have obvious benefits. What great benefits would come from eliminating these practices, such that we should do that?

That you do not like the world is not an argument for changing it.

Your case against peer review seems to be based on difficulties in getting your own papers published. Is it possible you’re not being very objective?

It’s possible, of course — and if it were so, I’d be the last to know. But I don’t think that’s it. Pretty much all my submissions have eventually been published (though not always at the first place I sent them), so in the scheme of things I am one of the winners in this game.

As I noted in the third of the linked articles, “I’ve been wary of writing this post because of the fear that it reads like a catalogue of whining about how hard-done-by I am. That’s not the intention: I want to talk about widespread problems with peer-review, and I wanted to make them concrete by giving real examples. And of course the examples available to me are the reviews my own work has received. Let me note once again that I have made it through the peer-review gauntlet many times, and that I’m therefore criticising the system from within. I’m not a malcontent claiming there’s a conspiracy to keep my work out of journals. I am a small-time but real publishing author who’s sick of jumping through hoops.”

The thing is, I am hearing similar stories from a lot of my colleagues as well. As ex-BMJ editor Richard Smith wrote in a rather devastating article, “almost no scientists know anything about the evidence on peer review. It is a process that is central to science – deciding which grant proposals will be funded, which papers will be published, who will be promoted, and who will receive a Nobel prize. We might thus expect that scientists, people who are trained to believe nothing until presented with evidence, would want to know all the evidence available on this important process. Yet not only do scientists know little about the evidence on peer review but most continue to believe in peer review, thinking it essential for the progress of science. Ironically, a faith based rather than an evidence based process lies at the heart of science.”

When I was a grad student and postdoc, I also shared this view – that peer review is capricious and ineffective. It stemmed from hearing peer review horror stories over morning coffee, and the disappointment of having my own papers rejected. I’ve now been on the other side of the fence as a managing editor for five years, and watched how the system works when you can see all the different pieces (I’ve now handled almost 10,000 papers).

The biggest difference I’ve noticed is how selective academics are when they attack peer review: everyone picks their worst experience and uses that to characterise it. In the process they forget about all the useful referee comments they’ve received or the papers that have been evaluated fairly and well, which may well be the majority.

This is a huge problem, because without support from the research community the whole peer review enterprise could stagger and fall. I often feel that peer review is like public transit – it’s everyone’s favorite moan, but without it our cities would grind to a halt and our whole society would suffer.

I notice that you spend a lot of time criticising peer review in many different venues, but I haven’t yet seen any compelling evidence that it is as bad as you say. I want to see numbers, not verbal arguments, and as a fellow scientist I think you should demand the same of yourself too.

There are some (not very encouraging) numbers in Welch 2012. He ran controlled experiments on the peer-review verdicts in a prestigious finance conference and in eight prominent economics journals. He found there there two two parts noise to one part signal in peer-review verdicts, and that reviewers’ intrinsic tendency to be nice or nasty was as important as the quality of an ms. in determining whether acceptance or rejection was recommended.

I’m not saying peer-review is without value. As it happens, I did have a good experience just this last month: one reviewer (of two) gave a detailed, helpful review that genuinely improved my paper. My contention is that the cost is very high, and the value is less than we tend to blindly assume it is. I want to see numbers that prove this is a good way for us to invest our time, rather than getting on with making science. I’m skeptical, but more than happy to be convinced.

I’d be really interested to hear what evidence you’d find convincing, as then we’d be able to go about designing the relevant tests and collecting the data.

I don’t know. I’d be interested to know, too. If you’d like to pursue this, please drop me a line on — we might be able to work something out between us.

Rats, my email address was lost because I put it in angle-brackets! Let’s try again …

I don’t know. I’d be interested to know, too. If you’d like to pursue this, please drop me a line on [miketaylor.org.uk] — we might be able to work something out between us.

Mike, your “noise and signal” are yet another meaningless metaphor. Why cannot you state your thesis scientifically? That reviewers differ is not an argument against review. Editors find reviews useful so what is there to prove? You need to make a specific non-metaphorical claim that can be tested. Moreover it must carry the policy weight you claim for it.

In 2009, Sense About Science surveyed over 4000 authors of ISI ranked journals about peer review. At that time 69% were satisfied or very satisfied with the peer review process and 91% believed that their most recent paper was improved as a result of being peer reviewed. Maybe things have changed since then but I’d be surprised if they’ve changed very dramatically.

About the survey demonstrating high satisfaction with current peer review systems… Not that conflicting interest always means conflicted outcomes, but it is perhaps pertinent to mention the impetus and funding of that survey:

“Sense About Science developed the Peer Review Survey 2009, in consultation with editors and publishers and administered with a grant from Elsevier; the survey included some questions from the Peer Review Survey 2007 (commissioned and funded by the Publishing Research Consortium) for comparison, and new questions about future improvements, public awareness and pressures on the system.”

What are these items evidence of Mike? I don’t usually take the “read all this and get back to me” gambit. You need to actually say something. State your claims and how these links support them.

I see there is a reference to mistakes. Every human selection system makes mistakes. Juries, elections, Congressional votes, stocking a store, etc. That is no reason to eliminate them. Selection is part of rationality.

Your hazing analog is also very strange. Hazing has been outlawed because it is abusive. Are you claiming that rejecting a paper is abusive? That would make you a true crank. Or are you confusing hazing with selection?

You are claiming that an established scholarly practice carried out by millions of people is somehow wrong. That is an very strong claim so it requires very strong evidence. This is a basic law of reasoning, policy and science.

Mike: peer-review is more like a hazing ritual than a reliable filtering mechanism.
David: where is your evidence?
Mike: I wrote it up in these four blog entries.
David: I’m not going to read those.

Then I believe we are finished here.

If you refuse to explain your position then we are indeed done. Citations without argument are a dodge not a defense. Nor is an analogy an argument. So far you have said nothing coherent.

About those blog entries: Anecdotes do not equal evidence. And it’s lazy and borders on disrespectful to refer to a list of links without at least some context.

You seem to have a different take on peer review here:

http://www.guardian.co.uk/science/blog/2012/nov/13/science-enforced-humility

… in which you write, “During the publication of a recent paper that I and co-authors wrote, one of the peer reviewers – very unwelcome at the time – pointed out that the statistical section was flawed, and required us to rework it. With the benefit of hindsight, the extra work was worthwhile: it improved the final paper.”

Here is our review of around 1oo peer-reviewed empirical studies (and some other articles) pointing towards journal rank, conservatively speaking, having no usefulness at all:
http://arxiv.org/abs/1301.3748
Unless there is new data, the evidence is rather clear: journal hierarchies don’t capture anything that one could quantify.

It looks like a weak paper, Bjorn. I was hoping it wouldn’t be, but you run all the cliche ideas at this problem, and apparently reached your conclusion before you conducted your research. It’s an overwrought paper, too — the writing is angry and aggressive. In any event, high-ranking journals have more retractions because they are more heavily scrutinized and, as you note, their articles are more uniformly read. To take this as an indictment is backwards — it actually means my assertion is correct, in that ranking helps the right scientists become aware of and read papers of interest. This leads to a higher rate of retraction and correction, but that’s science. But I’m not going to get into a debate with you about this. You have your paper. Others can read it. I wish them luck!

“apparently reached your conclusion before you conducted your research”
I’m a neurobiologist, not a bibliometrician. I didn’t have enough data to think anything different from everybody else until Marcus asked me to write something with him: top journals publish top research, perhaps with some less stellar stuff in there (see citations in the paper). Believe it or not, I was surprised the data couldn’t capture any of these intuitive notions about journal rank. If journal rank does anything, it’s not visible in the data.

“high-ranking journals have more retractions because they are more heavily scrutinized”
Ah, do you have any evidence for that? I’d like to cite it as we couldn’t find much in that direction. In fact, we found evidence that most retractions come from ‘lo-rank’ journals (Fig. 3):
http://www.pnas.org/content/109/42/17028.full
which means there is a lot of scrutinizing going on elsewhere, contrary to your claim. Clearly, one would assume that visibility plays a role, as we acknowledge, but it’s not a big enough factor to stand out in the data. Moreover, the non-retracted papers support the retraction data: overestimation of effect sizes with too low sample sizes in ‘hi-ranking’ journals (Fig. 1c). So we don’t even need the retraction data to make our claim. Unless, that is, if you have evidence to counter both of these well-supported claims?

“You have your paper. Others can read it. I wish them luck!”
In fact, quite a number of people already have. For instance three anonymous reviewers at PLoS Biology. They agreed with you that it was weak, but for the opposite reasons: they thought there was nothing new in our manuscript: everybody knows journal rank does nothing! You can see their opinions for yourself at the bottom of our draft versions:
https://docs.google.com/document/d/1VF_jAcDyxdxqH9QHMJX9g4JH5L4R-9r6VSjc7Gwb8ig/edit
BTW, Nature and Science also rejected our manuscript with the response that what we had to say was already said and didn’t need to be repeated on their pages.
So Nature, Science and PLoS Biology agree that journal rank does little for science, if anything.

Thus, as the evidence stands for now, journal rank is, at best, useless. Or do you have evidence we’re missing?

Bjorn, as I pointed out in my comment below you seem to be confusing journal ranking with impact factor (IF) ranking. The reviewers are correct in that these issues with the IF metric are well known. But Kent’s point about the twin values of sorting and ranking refers to the real ranking of journals, or what I call the natural ranking, not to the IF ranking.

With ‘natural ranking’ you mean the subjective ranking in our heads or which ranking? We cite no less than four different studies that show that IF aligns very well with subjective notions of journal quality: 40–43. Do you know of data that other methods capturing whatever ‘natural rank’ is better then IF? I’d like to review and potentially cite them!

Bjorn, given that we are talking about a human behavior system, what you call (disparagingly it seems) subjective ranking in our heads is actually an important objective fact. If IF captures it then your claim that IF measures nothing important is false. I also suspect these so-called subjective rankings are correct since relevance and importance are both with respect to the people involved. The science of science is a social science.

In fact your logic seem self contradictory to me (another matrix entry). You first use studies that use IF to draw strong conclusions about ranking then dismiss IF as not measuring anything. I do not think you can have it both ways.

David says: “I also suspect these so-called subjective rankings are correct.”

Could you please expand on what you mean by “correct”? I don’t understand what reality you’re saying the subjective rankings correspond to.

“what you call (disparagingly it seems) subjective ranking in our heads”
No, not disparagingly. I have/had this notion in my head as well.

“what you call […] subjective ranking in our heads is actually an important objective fact.”
Indeed! Which is precisely why we looked at the empirical evidence as to whether it is a figment of our imagination like astrology, dowsing or homeopathy, or whether one can actually find some quantifiable measure that can corroborate our subjective impression as something that is grounded in fact. The empirical data is roughly equivalent to that on astrology, dowsing or homeopathy: journal rank exists mainly in our heads and has little, if any, correlate one would be able to measure.

“I also suspect these so-called subjective rankings are correct”
This is the claim we tested in our paper and the available evidence that we could find contradicts it. Again, most people who believe in astrology, homeopathy or dowsing would strongly claim these practices to be ‘correct’. However, they have little to show in terms of actual data to support their claims. I’ve shown what we have found and there is constantly coming out more, see e.g. this paper (I have so far only read the abstract) that claims to show that statistical methodology is worse in top-ranking journals.
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0056180
That’s the kind of evidence that backs up our claims in the paper. Do you have any similar evidence to back up your suspicion that “subjective rankings are correct”?

“In fact your logic seem self contradictory to me (another matrix entry). You first use studies that use IF to draw strong conclusions about ranking then dismiss IF as not measuring anything. I do not think you can have it both ways.”
The more I read from you, there more I get the suspicion you need to read more than the abstract of our paper, and probably also the references we cite (the ones with actual data in them). As in some of your previous claims, we do nothing of the sort that you claim we do.
We check a measure that captures subjective (or as you call it ‘natural’) journal rank to see whether it captures any other metric with potential implications for any aspect of ‘quality’, such as expert opinion, replicability, methodology, retractions, statistical accuracy, etc. All of these measures can be linked to some aspect of what one could conceive as the quality or reliability of the work. The data say that the correlations with IF are usually low and where they are not low, we find contradictions in the direction of the slope of the correlations. Thus, a conservative interpretation of these data is that the IF captures subjective rank, but fails at grounding that subjectivity in anything objective (again, similar to astrology, homeopathy or dowsing, etc.). We never say the IF doesn’t measure anything. On the contrary, we say it captures subjective rank excellently. Two other metrics that IF captures very well, e.g., is the number of non-cited papers (smaller in higher IF) and the number of retractions (higher with IF). Read our paper and you will find all these references.
I have no idea where you get the impression from that the IF doesn’t measure anything? If you’re less cautious in your analysis, you might even say that there is a tendency of the IF to measure unreliability of scientific publications. So clearly, “not measuring anything” can only come from someone who hasn’t read our paper and is not familiar with the literature. Or do you have some evidence that you have actually read more than just the abstract?

Bjorn, I share Kent’s concern that this is an advocacy paper which only cites studies to support Its arguments. I also object to some of your concepts as misleading. For example you seem to call the fact that important journals publish important results “publication bias”. In fact your publication bias and decline effect mechanisms seem to be producing good and expected results, as Kent notes in part.

More deeply however you seem to be confusing journal ranking with impact factor ranking. The IF is an attempt to capture the natural ranking that occurs without it. It does not create that ranking, although it may affect it some. Human systems are like that when they know they are being analyzed. Thus while your criticisms of the IF are interesting they are somewhat beside the point and they certainly do not support your conclusions regarding the fact that journals have real ranks.

Unfortunately you use the term ranking ambiguously as between these two very different concepts — journal rank and IF rank. Journals, like universities, have rank. Journal ranking would have existed if IF had never been invented. This is why the bibliometrics community is so hot to develop better metrics, in order to capture the natural, real ranking. Thus in my coherence analysis matrix your paper seems to suffer from an ambiguous concept, namely ranking.

“I share Kent’s concern that this is an advocacy paper which only cites studies to support Its arguments.”
As I already replied, if you have evidence we’re missing, please bring it on and we’ll include it! The reviewers we’ve had until now (three anonymous plus Fanelli, Ioannidis and others) have either not indicated any missing studies, or we have included them in the version on arxiv).

“For example you seem to call the fact that important journals publish important results “publication bias”.”
No we don’t. I’m not sure how the section where we introduce the term can be misunderstood the way you did. but we’re of course willing to rephrase it to reduce the chances of such misunderstandings: “surprising and novel effects are more likely to be published than studies showing no effect. This is the well-known phenomenon of publication bias [12,20–26].” So clearly, we don’t do what you say we do.
Publication bias per se has nothing to do with journal rank: it is one of the unintended consequences of journal rank that something that exists even without journals is getting worse due to journal rank. Journal rank doesn’t create publication bias, it only makes it worse. The same holds for the decline effect, of course.

“More deeply however you seem to be confusing journal ranking with impact factor ranking.”
Yours is a troubling concern, as it was mentioned in our previous rounds of review (see link above) and we thought we had re-worded our manuscript adequately. Apparently we haven’t and I thank you for making us aware that we’re still not making ourselves understood.
Any journal rank can only exist in our heads – there is no objective ranking until it has been quantified. This is essentially, what you’re saying? That the methods to establish objective journal rank can make that rank align better or worse with our subjective ranking? The task that we have done is to collect the evidence surrounding the one measure that captures subjective journal rank best: IF. We cite a multitude of studies that show very high correlations of IF with subjective journal rank. In other words: IF is currently the best measure for what exists as rank in people’s heads. So if this is what you mean by “the fact that journals have real ranks”, then IF captures it better than anything else we have seen in the literature – do you have any evidence that there is an alternative method that captures subjective rank better than IF?

“Journal ranking would have existed if IF had never been invented.”
I agree completely: Which is precisely why we call the IF rank that is in the data ‘journal rank’ and not ‘IF rank’: not only is IF the best method out there to capture the ranking that existed before the IF was invented, but no matter how you capture the subjective hierarchy, the social incentives surrounding it are the same. Now, as we state in the article, there may be a method to rank journals that (in contrast to IF) does identify some journals that consistently publish articles that are more reliable, more cited and more reproducible etc. than others. We’re not aware of such a method, but there are some out there that might work. But because IF matches subjective rank so well, across a large range of fields, these methods would put journals at the top, that are not at the top now, neither subjectively nor in terms of IF. Notwithstanding, given the solid psychological evidence surrounding competition and IF, we would expect any such new journal rank to have the same pernicious incentives as the one we have now. Thus, whichever way you turn it, the rank as it stands now is of little use. We say that in the article. Clearly, we did not do a god job in getting this across, thanks for emphasizing this.

So, if you then ask me, how it comes that we all (including me!) have (or had) an intuitive ranking of journals in our heads long before anybody of us knew anything about the empirical data around this ranking? For this question, I can only offer you an answer by analogy, as I don’t know of any direct studies: people believed in astrology, dowsing or homeopathy. Then people started to apply the scientific method and all the effects that everybody took for granted disappeared. I suspect the same holds for journal rank: we all have the subjective impression that there is something to it, much like we might be inclined to think that the homeopathic globuli we once took actually relieved our ailments (done that as a pre-teen against hayfever), or the feeling that our horoscope fit us so well (deny that!). But when we apply the scientific method, all these subjective effects disappear. The data I have seen seem to indicate that journal rank is like homeopathy or astrology in this respect.
Of course, if you have evidence that the current data we cite is all wrong, I’d be delighted to reverse my current conclusions.

Well said Kent. I have argued in past comments that this sorting process is the highest value of the journal system. Getting the right stuff to the right people is what the journal system as a whole does and as you point out this is more than just communication. It is how verification and falsification come to occur, in the scientific community not in prepublication review. Communicating this fact to the policy community will not be easy as it is a sytem property not a journal property and one that has been little studied.

Note too that because journals provide the focal point they also facilitate the criticism. Search engines help people find the right stuff but they do not facilitate criticism. Hence they are no substitute for journals, as some people argue.

Regarding my taxonomy, while I have applied it to diagnosing confusions in organizations this may be the first time it has been applied to an entire industry. You are right that the concept of peer review has become confused. This is a central feature of revolutions, where concepts seek a new equillibrium. The question is how will the science system sort the articles? The journal system does it now and does it well, so why should that change?

Rejection of technically sound manuscripts (or “filtration and designation”, in more pleasant terms) is extremely important. The value of one’s article upon publication, like it or not, is largely measured by the number of articles it displaces as inferior. It’s the rejection rate, not the impact factor, that makes publishing in Science and Nature so prestigious for scientists.

That important task absolutely should be done using peer review – but not BY peer reviewers! Filtration and designation should be done by editors, and editors only.

It should not be within reviewer’s territory to say whether a manuscript is, or is not, “Science material” – though they often do that, avoiding a solid scientific justification why. Yes, peer review absolutely should evaluate the significance and impact of the work, but in relation to that field, science and society as a whole, not in relation to the perceived prestige of the journal! It is the editor’s task to judge that peer reviewer evaluation in relation to the scope and selectivity of the journal he or she is trusted to pilot.

Asking a reviewer “is this manuscript good enough for journal x” is lazy editorial practice, and encourages lazy reviewing. An editor should be able, and good, in making rejection decisions based on the justified scientific arguments presented by reviewers. If I owned a journal, and found out an editor is taking the easy way and outsourcing filtration and designation decision to reviewers, the editor would very quickly be an ex-editor.

I have always seen the core strength of peer review as feedback and revision of the manuscripts. It serves other purposes but in my experience as an editor and author, what gets published is often much better than what gets submitted largely due to the constructive feedback from the editor and reviewers. I’ve reviewed for both PLoS One and PeerJ, actually acted as the academic editor for PeerJ and the process isn’t all that different. Feedback is given to the authors and the manuscript is revised one or more times until it is accepted. There is a difference in criteria, but I don’t see this as a huge issue in a practical sense. PLoS One has worked well. It is obviously attracted huge numbers of submissions and, for what it is worth, the articles on average are cited quite frequently. Journals serve a variety of roles and journals that focus their reviews narrowly on whether the research and write up are acceptable science seem to have found a niche that both authors and readers find useful.

That the validation steps for newer journals and established journals seem similar to peer reviewers or authors isn’t surprising. This post accepts validation peer review for what it is. However, Martin Fenner of PLoS noted at a meeting in December that PLoS ONE does a poor job of getting papers to their proper audience — a hypothesis I’ve harbored, and one he confirmed. It’s good papers are being improved by review, but they aren’t being reintroduced to scientific discourse effectively by mega-journals.

Martin Fenner said that that researchers and practitioners who could/should make use of a PLoS One paper have difficulty or are failing to locating it? Or am I misinterpreting what you are saying?

His statement was that PLoS ONE was making it difficult for people to know about relevant papers. You need awareness first.

Ah, this “awareness” business might– just perhaps– be a generational thing. As a mere younger author (I am but 30), I don’t read any journal in particular, I plug keywords into a search engine (usually google, pubmed, or web of science). Have done since I was writing “fake grants” for my college course assignments when I was taking genetics and psychobiology courses. (I later switched to physical chemistry since it was more fun, and also needed for neuroscience work in general.) Whereas my PhD boss (an Associate Professor in New York) usually just flipped though the JACS articles released that week— occasionally, I was notifying him about interesting news I’d dredged up with pointed keyword searches. Although he did keep careful track of who was citing him.

Quite frankly, I overrode some of my postdoctoral collaborators and submitted to PLoS ONE instead of a subscription access journal since I am pretty darn sure more folks my age will see the article given the keywords I associated with my submission. (It was accepted with no need for revisions on the second try— I had to put in an analysis section. I was under pressure to submit the first version before I had performed a proper analysis of the human tumor data. Sigh; Germans….. and MDs…..I like talking to them, but sometimes you do get the feeling they just want to hit a button and get an answer….)

Maybe, maybe not. PLoS ONE’s impact factor is falling, and likely will be cut in half in a couple of years. And, because they never know who is reading a paper, you’ll never know the answer to your supposition. Your choices may also evolve as your career solidifies. But what’s best for scientific discourse? Leaving it up to Google?

Impact factor…? Really now. Most professors of statistics I know have issues with that particular metric. (As in, they laugh at me. I used to take it seriously, but nowadays most of the good, replicable work gets done in “low impact” journals; at least in the fields I am familiar with.) What matters is that a large audience has access to the article— then, I can market it myself to specific parties that may be interested in following up on my work. Of course, my field (medical diagnostic) has serious repeatability issues, may not be the same for different scholarly work.

I was an acquisition editor and publisher for both books and journals. My first question to any author or editor was: who is your audience? It seems to me that the mega journals are like the Marriott smorgasbord: something one wades through, rather than selecting from the menu which is put together by the chef. In the end one may be full, but in the former one is more likely to experience indigestion.

In your post you quote David Crotty as saying “Rubriq turns this guided process into a stochastic one, assuming expertise exists based on keyword matching or willingness to review.”

I think it is important to point out that this was a huge assumption about how Rubriq works, and incorrect. As an employee of American Journal Experts, a sister company to Rubriq, I have been witness to their process, and reviewers are evaluated and chosen based on much more than just a list of keywords, by PhD level scientists that have the same background as many of the professional editors making the same decisions at journals. In terms of “willingness to review,” that is an attribute that the prospective peer reviewer must have whether the request is coming from Rubriq or a journal.

I don’t see any confusion in Keith’s comments about the reviewers determining the level of novelty and interest and “reviewing for any journal”. The reviewers are able to point out the strengths and weaknesses of the technical execution and presentation of the work, as they would for any journal, and then speak to the novel aspects of the work and describe the breadth of readership that they believe would find the manuscript interesting or noteworthy. In this, they are simply replacing the common “Is this paper appropriate for our journal?” question with “Where do you think this paper should be published?” with the answers described in terms of readership rather than journal titles.

The biggest assumption here is that the system faces some huge challenge from an independent review service like Rubiq. Some journals may choose to accept an independent review in place of their own, but I personally believe that in the foreseeable future many journals will use independent reviews to supplement their own process and decision making. What scientist wouldn’t love to have more data when making a decision? We (scientists) have all received an editorial rejection from a journal that we really thought our manuscript was appropriate for, and I believe there has to be at least some editors concerned that they are declining to review some manuscripts that would have made great additions to their journal, but they simply have a limited amount of information and time to make that decision. Independent reviews can help guide authors to send their papers to the right journals with a level of honesty that a colleague might not be comfortable providing, and provide an editor with more information about the manuscript sitting in front of them before having to decide to ‘desk reject’ or move forward with their own process. Nothing in this process forces the journal editor to relinquish any decision-making power over what to publish or which opinions to trust.

Will a rubriq review include suggestions for improving the paper. Will a rubriq review suggest which journal may view the paper more favorably? Will the author of the paper get his/her money’s worth by going through the rubriq system? Will rubriq continue to use reviews who are exceptionally critical of those papers submitted to them for review?

Lastly, what is the goal of rubriq? Is it to make money or to enhance the paper and concomitantly science?

“Will a rubriq review include suggestions for improving the paper?”
Yes. In addition to the scorecard the reviewers leave comments that can be read by both the authors or any journal editors the authors allow to see the report.

“Will a rubriq review suggest which journal may view the paper more favorably?”
The Rubriq reviewers suggest an audience, and Rubiq will then suggest journals likely receptive/appropriate for the manuscript based on the scores and subject. If the authors choose to use the comments to improve the manuscript, then they can decide to aim high on the list (or submit to wherever they choose), or they can see if any of the journals are interested in the manuscript as-is and wait for feedback from the journal editors before making any changes.

“Will the author of the paper get his/her money’s worth by going through the rubriq system?”
Ultimately that will be up to the authors, obviously. However, to put this money in context, if you allow that an average postdoc’s salary + health insurance (in the US, at NIH minimum) is $50K per year, then a Rubriq review costs less than a week of a postdoc’s time. How many PIs would have a postdoc spend a week gathering data of the value that Rubriq provides, that could be submitted with the paper to help inform the journal editor’s decision to move forward on the manuscript? A lot. I think these reports could be like a cover letter on steroids.

To put the money in another context, from my work as the Scientific Illustration Manager at AJE, I can tell you that many journals charge authors as much or more to publish one color figure in their paper.

“Will rubriq continue to use reviews who are exceptionally critical of those papers submitted to them for review?”
Yes, provided that the reviews are well-executed, and professional (ie – not abusive). Journal editors will be provided the identity of the reviewers and statistics about that reviewer’s review history. If a reviewer does a good job but is harsh in the scoring, then journal editors will be able to see that this reviewer tends to score X amount below his or her fellow peer reviewers so they can put that reviewer’s scores in context. Given the variety of approaches to peer review and personalities in science, I suspect journal editors are already doing this informally within their reviewer pool.

The company would not benefit in the long term by providing “optimistic” reviews to journal editors. It’s that simple.

“Lastly, what is the goal of rubriq? Is it to make money or to enhance the paper and concomitantly science?”
Both. It is a company, but you would have to admit that is is being very transparent about where its money goes, especially compared with many of the publishers. In addition, most of the people that work there are scientists who aren’t very far removed from the bench themselves, and are sympathetic to the pain points of academic publishing and invested in trying to improve the system.

To go back to my original comment, it did combine a bit of Rubriq and Peerage of Science. Rubriq seems (at least based on the information on your FAQ pages) to base things as much as possible on an algorithm (http://www.rubriq.com/how/faq#reviewer-faqs-2), whereas Peerage is much more about whoever happens to volunteer. But there does seem to be a stochastic process either way–your editors are looking through the best-match the algorithm provides as a first step, correct? Then looking elsewhere if it’s not quite good enough?

I would still like to know the number of editors working at Rubriq and what their qualifications are for finding the right reviewers. If I send a niche paper in my area of interest, how likely are they to know my field? If Rubriq catches on, will you need to hire hundreds of such editors to provide broad coverage?

And there are very specific requirements of different journals that vary quite a bit. As I wrote in my comment here (http://scholarlykitchen.sspnet.org/2013/02/05/an-interview-with-keith-collier-co-founder-of-rubriq/#comment-81044):

As one example, toxicology journals usually require a dose:response curve. But general biology or medical journals may not have that requirement. Am I reviewing the paper for the former or the latter? Or do I have to perform a detailed review for every single possibility out there?

An article written for a broad, general science journal like Nature is written in a different style with a differing level of detail than an article for a specific niche journal, where jargon may be more familiar to readers and the level of experimental depth provided may be greater given fewer restrictions on page count. Did they include methodological details? Does Nature want those? I’d have a hard time doing a review of a paper without context.

Editors of journals subscribing to Peerage of Science have full freedom to solicit reviews using the Referral-tool, in addition to those who review on their own initiative or after receiving a Referral from a colleague.

But the idea is that when people see reviewing also as a way to demonstrate their expertise publicly and gain academic recognition for it, they try to choose wisely, and then devote more time and thought to the task. Most importantly, that attempt to display expertise is publicly judged by others, and the PEQ-scores are displayed to the author and any editor tracking the process (and to the reviewers themselves, after the process).

It is most certainly not a stochastic process.

Quoting Editor-in-Chief of Animal Biology (quote abridged, full text here):

Given that this was the first time that I was using the Peerage of Science service, I felt that I should ask for the identity of the reviewers, which were forwarded to me promptly.

I might not have come up with the same reviewers as the ones that performed the Peerage of Science reviews, but they were definitely appropriate and very competent.

Authors in Peerage of Science sometimes do state the intended journal in their comment to Peers. They are naturally free to do this (though in my personal opinion it risks at least unconsciously prejudiced peer review).

Reviewers are also free to state suitable journals, and often do. That positive statement is slated to become standard feature of the service. Also, I expect editors will be keen to be alerted on the recommendations of most skilled reviewers.

I think you are using the term “stochastic process” with regard to which scientists have already signed up to review for the company and which have not, but I think it is worth pointing out that the way manuscripts are matched to reviewers is far from random. It is true that the process begins with a keyword matching algorithm, but from the page you referenced:

“There is no automation in our method of choosing reviewers for manuscripts. Once we have built a large network of reviewers, our algorithm will help us identify reviewers within our existing pool to consider for the review. The Rubriq team will still make a final decision regarding the expertise of the reviewers that the algorithm suggests and how well the reviewers match to the manuscript. If we have reviewers within our network who are qualified to review that particular paper, we will make the paper available to them to claim the review. If we do not have a good match in our network, we will personally invite new reviewers with the right qualifications and expertise to review the manuscript.”

I completely agree that different journals will have different expectations, but I think the underlying quality of the science and presentation are still common themes for every journal. The issue that you raise would be easily handled in Rubriq’s system by leaving a comment such as “This work would be greatly strengthened by a dose-response experiment of compound X and will likely be required for publication in a toxicology journal.” If you felt that the dose-response curve was important to the conclusions of the paper, then this would be reflected in a lower Quality of Science score. The toxicology journal editor will see the comment (assuming the author releases the reviews) and either see the dose-response curve the authors did after the review or request that the authors complete the experiment.

If an author writes a manuscript in a long form, gets fantastic reviews from Rubriq, and decides to submit the paper to Nature, then they will have to adjust the style and length of the manuscript for that journal. For these types of journals, the manuscript will almost certainly, in my opinion, have to go through the journal’s own peer review system in the short form anyway. So what did the author get out of this? Three independent scientific opinions from a double-blind process that presumably tell the Nature editor “This is a really important, well-executed study” to help that editor decide if the manuscript should be one of the 70% that receives an editorial rejection or one of the 30% that gets reviewed. (Please note: I’m assuming for the sake of this discussion that the Rubriq reviews would have to be quite positive for our hypothetical authors to rewrite the paper for Nature.)

I think much of this comes back to the original point of this post – people see very different roles for the peer reviewer. From the comments you have made, I believe the most important question for you in filling out a review is “Is this manuscript appropriate for this journal?” and for others that is the least important question in the form. Rubriq is not in the business of deciding if or where a paper will be published, but rather gathering the best information we can and putting it in the decision maker’s hands in a timely fashion – first the authors, and then the journal editors. If the journal editors feel that they need more opinions about the science or suitability of the paper for their journal, then they can seek that.

I was one of the researchers who carried out post-publication peer review and experimental refutation of the arsenic-DNA paper. If the peer reviewers chosen by Science had been less credulous of the authors’ claims, I would have saved a lot of time and money.

The primary role of pre-publication peer review must be to evaluate a paper’s claims, not to decide whether these claims are sufficiently interesting that the journal’s readers should evaluate them.

The two are not mutually exclusive. Unfortunately, some new initiatives make them effectively mutually exclusive.

it seems to me that I cannot see how something like Rubriq would not have discovered the errors in the Science article. In fact, I do not see how something like Rubriq contributes or adds to the normal review process. Of course, it does make some money for some folks but does it do more?

I am not sure. One sends in a paper and pays a fee for another set of eyes to look at the paper. What happens if the Rubriq eyes say forget about it. The author has just spent some hundreds of dollars to learn what he could learn by just submitting the paper but his lesson is free. Say the Rubriq eyes say work on this and clean up that and then send it to X or Y or Z. The paper is then rejected or accepted but comments are made that were missed by Rubriq. Again we have a costly lesson.

Rubriq says that its finances are transparent. Are salaries given for management?

If one looks at the annual reports of most publishers, one can find out how the money is earned and spent. The same goes for societies and associations. If the company is held privately, one can still look at a public company and get an idea of where the money goes.

Thus, I am not too sure about the purity of cause mentioned by Rubriq, nor the lack of purity of cause by commercial or society/associations.

.

As I noted in my prior comment, Science’s peer reviewers found errors in the Science paper, even pointed to an experiment that, if done, might have caused the authors to revise or retract their paper prior to publication. But the normal and accepted limitations of pre-publication peer review allowed the authors to forge ahead, on the basis that the questions were asked and the authors responded, and put their reputations out there.

I continue to be astonished by people who want to skewer the peer review process for a failure, but allow the authors to get away without a bruise. It was the authors who made the mistake and who clung to their claim even in the face of a lot of questions during peer review. It is ultimately the authors’ fault. Science and its reviewers did what they should have, in my opinion. It’s the authors who blew it. Let’s not forget that.

And the authors can be just as obstinate in a Rubriq system as any other.

However, it should be said that your refutation of the Science paper has greatly boosted your reputation and (hopefully your) career. You are not someone I was familiar with before you took on this task, and are now much more well-known. Looking back, do you think this was a poor use of your time and money?

The real issue is not whether I benefited but whether the taxpayers who pay my salary and research expenses benefited. Given that the paper had been published and given such enormous publicity, the time and money needed for the refutation were well spent. But that doesn’t justify the errors by the authors and reviewers.

People pursue false hypotheses and do experiments that don’t pan out all the time. How much additional expense came into play here merely because this played out in public via Science, rather than if it were announced during a talk at a meeting? If they had published a paper that included a caveat suggesting additional experiments were needed to confirm things, wouldn’t the same expenses have occurred?

Given the importance of matching manuscripts with appropriate reviewers, the 2009 Sense About Science peer review survey of 4000 researchers threw up a worrying result (Figure 13). When asked the main reasons for declining to review in the preceding 12 months, 58% said it was because the paper was outside their area of expertise. This was the most frequently mentioned reason for declining to review, above being too busy (49%).

I have developed an algorithm that may solve this problem. It finds those published researchers whose work is closest to a given paper or proposal, by varying degrees of closeness. I may do a Kitchen article on this.

It is precisely this kind of technology, properly deployed, which would make the last remaining function of journals – to sort according to vaguely defined readership – obsolete.

There is a reason I don’t look at the table of contents of Phys. Rev. Letters – the chance that an article for me as a neurobiologist comes up there are low, compared to, e.g. J Neurosci. where I do check the eTOCs. However, I use tools like JANE or eTBLAST, etc. to search as much of the literature as I can and indeed there are relevant research articles popping up semi-regularly in journals such as Phys. Rev. Letters and other usually low relevance journals.

Thus, I would gladly pay for a service that would do this effectively for the entire literature, so I don’t have to go through endless lists of titles on the eTOCs any more! It is precisely this dysfunctionality of the literature, the inability to provide me with up-to-date relevant literature which is one of the most frustrating aspects of my work and one of the prime motivating factors that got me into publishing reform in the first place. I want to use the same technology that drives Amazon, Ebay, Facebook and Twitter for my scientific work. Is that too much to ask?

Not really Bjorn for it does not rank by importance merely closeness of topic. Ranking of importance by topic is what the journal system does.

This is what everybody thinks and which is not supported by the evidence. I really sound like a broken record now: do you have any evidence to support your claim that journal rank really sorts by importance? The evidence I have seen is so weak, that I’d be really interested in seeing some solid evidence I can rely on.

The evidence is that this is what the people are doing, or trying to do. It is the point of rejection and I see no reason why it should not work, unless importance is somehow beyond human judgement. Is that your claim?

If you are demanding that I prove it works I cannot do that any more than I can prove that science benefits society and so is worth the money, a claim which others are questioning. The evidence that it works is just that science advances and ranking is a central feature of the system.

“If you are demanding that I prove it works”
No, I’m not asking for proof, that would be absurd. All I’m asking for is evidence of this sort:
http://www.sciencedirect.com/science/article/pii/S0895435612001928
It is the first (and to my knowledge only) study that quantifies novelty in scientific studies and finds that in clinical study this measure does indeed correlate with IF! I just found this paper and we will cite it in our manuscript. However, the effect is so weak (a coefficient of determination of less than 0.1 – but significant!!), even weaker than the correlation with citations, that the result is purely academic and of little practical value beyond some meta-analyses (correct me if I misinterpret the statistics).

Thus, this study supports your claim that higher ranking journals publish more novel results – but the effect is so tiny, it is practically irrelevant, even though it is statistically significant. It’s this kind of data I’m asking for: show me that higher-ranking journals actually publish more of what I ought to read. So far, the evidence is so thin, it has no practical value.

Bjorn, I am a bit puzzled here. You seem to say that ranking does not correspond to citations. But the IF is based on citations so the correlation is there by definition. Subjective ranking correlates with the IF so if the IF rankings are relatively stable then subjective ranking is predicting citations correctly. Assuming citations roughly capture importance then so does ranking. There is your data.

Are you possibly referring to long term citation data as opposed to the relatively short, multi-year term that IF uses? If so then all this means is that ranking only predicts importance on the scale of years not decades. This not surprising as no one knows what science will look like in a decade or more.

“You seem to say that ranking does not correspond to citations. But the IF is based on citations so the correlation is there by definition.”

You would certainly think so! But it turns out, no. See for example http://www.bmj.com/content/314/7079/497.1 (BMJ, 1997) which shows only a very, very weak correlation between journal impact factor and the citation rate of individual papers therein. For a more recent analysis, try http://arxiv.org/abs/1205.4328 (arXiv preprint, 2012) which shows how the always-weak correlation has changed through time, and how it is currently declining yet further.

The meta-point here is that our intuitions about impact factor, like so many of our intuitions, are unreliable. That’s one reason why I mistrust my own (and others’) intuition regarding journal rank, too. We need data.

How does this happen? High impact factors are hugely influenced by a tiny number of very very highly cited papers — one of the problems with the “statistical illiteracy” of impact factor, its use of a simple mean rather than a median, or a modified mean (outliers discarded or similar).

Mike I am well aware of your point but it is not related to my point. I am talking about the relation between journals and citations not papers and citations. There is by definition at least one correlation between IF and citations because that is how IF works.

“You seem to say that ranking does not correspond to citations.”
Again, had you actually read the pre-print, you would not have come to that conclusion. We first detail how the IF is ‘calculated’, citing many, many papers that detail all the statistical properties of the IF. We now even include a very recent editorial that looks at journal citations:

http://www.nature.com/nmat/journal/v12/n2/full/nmat3566.html

The problem is: nobody reads entire journals, cover to cover. The IF is not predictive of the content of the journals: the papers that make up the IF are so few, that they are not representative, and until someone takes other methods (as I linked here and the others we cite) to see if there is another way to make the journal brands actually predictive of something tangible (i.e., representative of its content), journal rank is useless.

In more colloquial terms: who cares if Nature publishes two articles per issue that get highly cited, if only one of them a year is something I’d be wanting to read? Having to go through their eTOCs for 52 issues only to get the one paper I want/need to read is a complete and utter waste of my time. Multiply this with the other 20 or so journals which come fairly close to my specialty and you see that going by journal is utterly inefficient.
Or a different perspective: Why should I care that the article preceding the Nature article of the candidate I’m looking at got 1000 citations, while the candidate’s got none? Why should I care that the journal in which this candidate’s other paper appeared has a shiny cover? Shiny covers don’t translate into anything tangible in the articles beneath the cover, much like the citations a journal accrues don’t seem to translate into much for their articles, as the evidence suggests.

In other words: journal level metrics can only be useful for scientists, if they represent their articles in some useful way. The data we cite show: journals do not represent their articles in any useful way, save, maybe, by content (which was not the topic of our manuscript).

In yet a different formulation: how many citations a journal gets in total, doesn’t translate into anything meaningful for scientists: it doesn’t translate into better methodology of the articles, it translates much too weakly into citations of their articles, it doesn’t translate into more reliable articles, it translates much too weakly into information gain, and of course we only have unreliable intuition (see Mike’s comment) as to whether it translates at all into anything that’s impossible to quantify (expert opinion is not strongly, but significantly, captured by journal rank).

The above is, more or less, all covered in our paper (and the references), so I wonder a little why I have to reiterate it here.

To put it yet again differently: do you have any evidence that the covers or the citations of journals have any effect on the content of their articles? What I have seen in this respect is much weaker than one would expect, so I’d be very keen on more solid data.

Regarding the relative quality of papers in high-branded journals, there is a very interesting study where the authors found that the quality of crystalographic structures of proteins in papers from Nature/Science/etc. were much more likely to contain errors than the structures of proteins in papers from lower-rated journals. The DOI is 10.1107/S0907444907033847

The point is: higher-branded journals DO NOT guarantee at all that their papers have better science: they simply guarantee that their papers are more “trendy” . I do not believe that “relevant” is the proper word, as different people ascribe relavance in different ways. I find it very disingenious to argue that the relevance of the obviously flawed arsenium paper is “proven” by the criticism that it generated, as it only received such criticism by having been labeled as “relevant” by appearing in Science/Nature and havind a dedicated press conference in the first place.

In short, arguing that journal editors at high-branded journals (who often are not practicing scientists) and a couple of reviewers can accurately judge the relevance of a paper seems to me to be a way to ensure that scientists will try to hype their findings or decide on research topics by sexiness/trendiness just to get a shot in the Science/Nature/etc. limelight. High-brand may be good for the sociology of Science, but it should rather be a mark of scientific soundness. And now it is not.

Your strong claims do not follow from your weak evidence, if it is evidence at all. What you denigrate as trendy is what the community thinks as important. Note to that you are talking about quality or importance not relevance. Relevance has to do with the journal’s scope.

“What you denigrate as trendy is what the community thinks as important. ”

What editors think is important is often different from what the community thinks is relevant. The arsenic bacteria fiasco was such a case: a specialist journal wouldn’t have let such weak evidence support the grandiose claims the authors write in their conclusions. But I think you are agreeing that journals are important for the sociology of science, i.e, to establish hierarchies, who is in/out, etc. I would much rather have their brands implying that the research they publish is solid.

I also argue that in a world where searching abstracts through the internet is instantaneous, researchers have no trouble finding reports relevant to their research interests even if those papers are not in their usual RSS journal feeds.

From the abstract “The most striking result is the association between structure quality and the journal in which the structure was first published. The worst offenders are the apparently high-impact general science journals. The rush to publish high-impact work in the competitive atmosphere may have led to the proliferation of poor-quality structures.”

Please add a link to the paper. Reading only the abstract — and a selection from it, at that — isn’t sufficient for an informed response.

Thanks for the link. Interpretation of data is key to understanding it. The data are rigorous, but the interpretation of the journal results is what’s key. The authors note that a few anomalous results could skew the data. The other item they note is that “structures are refined until the crystallographer is satisfied with the final model and the researchers are able to draw scientific conclusions from the structure.” Could it be that the crystal structures in papers in the major journals yielded worthwhile information earlier? This could mean that less refinement occurred to the structures. Conversely, if a journal is publishing more incremental findings, you might expect their structures to be more final and by these measures “superior” based on the techniques used in this paper?

Comments are closed.