There are some things that get better with age and experience. Reviewing manuscripts, unfortunately, is not one of them.
A long-term study of reviewer quality in a medical journal reports a small but significant decline in the performance of peer-reviewers over time.
Their paper, “Longitudinal Trends in the Performance of Scientific Peer Reviewers,” by Michael Callaham and Charles McCulloch, appeared earlier this year in the Annals of Emergency Medicine.
Callaham and McCulloch analyzed the quality scores of nearly 15,000 manuscript reviews performed by 1,500 reviewers between 1994 and 2008. The quality of each review was rated on a five-point scale by one of the journal’s editors. Callaham and McCulloch were primarily interested in the rate at which individual reviewers’ quality scores changed over time.
As a group, reviewer quality scores declined steadily by about 0.04 points (or 0.8%) per year, a small but significant change. Not all reviewers faired so poorly, however: 8% of reviewers improved their scores over time, but 92% of them got worse. Even the performance of the best reviewers showed general declines over time (a decrease of 0.03 points per year).
Overall scores of the reviewers stayed constant over time despite individual deterioration as newly recruited reviewers came in with higher scores than those who had been removed from the reviewer pool.
Callaham and McCulloch propose two explanations for the observed decline in reviewer performance:
- Decreased cognitive abilities of reviewers as they age, which include impaired decision-making, unwillingness to comply with all the review requirements, and the inability to keep up to date with current knowledge and techniques.
- A loss of motivation for producing high-quality reviews as reviewers take on additional roles and responsibilities over time. They write:
Competition for the reviewer’s time (which is usually uncompensated) increases with seniority, as they develop (more enticing) opportunities for additional peer review, research, administrative, and leadership responsibilities and rewards.
Personally, I find their loss of motivation hypothesis more convincing than cognitive decline. Or perhaps, as I push into middle age, I wish this to be the case.
Discussion
16 Thoughts on "Quality Reviewing Declines with Experience"
I am skeptical of the cognitive decline hypothesis, but I forget why.
More seriously I am skeptical that these results are real. The quality scale is vague and the reviewer population is unstable and complex. They describe the dynamics thus:
In brief, every half year the reviewers’ performance is reviewed for quality ratings, availability for review, reliability of review completion, and timeliness. Depending on these parameters, they can be moved up or down within 3 tiers. However, reviewers who are “demoted” are not removed and continue to be used for various reasons, including specialized knowledge. Their volume of reviews may wax or wane, or they may be inactive for periods. Some guest reviewers are invited specifically for a particular article, according to their expertise; they may do only that one, or they may become regular reviewers. The goal of the stratification system is to direct the largest volume of reviews to the reviewers most likely to accept the assignment, review it promptly, and produce a review useful to the editor. This goal has been achieved; 74% of reviewers are used each year, only 15% of reviews are late, and mean review time has decreased to 10 days.
There are a lot of variables here, other than duration of reviewer participation. The reviewer and editor populations are interacting dynamically with the scoring system. Pressure to decrease review time is particularly interesting. One also wonders about the dynamics of the editor population over this period.
Small changes in soft variables in dynamic populations should be regarded with caution. It sounds like they introduced a scoring and stratification reviewer management system for practical reasons, but they are now milking it for scientific results. I remain skeptical.
David, I agree that their system is complex but I believe that they treat their data appropriately and don’t overstate their results. They are looking at the direction of change of individual reviewers not the direction of the population itself, which you’ll note, remains constant over time. They also control for the effect of editors in their model.
82% of reviewers experienced a decline in score over the study period. For this to happen by chance would be very unlikely. However, the magnitude of decline is very small, and may not have much practical significance for editors. The fact that poor reviewers are demoted and may be used less often may understate the true decline in the reviewer pool.
There are some elements of their analysis that make me skeptical (i.e. why a quadratic would be used to control for secular effects), but I cannot dismiss this study outright because they set up their system for practical reasons and decided to study it later for scientific reasons. If this were “milking” then much of what we know about science publishing would fall into the same category.
I have another sort of hypothesis to explain the purported trend. The reviewers lose interest in the enterprise, become jaded with their field, etc. This is not a decline in cognitive ability; it is quite possibly the opposite. Fields decline over time and researchers are nomadic. They keep reviewing but they lose heart about it.
News Flash! Busy people are busy! Film at 11…
Looking at the criteria used for judgement here, one must be careful as to what is really being measured and what is meant by “quality”. It seems like timeliness and the degree of depth offered in the review are key parts of what’s being evaluated. And it seems obvious that the further one is into one’s career, the less time one will have, so fewer assignments accepted, a higher likelihood of them being completed late, and less time spent writing things out in copious detail are all indicative of this.
Much of what’s measured here isn’t about “did the reviewer make the right decision” but instead seems to be “how helpful was the reviewer to the author”.
David, I would argue that usefulness to the author (improving the manuscript) and usefulness to the editor (manuscript decision) are valid ways of measuring article quality. Assuming that a manuscript has some intrinsic quality to it requires you to find an externally valid way to measure it. This becomes really problematic and reminds me of the discussion surrounding Tim Vine’s post, Is Peer Review a Coin Toss?
I agree that it’s a perfectly reasonable way for an editor to track and rate reviewers. But let’s be clear here, it’s not a measure of that reviewer’s ability or mental acuity, but rather their willingness to be useful to the editor/author. A brilliant senior researcher who can look at a paper briefly and accurately state “it’s crap, reject it” gets a bad score here for not spending lots of time telling the author how they can improve their paper. Ditto the reviewer who does a detailed, thorough job but turns it in a week late. Senior researchers are less cooperative? What a shock.
Did they measure for variables like the reviewers taking on family responsibilities? Might there be a correlation between decline in review quality over time and increase in the number of children one has? Having to interrupt writing a review to change a diaper might be more explanatory than cognitive decline! 🙂
Nice post Phil. I’ve seen this paper used to argue that peer review as a whole is doomed because nobody will be able to do a decent review in a few years; this is much the same as the human race being doomed because everyone keeps dying.
I wonder whether this decline actually represents a break point before and after getting tenure- before this point other peoples’ publications in your area are a threat because they dilute the importance of your own work, but this matters much less when you’ve got a stable job. This might lead the first group to approach a review as ‘convince me that this should be published’, and the latter as ‘it’s publishable until I find enough flaws’.
Perhaps the Catholic Church has an explanation for this. It has always been common knowledge, for those in the know, that one should choose an older priest for confession. Why? Well the younger priests tend to be more zealous and have less personal experience with falling short of grace. Therefore, older priests tend to be more forgiving and less likely to impose a draconian penance. Perhaps the same is true of older reviewers, Perhaps it isn’t loss of cognitive ability, but instead just an understanding that perfection (even in research) is hard to achieve.
Phil,
Thanks for bringing this interesting article to our attention. That group at the Annals of Emergency Medicine do some really good work on reviewing. I haven’t read this article, only your summary. I look forward to reading the full article.
They have some very good training material for reviewers that they make freely available on the Web.
http://www3.us.elsevierhealth.com/extractor/graphics/em-acep/
They also published an interesting article years ago where they assessed their reviewers’ ability to review a fictitious study. How they got this through the IRB amazes me but it is a fascinating and ingenious study.
http://www.annemergmed.com/article/S0196-0644(98)70006-X/fulltext
Their findings (the on you reviewed not the earlier study) does jibe with my experience both as an editor and a reviewer. Ironically I actually used to review for Annals of Emergency Medicine over twenty years ago and they were the first journal I ever served as a reviewer. I was doing some work for the American Board of Emergency Medicine at the time which I assume is why they asked me. It seemed like such an honor and I was so intimidated that I spent hours carefully reviewing the manuscripts and agonized over the feedback I gave. I think more than anything else, that is the key. Now,when I get asked to review it feels like a chore. I try to do a good job but it just seems a distraction.
As an editor, I have subjectively found the same thing. Reviewers who are just out of school or residency for physicians are generally excited about the opportunity and put a lot of effort into doing a good job. Also, while experienced reviewers are often very knowledgeable and experienced in the narrow area they work, reviewers who have just gotten out of school are often more up-to-date on the broad literature in the field.
There is something to be said for experience and the wisdom it brings. When you can get experienced reviewers who are willing to put the time into doing a thoughtful job, they are probably better reviewers. Overall though, I am not surprised they they found less experience reviewers were rated higher.
We covered this paper in Retraction Watch when it was posted online in November 2010. Here’s our take: http://retractionwatch.wordpress.com/2010/11/23/do-peer-reviewers-get-worse-with-experience/
You’re all overthinking this: we older folk just get _kinder_! As a journal editor and MSc/PhD supervisor, I know NOT to send something marginal to an eager young thing, because they’ll rip it to shreds to show how smart they are. Me, I give ’em the benefit of the doubt for spelling their own names correctly.
It is important that we are not dealing with the diminishing cognitive capacities of old folks like me here. Experienced just means that they have been reviewing for 14 years or less. Many may have started in the 20s and 30s, yet they declined. Is there any actual age data in this study?