Post-publication review is spotty, unreliable, and may suffer from cronyism, several studies reveal.
Reporting last month in the open access journal Ideas in Ecology and Evolution , ecologist David Wardle analyzed over 1,500 articles published in seven top ecology journals during 2005 and compared their citation performance nearly five years later with initial Faculty of 1000 (F1000) ratings.
Faculty of 1000 is a subscription service designed to identify and review important papers soon after they are published. F1000 reviewers assign one of three ratings to journal articles: (“Recommended” = 3; “Must read” = 6; or “Exceptional” = 9) plus a brief comment. For papers that receive more than one rating, a F1000 Factor is calculated.
Wardle reports that fewer than 7% of articles in his study (103 of 1,530) received any rating, with only 23 receiving a “must read” or “exceptional” rank.
Moreover, F1000 ratings predicted poorly the citation performance of individual articles. The top 12 cited articles in his study did not receive any rating at all, and many of articles that received a high rating performed poorly. Wardle concludes:
If, as this analysis suggests, the F1000 process is unable to identify those publications that subsequently have the greatest impact while highlighting many that do not, it cannot be reliably used as a means of post-publication quality evaluation at the individual, departmental, or institutional levels.
Speculating on why an expert rating system claimed to involve the “leading researchers” in science operated so poorly, Wardle offered several explanations. First, coverage of the ecological literature in F1000 is spotty, with some subfields completely ignored. Second, he believes there is evidence of cronyism in the system where F1000 section heads appoint their collaborators, colleagues and recent PhD graduates, many of whom share similar views on controversial topics. The appointing of F1000 raters, Wardle adds, appears to suffer from geographical bias, with North American section heads appointing North American faculty who subsequently recommend North American articles. He writes:
The F1000 system has no obvious checks in place against potential cronyism, and whenever cronyism rather than merit is involved in any evaluation procedure, perverse outcomes are inevitable.
In a paper published last year in PLoS ONE , five members of the Wellcome Trust analyzed the citation performance of nearly 700 research articles receiving Wellcome funding with F1000 scores. Each paper was also rated by two different researchers at the Wellcome Trust for comparison.
Similar to Wardle, F1000 faculty rated just under 7% of the cohort of papers. While there was moderate strength correlation (R=0.4) between Wellcome Trust ratings and F1000 ratings, many highly-rated articles identified in one group were completely dissimilar from the other group, and less than half of the most important papers identified by Wellcome raters received no score from F1000. Allen et al. write:
. . . papers that were highly rated by expert reviewers were not always the most highly cited, and vice versa. Additionally, what was highly rated by one set of expert reviewers may not be so by another set; only three of the six ‘landmark’ papers identified by our expert reviewers are currently recommended on the F1000 databases.
An alternative to Impact Factor?
When articles receive multiple F1000 reviews, their average rating does not differ substantially from their journal’s impact factor, a 2005 study concludes . An analysis of 2,500 neurobiology articles revealed a very strong correlation (R=o.93) between average F1000 rating and the journal’s impact factor. Moreover, the vast majority of reviews were found in just 11 journals.
In other words, F1000 ratings did not add any new information — if you are seeking good articles, you will find them published in good journals.
While advocates of post-publication review may counter that this type of metric is still new and undergoing experimentation, F1000 Biology has been in operation since 2002, adding new faculty and sections ever since. In 2006, the company launched F1000 Medicine.
Earlier this year, Lars Juhl Jensen, a computational biologist and author of several PLoS articles, analyzed the post-publication statistics made publicly by PLoS and reported that user ratings correlated poorly with every other metric — especially with citations — and wondered whether providing this feature was useful at all.
Unless post-publication review can offer something more expansive, reliable, and predictive, measuring the value of articles soon after publication may be more difficult and less helpful than initially conceived.
 Wardle, D. A. 2010. Do ‘Faculty of 1000′ (F1000) ratings of ecological publications serve as reasonable predictors of their future impact? Ideas in Ecology and Evolution 3, http://dx.doi.org/10.4033/iee.2010.3.3.c
 Allen, L., Jones, C., Dolby, K., Lynn, D., & Walport, M. 2009. Looking for Landmarks: The Role of Expert Review and Bibliometric Analysis in Evaluating Scientific Publication Outputs. PLoS ONE 4: e5910, http://dx.doi.org/10.1371%2Fjournal.pone.0005910
 Editor, A. 2005. Revolutionizing peer review? Nature neuroscience 8: 397, http://dx.doi.org/doi:10.1038/nn0405-397