It is easy to criticize journal metrics, especially when they do not provide the desired output. Authors and editors often espouse the metric that makes them look the best. And the loudest voices against journal metrics often come from those who are poorly ranked. When it comes to metrics, the ends often justify the means.
In the construction of journal metrics, there is no rule that indicators must be simple or treat their data fairly.
The Eigenfactor and SCImago Journal Rank (SJR), for example, are calculated using complex, computationally laden processes that give higher weight to citations from highly cited journals. Similarly, the Source-Normalized Impact per Paper (SNIP) attempts to balance the great inequality in the literature by giving more influence to citations in smaller disciplines with less frequent citation rates. Viewed in contrast, the impact factor is much more simple and egalitarian in its approach by giving equal weight to all indexed citations. Each of these approaches to create a simple metric are correct (if they report what they say they are counting), yet are based on different underlying values and assumptions.
In October 2011, the F1000 announced a new metric for ranking the performance of journals based on the evaluations of individual articles. As a direct challenge to Thomson Reuters, many are interested to see just how F1000 journal scores would stack up against the industry standard impact factor — a topic featured in this month’s issue of Research Trends.
I am less concerned with the performance of individual journals than I am in understanding the thought that went into the F1000 Journal Factor, what it measures, and how it provides with an alternate view of the journal landscape. When I wrote my initial review of the F1000 Journal Factor, I expressed three main concerns:
- That the F1000 Journal Factors were derived by a small dataset of article reviews contributed by a very small set of reviewers.
- That the F1000 gave disproportionate influence to articles that received just one review. This made the system highly sensitive to enthusiastic reviewers who rate a lot of articles in small journals.
- That the logarithmic transformation of these review scores obscured the true distance in journal rankings.
To underscore these points, I highlighted what can happen when the Editor-in-Chief of a small, specialist journal submits many reviews for his own journal. In this post, I will go beyond anecdotal evidence and explore the F1000 from a broader perspective. The following analysis includes nearly 800 journals that were given a provisional 2010 F1000 Journal Factor (FJF).
Predicting a Journal’s F1000
The strongest predictor of a journal’s F1000 score is simply the number of article evaluated by F1000 faculty reviewers, irrespective of their scores. The number of article evaluations can explain more than 91% of the variation in FJFs (R2=0.91; R=0.96). In contrast, the impact factor of the journal can only explain 32% of FJF variation (R2=0.32; R=0.57).
The rankings of journals based on F1000 scores also reveals a strong bias against larger journals, as well as a bias against journals that have marginal disciplinary overlap with the biosciences. The following plot reveals these biases.
Larger journals, represented by bigger circles in the figure above, consistently rank lower than smaller journals receiving the same number of article evaluations. This pattern is most apparent in the lower left quadrant of the graph where journals received 10 or fewer article reviews. In addition, as the FJF is calculated based on the proportion of submitted reviews compared to the number of eligible articles, journals in the physical sciences consistently rank lower than biomedical journals.
To underscore this point, the Canadian Journal of Plastic Surgery and Physical Reviews E — Statistical, Nonlinear and Soft Matter Physics, both received just one review in 2010. The former published just 24 eligible articles while the latter published 2,311. As a result, Can J Plast Surg was given a FJF of 15.36 while Phy Rev-E received a FJF of just 1 — the lowest possible score in F1000. The disparate FJF scores between these two journals is a function not of the quality of the articles they publish but of their size and subject discipline.
Other large physical science journals performed just as miserably under this calculation, while small biomedical journals performed exceptionally well.
At the other end of the graph, the bias against large journals and physical sciences journals seems to attenuate, with journals like Nature, Science, Cell, NEJM, and PNAS occupying the highest-rank positions. Unfortunately, journals occupying this quadrant don’t seem to tell us anything that we don’t already know, which is, if you want to read high-quality articles, you find them in prestigious journals. In this sense, post-publication peer review doesn’t offer any new information that isn’t provided through pre-publication peer review.
In defense of systemic bias against the physical sciences, we should remember that F1000 focuses on biology and medicine and that we shouldn’t be surprised that a core journal in statistical, nonlinear and soft matter physics (Phys Rev E) ranks so poorly. On the other hand, there are many core chemistry journals, like J Am Chem Soc (JACS), which rank very highly in F1000 simply because disciplinary boundaries in science are fuzzy and overlap. The desire to reduce size, discipline, and individual article scores into a simple unidimensional metric obliterates the context of these variables, leaving users with a single metric that makes little sense for most journals.
Journal metrics are supposed to simplify a vast and complex data environment. Like impressionistic paintings constructed through the technique of pointillism, we should be able to stand back and see meaningful pictures formed from a field of dots, each of which contains very little meaning in isolation. The construction of the F1000 Journal Factor appears to do just the opposite, creating a confusing picture from a dataset of very meaningful dots.