Antique map of America
Image by Changhua Coast Conservation Action via Flickr

On Monday, the Faculty of 1000 announced a new service, the F1000 Journal Factor, that ranks the quality of journals based upon the ratings of individual articles submitted by volunteer faculty reviewers.

While still in beta, the new service is a clear shot across the bow of Thomson Reuter’s journal impact factor.

Unlike the impact factor — released by report each summer for the previous year — the F1000 Journal Factor is updated monthly. It is designed to “provide a continuously updating picture of journals ranked by excellence within biology and medicine,” according to the F1000 press release.

In comparison to the impact factor, which simply reports the average number of times an article has been cited over a two-year window, the F1000 Journal Factor is much more elaborate. It’s a calculation that combines individual article rankings given by faculty reviewers with the frequency of reviews in their system.

The Journal Factor requires a few paragraphs to describe, but if we’re to understand what it measures, it’s important to deconstruct the metric. In its simplest form:

Journal Factor = log10 ((Sum of Article Factors) x (Normalization Factor) +1 ) x 10

An Article Factor is calculated by taking the highest article rating (10=exceptional, 8=must read, 6=recommended) to which an incremental value is added for each additional rating (3, 2, or 1, respectively). So if an article received one exceptional rating and two recommended ratings, its score would be 10+1+1=12. All of these article factors are then summed up for each journal.

The Normalization Factor is the proportion of articles in a journal that received at least one review. Like the calculation of the impact factor, it is designed to provide equal weighting to journals of different sizes. As some journals will have received zero ratings over the period of evaluation, they add 1 to all journals so that their logarithmic transformation (explained next) would not result in error.

The logarithmic transformation (log10) takes a distribution that extends over several orders of magnitude and draws that long-tail in. (As an analogy, earthquake data is based on log10, meaning that a 9 degree quake is 10 times more powerful than a 8 degree quake and 1000 times more powerful than a 6 degree quake). All final scores are then multiplied by 10, as the company describes, “to make the FFj a readable number.”

There are a few things that concern me about their calculation.

First, the calculation gives much greater influence to articles receiving just one rating (and particularly a high one) than those articles that receive multiple ratings. The result is a Journal Factor that is highly sensitive to enthusiastic reviewers who rate a lot of articles in small journals.This may explain why the Journal of Sex and Marital Therapy [1] ranks higher in F1000 rankings than Science, PNAS, or Cell.

Second, I’m a little uncomfortable with what the logarithmic transformation does with the interpretation of the journal ratings. Even after applying a journal size normalization factor, the distribution of journal scores is highly skewed, with a few journals receiving the vast majority of article evaluations and a very long tail of infrequently evaluated (or unevaluated) journals. The log transformation obscures that distance.

Third, the F1000 Journal Factor attempts to rank journals based on a proportionally small dataset of reviews by a proportionately small group of reviewers:

On average, 1500 new evaluations are published each month; this corresponds to approximately 2% of all published articles in the biological and medical sciences.

This probably explains why the company gives a lot of weight to journals with just one article review and finds it necessary to perform a logarithmic transformation to the data.

To be fair, the F1000 business model was not designed to provide comprehensive reviews, so a lack of completeness should not be considered a failure on their part. The purpose of F1000 is to guide readers, through expert evaluation, to a small sub-population of worthwhile literature. To derive a comprehensive journal ranking system built on such limited data, however, is to build a map where 98% of the territory remains unexplored. Where ancient cartographers would have filled in these empty sections of the map with pictures of savages and sea-monsters — so that the reader understands what remains uncharted — these Journal Factors assume that missing data are the same as zero. They are not the same.

From a business standpoint, I’m not surprised at this new development. F1000 is attempting to produce a new derivative product from an existing service, which is no different from Thomson Reuters creating derivative products, like the Eigenfactor or 5-year impact factor, from their own citation data.

From a rhetorical standpoint, however, F1000 is taking a 180-degree turn. From its inception, F1000 was publicly dismissive of the value of the journal. The purpose of F1000 was to direct readers to important articles irrespective of the imprimatur. This was — and still is — the principal argument behind post-publication peer review. What is curious is that the motto of F1000 is now gone from the company’s website, although you will still find it verbatim on Facebook, Wikipedia and some library websites.

F1000 rates research articles on their own merits rather than according to the prestige of the journal in which they are published (the impact factor)

If F1000 were able to adequately deal with issues of completeness and potential conflicts of interest, their Journal Factor may ultimately not tell us anything that we don’t know from Thomson Reuter’s impact factor, which is, if you want to read good articles, you will find most of them in good journals.

In sum, the real value of F1000 is not what the aggregate data can tell us about individual journals, but in what experts can tell us about individual articles. As local guides, faculty reviewers know much more about the territory than does the cartographer.

[1] R Taylor Segraves was responsible for submitting 23 of the 34 article reviews for the Journal of Sex and Marital Therapy. He is also the editor-in-chief of that journal.

Enhanced by Zemanta
Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist.


18 Thoughts on "F1000 Journal Rankings — The Map Is Not the Territory"

It’s hard to know much about the entire journal landscape from a metric that only looks at such a limited set of articles.

The Sex and Marital example is pretty telling for how easily this metric could be gamed. To be meaningful, you’d have to build in a lot of safeguards, and not count reviews of articles in a journal from someone who is an editor of that journal, or on their editorial board, or perhaps even from anyone who has recently published in that journal as it would boost the ranking and indirectly aid their own paper.

The point about J Sex Marit. Ther. is well made, although Phil is wrong to say it “ranks higher in F1000 rankings than Science, PNAS, or Cell.” It only ranks higher than those journals when you drill down into the sections. And of course, we’d expect small journals to rank higher than the holy trinity in Faculties and Sections, because those journals publish important papers in those communities.

We were alerted to the JSMT problem on Monday, and actually took rapid action to address it. We agree it shouldn’t have happened, but we are revising our guidelines to reviewers to make sure it doesn’t happen again. I’ve written more here:

J Sex and Marital Therapy was ranked 25th out of the top 50 journals in Medicine, according to the spreadsheet and embargoed press release sent out by F1000. Science was ranked 30th, PNAS 33rd and Cell 44th.

David, I’m not sure if this is an explicit case of gaming. The editor-in-chief was doing what all good editors should do, which is to highlight and promote good articles. It only becomes an issue of gaming when the article data are aggregated into a journal ranking.

If we ask any faculty reviewer to recuse him/herself from rating an article where there is some connection to an author, there may not be much left to evaluate. Top scientists form a tight network of relationships. They serve on multiple journal boards, granting committees, promotion and review committees. They collaborate with peers in other institutions as well as with peers in their own. They often oversee graduate students and postdocs who will eventually move off to other institutions. In sum, it may not be possible to remove any possible conflict of interest when it comes to constructing a grand ranking of journals.

Phil, that editor didn’t know F1000 were going to rank journals, so there’s no way he could have been knowingly trying to game the system.

We are currently figuring out how to handle this issue. Our current best guess is to allow such evaluations, with an appropriate declaration of interest, yet exclude those individual evaluations from the rankings.

Keeping track of and dealing with such related cases of cronyism/non-impartiality is something we’re serious about. In the next couple of days there will be a feedback button so we can easily see what our users think… and I might have to take a few days off 😉

I’m not arguing that the editor was trying to game the system (David Crotty’s point) but suggesting that it may be very difficult to derive unbiased journal rankings when you strive to enlist expert faculty as your reviewers. Science is a small world, and top scientists occupy an even smaller world.

I was actually agreeing with you there about gaming (or not)–sorry it wasn’t clear.

And yes, you’re right. There will be bias, and what F1000 is doing is try to deal with that bias–either by reducing it or bringing it under the spotlight.

My first reaction is “who has time to write the reviews?” Any faculty member that has time to read a fresh article and then write a thoughtful review has way too much time on their hands, or is making their graduate students do it, which in my group is one of the costs of existence for graduate students, i.e. participation in weekly journal club brown bags.

Second, for me it is not so much the journal where the paper is published, rather it is the search algorithm we use to return relevant articles. I am not so much out to make Journal X more relevant, I am out to get articles that are relevant to me wherever the are no matter language it’s in. So it matters not to me, at zeroth order, if the work is published in the the Journal of XYZ or ZYX.

I understand that F1000 is trying to get to a per paper rating as opposed to a per journal rating. But here I see buds making a pact to extol the virtues of each others papers and throwing this whole thing out of whack.

Finally, I have to say that as we all know that it make take years for truly transformation results to get published and then fully appreciated after publication. So I think maybe F1000 is chasing the wrong thing here.

I still can’t see what the value of this metric is. The F1000 evaluations are based on the passing fancies of under twenty people in a particular area, and they only cover a tiny proportion of the literature. How is this more accurate than citations, which are effectively binary evaluations by everyone?

Furthermore, neither the Impact Factor or the FFj can escape the circularity of these ratings- are papers in higher IF journals cited more because the journal publishes the best papers, or because everyone reads those journals and assumes that the papers published there are worth citing? I would have thought that an organisation dedicated to separating the value of an article from the IF of its journal would have put more effort into getting around this problem.

Comments are closed.