The recent proposal from a group of high-profile editors to downplay or eliminate mention of the dastardly impact factor and replace this with other metrics, mainly a distribution chart of citations, raises some interesting issues. Phil Davis touched on a few of these earlier this week.
For example, does their proposed approach actually defy the impact factor or feed it? Does it perpetuate the tendency among academics to conflate journal impact with paper citations?
The group made its proposal in a paper published on the bioRxiv preprint server.
In the paper, the authors urge:
. . . journal editors and publishers that advertise or display JIFs [Journal Impact Factors] to publish their own distributions using the above method, ideally alongside statements of support for the view that JIFs have little value in the assessment of individuals or individual pieces of work
The putative goal of the proposal is to sweep the impact factor aside and replace it with more fine-grained measurements, notably a chart like those seen below (linked over from the Nature coverage) of a distribution of citations. These charts are very similar to those included in the pre-print paper.
Edward Tufte could use charts like those to the left as bad examples of data display and clarity, as they nicely illustrate that showing distributions can be misleading, as well.
For instance, the charts cover two years of citations, but the images make no differentiation between Year 1 and Year 2. Also, the y-axis isn’t set — the Nature y-axis ends at 80, while the Science y-axis ends at 70.
It gets worse. Look closely, and you’ll see the three graphs aren’t even in the same units. The PLoS ONE graph is stated in thousands of papers, so it is actually 175-times as high as the other two charts. But it doesn’t look like it. When you understand the scale, you can see that more than 12,000 PLoS ONE papers received zero citations, a noteworthy data point — but this is not something the chart really helps you see. In fact, the three charts look equivalent when given a quick visual scan, despite some huge differences.
Moving the y-axis like this is a common way to make pretty pages, but moving it also manipulates perceptions. Sometimes, this is purposeful, done to either create a false sense of urgency or a false sense of equivalence.
The x-axis fares a bit better — it’s at least stable across the set, perhaps because we don’t respond as much to left-right manipulation as we do to up-down manipulation. But problems remain. Does the PLoS ONE chart really need to go out that far? Moreover, what does “100+” mean? What’s the maximum number? Clearly, there are a lot of papers in the 100+ set in two cases (50-75, approximately), yet they are all represented as a wave of data against one rock labeled “100+”. And that”100+” area is a really interesting zone that drives a lot of action in the numbers. What’s the limitation on showing the entire distribution? What’s the optimal extent of the x-axis? Three hundred? One thousand? What would erase the righthand spike?
Given these and many other problems with the proposed graphs, it’s clear that showing a distribution is not a simple matter. In a post on this topic earlier this week, the charts suffered from a lack of labeled axes, making it hard to determine that the eLife y-axis was approximately three times as high as the PLoS Biology y-axis. You would have to carefully read the data presented alongside the charts to figure out that the two distributions are not equivalent. Given the power of visuals, the chance of creating a strong misperception is high. A revised version of the same charts (courtesy of Phil Davis) is presented below, showing how labeling axes and scaling data accordingly can change perception — that is, make them more accurate.
Imagine now that a hundred or thousand different editorial offices are left to their own graphical devices to generate these, and it’s easy to envision a baffling array of charts — different and irreconcilable colors, labels, axes, and legends. In fact, in a comment earlier this week, an author on the aforementioned paper encouraged users to “re-plot the data in new and interesting ways.” I would rather encourage people to plot the data in uniform and accurate ways.
The Nature charts also reproduce the impact factor itself (to one decimal point, a potential issue on its own), reinforcing this dreaded measure as the mean of the citations and as a familiar number authors use because it remains a main currency of publication.
Finally, taking the PLoS ONE example (while eliminating the scale of the y-axis), then thinking about it. Most journal citation distributions will look like this — because of common skew and generally low impact factors across the industry, as compared to behemoths like Science and Nature — suggesting that the hopes for helpful distributions will be frustrated or short-lived. If you were to look at 100 of these distributions, 90%+ would likely look like the bottom one from the Nature set of examples, and how interesting is that?
Because these profiles tend to be rather predictable — a lot of papers below the impact factor, some above, and a couple of huge outliers that really drive the score for high-impact journals — their utility is rather limited.
More importantly, I believe the authors of the paper in question seriously misread human psychology.
Let’s play this out.
I’m an author. I have a great paper. I want to advance my career. I look at the impact factor. Looks good. I also look at the citation distribution. I see that many papers underachieve — that is, their count of citations is lower than the overall impact factor for the journal.
Is that going to dissuade me?
Hardly. Most authors, their toils coming to fruition, enter the mythical land of Lake Wobegon, which means their paper is now decidedly above average. Each author submitting to a high-end journal is likely willing to think that her or his paper will end up in the tower at the right.
Showing a distribution is unlikely to shift the psychology of authors or tenure committees. In fact, a hard number, even to three decimal places, is much less prone to distortion than a distribution, which can be squeezed, resized, and subtly given false-equivalency with other distributions.
Getting the data for these distributions would add work and expense to a system already dealing with the workload and complexity of supporting a lot of steps and processes. It would also mean paying to get data from commercial entities that have assembled and curated these data — namely, the recently transacted Thomson-Reuters, and Elsevier (Scopus). Small journals may not have the means to produce these charts, so this proposal could reinforce industry consolidation and commercial interests. These are not “bad” per se, but these likely represent unintended consequences.
Overall, this proposal isn’t very strong. It seems to have key hidden downsides, is unlikely to have the intended effect, and could be expensive/difficult to implement.
But with this many big names on it, and because the topic is always a hot one, the paper will probably be cited a lot.