Retractions are a hot commodity right now. The ever-useful “Retraction Watch” provides, I think, the most interesting and incisive updates on retractions worth noting. Leave it journalists to do it right (but, guys, you really should get a URL hosting service to hide your wordpress.com situation). In August alone, there was coverage in the Wall Street Journal, Wired, and numerous blogs. Most of the coverage focused on well-known cases of scientific fraud, and some insinuated a widespread problem.
Academics naturally want to study retractions. Unfortunately, doing so is full of conceptual pitfalls and blind alleys. A recent paper from Fang and Casadevall shows this in spades. Fang is the editor-in-chief of Infection and Immunity, and Casadevall is editor-in-chief of mBio. The paper seems inspired in part by a spate of retractions Infection and Immunity had to issue recently, an experience that clearly stung Fang. Looking out over the literature, the two sought to see if a correlation between high impact journals and retractions existed. Using some simple math (and simple is not always safe), they found a correlation.
Yet, somehow I feel that finding a correlation between retractions and impact factor is akin to finding a correlation between syrup and waffles. And it invites the question, Which is there because of the other?
Studies have found that papers continue to be cited at a high rate after retraction. In high impact journals, there is no reason to believe that these citations don’t contribute their fair share to the impact factor. After all, an infamous paper may be more readily cited because it’s top of mind for a busy author. Retraction notification continues to be an imperfect process, especially when citations are often lifted from abstracts, old review papers, and the like. Citation is not as strict a discipline as we’d like to think.
So, the idea of a retraction index is fraught from the start. But, to their credit, these authors pressed on.
To create an index, Fang and Casadevall took the number of retractions between 2001-2010, the number of PubMed papers with abstracts for those same years, and divided the latter into the former, multiplying the result by 1,000. This generated their “retraction index.” Plotting this against impact factor, they were able to arrive at a significant p value, and draw a straight northeasterly line left to right. That said, even with their convenience sample, a smooth-fit line would have done loops trying to tie their data points together.
Now, like me, you may be wondering why the authors multiplied a value by 1,000. It’s because, even under the magnifying glass of alarmism, the number of retractions in the literature is small (about 0.02% of all papers). So Fang and Casadevall had to multiply their 10 years’ worth of results by 1,000 in order to get a whole number for their index. As Fang wrote me, “The reason for multiplying by 1000 was merely to have a scale of 0 to 4 instead of 0.000 to 0.004.” And remember, that’s amplifying 10 years of data.
It’s important to remember how small the retraction problem is, and how powerful tools like plagiarism detection software and image manipulation detection practices are emerging to track down even these few frauds, hacks, and cranks.
Just because the documented problem as represented in the available literature is small, that doesn’t mean the overall problem is small. This is one of the infamous problems with retractions — we only know about what someone has chased down and exposed in journals that were strict enough to go all the way to retraction. There may not be many false-positives, but there are surely many false-negatives.
Fang and Casadevall chose to include in their convenience sample journals that are all pretty well-known, with only a few stragglers. This creates a potentially misleading sample, especially if you’re positing a new index. So, while Fang and Casadevall note a gross increase across the literature in retractions between 2001-2010, a large portion of these — 105, to be exact — were published in a small crystallography journal (impact factor, 0.41) between 2006 and 2010, according to Inside HigherEd. This journal was not part of their analysis, and would have likely mangled their statistical analysis.
A major problem with indices like the one proposed is that it’s difficult to know what they mean.
Because of the inherent ambiguity of retraction tracking, one storyline that you could accept is — high-impact journals are swinging for the fences more often than others, and the risk:reward equation catches up with them; the effects of powerful brands are hard to control, and just as these brands can amplify the effects of a good paper, they can mute the effects of corrections and, yes, even retractions; and, citation isn’t preordained as a positive mental debt — in many cases, citations are cast into the literature as roadsigns of shame, so a retracted paper may still be cited as a good bad example or as the infamous case, the noteworthy red herring.
Another storyline you could construct is — high-impact journals are more rigorous at all levels, including ferreting out errors, fraud, and plagiarism even after publication; this is a sign of their well-deserved high status, so a correlation to impact factor is actually the other side of their rigor and reputation; therefore, a high retraction index is a sign of health and vitality for a journal, so the concern could be about those journals with a low retraction index.
It’s always tempting to get on your high horse about retractions, but in most cases, they are due to simple error. Only a small percentage of the total are due to fraud or scientific misconduct.
So, we have a problem that has to accumulate over 10 years, then be amplified by a factor of 1,000, before we even get to whole numbers when the total is divided by the number of articles in play. Even then, a small percentage of these retractions are because of fraud.
Fang and Casadevall write optimistically at the end of their paper:
. . . retractions have tremendous value. They signify that science corrects its mistakes.
For those convinced that science is self-correcting, and progresses in a forward direction over time, we offer only discouragement. We had anticipated that as time passed, citations of the original articles would become more negative, and these articles would be less cited than other articles published in the same journal and year. In fact, support for the original articles remained undiminished over time and perhaps even increased, and we found no evidence of a decline in citations for any of the original articles following publication of the rebuttals.
Rebuttals and retractions are very different animals, so the comparison is tangential at best. But it’s worth noting that there is another level of dispute in science, and it seems ineffective. And since retraction isn’t enough to halt citation in its tracks, is optimism really justified? Or do we need to do more?
Discussion
2 Thoughts on "Mountains Out of Molehills, and the Search for a Retraction Index"
The concept of a retraction index based on correlation analysis is also problematic from a statistical standpoint–retraction data are not independent observations.
When an author is suspected of misconduct, a reputable journal may undertake an investigation to determine whether other articles the author has published in the past suffer from the same problem. Institutions do the same thing–at least those that take research seriously and have the means and will to organize a proper investigation.
It is much difficult, however, for journals, institutions and governments that lack proper oversight and the means and will to investigate misconduct to undertake an investigation. In the past, I have brought cases of duplicate publication to editors who lacked the integrity to take action; these were not professional editors, I should note, but academics.
Publishers who attempt to do the right thing and retract an article may also be quick to reverse their decision when faced with a possible lawsuit, Emerald is a case in point.
Yet, as you rightly note, the simple correlation analysis implies a causal link between citation impact and retraction; in reality, the relationship is much more complex.
Framed differently, I’d be much more willing to call their metric a “Scientific Integrity Index” or SII.
I wonder how much of the continued citation of retracted papers is due to the persistent reliance on the PDF file as the standard format for research papers. Despite growing connectivity levels, there remains a strong cohort (if not the majority in some fields) who deal with the literature by downloading and storing the PDF, and even, gasp, printing it out on paper for reading. For these people, the paper is a static object, and they miss any new developments such as the linking of a retraction or a rebuttal/commentary.
Eventually the solution will be either an increase in those who access the literature online for reading purposes, or the development of new file formats that are dynamic enough to reach out to a server and self-update.