We like to believe that science is self-correcting, that scientific error — the result of sloppy methodology, miscalculation, or worse, intentional fraud — is eventually detected and expunged from the scientific record. We like to believe that retraction notices reach their intended audience, yet we know that retracted articles are often cited as if they were valid studies for years after retraction.

Why do retraction notices fail to reach readers? What is it about the scientific communication process that allows retracted articles to live a secret life, promulgating inaccurate — and sometimes harmful — information to scientists and the general public? And what, if anything, can be done about it?

Last year, with a grant from CrossRef, I investigated some of these questions in a paper published in the July 2012 issue of the Journal of the Medical Library Association, entitled The Persistence of Error: A Study of Retracted Articles on the Internet and in Personal Libraries.”

In this study, we documented the location of retracted articles on public websites, outside the control of journal publishers. We also peered into the secret life of retracted articles living in personal libraries by studying records in Mendeley, a popular tool for managing references and sharing papers.

Of the 1,779 retracted articles identified in MEDLINE, published between 1973 and 2010, we were able to locate 321 publicly accessible copies, 95% of which were the publishers’ version and just 4% were final manuscripts. The most frequent site for access to retracted articles was PubMed Central, which provided public access to 138 (or 43%) of them. 94 (or 29%) of retracted papers were found in educational domains (on personal, lab and departmental websites), and just 10 (3%) were located in institutional repositories. Commercial websites hosted 24 (7%) of these retracted papers, which were used to promote a particular health product (e.g. dietary supplement) or medical intervention (e.g. surgery). Just 15 (5%) of these publicly accessible retracted papers included a retraction statement. 1,340 (75%) of these papers were found as records in personal Mendeley libraries, shared by 3.4 users, on average.

While readers often benefit from the many informal channels of access to the scientific literature, these newfound sources may come with the cost of promulgating erroneous and sometimes fraudulent information.

Authors who upload copies of their papers to a public website have little incentive to replace them–sometimes years later — with a watermarked “RETRACTED” version. Ditto for articles downloaded and saved in a personal library. And while PubMed will display a retraction notice in article record (example), a simple Google search will bypass these notices and take the reader directly to the fulltext and PDF version of the article on PubMed Central, both of which lack the retraction notice.

Earlier versions of retracted papers–living their secret lives as final author manuscripts–persist in institutional repositories years after they were retracted from the journal literature. This takes place, in part, because most institutional repositories are not designed to remove, update or append what is deposited in them. They are also run by individuals who lack the oversight that would allow such changes to take place. Repositories are run by librarians, not editors.

Stemming the propagation of bad science in a decentralized, multi-version document environment is not easy and it will likely require a number of different solutions targeted at different stages:

  • Discovery--Alerting readers that an article has been retracted at the search and retrieval stage through bibliographic and citation coupling of the article with the retraction notice.
  • Reading–Providing status updates for articles with services such as CrossMark.
  • Writing–integrating status lookup functions into reference managers like Mendeley and EndNote.
  • Publishing–detecting retracted references in bibliographies during the manuscript review process.

While none of the proposed solutions can independently halt the persistence of error in the literature, taken together, they may help to greatly minimize its effects.

Enhanced by Zemanta
Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/


25 Thoughts on "The Secret Life of Retracted Articles"

An outstanding and invaluable study. By contrast, can anyone comment if the legal information system allows (again, if at all0 for discovery of overturned court cases?

When I worked for Lexis years ago, there were competing systems for discovery of the status of cases–AutoCite and Shepards (sp?) Woe to any attorney who relied on a precedent only to discover that it had been invalidated by a higher court (though usually the checking was left to paralegals). Lexis and Westlaw developed tools to allow one click validation of a citation, but that was easier than in today’s STM environment because cases issue from a limited number of established jurisdictions. There were some interesting struggles around identifying a newly issued case that had not yet been published (mostly print back then) so the online legal systems tried to create their own citation structures. There was no parallel to authors posting their work on various public web sites.

Awesome piece, that is something I have actually never thought about it and it opened my mind.

“They are also run by individuals who lack the oversight that would allow such changes to take place. Repositories are run by librarians, not editors.” – well, don’t you think that librarians should start acting as editors, at least in the context of IRs?

They do. Phil is displaying his bias against institutional repositories here by implying that their managers (ie. librarians) welcome all comers with little regard for quality control (of metadata) and no concern for responsible collection management. That is simply not true.

Nice work! This problem illustrates the fact that scientific communication is a complex diffusion process, in which where ideas go is not directly traceable. This has implications for metrics, but that is a different issue. PMC has a lot of money and NLM does a lot of research, many millions worth, so they should be able to fix their part. Have you talked to them? Beyond that, how hard was it to do the tracking down that you did? Would it be feasible to build something like CrossUnRef to notify people that they are posting or citing a retracted copy? This could be fun.

More generally, this is a special case of a common problem in engineering, called configuration management. When you have a project and the design is changed you have to try to see that every relevant project document is changed accordingly. The copier created this problem and the Internet is the world’s biggest copier, by far. Changes chasing documents around the system can be a major source of confusion and error. If it is a big project with many tiers of independent subcontractors, it can reach the scale of your science problem. I met this problem once in a multi-billion dollar weapons program. I call it chaos management. So the CM community might have some useful tools.

Are there ever cases, however, of authors later re-defending the data in retracted articles in even further articles, such that the issue of the retraction becomes blurred? I’ve never heard of this, but it would seem “an author scorned” has sufficient motivation….

I think, there should be some kind of central database of retracted papers based on DOI / PMID and connected to all other literature databases (Pubmed, libraries, Reference managers, ..). System could resolve such paper according to DOI or PMID and warn user of retracted paper.

I think it would be excellent for a retractions database to be developed. The issue was raised by Richard Van Noorden (at 4:28) in a live webchat on retractions hosted by Nature 11 October 2011


and which included a poll on ‘Do you think a database of retractions would be useful?’. 96% voted yes (but don’t know how many people voted).

Ivan Oransky, who was part of the webchat, commented “We’d love to see a retractions database, and if the number of questions we get about whether one exists is any indication, so would a lot of other people.”

Great piece, thanks Phil. I have one other suggestion for your list of solutions – education. Although maybe not exactly a solution as such, I do think it’s important to help readers understand how and why to check whether an article has been retracted…

I agree education of researchers is critical. Our journal is published by Wiley-Blackwell. The W-B website makes it very easy to see any errata, discussions, or replies. (We haven’t had a retraction, but I assume it too would be clear.)

Those who get copies of our articles through “free” channels, legal or otherwise, may be getting what they paid for. A researcher takes a serious risk with the reputation in not checking with the official source.

What about articles whose results are later contradicted although there is no retraction? Ioannidis showed this is common (http://pubmed.gov/16014596)

An example is the HOPE study that was published in the NEJM (http://pubmed.gov/10639539). This study was questioned after a later report detailed the timing of administration of the study medication and measurement of blood pressure (http://pubmed./gov/11751742). UpToDate gives a well-reasoned refutation of the conclusion of the HOPE study (Wikpedia still accepts the HOPE study http://en.wikipedia.org/wiki/Vascular_surgery ) .

The current copy of the HOPE trial at PubMed and the NEJM websites gives no indication of this controversy and misleads readers. The list of cited articles should help, but it is a list 2340 citations arranged only by date. The list is incomplete and does not include http://pubmed./gov/11751742 .

Could a system be designed to link a study to current assessments of the study? As wikis improve their content, they may be able to fill this role.

Retraction is a part of the scientific communication network. Although it is qualitatively different, there is a certain operational similarity between retraction and the normal evolution of a scholarly concept. Both involve the overturning of an idea through further work. Granted, a retraction is a rather precipitous reversal!

In Web of Science, Thomson Reuters considers it critical that a retracted article be identifiable as such, but also that the retraction notice be fully incorporated as part of the life history of the article. When we receive a retraction notice, we take several steps to ensure that the retraction is noticed. First, the retraction itself is indexed using the same title and authors as the original article – this ensures that any topic or author search that retrieves the article also retrieves the retraction. Second, we add the parenthetical note to the title of the retraction notice “Retraction of…” and the bibliographic reference of the original (retracted) article. Third, we create a cited reference in the retraction notice to the original (retracted) paper so that the two are linked in the product; this also means that a cited reference search for the original (retracted) paper will also retrieve the retraction notice as one of the citing articles. By the three most common search methods, if you get info about the article, you get info that it’s been retracted. Finally – we also update the title of the original paper to note “Retracted by…” and the bibliographic information of the retraction notice. Any retrieval of the original article will immediately identify that the item has been formally retracted. By retaining the original, and all other citing works, we also allow our users to view and thus evaluate the later articles reporting work that might have been based on the retracted item.

We see this as the most effective way to represent the relationship between the retraction and the original work, as well as the relationship between the original work and the literature that either depends on or refutes it.

Hi Phil. This is great work, must have been quite the mission. Is there any data on how often authors try to cite retracted papers? It would be good to know whether they’re available but ignored, or whether people are still actively using them in their research. Maybe production editors would catch these citations when they’re typesetting?

Reading your post also tapped into one of my biggest worries about switching to a world dominated by post-publication peer review: if researchers are still accessing papers that are (to the fullest extent possible) flagged as false and unusable, how can we expect that an equally devastating critique posted on a personal website will be seen by everyone that might cite that paper?

Thanks Tim,
The recent study by Jeff Furman and others (see Can Article Retractions Correct the Scientific Record?) provides a very good analysis of when articles are retracted and how much effect a retraction has on future citation of that article. Their article is exceptional in that they use a control group and also control for the variability in article-to-article citations.

Related to this point, one could imagine that citations to retracted articles could be flagged in the editorial process. As CrossCheck is employed by many publishers to check for plagiarism, the reference section could searched against the CrossMark database.

An author may still have reason to cite a retracted article, but at least everyone would understand the context of citation. John Budd’s most recent paper illustrates that the vast majority of citations to retracted papers either implicitly or explicitly acknowledge that the retracted article is valid. This suggests that there is a problem in the communication chain.

I’m not going to speculate on article retractions in a world dominated by post-publication peer review, only that I hope that those advocating for such a world understand this problem and are thinking of solutions.

In addition to the “discovery, reading, writing, publishing” checkpoints, a check might be added at the “indexing/abstracting/analyzing” stage. Those indexing, abstracting, and analysis services that process reference lists automatically or semi-automatically might be able to add an automatic check for retracted papers without too much difficulty and send an automatic notice to the editor of the journal: In your Vol x No y the article by … “…” pp. …-… cites a retracted article as reference [nn]. Please see retraction notice at…

This would be a bit like locking the barn door after…, but still might be a usual final QC check that would help improve the system.

Perhaps, if civil penalties were instituted that would fine authors who fail to make sure that all versions of their articles on the open Web bear retraction notices within, say, 90 days of the retraction occurring, there might be more motivation to make sure this process was handled properly.

This would be administratively difficult, since authors have no control over most of these versions, and no way to find all of them. Plus who would impose these penalties? It would probably diminish retractions.

This analysis is timely and has broad implications. We have recently studied retracted articles in terms of their citation context and how they evolve over time. We found that many retracted articles are highly cited and from the most active research areas, i.e. they may pose a high risk to the rest of the literature, and higher-order citation chains originated from a retracted paper are not readily traceable with available tools. Our work will appear in the Journal of the American Society for Information Science and Technology. A preprint is available here:

I have long been fascinated by visual analytics and this analysis is no exception. However these methods still strike me as a technology in search of a use. The basic problem seems to be that citations do not convey very much information. My research suggests that relatively few citations reflect an actual dependency relation to the cited work. If C cites B and B cites A this is not strong evidence that C is using A’s results.

Thus I think your conclusions may be somewhat overstated. For example, I do not think we are ready for a cadre of regulators to use these methods to supervise the literature, and we may not want that in any case. Your results seem to suggest that citations of retracted articles change from positive to negative, perhaps showing the literature to be self correcting. I think the goal at this point should be to first find a use for these tools.

Thanks for sharing your views on this. As you can see in our study, many of the most highly cited retracted articles have clinical implications. For example, believing Wakefield et al. or not, parents had to make their decisions on whether to take their children for vaccines. Believing Nakao et al. or not, physicians had to choose whether to prescribe their patients accordingly. In this kind of situations, the point is no longer searching for strong evidence (which may not come in time anyway), but rather assessing the potential risk so that one can minimize irreversible consequences if a wrong turn were unknowingly taken. Your point on the strength of citation along as evidence is great, which is why we took into account citing sentences in the full text analysis to reveal exactly how retracted articles were cited (see Table 5 in our paper).
We wanted to raise the awareness of the potential risk posed by retracted and ought-to-be retracted articles. Visual analytic tools could help us to verify and assess the potential risk. We sense that the risk is probably under-estimated, for example, as shown in your comment 🙂

Thank you for responding. Given that we both study the diffusion of scientific ideas (I will not say knowledge in this retraction context) this could be a productive discussion. I would say that I do not doubt the risk but I question, indeed fear, the cure. Once a bad idea goes out there is really no way to call it back.

On the other hand I suppose the regulators might mount an information campaign of some sort, using visual analytics as a kind of radar or something. Is that what you have in mind? By a perverse coincidence I happen to have a background in regulatory design, so I view these issues as engineering problems. My first thought tends to be feasibility. What I fear is something that looks like censorship of the literature.

Comments are closed.