Fighting Citation Pollution - The Challenge of Detecting Fraudulent Journals in Works Cited

Editor’s Note: Today’s post is by Lisa Janicke Hinchliffe and Michael Clarke.

As citations to articles in fraudulent journals increasingly appear in article manuscripts, vexing reviewers and editors alike, the scholarly communications community needs to develop an automated shared service to assess works cited efficiently and ensure that authors are not inadvertently polluting the scholarly record.

Caeretan Hydria, Digital image courtesy of the Getty’s Open Content Program.

CrossRef’s recent decision to rescind the membership of OMICS brings the issue of citation pollution into sharp relief. The decision comes in the wake of $50 million fine levied against the publisher by the US Federal Trade Commission in a summary judgement earlier this year. Now CrossRef is freezing OMICS out of its ecosystem. While DOIs already deposited will remain active, OMICS will no longer be able to deposit DOIs via CrossRef.

CrossRef is not the only organization to grapple with this issue. The Scientist reported in May on growing concerns among researchers about papers from fraudulent publishers finding their way into PubMed via PubMedCentral. Once in PubMed, the papers appear just like any other paper and can easily be found and cited by researchers.

While the extent of the fraudulent and deceptive journal publishing practices in scholarly publishing is not fully known, it is perceived as a substantial and growing problem. There are, for example, over 10,000 journals on the Cabel’s blacklist. (Let’s pause to let that number sink in: over 10,000 journals.) While some of what is published in these 10,000-plus journals is undoubtedly methodologically sound scholarship (an inference based simply on the volume of papers we are talking about), other articles are at best questionable science and at worst outright fraud. Separating the methodologically sound from the fraudulent would be a Herculean challenge (analogies to vanquishing the Lernaean Hydra or cleaning the Augean stables seem apropos), so what are citing researchers, and the legitimate journals they publish in, to do?

Authors and editors who wish to avoid giving citations to fraudulent publications are in the position of having to track which journals engage in fraudulent practices. This is difficult due to the sheer number of such journals and the fact that many fraudulent journal titles are deliberately chosen to closely mirror those of legitimate publications. While manual checks by authors and copyeditors against whitelists and blacklists are possible, such approaches are time-consuming and costly. Further, copyediting practices vary widely among publishers and even among journals at the same publisher. While some journals closely review citations, others simply edit details to conform with the journal’s style format.

Spending any time seriously considering this challenge leads one to see there is clearly a need for a scalable, easily adopted, and industry-wide approach to the problem of citations to articles in fraudulent journals appearing in author manuscripts.

We suggest that what could meet this need is a “meta journal look-up service” that could be accessed via API by the production systems and editing tools used by publishers. In reference to the labors of the ancient Greek hero, we propose calling such a system “HYDRA” for High-frequencY Fraud Detection Reference Application.

How HYDRA could work is as follows. A manuscript would be submitted to a publisher’s production or copy editing system. As part of that ingest process, the list of cited journals would be sent to HYDRA in the form of an API query. HYDRA would then return a list of which whitelists each cited journal appears on. So, for each citation in a manuscript, HYDRA would return something like “Journal X is indexed by Web of Science, Journal Citation Reports, Scopus, DOAJ, MEDLINE” and so on. It could include subject lists as well, e.g., EconLit, PsycINFO, MLA, GeoRef, Inspec, and so forth. HYDRA could further allow publishers to maintain their own whitelists that would be incorporated into query results; this might include regional titles and niche publications that do not appear on other whitelists. Such a look up process could also bring back which blacklists a cited journal appears on. By querying multiple lists, HYDRA would avoid over-reliance on a single authority and allow for a more nuanced assessment of a given journal title.

If a journal does not appear on any whitelists — or if it appears on any blacklists — a query to the author could be automatically generated (as a component of the author submission or proof review process) asking the author to justify the citation. Journals might adopt a simple editorial policy: If a reference is not included on certain whitelists (which might vary by journal and might include publisher-maintained regional lists), then authors must justify the citation to the satisfaction of the editor. For example, in writing about fraudulent publications, it may be necessary to cite them!

As HYDRA would be providing a simple look-up service, it could be embedded into any number of tools and applications in the scholarly workflow. This might include authoring tools and manuscript submission systems, for example. HYDRA might also have a simple web look-up that anyone could use. This might even be used by authors to validate that a journal they are considering submitting an article to is on well-known whitelists or to find out if it is on any blacklists.

This approach would not require too much in the way of new infrastructure or the creation of new lists. It would require, however, that the various whitelists allow HYDRA to make an API call, for free or through some sort of license model, and return a validation that a given journal is on a list (or that it is not). HYDRA would therefore not store any information from any whitelists — it would simply act as a kind of switchboard. It would be, in other words, a look-up, not a meta-index. And the look-up need not contain any additional information from the lists — only the fact that the journal appears on them (or does not). This enables any subscription-based whitelists/blacklists to preserve much of the value of their products while contributing data to HYDRA, which in a way serves as marketing for the fuller data and services of the subscription products.

The development and industry-wide adoption of a service like HYDRA could go a long way toward keeping citations to articles in fraudulent journals from polluting the scholarly record. It would also go a long way toward educating authors, editors, and others about the problem. The simplicity of the service makes it easy to adopt both technologically and socially. The costs of developing and maintaining such a service should be minimal and could be supported via a modest fee for API access (website look-up, a more manual process provided for the individual author or very small publisher, would ideally be free).

This idea is ripe for a collaborative development approach, perhaps undertaken by an existing infrastructure organization. We offer this idea with the acknowledgment that it is not fully detailed (e.g., how to handle citations to sources other than articles, should it be extended to flag retractions, etc.). We hope that it will inspire conversation and, perhaps, action.

***

Note: We wish to acknowledge that the idea for HYDRA was born in response to a post from Margaret Winker on the OSI listserv that asked: “Authors cite research published in what may be predatory journals. Should a journal refuse to allow the citation(s)? And if so, what does that look like?” Though the full extent to which citations to articles in fraudulent journals are entering the scholarly record is not well documented, the OSI discussion revealed that this is a problem of great concern for journal editors and publishers that has elided easy resolution through manual review processes.

Lisa Janicke Hinchliffe

@lisalibrarian

Lisa Janicke Hinchliffe is Professor/Coordinator for Research Professional Development in the University Library and affiliate faculty in the School of Information Sciences, European Union Center, and Center for Global Studies at the University of Illinois at Urbana-Champaign. lisahinchliffe.com

Michael Clarke

@mtclarke

Michael Clarke is the Managing Partner at Clarke & Esposito, a boutique consulting firm focused on strategic issues related to professional and academic publishing and information services.

Discussion

34 Thoughts on "Fighting Citation Pollution — The Challenge of Detecting Fraudulent Journals in Works Cited"

Is this an elevator pitch? 😉
If yes, its a winning pitch already. All you need now is a funder who is fine with offering this service for free (in the spirit of open access)

By Vogel G
Sep 25, 2019, 6:30 AM

Doing this post acceptance is not the right place to do it. If an author is heavily relying on “questionable” sources, the entire study comes into question. This is better dealt with at the revision stage, or even submission.

This is a new journal killer. Takes three years to get considered for Scopus. Web of Science can take decades for niche topic journals. If they aren’t OA or medical by nature, you aren’t in DOAJ or PubMed. So how do you implement something like this without killing citations to new journals or ultra-niche journals? If we start asking authors to justify using new sources from reputable publishers, they will see it as yet another hurdle and not bother citing it unless they absolutely have to. Not getting cited is the number one reason for not getting indexed.

Sadly this is another situation where we need a solution for the fraudsters and that solution unfairly penalizes good actors. Also, I think this tool already exists from Scite.ai.

By Angela Cochran
Sep 25, 2019, 8:37 AM

No publisher has to check any particular category of journals with this tool. Can be as broad or narrow as one likes. And it could be done at manuscript submission or post acceptance. Perhaps it needs to be done at both given how pieces can change through the reviewing process. Nonetheless, the main point to take away here is this work is currently being done manually and editors are looking for an automated tool to assist them in this work. It isn’t a question of whether this work gets done but how.

FYI Scite.ai said on Twitter today that they don’t have all the functionality for this. See: https://twitter.com/joshmnicholson/status/1176832869995634689?s=19

By Lisa Janicke Hinchliffe
Sep 25, 2019, 8:59 AM

I agree that whitelists are problematic here, as they are often incomplete — just yesterday it was noted that a huge number of Latin American journals are not listed in any of the major indices (https://twitter.com/OccupySTM/status/1176510857175547906). I do think, however, there’s value in a tool that would flag any citations to journals on a reputable blacklist. It doesn’t necessarily mean removal of that citation, but would allow an editor to give it a closer look or query an author. I had proposed this to Cabell’s many years ago (mentioned in a TSK comment from back in 2017 https://scholarlykitchen.sspnet.org/2017/07/25/cabells-new-predatory-journal-blacklist-review/#comment-70007). To me this would be an ideal business model for their blacklist — make the list freely available to all, and then sell the tool that makes it easy to apply.

By David Crotty
Sep 25, 2019, 9:15 AM

A publisher can add a ping to the database of all Latin American journals if it wants. We mention this in the piece.

“HYDRA could further allow publishers to maintain their own whitelists that would be incorporated into query results; this might include regional titles and niche publications that do not appear on other whitelists.”

Nothing in the design of this restricts the whitelist sources to WOS and Scopus.

By Lisa Janicke Hinchliffe
Sep 25, 2019, 9:30 AM

If editors are constantly having to discover and evaluate potentially enormous indices of journals and papers, then that removes a lot of efficiency from the system and leaves the editors back to doing things manually.

By David Crotty
Sep 25, 2019, 11:49 AM

That’s the beauty of a shared tool… just like libraries are able to enable access through a shared service like SFX to provide access to their patrons to those things that they subscribe to… HYDRA’s knowledgebase could alert editors to collections of journals that they themselves are not aware of and might accidentally discount through to lack of familiarity rather than a quality assessment.

By Lisa Hinchliffe
Sep 25, 2019, 12:47 PM

You mean like give the razor for free but charge for the blades! I think the problem one encounters is that the market is just not big enough to support the work behind building, marketing and administrating such a system. Perhaps I am wrong.

By harvey kane
Sep 25, 2019, 10:52 AM

The work is already being done (the blacklist exists). I think there’s more business in doing something like iThenticate, charging per paper run through the system, then in selling access to the list itself

By David Crotty
Sep 25, 2019, 11:47 AM

I think your first point is really important and want to bring it out for further discussion. The underlying premise of this post is that reputable scholars can’t be trusted to only use and cite reputable scholarship. The fact that they used bad sources is surely of much greater concern to the future of legitimate scholarship than the citation “pollution” problem? Or is there a subtext here where everyone silently admits that scholars are filling their Works Cited lists with works that look good but weren’t really needed to include in their own article? Also given the point made by others that exactly because of the difficulty especially for younger scholars in detecting predatory journals when choosing a journal to submit to, we’re seeing an increase in legitimate scholarship being published in predatory journals, I have to question whether we should be second-guessing any given scholar’s decision to cite any given particular article, regardless of the reputation of the journal it was published in? We may well be moving into a time where the journal is no longer the appropriate unit to attach reputation to. Those for whom pre-prints are really important sources have already moved beyond that anyway, but I know that varies heavily by discipline.

By Melissa Belvadi
Sep 25, 2019, 11:47 AM

Um, isn’t the role of the editor and reviewers precisely to “second guess” … i.e., scrutinize the work to determine when it is ready for publication and industry in efforts until the quality bar is reached/surpassed? I don’t think anyone has to judge the ethics or morals of an author who is citing something to say “these don’t appear to be appropriate by our standards would you revise or justify?”

By Lisa Hinchliffe
Sep 25, 2019, 12:44 PM

Thanks for this post, Lisa and Michael.

You might be interested to know that I presented some preliminary research on citations to predatory journals at this year’s World Conference on Research Integrity. I need to publish my findings and haven’t done so yet–but one of the more disturbing data points was that of the seven indisputably predatory journals I examined (identified as such because they published nonsense articles that were used in “sting” operations), one of them had had 36% of its published articles cited in the mainstream scientific literature, and another had 25% of its articles cited. A third had only 6% of its articles cited–but it’s a relatively prodigious journal and its lower percentage actually represented the largest absolute number of citations.

By Rick Anderson
Sep 25, 2019, 9:50 AM

Very interesting indeed. Perhaps unsurprising then that editors are concerned that these things be checked for and that, having started to do so manually, they’d like some tools to help make the process at least more efficient if not also more effective.

By Lisa Janicke Hinchliffe
Sep 25, 2019, 10:07 AM

Hi Rick–in a similar vein, but slightly different viewpoint, we found that 7 predatory nursing journals generated 814 citations in 141 non-predatory nursing journals:

Oermann, M. H., Nicoll, L. H., Carter-Templeton, H., Woodward, A., Kidayi, P. L., Neal, L. B., … Amarasekara, S. (2019). Citations of Articles in Predatory Nursing Journals. Nursing Outlook. https://doi.org/10.1016/j.outlook.2019.05.001

By Leslie Nicoll
Sep 27, 2019, 1:58 PM

Can you give us an idea of those 814 citations, how many different actual articles that represented? And if it’s not too many to analyze, maybe you can look at them to see if they are actually good quality research, despite the quality of their parent journal? I tried reading your article but it wasn’t clear to me with all the article counts how many of those were total uses rather than distinct article counts.

By Melissa Belvadi
Sep 27, 2019, 2:05 PM

This is truly a “back of the envelope” calculation, but I came up with:
*814 citations, of which 667 were unique
There were 90 articles that were cited twice; 32 articles were cited three times; 17 articles cited four times; 5 articles cited five times; 6 articles cited two times and 1 article that was cited 8 times; total: 147.
We did not read the individual articles to assess their quality. However, in a prior study where we *did* read the articles, we reviewed 358 articles and of those, 96.3% were rated average or poor.
Oermann, M. H., Nicoll, L. H., Chinn, P. L., Ashton, K. S., Conklin, J. L., Edie, A. H., … Williams, B. L. (2018). Quality of articles published in predatory nursing journals. Nursing Outlook, 66(1), 4–10. https://doi.org/10.1016/j.outlook.2017.05.005

By Leslie Nicoll
Sep 27, 2019, 3:25 PM

Citations are to papers not journals. While I agree that some journals (and publishers) have questionable peer review and publication practices (viz. OMICS), some papers published within reputable journals are fraudulent, and vice versa. Moving from a granular system of detected and marking the fraudulent literature (through individual paper correction and retraction), to a basket model appears to be moving us in the wrong direction.

Moreover, authors are known to be poor citers of the literature, not even getting the journal title correct. According to Stephen Hubbard and Marie McVeigh of Clarivate Analytics, “the average [reference] entry for a covered journal contains 10 variants, but approximately 1% of preferred titles require more than 50 variants.” [http://dx.doi.org/10.1001/jama.2009.1301]. In practice, your automated HYDRA lookup will fail to match references to white or blacklists, generating automated messages for authors to justify many of their sources (at best), or failing to tag known fraudulent papers and journals.

I agree that the scholarly record is not entirely clean. However, we have safeguards that are designed to, at least, catch and deal with the largest polluters (viz. OMICS). Clarivate routinely suppresses titles from its Journal Citation Reports (JCR) that engage in citation self-dealing and cartels, and Scopus also has oversight on what they index. While these tools are not in themselves perfect, they do help keep the skies relatively clear, trustworthy, and breathable. The unintended consequences of implementing a system-wide detection model based on lists of questionable authority and accuracy makes me worried that the solution you propose may be far worse than the problem.

By Phil Davis
Sep 25, 2019, 10:09 AM

The system tags and flags … what an editor does with that info is a different issue. If we trust editors to accept/reject manuscripts, surely we can trust them with some metadata onthe journals being cited? Especially since editors are reporting already looking for this info … just inefficiently manually?

By Lisa Hinchliffe
Sep 25, 2019, 10:33 AM

Question: Do you accept personal communications, newspaper items, books or video clips as legitimate references in a peer reviewed journal article?

By Bernhard Mittermaier
Sep 25, 2019, 11:43 AM

Or datasets or software code for that matter.

By David Crotty
Sep 25, 2019, 11:46 AM

This whole idea strikes me as problematic because:
— it implies that authors either don’t read the papers they cite or can’t comprehend and evaluate the papers they read;
— querying authors re. suspicious citations is redundant considering that (1) the justification for citing an item is usually given in the manuscript itself, and (2) if the manuscript’s findings rely on its reference list, it’s not really methodologically sound itself (barring systematic reviews and such to a degree of course);
— it reinforces conflation of journal reputation and article quality;
— it opens up new possibilities for abuse and citation manipulation depending on how editors handle “unjustified” citations to predatory publications

By sasa marcan
Sep 25, 2019, 11:46 AM

it implies that authors either don’t read the papers they cite or can’t comprehend and evaluate the papers they read

By this logic, we shouldn’t subject papers to peer review and editorial intervention at all. After all, doesn’t doing so just imply that authors are dishonest or incompetent?

it reinforces conflation of journal reputation and article quality

Since predatory journals are specifically in the business of publishing anything submitted to them, regardless of whether it’s literal gibberish, it seems to me that the link between journal reputation and article quality is maybe not so unreasonable–at least where predatory journals are concerned. And those are the journals at issue here, not just journals will less-than-stellar reputations.

By Rick Anderson
Sep 25, 2019, 2:44 PM

https://arxiv.org/abs/cond-mat/0212043
“We report a method of estimating what percentage of people who cited a paper had actually read it. The method is based on a stochastic modeling of the citation process that explains empirical studies of misprint distributions in citations (which we show follows a Zipf law). Our estimate is only about 20% of citers read the original.”

By David Crotty
Sep 25, 2019, 3:04 PM

This one too might be of interest:
https://arxiv.org/pdf/physics/0504094.pdf

By David Crotty
Sep 25, 2019, 3:07 PM

This is a really nice study, but there are a couple of problems with it. Firstly, the analysis (for necessary methodological reasons) looks at “celebrated papers” – the main paper in the study has 4300 citations. The results may not extrapolate easily to more “normal” papers. For example, the “celebrated” paper may embody a well-known result in the field which the citing author didn’t need to review before conducting their own study, or it may be a standard reference for a well-used method. The citing author may well have read the paper, but not recently. It is reasonable to hypothesise that less well-known papers directly relevant to the study in hand might receive closer scrutiny.
Secondly, Simkin & Roychowdhury’s paper was published in 2002. When the citations in the study were made, accessing the cited papers almost certainly involved a lot more organisation and effort than it would nowadays.
It is therefore possible that the true figure applicable to citing authors today might be a bit higher than 20%.

By Dave Jago
Oct 23, 2019, 6:11 AM

We might begin with organizations already affiliated with what we consider as “white list ” organizations, such as Publons, Unpaywall and librarians’ favorite, Ulrichs.

By Ruth A Pagell
Sep 25, 2019, 1:01 PM

A few bibliographic items:

1.
Regarding an important question in this context: how to define a predatory journal?
https://www.ncbi.nlm.nih.gov/pubmed/30135732.2
“What is a predatory journal? A scoping review.”

2.
Re. whether authors even read what they cite, one take is this:
“Read before you cite!”
M.V. Simkin, V.P. Roychowdhury (UCLA)
https://arxiv.org/abs/cond-mat/0212043

By Brian Simboli
Sep 25, 2019, 3:05 PM

Shifting from peer reviewed journals to preprints: perhaps a feature of the preprint server of the future would be a capability to identify citations to predatory journals (appropriately defined) within a given preprint. Caveat lector with respect to anything published anywhere, even peer-reviewed material, but at least this might help provide one more quality check for an arena (preprints) that is starting to burgeon.

By Brian Simboli
Sep 25, 2019, 3:17 PM

What good is a Do Not Cite List, if the blacklist on which it depends is only knowable to a tiny fraction of authors? I understand that Cabell’s can’t give away their product for free, but they need to find some model that makes the blacklist more broadly available. Some sort of freemium offering or, as David Crotty suggested, an iThenticate type scanning service to let authors or editors scan for dubious journals. The problem just seems to be getting worse with a blurring between legitimate and vanity journals. Rick Anderson mentioned a stinger journal with 36% of its papers cited in mainstream journals – that to me is a problem. I did some poking around on the OMICS website a while back and easily found articles from prestigious universities or government science groups: MIT, University of Saskatchewan, University of California campuses; Colorado State University, University of Denver (not colleagues of Beall, presumably) U.S. Geological Survey, U.S. Environmental Protection Agency, U.S. Center for Disease Control, …

Publishing one’s science in a vanity or predatory journal doesn’t mean that the science less valid than in a preprint, but with a preprint, the reader knows what they’re getting. There was a criticism from Sasa Marcan of conflating article quality with journal quality. I think there’s at least a correlation between the two (admitting that no one has yet figured out a good way to measure either).

We still need a better term for these vanity, fraudulent, predatory, masquerading, mimicry, luring, seductive, or just generally dodgy open-access journals. I don’t think fraudulent is it. Not sure one term can fit all. Sometimes it truly is a predatory journal luring a naive researcher with a slick website and impressive title, and other times it’s akin to the vanity book presses, where the researcher has to well know they’re paying for an easy publication.

By Chris Mebane
Sep 25, 2019, 5:34 PM

Unfortunately the article about Crossref dropping OMICS is behind paywall. And it is not easy to find anything about this interesting fact. There is one point from Crossref’s July 2019 Board meeting (https://www.crossref.org/board-and-governance/), which says: “10. To ratify the account termination, for cause, of OMICS Publishing Group (Member ID 2674); Ashdin Publishing (Member ID 2853); Scitechnol Biosoft Pvt. Ltd. (Member ID 9225); and Herbert Publications PVT LTD (Member ID 4912).” Although it is not really clear what happened and why.

By Gabor Schubert
Sep 26, 2019, 5:23 AM

Here’s one way something along these lines could be implemented right now:

1. Run the manuscript/PDF through a citation extraction and parsing tool (plenty of them out there)
2. Import the extracted RIS/BibTeX file into Zotero
3. Zotero has now integrated with RetractionWatch and so will flag any citations to retracted papers

This could be automated with a script.

To flag any citations to ‘undesirable’ journals, you could also:

1. Resolve each citation to a DOI (online tools are available to do this that will handle most citation styles)
2. Create a list of the DOI prefixes of the publishers of the ‘undesirable’ journals
3. Match the citation DOIs against this list

Actually this might also be a good argument for not ejecting undesirables from CrossRef (you’d lose the ability to match their DOIs).

By Phil
Sep 26, 2019, 7:36 PM

By omia
Oct 21, 2019, 10:50 AM

From 2015: Should We Retire the Term “Predatory Publishing”?
https://scholarlykitchen.sspnet.org/2015/05/11/should-we-retire-the-term-predatory-publishing/

By David Crotty
Oct 21, 2019, 10:52 AM

Last week we launched our Fidelior platform, addressing the growing problem of the debilitating effects of pseudo-science and predatory publishing. Fidelior offers the academic world a tool to differentiate journals of reputable from questionable nature.
Fidelior is an initiative of the Fidelior Research Team, partner of the Botlhale Village Innovation Ecosystem (www.botlhale.co.za), representing a group of concerned colleagues who seek to facilitate to mitigate the problem. We hope to establish and grow a scholarly ecosystem of stakeholders that share positive and negative experience of scientific publishing in an effort to enhance trust in science.
We invite all of you to participate in testing the Fidelior platform by registering at http://test.fidelior.net/#home and give us you assessment.

By Prof. Dr. Kris Willems (PhD)
Oct 22, 2019, 5:21 AM

The Scholarly Kitchen

Fighting Citation Pollution — The Challenge of Detecting Fraudulent Journals in Works Cited

Lisa Janicke Hinchliffe

Michael Clarke

Discussion

SSP Announces Release of Individual Results for the Insights Benchmarking Compensation & Benefits Study

SSP Virtual 5K Run, Walk, and Roll Returns for Fourth-Year

Lisa Janicke Hinchliffe

Michael Clarke

Related Articles:

Next Article: