Editor’s Note: Today’s post is by Mike Rossner. Mike is a consultant about image manipulation in biomedical research through his company Image Data Integrity, Inc.  Mike’s career in publishing included time as the Managing Editor of The Journal of Cell Biology and then as the Executive Director of The Rockefeller University Press.

The STM Integrity Hub

Several posts (“The New STM Integrity Hub”, “Peer Review and Research Integrity: Five Reasons To Be Cheerful”, and “Research Integrity and Reproducibility are Two Aspects of the Same Underlying Issue”) in The Scholarly Kitchen last year described the STM Integrity Hub. This Hub is a platform being developed by STM Solutions, through which participating publishers will share access to their submitted manuscripts. The Hub will include software to detect each of the following: 1) The hallmarks of a manuscript produced by a paper mill; 2) simultaneous submission of a manuscript to multiple journals; 3) image manipulation/duplication. The software applications for the latter two are intended to work at scale, comparing the content of a submitted manuscript to thousands of other submitted manuscripts, and perhaps also to millions of published articles.

Trust word on a Jigsaw Puzzle

Algorithms to detect image manipulation/duplication

My interest in and commitment to image data integrity spans more than two decades. In 2002, I initiated a policy at The Journal of Cell Biology to screen all images in all manuscripts accepted for publication for evidence of image manipulation/duplication. That work was described in a Scholarly Kitchen interview nearly a decade ago. At the time, all of the screening was done using visual inspection, aided by adjustments of brightness and contrast in Photoshop, which can reveal inconsistencies in background that are clues to manipulation, or consistencies that are clues to duplication.

In my opinion, visual inspection remains the gold standard for screening images for manipulation/duplication within an individual article or for image comparisons across a few articles, especially when a processed image in a composed figure can be compared directly to the source data that were acquired in the lab. But that process does not scale to comparisons across the entirety of the biomedical literature.

In the past decade, numerous software applications have been developed for the automated detection of image manipulation/duplication. These applications present the possibility of screening images at a scale that is not practical with visual inspection, and their use has the potential to protect the published literature in ways that were not previously possible. Several of them are now commercially available.

A call for data transparency

A recent news article in Nature indicated that “a small group” of publishers is currently testing the effectiveness of various software offerings in this space on behalf of the STM Integrity Hub. It is important that the data used for those tests, along with the results of those tests, be made publicly available, at least for the software ultimately chosen for the Hub. Ideally, these data include the images that were used for the tests and the output from the software, including the calculated true positive and false positive rates for different types of image data (e.g., photographs, blots, micrographs, scatter plots) and different types of manipulations (e.g., splices, use of the clone tool) or duplications (e.g., direct duplication, duplication with a change in aspect ratio, duplication with a change in orientation). Those rates can then be independently verified.

It is, of course, not unheard of for entities with a vested interest in a product to test it themselves. Think of airplanes being tested by their manufacturers or clinical trials run by the company that produced the drug. However, how crucial to the public good does a product have to be for its validation data to be subject to public oversight, such as the FAA for airplanes or the FDA for drugs in the U.S.?

While the connection to public health and safety may be less direct for much of pre-clinical research, I would argue that integrity of the published record is sufficiently important that the validation data for software designed to protect that record should at least be made public, and at most should be audited by a public entity, such as the Office of Research Integrity in the U.S., or European Network of Research Integrity Offices in the U.K. and Europe.

The importance of data transparency

Anyone using, or considering using, the software selected for the Hub needs to know its capabilities and limitations, so they know to what extent its use is protecting the published record. Any publisher using the software also needs to disclose to its editors/reviewers/readers what the software can and cannot do, so that they can remain vigilant with respect to its limitations. For example, if the software is really good at detecting duplications in certain types of images but not so good at detecting duplications in other types, theeditors/reviewers/readers will know to remain more vigilant about visual inspection of the latter types.

Anyone basing decisions about potential research misconduct on the output of software also needs to know its capabilities and limitations. I just consulted on a case where an author defended himself against an allegation of image duplication by using an algorithm, which did not detect the duplication. The duplication was strikingly evident upon visual inspection of the image, although the author tried to dismiss the visual inspection as subjective.

Visual inspection can take into account pixel variations, such as those introduced by image compression or different exposures, that might fool an algorithm’s statistical analysis. Although the editor of this particular journal did not take the author’s defense at face value, I am concerned that editors will become reliant on software to settle similar matters in the future without fully understanding what it can and cannot do.

Conclusion

I believe that there is a meaningful place for algorithmic screening of image data by publishers before publication ― in conjunction with visual confirmation of the results — but it is important that this community be transparent about the capabilities and limitations of any software that it chooses to use.

Mike Rossner

Mike Rossner is a consultant about image manipulation in biomedical research through his company Image Data Integrity, Inc. Mike's career in publishing included time as the Managing Editor of The Journal of Cell Biology and then as the Executive Director of The Rockefeller University Press.

Discussion

11 Thoughts on "Guest Post — Publishers Should Be Transparent About the Capabilities and Limitations of Software They Use to Detect Image Manipulation or Duplication"

Thank you for bringing common sense to this topic!

Successful hands-free checking of images for nefarious manipulation without generating false positives requires significant computing power which makes automated solutions expensive and unrealistic at scale.

Given the expense of human checking, what can journals do if they want to add value in manuscript processing?

Consider the approach used at border crossings. It’s too expensive and slow to examine every bag, so travelers are asked to make a customs “declaration” that serves the purpose of alerting them to restrictions and increasing the stakes of cheating.

Journals could ask authors to specifically identify who prepared images and if they made any changes. Such declarations/assertions could be tied to their individual ORCID IDs for transparency and downstream accountability.

As indicated in your article, software-based image checking adds value in specific situations where the cost is justified. However more practical approaches are needed if publishers want to improve peer review at scale and in a cost-effective manner.

Richard Wynne – Rescognito, Inc.

Hi Richard,

Thank you for your comment. In my opinion, publishers should not rely on a declaration by authors, but they should carry out systematic screening themselves. One way to control costs is to limit image screening to just those manuscripts accepted for publication, rather than screening all submitted manuscripts. That might mean that peer reviewer’s time is wasted in cases where the acceptance of a manuscript has to be revoked because image manipulation is detected that affects the interpretation of the data. The incidence of such cases was consistently 1% over the dozen years that I was involved with systematic image screening of accepted manuscripts at JCB.

Mike, there is no one more knowledgable and experienced on this topic than you. Nevertheless, I think you are confusing transparency with authority. You can be completely transparent about how a machine determines whether an image has been manipulated. Whether or not those changes are a violation of publication policy, and what course of action is taken as a result, requires a human being with whom the authority and accountability of the journal is vested. We often call these people “editors.” Computer algorithms may be put to work to detect image manipulation, but an algorithm has no authority or accountability to the journal, the publisher, or the community behind the journal. That part can only be vested in humans.

In your example above, the author argues that image manipulation didn’t take place because the algorithm didn’t detect it. This is like arguing that I didn’t steal from the corner store because the facial recognition software on the store camera was unable to make a positive match, even though the clerk at the desk saw me pocket a candy bar, watched me walk out without paying. While bored and underpaid, this clerk is vested with the authority and responsibility of running the store. Not the software.

Last, there is an argument to be made that complete transparency on how the algorithm works and what it can do will give fraudsters with a plan on how to evade detection. A neighbor of mine put an old computer camera in the corner of a window by his front door. “It doesn’t even work,” I said when I saw the USB cable laying uselessly unplugged on his living room floor. “Yes,” he said, “but a robber doesn’t know that!”

Hi Phil,

Thanks for your comment. I may be using the wrong term, but I don’t really want to know HOW an algorithm works (I’m probably not capable of understanding it anyway); I want to know HOW WELL it works. In other words, how much human intervention is needed. For example, if an algorithm can detect 100% of the duplications that I detect visually for a particular type of image data, but only 25% of the duplications for another type, then I know to focus my human duplication detection efforts on the latter type.

Regarding the issue of deterrence, the rates of manipulation-both manipulation that did not affect the interpretation of the data, and manipulation that did-that we observed at JCB were remarkably consistent over the years despite the fact that I was very public about the fact that we were doing systematic screening. That knowledge did not seem to be a deterrent.

I can only support what Mike is referring to when it comes to robustness of algorithms and the dependence on the actual situation it is used in (meaning the manipulation tested for). That is why there needs to be a rigorous testing of those algorithms and clear statements on their limitations (because all of them have those). Otherwise, journals are flooded with senseless allegation without substance. Those might waste more time than they do good.
Besides that, we (still) need the human eye to corroborate the finding as well as judging it in the light of the context and its impact it may have. So, software can reduce some of the workload but the judging, further investigation etc. is still a humans task. Therefore, I think we need effective algorithms for specific cases in order to not shoveling the ravel from one hole into another one and creating more work than before. Furthermore, I would also rather not rely on authors’ declarations.
The final question (which scares me a little) is, how will scientific reproducibility, proper usage of public funding and trust in science look like in near future, when tools like ChatGPT linked to GANs create images in which no algorithm nor a human can detect a manipulation of falsification and manuscripts are written in a way that no plagiarism software will ever detect it and statistical fakes are unidentifiable? Then fake science might become so prolific that there might be no efficient countermeasures at all anymore. Therefore, the community has to think about other types of proves needed to validate originality and fakt-fullness of science (while I currently also do not have a clue how that might look like).

Mike Rossner’s “Guest Post” concerns the lack of transparent software to detect false images. I fully agree. And as ability to fabricate images de novo develops (as it surely will), assessment of the image evidence will become even less accessible to both editors and to those in the research area in question. But this only means Journals need to do more to minimize having to wrestle with these question, especially when lawyers are involved.

Happily we are not fully there . . . at least not yet. The vast bulk of questioned images on PubPeer (still) involve either 1) duplicate-use or 2) bit-mapped alterations: Neither requires sophisticated software, nor any particular expertise in the questioned research per se. Also, evidence for/or against an otherwise weak allegation can be strengthened by using a mix of colors to overcome the eye’s limited ability to detect in grey scale. (The [now dated yet still] useful “Advanced Forensic Tools” and “Read Me Files” on ORI’s website address these uncertainties.)

Mike correctly observed that a pre-alert of journal screening does not always deter; and in fact ORI reviewed several JCI cases well after their pioneering efforts were widely known. At ORI, the common question in any case was uniformly “This is so obvious, just where were the coauthors?”

In my opinion, the absent ‘’coauthor problem” problem would vanish if Journals simply exercised what is already their prerogative, a step that both would cut their cost and instantly establish their quality. How many Journals require a ‘pre-nupt’ agreement, signed at submission by all coauthors, that specifies 1) that if a credible allegation of image falsification, indicating any shortcut in proper technic cannot be satisfied by the immediate provision of the raw or primary (unreduced) data, 2) then that paper would be automatically retracted? Coauthors would be motivated to ask to see the primary data; absent coauthor issue would disappear; and Institutions would soon get serious for meaningful data retention policies.

This simple approach won’t prevent the dishonest researcher, and with AI some image ‘allegations’ will only get more difficult to assess. In the meantime, wouldn’t such steps immunize journal editors and researchers in the questioned field from the otherwise costly need to understand what is rapidly becoming an arcane and inaccessible software approach to image forensics?

John Krueger (ORI 1993-2013)

We at Proofig, a company that provides a service for checking duplications and manipulations in scientific images before publication, agree with the ideas presented in the article by Mike Rossner. We are also in full support of the STM Integrity Hub and its mission to detect image manipulation and duplication in submitted manuscripts.
It is also important to note that while the software can detect image manipulation and duplication, the users (Editors/publishers/researchers) must understand the results and the limitation of any software in order making proper judgement. We in Proofig are committed to providing education and training to our clients to ensure they are able to fully utilize our service and make informed decisions.
To have areal statistics on softwares performanses – we recommending building a large bank of articles with and without problems in all relevant categories, agreed upon by at least 2-3 independent experts. That is crucial for objectively examining the performance of the softwares.
We would like to add that we believe that our service, Proofig, can assist in this endeavor. We provide a comprehensive and objective analysis of scientific images before publication, using state-of-the-art algorithms. For more info find more information about our service at https://www.proofig.com/
In conclusion, we at Proofig strongly support the STM Integrity Hub and its mission to ensure the integrity of the published literature. We are committed to working with the STM group and publishers to achieve this goal. The Proofig Team.

Thanks for these comments. We agree technology should not be able to make decisions, but it will and already does play an important role in flagging potential (image) integrity issues which are then for the editor or others to further investigate (e.g. through manual inspection, which indeed is still crucial we agree). Neither do we propose that machine-based (image) screening should replace manual screening before or after publication. Such tools simply assist in integrity control, particularly before publication but even after conditional acceptance. Especially when screening across papers is required, like with detecting text- or image-reuse from one paper into another, which are manually hard to spot, such tools are of tremendous value.

It is important to remark that the STM Integrity Hub is an enabling infrastructure where different tools can be plugged in based on publisher’s demand. Tools can be developed internally or externally. Which tools publishers use will depend on the choice of the publishers themselves, and tool providers will be encouraged to make information about performance and limitations available. We agree with the need to provide a necessary level of transparency about capabilities but we also recognize we need to be careful to strike a balance such that this information does not inadvertently undermine the integrity activities by allowing fraudsters to evade detection. Details on limitations or other information that will help bad actors to circumvent integrity screening should be minimized, while users must indeed be made aware of the performance limitations of any tool.

As a more general point, one of the aims of the Integrity Hub is to increase the awareness of tools that are available for publishers (independent of if or when they will be made available through the Hub). In December, we released an overview of four image screening tools and their capabilities prioritized by an STM working group, based on a self-assessment. For more information, see https://www.stm-assoc.org/stm-integrity-hub/

Hi Joris,

Thank you for your comments. I am pleased that we are all in agreement about the need for human involvement in the screening process. As I mentioned in my response to Phil Davis’ comment, transparency about the capabilities of software is important to inform where one needs to focus those human efforts. Any area in which an algorithm falls short would ideally be supplemented with visual screening.

I disagree with the argument that we need to be cautious with transparency for fear of tipping off potential perpetrators of image manipulation to weaknesses in the system. The broader transparency helps to inform editors/reviewers/readers about how they should read and interpret a given article based on any limitations in the screening process. I do not think that knowledge of these limitations would encourage potential perpetrators, just as knowledge of the existence of the image-screening program at JCB did not serve as a deterrent (see my response to Phil Davis’ comment).

Comments are closed.