*AI was most definitely used in writing this article
Last week I received a frantic call from a Master’s student in Austria who was inconsolable. He had just submitted his thesis to his university for review and it had been flagged as being written by AI. The university had given him one more chance to revise and resubmit his work. If it passed the AI detection tool then they would review the work and give him a final grade. If it failed the automated check, then it would be automatically rejected and he would be dishonorably kicked out of his program with two years of study going down the drain.
AI Detection Tools to Uphold Research Integrity?
The recent surge in the development of AI technologies in the realm of writing has led to the rise and proliferation of AI detectors in the academic world. These detectors promise to be the gatekeepers of academic integrity by combating plagiarism and AI-generated content. While the ambition is noble, their practical implementation has seen its fair share of critical shortcomings.
The fundamental assumption underlying the creation of AI detection tools seems to be that AI writing should be able to be detected the same way that plagiarism is detected. However, there is a critical distinction: plagiarism simply looks for exact matches with existing works, an objective criterion that can be identified, measured, and replicated. AI writing, on the other hand, is original in its own right (even if drawn from unoriginal sources), and can’t be easily traced to its source.
My opposition to scholarly publishers relying on detection tools stems from both pragmatic and ideological reasons. Let’s start with some of the pragmatic issues.
Issues with False Positives
Large Language Models learn from human writing inputs and are built to resemble human writing in its outputs. Already with the launch of ChatGPT, it was clear that generative AI could produce writing that successfully mimics that of humans. Quantifying the respective human and AI components in a specific document is challenging and often times authors will mix their own words with those suggested by the AI tool.
The imperfections of AI detectors are becoming more evident as they often misidentify genuine human-generated content. Studies have shown error rates of up to 9% and higher, a number way too high to live with. One notable case was an AI tool flagging the US Constitution as AI-produced. This false positive not only highlights the glaring imperfection of these detectors but also underscores the potential for pitfalls awaiting academic authors who treat these reports as authoritative. A humorous yet disturbing case of such confusion arose after a professor from Texas A&M failed his entire class after ChatGPT responded in the affirmative when he asked if it had written the papers handed in by the students.
In a shockingly candid admission, Turnitin admitted in a recent video that their AI-detection software should be taken ‘with a grain of salt‘. In addition, they say that instructors will need to be the ones to ‘make the final interpretation’ regarding what is created by generative AI. Isn’t that the exact reason why faculty members are turning to these tools in the first place?!
Universities are starting to understand the implications of these admissions and have started to take action by advising their faculty not to use the tools. In a guidance report published by Vanderbilt University, they note that Turnitin, their supplier for plagiarism software, originally claimed to have a 1% positive rate finding AI written works upon the launch of their AI-detection tool but then upped that rate to 4% upon wider usage and testing.
Even if those numbers improve, it wouldn’t be difficult for ill-intentioned authors to run the AI output through paraphrasing software to remove traces of the original. OpenAI itself shut down a project attempting to detect its own outputs! Many universities have already changed course and are looking for alternative policies.
Collateral Damage of False Accusations
The fallacy of AI detectors has real-world consequences. Timnit Gebru, Founder & Executive Director at The Distributed AI Research Institute (DAIR), recently shared a distressing email she received where a writer was unjustly accused of employing AI. Such incidents can cause undue emotional distress and potentially tarnish a researcher’s professional reputation. The ripple effects can result in mistrust, skepticism, and derailment of academic careers, not to mention prolonged legal battles.
Even worse, these detectors are more likely to mark work by English as an additional language (EAL) speakers as being AI-generated than their native English-speaking counterparts. The last thing any publisher should want is to risk further embedding biases and discrimination against EAL authors.
Why are We Running to Ban AI-assisted Writing Again?
Scholarly publishing should be cautious about embracing AI detection tools for reasons beyond research integrity.
While publishers most publishers likely won’t want to publish research that was obviously rendered by ChatGPT, adopting policies where AI checkers are standard is also making an educational and values statement about how we view the use of generative AI in the expression of academic findings. Rather than rejecting AI tools in academic writing, what if we used them as educational tools and a means to level the playing field for EAL scholars?
Institutions like Yale University are pioneering efforts in utilizing AI to augment the writing process. Wharton School couple Ethan and Lilach Mollick have put together an entire online practical AI course for the classroom including how GPT can be integrated into assignments. These advancements highlight a potential path forward where AI aids rather than hinders academic writing.
Conclusion
While the motivation behind integrating AI detectors in the academic review is well-intentioned, the challenges they introduce necessitate a different approach. The scholarly publishing industry must be vigilant, weighing the potential pitfalls against the promise and exploring ways to harmoniously blend AI into the academic literature.
Discussion
5 Thoughts on "Publishers, Don’t Use AI Detection Tools!"
Great article, thank you. Potentially relevant discussions about topics such as these with the likes of Mozilla, Google, Meta, HuggingFace, Stanford, Princeton, Carnegie Mellon and Harvard:
Workshop on Responsible and Open Foundation Models
https://sites.google.com/view/open-foundation-models
it’s like those supervisors that only look at the overlap percentage without ever actually checking the similarity report in turnitin (e.g. 80% similarity, but it’s all from the author’s earlier draft of the same paper)
but i do wonder if there’s an established way of proving authorship in cases when the system flags a paper as AI-written? are there any logs produced by word (or other processors) that could be used to verify that sort of thing?
seems ridiculous to just give students two chances to get a pass from their AI detection system without any other way of challenging the decision
Thank you, Saša. This is an excellent point. We have certification systems in products such as DocuSign. The challenges to authenticity that chatbots present are a huge opportunity for word processor entrepreneurs.
AI-written, AI-generated, AI-produced, and AI-assisted are four types of AI-output mentioned above. Another type is AI-copy-edited.
Should copy-editing be detected? Should copy-editing be banned? Is it acceptable to have a human perform copy-editing but not a computer?
I use AI to copy-edit my writing. However, if I were applying for a copy-editing job or being evaluated on my English language skills, I would not use AI. If I were writing a STEM thesis, I see little logic in prohibiting me from AI-copy-editing it.
For technical folks interested in using my AI-copy-editor, it is available for free at https://gitlab.com/castedo/openai-utils. It is an incredibly cost-effective copy-editor. This comment has been AI-copy-edited using this AI-copy-editor. I’m pretty sure you’d rather read this AI-copy-edited version more than my no-AI draft.