Editor’s Note: Today’s post is by by Leticia Antunes Nogueira and Jan Ove Rein, both from the Norwegian University of Science and Technology (NTNU). Leticia is project leader for artificial intelligence at the university library. Jan Ove is Senior Research Librarian and acts as a subject specialist for medicine, providing teaching, counseling and research support for staff and students at The Faculty of Medicine and Health Science.
In Part I of Chatbots: To Cite Or Not To Cite? our guest authors explored the context of citation and information sources and how Generative AI fits in. Today, they continue to develop their arguments toward a conclusion.
In part I we explored publisher’s policies on generative AI, which in general offer clear recommendations for not accepting chatbots as (co)authors, but which leave guidance on the question of citing chatbots as information sources somewhat vague. We also explored the reasons for citing other works and the observation that opinions differ on whether generative AI tools should be seen as information sources.
To cite or not to cite?
In this dilemma and amidst the uncertainty about which practices and rules apply, our position is that chatbots should not be accepted as sources, and therefore should not be cited as such. This is a different matter than explicitly labeling AI-generated content. A piece of text that is entirely generated by generative AI ought to be marked as such for the sake of trust and transparency. But the matter of authors citing chatbots as information sources refers to a different issue. This does not mean that authors’ use of generative AI should go unacknowledged, much to the contrary.
A piece of text that is entirely generated by generative AI ought to be marked as such for the sake of trust and transparency. But the matter of authors citing chatbots as information sources refers to a different issue.
Like declarations regarding conflicts of interest, we consider it useful to declare if and how authors have used generative AI in their work. There are many legitimate uses of generative AI, and disclosure can contribute to transparency and as a source of inspiration on how to use these innovative tools to the benefit of academic research, not its detriment. This approach, however, is not without its challenges, given that the very definition of what constitutes generative AI can be blurry, as well as how much awareness users have of it.
We recognize the concern with the rightful attribution of ideas that are not one’s own, we maintain that the drawbacks of citing a chatbot are greater than the benefits of attribution. Two important reasons for this position, as we discuss in part I, concern the fact that outputs from chatbot can neither be reproduced, nor traced back as traditional sources. Nonetheless, in addition to this common objection for accepting chatbots as sources, we offer a few more arguments.
Citing chatbots conflicts with author policies
First and foremost, if academia agrees that chatbots cannot be acknowledged as authors, citing them as sources undermines this position. Doing so even indirectly legitimizes chatbots and their developers as authors. For instance, if we write that “the crust of the earth is a vast museum” (Darwin C. On the origin of species. Minneapolis, MN: First Avenue Editions; 2018, p. 162.), it is clear to the reader who checks our references that Darwin is the author of this statement, which can be found by anyone who checks page 162 of the specific edition to which we point. Citing chatbots creates a precedent that undermines the responsibility of authors.
Citing chatbots creates a precedent that undermines the responsibility of authors.
If rather than Darwin, the citation read “OpenAI,” we would implicitly be accepting that OpenAI is responsible for making that claim. It is true that scholars cite nonhuman entities regularly. We do it ourselves in part I, when we cite publishers’ policies on generative AI. But citing a report by the United Nations, for example, and a piece of synthetic text from an AI developer is decidedly not the same. Organizations are composed of people, and thus bear responsibility in a way that chatbot developers simply cannot.
Citing chatbots pollutes the information environment
Insisting on citing chatbots as sources promotes a kind of pollution in information ecosystems. This is because not only are people’s trust in sources compromised, but also if the very data used to train LLMs is generated by AI (i.e., texts from the internet, academic texts, and others), the quality of the models erodes. This has been seen, for example in appeals to keep Wikipedia, whose freely licensed content has been crucial for training LLMs, a “human-centered project”. Moreover, uncritical use of chatbots entails epistemic risks; that is, chatbots do not ‘know’ the answer to any question, they predict a sequence of words that might answer the question. When we take up these outputs and not only ascribe them the value of information and knowledge but also legitimize the LLM as the source, we create a self-fulfilling prophecy that reinforces the model’s prediction as de-facto reality. This would be troubling in any circumstance; but six years after “misinformation” was crowned the word of the year, and terms like ‘echo chamber’, ‘filter bubble’, and ‘post-truth’ have become common parlance, the issue of source integrity gains a new dimension.
Chatbots have not been designed to be truth machines
Chatbots have not been designed as tools for information purposes, though they can perform very well in tasks primarily concerned with communication. The uncertainty about the quality of their outputs is due to their purpose and structure, not their degree of technological maturity. LLMs are probabilistic by design, meaning that falsehoods are — as those in tech culture would say — a feature, not a bug. Hallucinations are easy to understand once we recognize that chatbots work by calculating the likelihood of upcoming linguistic strings, given preceding inputs, training data, and model parameters. Looking under the hood of chatbots and appreciating how they work, we become aware of their uses and limitations. They are based on patterns of language usage, not information.
Looking under the hood of chatbots and appreciating how they work, we become aware of their uses and limitations. They are based on patterns of language usage, not information.
Once chatbots get coupled with trustworthy databases to generate text based on selected and curated data, then it becomes possible to increase the trust placed in their outputs, not as sources, but as means to extract information from a large information pool. In this kind of use, chatbots do not serve as the source of information in and of themselves, but point to sources in their dataset. At the same time, this kind of tool gives rise to other challenges, such as whether the sources are genuine and relevant, as well as to what extent the text has been taken verbatim or paraphrased. This kind of application can already be seen in tools such as Microsoft Copilot in Bing (for general use) and Scopus AI, Elicit, and Scite (for academic use) to name a few, although these tools are also not free from criticism.
Creating rules that cannot be enforced sends the wrong signal
Prescribing that authors cite chatbots as sources does not mean they will do so, especially since the chances of identifying this are slim. Regular plagiarism checking tools do not detect synthetic text, and AI detection tools are deeply unreliable. We can’t assume with an adequate degree of certainty whether a piece of text is genuine or synthetic, based only on our opinions about the text. Hence, if citing chatbots as information sources became the accepted standard, we would have a situation in which social norms dictate a practice that can be ignored with little consequence. In our view, the fact that the APA (and we remark, only the APA) has proposed a framework for citing chatbots, should not mean that those who follow the APA style are expected or obligated to do so. APA is very clear about their proposal being based upon the citation structure for software rather than personal communication, and about the need to document the use of generative AI in other ways as well in a piece of work.
…if citing chatbots as information sources became the accepted standard, we would have a situation in which social norms dictate a practice that can be ignored with little consequence.
Another risk in prescribing that chatbots be cited is that this practice legitimizes the kind of low competence use in which it becomes acceptable to take text verbatim from LLMs, so long as a citation is included. This would be unfortunate, since there are in fact more productive uses of these tools that support learning and research rather than undermine them. Academia would benefit more from investing in AI literacy and education.
There will be instances in which ideas will go without attribution and some authors do and will use chatbots inappropriately. When this happens, it will be nearly impossible to identify and reprimand. Nonetheless, this is not a problem that can be solved by citing chatbots. Moreover, that less-than-ideal scholarly practices happen is not something that first came about with the introduction of chatbots. We would be hard pressed to find anyone who would say it is fine, for example, to cite a paper you have not read. This is an unfortunate and problematic practice, and yet, it is neither a concrete prohibition nor possible to control. The way our current norms work in this regard entail a kind of peer-pressure that discourages problematic behavior. In other words, if you cite without having read, chances are you might cite it wrongly, which could lead to negative exposure and you research being retracted.
We need a similar approach for chatbots: sociocultural norms that prescribe appropriate and inappropriate uses of tools, even though enforcing these norms as concrete rules is impractical.
We need a similar approach for chatbots: sociocultural norms that prescribe appropriate and inappropriate uses of tools, even though enforcing these norms as concrete rules is impractical. Social cultural norms are fragile when they first emerge, but those that endure through time eventually get deeply embedded into institutional culture. This approach falls short if the ambition is to ensure no one ever acts in this manner; but it does work to establish a clear set of expectations concerning adequate and inadequate practices and behaviors.
Final thoughts
The challenges generative AI pose to academic integrity, arise not from academic values becoming obsolete in face of increasingly automated intellectual work, but from a lack of established norms for dealing with these developments. While higher education institutions have a crucial role to play in teaching academic writing and academic integrity, academic publishers and journals are the ones with the most power and influence to pave the way for the normalization of practices concerning the disclosure of generative AI use in academia. That is because a significant share of the incentive structure governing research, including scholars’ chances of obtaining funding, their opportunities for career development, and their reputation are intricately intertwined with where they publish. As a result, more explicit guidelines from publishers and journals advising against citing chatbots as sources would be important for institutionalizing acceptable and unacceptable practices.
…more explicit guidelines from publishers and journals advising against citing chatbots as sources would be important for institutionalizing acceptable and unacceptable practices.
An alternative to be explored further is to develop a standard for marking sentences that are taken verbatim from chatbots, but in a manner that does not conflate them with citations of information sources. If this approach of attribution without sourcing became the standard, it might lead to other consequences for how we conceptualize chatbots and AI, but it would preserve the integrity of information ecosystems. Such a system would need to be shared across publishers for it to make sense, and it would require not only agreement that this is desirable, but some degree of coordination among them.
Universities and other higher education institutions have been grappling with how to provide guidance to staff and students about Generative AI, as well as how these tools affect existing regulations relevant to cheating and plagiarism. When it comes to student assignments, it is essential to avoid establishing different standards for them than for scholars publishing in academic fora. Students who have not yet developed the craft of academic research and writing may not have the academic maturity to use AI tools without pre-established guardrails in their assignments and exams. The practices students learn today will have an effect on scientific production in the near future, and the challenge needs to be addressed with clear expectations, the promotion of AI literacy, information literacy and institutional support.
When it comes to student assignments, it is essential to avoid establishing different standards for them than for scholars publishing in academic fora.
In conclusion, while it can be tempting either to leave the matter to individual scholars, or to regulate the use of chatbots in academic writing through the same apparatus we use for actual information sources, different considerations are needed when it comes to generative AI. Not only would citing chatbots as information sources offer little in terms of promoting smart use of generative AI, it could also be damaging. We hope to see more scholarly institutions, from publishers to universities, taking a clearer position in using chatbots not sources of information that we cite, but tools that we disclose.
Acknowledgements
We would like to thank Ann Michael, Avi Staiman and Tadeu Fernando Nogueira for their comments on earlier versions of this two-part series and Sindre Andre Pedersen for interesting discussions on this theme. We are also grateful to Inger Hesjevoll Schmidt-Melbye and Alexander Lyngsnes for their assistance with etymological interpretation (part I), and Katrine Aronsen for her advice on searching for articles that employ chatbots as information sources.
Discussion
5 Thoughts on "Guest Post — The Case For Not Citing Chatbots As Information Sources (Part II)"
Let me take your suggestions a step further. Instead of “more explicit guidelines from publishers and journals advising against citing chatbots as sources” we need “more explicit guidelines from publishers and journals advising against USING chatbots as sources” in scholarly writing. Why not use and cite peer-reviewed scholarly sources? Why not teach our students about what distinguishes scholarly writing from informal writing? Why not teach our students about research integrity?
As a reader I could care less what a chatbot has to say, based on garbage scraped from the Internet and mixed in with original and copyright-protected works taken without permission or compensation. I am reading a scholarly article because I want to know what the scholar has to say, based on their reading, research, reflection, and thoughtful analysis.
Thanks for putting it in clear words, Janet. I agree with you that chatbots are not for information sourcing. It is not a matter citation alone. As a reader of any material, but especially academic literature, the integrity of sources and intentions of the writer are of crucial importance, and machines have neither.
Thanks for your comment, Janet! I absolutely agree with you in the importance of teaching students how to use good and reliable sources. I have lectures both for MA- and PhD-students in “good citation practice” where I address this and the interest for this topic has skyrocketed in the last year after two Norwegian government ministers had to resign after it was revealed that they had a lot of plagiarism in their master thesis.
Thoughtful pair of essays, well done. While I agree that chatbots/AI LLMs should not be cited, the use of these tools is quite the slippery slope. I have before me an article to review which disclosed the use of AI to polish their writing and to find relevant literature. My initial reaction was reject- who knows, did the AI just polish the language the human authors wrote or was it used for much more? It’s a review article, which is the type of articles best written by bots. And did the authors actually read the suggested literature or just the AI produced synopses? And how thorough or even handed were these AI research tools?
But as I started to write my snarly review I thought well, how thorough and even handed are humans at combing the literature and deciding which relevant papers to cite or not? And what’s the ethical difference between having advanced AI models edit one’s prose versus the rudimentary grammar and spell checkers found in all word processors? It’s just that the former can do so much more than editing. And if the authors hadn’t declared their use of AI assistance, who would have even known? A slippery slope. Are we heading for a corpus of literature written by AI for AI?
Hi Chris. Great point. I can see the skepticism in reading the disclosure about the use of AI. The challenge is that we really cannot know the extent to which it has been used. Lack of disclosure does not mean lack of use, so we should not penalize it. At the same time, there is a lot of reliance on trust, which is a vulnerability when it comes to creating knowledge. Like you, I also fear a dystopian scenario in which AI reads, writes and assesses other AI. in order to avoid that, the whole system of research incentives and policies will need to change.