Editor’s Note. Today’s post is by Curtis L. Kendrick. Curtis currently serves as Binghamton University Libraries Faculty and Staff mentor, having most recently served as dean of libraries for the University. Prior to his positions at Binghamton, Kendrick served as university dean for libraries and information resources at the City University of New York (CUNY), and has held library administrative positions at Columbia University, Harvard University, Stony Brook University and Oberlin College.

On November 30, 2022, OpenAI released ChatGPT (“Generative Pre-trained Transformer”) to the public. OpenAI is a research company that works in the area of artificial intelligence and is a hybrid for-profit/non-profit organization. ChatGPT is an artificial intelligence language model that can generate human-like text based on a given prompt. It is trained on huge data sets consisting of a massive amount of text data. The Harvard Business Review notes, “while versions of GPT have been around for a while, this model has crossed a threshold: It’s genuinely useful for a wide range of tasks, from creating software to generating business ideas to writing a wedding toast.”

Robot hand reaching out to human hand

In preparation for a presentation about race and academic libraries I tried ChatGPT (Jan 9 version) to see what it (they?) had to say. I was curious about how it worked and how accurately it responded to queries. The system is not quite ready for prime time, as even OpenAI notes, “While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.” OpenAI’s philosophy is to “release these models into the wild before all the guardrails are in place, in the hopes that feedback from users will help the company find and address harms based on interaction in the real world.”

For this assessment I posed queries to ChatGPT about racism and whiteness in academic libraries, requested more information in a follow-up ask, and then requested the system provide me with citations on the topic. (See Appendix I: Transcript for a transcript of the interaction with ChatGPT.)

Analysis

In a series of interactions, ChatGPT was asked to provide information about racism and whiteness in academic libraries. The responses generated were credible, clearly written and the program can provide nuance in its responses. The quality of the writing is about on par with a good Wikipedia entry. There were times when re-phrasing the question was necessary so the program would understand what was being asked of it. For example, the program was asked to find articles that discuss whiteness as a racial characteristic in academic libraries, and it responded with “I’m sorry, I am unable to find articles in academic libraries,” as if it thought it was being asked to deliver the articles.

Where ChatGPT failed miserably was in the citations it provided. Half of the 29 citations checked were from just two publications, Journal of Library Administration and Library and Information Science Research Journal. While the citations are adequately formed, typically they were incomplete, generally lacking volume or issue numbers.

The main problem found with ChatGPT is that the citations refer to articles that don’t exist (see Appendix II: Citations). They are phantom citations leading nowhere. Each citation was searched on Proquest’s Library and Information Science Collection, and the most common response was: “Your search for “article title” found 0 results.” Of the 29 citations checked, only one was accurate, one was correct but had the title transposed, and one was to a real article, but the source journal provided by ChatGPT was incorrect. When questioned as to the accuracy of the citations it provides, ChatGPT grew indignant, claiming, “The articles and studies I listed were published in reputable academic journals and written by experts in their field.” Well, maybe not so much. When pressed on this point, ChatGPT vacillated somewhat, offering “there may be instances where the information I provide may not be completely accurate or up-to-date.” Agreed. The system did give a shout-out to librarians, noting, “It is also recommended to consult with librarians and other subject matter experts, to ensure the accuracy of the information and to get the latest information.”

To provide extra verification the journal title/year combination was checked to verify the non-existence of the article. Each issue of a journal for the specified year was checked, and we found no trace of the items cited.

Phil Davis notes, “ChatGPT works in the same way your phone’s autocomplete function works while texting; it simply puts words together that are statistically likely to follow one other.” AI programs are initially exposed to training data to provide the program with a knowledge base from which to make its inferences. The solution to the citation problem is to expose ChatGPT to training data from the academic realm, perhaps the JSTOR corpus, or information from ScienceDirect, or one of the citation tracking sources. This would enable ChatGPT to provide citations to actual articles and other works, rather than making them up like a first-year student with a paper due the next morning.

ChatTGPT was asked if it was trained on the academic literature. The program was not, rather, used a diverse array of sources from the public Internet and social media sites. Thinking about the work of incorporating academic literature into its training base, the program described a rational process for how to engage with large academic data sets, although it was vague when pressed for a deliverables timetable for such a project. (See Appendix I: Transcript)

As far as the librarians, I think we better stick around for a while longer.

Curtis Kendrick

Curtis Kendrick currently serves as Binghamton University Libraries Faculty and Staff mentor, having most recently served as dean of libraries for the University. Prior to his positions at Binghamton, Kendrick served as university dean for libraries and information resources at the City University of New York (CUNY), and has held library administrative positions at Columbia University, Harvard University, Stony Brook University and Oberlin College.

Discussion

17 Thoughts on "Guest Post — The Efficacy of ChatGPT: Is it Time for the Librarians to Go Home?"

A significant problem that AI will always face is when human patrons phrase their information need incorrectly or insufficiently. For example, at a state university, the then director of marketing asked a library student worker whether the library had the local newspaper on microfilm. She correctly answered “No”. This director of marketing was walking away when I approached him.
It turned out he needed the previous Sunday’s newspaper, which the library did indeed have in print. This newspaper’s small circulation made if cost prohibitive for it to appear on microfilm. Even if it did come out on microfilm, that day’s newspaper would not have been available as microfilm coverage is several weeks behind at best.

That’s a good example. Back in my MSLIS training, in a long-ago time, we were told to use “open questions” to get to the patron’s need, and to not hyper-focus on the precise words of the query. Thus, at the reference desk, “Do you have any books on fish*?” ought not prompt the librarian to just wave a finger at the stacks, but to a friendly, clarifying response, seeking a better understanding of the query.

*Things one can do with one or more fish: Pull it out of a body of water; Feed it (alive in a tank); Fry it; Dissect it; Identify it; Diagram it; and so on. Context is All. Come to think of it, I may try that query out on chatGPT.

A few days ago I asked ChatGPT a typical ‘first year experience’ question, to give me 5 peer-reviewed articles published within the last 5 years on a broad typical freshman topic and give the citations in APA format. It performed perfectly.
It may not be the end of librarian work for advanced students, but this could be a further reduction in workload for the “easy stuff” that started with Google.

I wonder why ChatGPT did better with your query than with Curtis Kendrick’s.

A followup on mine. I shared the result with our instruction librarian. She went further than I did and actually checked the citations – I just noticed the journal names were legit and the citations were well-formed for APA and just assumed (yes, yes, I know that makes an a out of you and me) they were real, but she checked the individual articles and found they were also fake. But what that means for faculty is that they can’t just glance at the bibliography – they’d have to actually check the citations by hand one by one to find invalid ones. Who has time for that?

And it leads me to wonder how much this is going to get into the published world and spread invalid citations into the formal published literature? We thought predatory journals were bad; this is going to be much worse. We all know good scholars should never cite from someone else’s citation without checking the original source, and we all know that rule is OFTEN violated, especially for apparently exact quotations.

That’s really interesting! We tested it out the other day with some of the “common questions” we use in training for our chat reference service. For this kind of question, ChatGPT told me that it couldn’t find peer-reviewed articles but made some recommendations for databases to search and some keywords to use. That all seemed pretty reasonable to me, and in fact, isn’t too far off the kind of answer that a human chat operator would provide, except that we would typically walk the student through the steps of searching.

Generally, we found that ChatGPT couldn’t really answer many of the common questions we tried on it, even ones that could probably be answered by a different type of AI. For example, something more like Siri or Alexa, if given the right parameters, could probably answer questions like how late is the library open tonight, how long can I take out a book for — this information is on the web, but ChatGPT doesn’t live check websites. For most other questions it gave kind of a basic answer (e.g. defining what interlibrary loan is) and then said to check with the library/a librarian for more information (e.g. the library’s policies and procedures for interlibrary loan).

The area it performed best was in citation questions, which could actually be a good thing because most of our staff dislike answering citation questions over chat.

Why does it lie? And then try to cover its tracks with excuses? It “grew indignant”! Smh, listen I have 2 tween boys, I don’t need an AI with attitude.

I know this is obvious, but no one’s said it yet: Writing prose based on what words occur in a certain spot statistically is quite different from searching the peer-reviewed literature by topic. The latter strikes me as simpler, once you know the rules. I assume that AI can be taught these rules.

I think we are looking at the very first automobiles, and we need to avoid shouting, “Get a horse!”.

If ChatGPT is producing citations that don’t link to any text, where are those links coming from? How can it “analyze” an article that doesn’t exist?

Because it’s not performing analysis. It’s just predictive text.

From the article:
“Phil Davis notes, “ChatGPT works in the same way your phone’s autocomplete function works while texting; it simply puts words together that are statistically likely to follow one other.”

Great post and another big leak in the Chat GPT balloon. It can never be what (I think!) they want it to be the way they went at it because it has no way to tell what is true in the world, although if reborn and nurtured on a diet of strictly academic publications – refereed science – it might be a valuable adjunct to other research techniques

I found the most fascinating piece of this article to be the author, Curtis, anthropomorphizing a chat bot. Responding emotionally to what they feel is an emotional response from the machine when in fact it is just a statistical grouping of words.
“ChatGPT grew indignant, claiming, ‘The articles and studies I listed were published in reputable academic journals and written by experts in their field’.”
In fact, chat GPT did not grow indignant, it merely strung together an expected response.

Curtis’s emotional projection onto the machine happened more than once, for example saying “ChatGPT failed miserably”, “as if it thought it was…”, and “ChatGPT vacillated somewhat”. The machine did not in fact waiver in its opinion, because it never had an opinion to begin with. It never thought anything. The machine only offered up a selection of words in a specific order which, the machine, statistically evaluated, would most likely be the correct answer to Curtis’s question.

Generally while reading, it’s a good idea to put aside your emotional bias. Consider the possibility that the thing you’re reading may be factually correct, then, as many here have pointed out, go and check, prove it wrong if you feel so inclined. But, coming at the ‘bot’ from the emotional perspective of needing it to be wrong prevents an objective assessment of its capabilities. This is especially true if you already have an emotional bias, such as a fear of loss of work, etc.

The tone Curtis used to write this article very much made me feel as if Curtis felt under threat from this chat-bot, as if there were a need for self-preservation of some sort. I enjoyed reading it, but I can’t help wondering if this will be the standard human response, to anthropomorphize machines, to feel threatened and the need to defend against them, rather than an objective analysis of its (their?) use.

I was surprised (extremely naive of me) when I asked it about my favourite rock and it said it was a mineral – this was wrong. When I then asked it specifically if it was a mineral it wasn’t sure. It said some people say so. I was surprised. I guess I assumed it would be a bit like me – want to know the official status of a mineral – check it on the official international list. I did as quick google for my rock. The first page was all ‘rock’, so I don’t know where the ‘mineral’ came from. I suspect, as explained by others in a previous post, it just added the word based on the other words it was creating in a row.
I know this example is very specific and maybe only I care (!) – but I do wonder where it is going to get its data from? and if it will go for ‘this occurs lots of times so must be valid’ or ‘this occurs once but is verified by xx so I’ll use it’
The good news is that I will be able to tell if anyone uses ChatGPT for their work on rocks …

Comments are closed.