Editor’s Note: Today’s post is by Marjorie Hlava. Marjorie is Chief Scientist, Chairman, and founder of Access Innovations, Inc. 

I tried three different large language models (LLMs) to rewrite a potential article. I started with seven paragraphs and decided to see if one of the GenAI systems could help me with punctuation and sentence structure. I tried three different LLMs to rewrite the potential article: Claude, ChatGPT, and Google Gemini. The results are WILDLY different. The amount of time needed for each attempt varied. During the interactions I further asked each for reference citations and metrics to add to the article.

Robot hand using laptop and human hand writing on paper with a pen

Claude

I asked Claude to rewrite and extend the article. It did. It substantially expanded the article, in outline form, organizing it into clear sections and adding depth to key concepts, providing a “more structured analysis of both traditional publishing and AI’s potential impact” it said. (In fact, Claude added things I do not believe and never would have added myself.) Claude added fairly “detailed sections on current challenges and future implications, it said, maintaining the original insights while providing more context and explanation”. (NOPE, it changed the context entirely.) It included a discussion of both benefits and challenges which it told me were balanced but I found biased and then changed my conclusion summary to a “forward-looking perspective on hybrid models”. (Which I could not use.) It also suggested other kinds of adjustments it could do – I passed.

I then asked Claude about XML specifically, in a section on “Technical and Access Challenges”. Claude responded with a nicely expanded section on the future of XML in publishing, including pros and cons, and a suggested hybrid approach which generally followed my original text.

I asked about the role taxonomies could play in the process. Claude provided a lengthy discourse on how to use and build taxonomies, which read amazingly like some of the text from the three books I have written on the subject (I have not yet cross checked for plagiarism.) I asked for case studies on reduced XML, and it came back with four well referenced examples. 

I asked for some metrics to be provided. Claude offered interesting numbers, but citations would be necessary to use those, so I asked for references. It came back with a verifiable list of references, using authors I was aware of. But the references were not on the topics I had discussed in the article, only on the article it gave back to me as an outline.

I had the longest interaction with Claude (10 sequential interactions).

Conclusion: some useful stuff but not something I can publish.

Next, I tried ChatGPT 

I input same the original article asking for grammatical clean up and for references and metrics to be added.

ChatGPT rewrote the article in outline form. It did not change the thesis. Again, I asked for some metrics to be provided. ChatGPT offered interesting numbers, but citations would be necessary to use those, so I asked for references. It first embedded the references in the article (example: Jones, 2023) and when I asked for full references, it came back with a nicely formed list. They did not correspond to the references indicated in the text from the earlier interaction. Between the embedded references and the list of references, there were 22 citations. I did not recognize any of the authors although I believe myself to be up to date on the literature and the people who write in this area. DOIs were provided so I checked those first. None of them resolved! All came back listed as errors. Hummm. Maybe there were em dashes or some other formatting problem? Nope. I tried a variety to ways to get them to resolve. None worked. I hopped over to Google Scholar to find the authors there. There was nothing on the indicated authors’ names in the topical area. I tried the titles as a “known title search” and titles could not be found. Next, I tried the Journal names listed. Those could not be found either! The entire reference list was totally fabricated! But well formatted! In addition, the Jones in the citation below is not the same year as the Jones in the embedded citation.I coud not find a reference to the Digital Communication Review. I do not think it exists – good name though.

Just for interest, I asked ChatGPT to give a reference to an article by Hlava (written by me). It came back with something I did not write in a journal that does exist, and of course the DOI does not check out.

  • Hlava, M. (2014). “The Importance of Metadata in Academic Publishing.” Journal of Scholarly Publishing, 45(4), 397-409. DOI: 10.3138/jsp.45.4.397

ChatGPT did the least reframing, reformatting, and adding of extraneous material, but the citations were a joke

Finally, I went to Google Gemini.

I asked the same general question – rewrite the article and add metrics and citations. It came back by putting my article in outline form but totally ignoring XML and taxonomies, which were my topics of interest. It was completely useless.

I asked again with different introductory paragraph and then a differently formatted prompt. It did usefully recast the article response.

The first time Google Gemini came back with general references but nothing specific:

UNESCO. (2021). Recommendation on Open Science. (International framework for open science)

COPE: Committee on Publication Ethics. (Various resources on research integrity and publication ethics). (Provides information on publication ethics and best practices)

The second response took out the earlier references and added disclaimers and this paragraph instead. [Insert relevant citations from scholarly articles, reports, and other credible sources on the topics of AI, scholarly publishing, peer review, and research integrity.] What a cop out!

I’m going back to the old-fashioned way of writing – with my human brain and fingers on the keyboard – but with a few new ideas to consider from the GenAI LLMs.

Marjorie Hlava

Marjorie M.K. Hlava is Chief Scientist, Chairman, and founder of Access Innovations, Inc. She served on the NISO board for five years and the NISO Content Board. She is past president of several organizations: NFAIS (2002-2003), American Society for Information Science and Technology - 1993 (ASIST), Documentation Abstracts 1990-1991, Hubbell Society Museum and Library (1994-1998).

Discussion

20 Thoughts on "Guest Post:  Trying to Write a Paper with LLM Assistance"

Two very brief thoughts:

First, if we’re going to assess what specific GenAI tools can and cannot do, we really have to know the specific prompts that were used.

Second, none of the tools used here are designed or marketed specifically for research or scholarly writing. Perplexity.AI is, and I would be interested to see how it performed on a similar exercise.

Thank you for mentioning this. None of the tools listed are appropriate for writing a paper and I’m disappointed that this example exists on a reputable site like Scholarly Kitchen. I fear it’s going to perpetuate a myth that these resources can *actually* find references and provide data when it’s just not possible.
There are tools specifically designed for finding citations, so perhaps those tools should be used as an example.

I believe it is possible to find useful citations. Gemini did find and show valid citations. The other two did not.

When you’re trying to use popular tools, not those that cost extra money, which is what the common man is going to do we need to consider what kind of feedback they give

One clarification, I did NOT ask it to write a paper. I had it drafted. I asked for references and statistics to add to the paper. It was a simple request. “Please add statistics and references for this paper” ….. and then I pasted in the seven paragraphs I had written. I will post the full set of notes tomorrow so you can see what the prompts are although they are inferred already.

Tools like Perplexity AI cost money. I was trying to do this with free open sources not paying $20 a month for something that is what I think students and research researchers who are not well funded would use.

I’ve tried it with Perplexity and gotten similar results. Of course, that makes sense since Perplexity uses the other models for its output.

LLMs are continuously evolving, incorporating new features to improve results and enhance user experience. However, they may not always align with the specific expectations of researchers in producing the desired outcomes. Ultimately, it is the RESEARCHER to decide on ‘what is to be’ and ‘what is not to be’ considered, being an intellectual by nature. Experimenting with selected LLMs to rewrite scholarly papers in this context is not suitable and appropriate.

Every scholarly writing is customized for a specialized audience ie., use of a specific language, length, tables and graphical images for effective communication. Many full-text scientific papers are restricted behind paywalls, while LLMs are primarily trained on abstracting and indexing databases and Open Access sources. However, they provide excellent suggestions for improving the text in a convincing way to enhance the final report.

The application of AI in scholarly writing still has a long way to go in meeting established academic standards, with accuracy remaining the top priority.

As a librarian at an academic institution primarily serving undergraduates, for me the point is that our students ARE using these tools to do homework and to generate papers. They may not be appropriate for these tasks but they are being used with less preparation or knowledge of the likely outcomes. I found the results interesting and of value to my colleagues in our Center for Research & Writing (peer support for students) and for our faculty development and instructional design coordinator.

Thank you very much! That was my intent to point out that you cannot use these tools or trust them for writing. They are great for indicative conversational search however they are not particularly current either.

I agree with Celia. Students, along with early and late career researchers, are being told these shiny new tools are the next great thing so get on board or get left behind! Very few of them actually understand how the systems work, let alone when one is appropriate for a task. They are not interested in instruction – they are too busy memorizing for exams and such, to spend the time to understand and master the use of AI. “I type something in and I get something back – what do you mean I need to analyze it for accuracy?”

I only want to use free tools. Perplexity. AI for example, cost $20 minimum per month. I think the average student and a poor researcher will not be paying that they will go to the popular choice first which is what I did.

The author misunderstands how LLMs can be useful in writing articles. Using LLMs without using RAG is just relying on the patterns learned in the training of the model. If you know the relevant literature, using RAG, it fetches the relevant info from the papers you give it, adds it to the prompt and generates a response based on the relevant retrieved information. This addresses issues with hallucinations, cut off dates and citations. Tools like ChatGPTs Deep Research, and The-Literature add search to the beginning of this process.

My students know that if want an essay out of an LLM, upload the PDFs of the papers in the reading list, the assessment guidance and the marking criteria. If they are using notebookLM, they could listen to a chatty podcast, or chat to the podcast in interactive mode to give them ideas for prompts.

If you can’t train your students to do better than LLMs plus RAG, then their graduate jobs aren’t going to be around for long. Pretending LLMs fed with the relevant literature and trained in a academic writing style (like notebookLM is trained to generate output in a podcast style) is to bury your head in the sand. Knowledge work is going to change and we have to prepare ourselves and our students for what’s coming. If Google trained a LLM on all the lectures that were captured over the pandemic, they could have create an automatic lecture generator, that fed with relevant papers could generate a lecture. It’s so obvious someone will. Like Uber drivers are really training the driving model that will replace them, all our open papers and lectures, not to mention all our feedback on Turnitin, are training the models that will replace us if we can’t to do more than just regurgitate knowledge. This article doesn’t illuminate the work we’ve got ahead of us.

I never use AI to write text. For me, Chat GPT has become a substitute for a Google search and sometimes for using the databases provided by my academic library. I can give more detailed instructions about what I’m looking for and then use the preliminary results to define a second search if needed. If I’m seeking information in an area where I’m not an expert, these searches sometimes provide specialized terms that are worth a further look. I’m also fully aware that AI makes up phony citations. I check all results to verify if they exist and also often find useful information for my projects in unexpected places. The final advantage of Chat GPT is that it’s not discipline-specific but draws upon resources from a multiplicity of sources so that I don’t risk choosing the wrong database from the long list that my academic library provides.

The paper I was working on (I am still working on) is a fast evolving field and every time I think I have found everything something different in the technology pops up. I find the academic literature is not necessarily keeping up with the latest technical innovations so other search methodologies also need to be employed. I hoped that this might work. I knew that the general journal literature would lag behind news, the databases covering them will even be even further behind. Conference proceedings were likely to have a lot of more current information as do patents.l was hoping that the Gen/AI systems/LLM would give me something new and interesting. so when a bunch of new citations popped up to literature i had never heard of with authors I did not know. I was pretty excited until I went to try to read them and they were bogus! It did not work, and I further discovered how misleading they could be. That was when I was trying to convey apparently I didn’t do it very well.

I was disappointed with this article. Using just the off-the-shelf tools, regardless of whether these are all that the “average student and a poor researcher” might access, is not helpful. They were not designed for the task — they’re general purpose, “Jack of all trades, master of none”.

The Scholarly Kitchen audience would be better served with case studies of what can be achieved with the latest state-of-the-art tools, like OpenAI Deep Research. It’s ridiculously expensive today, but won’t be tomorrow.

Should you add references recommended by software, if they were “good quality” in terms of relevance?

I’d argue you should read the paper and then decided if it’s suitable to be cited, not just paste in references because they are relevant without knowing what’s written in there – how does that help your readers?

Valid point. I would have read the referenced papers if the citations had been valid because I wanted to flush out the paper more, but of course the citations were bogus so I found my own additional literature to read in the traditional fashion. I had asked for papers so that I could find things recently written on the fast evolving topic I was writing about which was the future of XML and whether CP/LD would overtake its usefulness.

Leave a Comment