While the future utility of the word-guessing machine we call “Generative AI” remains a question mark, there are a few things we know about its current state: it’s not great at consistently creating quality outputs, but it is superb at producing outputs in quantity. This rapid production of vast amounts of content is beginning to overwhelm different systems. Case in point, the NIH has been forced to put a limit on the number of grant applications any individual can submit in a calendar year, largely because of the flood of AI generated slop that they’ve had to process.

Another area impacted is online recipes, where the internet seems overwhelmed with websites offering thousands of recipes with accompanying AI food photos. Meta seems to have put some effort into developing “Inverse Cooking“, that is, using AI to create a recipe based on an image of a prepared dish (and has released code to allow others to do so). One can only ponder the recursive slop that will result from AI creating recipes from images generated by AI and so on, and so on.

So are these AI recipes any good? While they seem to be getting better in terms of not being completely absurd (less glue pizza and fewer rock based diets), they’re still not great. As the video below demonstrates, AI doesn’t quite seem to understand proportions and measurements all that well. Maybe keep this in mind when considering the use of AI to propose scientific experiments.

David Crotty

David Crotty

David Crotty is a Senior Consultant at Clarke & Esposito, a boutique management consulting firm focused on strategic issues related to professional and academic publishing and information services. Previously, David was the Editorial Director, Journals Policy for Oxford University Press. He oversaw journal policy across OUP’s journals program, drove technological innovation, and served as an information officer. David acquired and managed a suite of research society-owned journals with OUP, and before that was the Executive Editor for Cold Spring Harbor Laboratory Press, where he created and edited new science books and journals, along with serving as a journal Editor-in-Chief. He has served on the Board of Directors for the STM Association, the Society for Scholarly Publishing and CHOR, Inc., as well as The AAP-PSP Executive Council. David received his PhD in Genetics from Columbia University and did developmental neuroscience research at Caltech before moving from the bench to publishing.

Discussion

9 Thoughts on "The AI Slop Overload Does Not Taste Good"

‘While the future utility of the word-guessing machine we call “Generative AI” remains a question mark, there are a few things we know about its current state: it’s not great at consistently creating quality outputs’ – I have to disagree with two different parts of this. First, it’s not merely “word guessing” – it can do complex math and complex software coding. It has been invaluable to me in writing python code, scripts for Google Sheets, SQL queries, and a lot more.
Second, your definition of “quality outputs” may be an unfairly high standard. When I use it to help me with writing (mostly professional emails), I find it far superior to what the average B.A. graduating student from most good if not top universities can write. If our bar for quality is above that, we have to question what thousands of us are spending our productive lives engaged in?

That’s an amusing example but a bit contrived. And don’t assume undergrads even know the alphabet. A few years ago (pre-Covid) I was involved in hiring student assistants for our library. Part of their job would be to reshelve books in our Library of Congress shelving organization, where call numbers start with a letter. I thought it was entirely reasonable to ask all of the candidates “what letter comes after Q in the alphabet?”. You would be shocked how many native-English-speaker students couldn’t answer that correctly. I even encouraged them to recite the alphabet in their heads until they found it, but they couldn’t do that either.

The point is that whether they can count the b’s in “blueberry” is actually irrelevant to whether any given LLM can produce consistent (which does not equal perfectly all of the time) high enough quality responses to have real usefulness.

Wow, sorry to hear that undergraduates are no longer able to recite the alphabet. Relying on AI to (apparently incorrectly) recite it for them isn’t going to get those books shelved properly.

I take the viewpoint that if you can’t be bothered to actually write the message to me, then I can’t be bothered to read it. If your thought process can be outsourced to a prediction machine (whether that is a word, math, or code prediction machine), then it is likely of little value. I do understand the use of LLMs for repetitive tasks or sorting large datasets. But writing is thinking, and if that writing requires no thought, then perhaps it shouldn’t exist (and I certainly don’t want to waste my time reading it). And I certainly don’t want to outsource creative endeavors like cooking.

Yesterday’s post by Sarah Kendzior offers a painful reminder of what we lose when we stop thinking for ourselves:
https://sarahkendzior.substack.com/p/soul-stripping

I’m not sure why you think that counting the number of letters in a word is contrived. Mark Liberman has posted often on this over at Language Log (the Penn Linguistics Department blog), as an indicator of the limitations of current LLMs. If they can do “complex math” (as you wrote in your original comment), surely they ought to be able to accurately count the number of times a letter appears in a common English word, especially since that’s a trivial problem to solve with ordinary computing. The problem is that they aren’t actually counting; they are stochastically producing text that makes it seem as if they are counting (and doing so badly). And if you can’t rely on LLMs to do a simple task that can be easily checked, why rely on them to do something complex that you can’t easily check?

I recommend Timothy Burke’s recent series of posts on what LLMs can and can’t do for historians for a good, thoughtful, nuanced examination of the issue. (They’re available at https://timothyburke.substack.com)

‘why rely on them to do something complex that you can’t easily check?’ The thing is, with my fairly complex python code, I am relying on it to do something complex that I CAN easily check, by running the code and seeing if I get errors or if my output data is wrong. Ditto for shell scripts and other coding needs. I don’t claim to understand why it can’t count letters in a word.

But I can tell you that when I asked for a linux bash command to recursively find all files in my home directory with the filename “harvard_data.db” and sort them by size then list them with their full paths and size shown in human-readable format, it gave me this, which works absolutely perfectly by the way, and it was easy to check that:
find . -type f -iname “harvard_data.db” -printf “%s %p\0” | sort -zrn | xargs -0 ls -lh

Could I have come up with that on my own? Eventually yes by looking at the man pages for find, printf, and sort, and hopefully running into the xargs command that I didn’t already know. But Gemini took seconds and saved me over 10GB of hard drive space when I needed it.

Just because some people don’t have a use case in their workflows for what the LLMs do well “consistently” to create “quality outputs” doesn’t mean that no one does. Maybe fewer claims to universality and more “in my experience” would be appropriate.

I might argue that the pushback you’re seeing here is not because AI isn’t useful — it is tremendously useful in some capacities (I see language translation as a potential huge breakthrough in our areas to help the spread of access to scholarly materials for example). The skepticism is because it is being sold to us as a do-everything tool, with “PhD-level expertise” (https://www.pcmag.com/news/with-gpt-5-openai-promises-access-to-phd-level-ai-expertise). The constant failure to perform as advertised in many high profile areas often eclipses the examples of where it is indeed helpful.

And for what it’s worth, there are tons of examples of the variable and often poor quality of AI outputs for coding — so many that I think it is perfectly reasonable to question the technology’s ability to consistently create “quality outputs”:
https://devclass.com/2025/02/20/ai-is-eroding-code-quality-states-new-in-depth-report/
https://arstechnica.com/ai/2025/07/study-finds-ai-tools-made-open-source-software-developers-19-percent-slower/
https://medium.com/@conneyk8/maintaining-code-quality-in-the-age-of-generative-ai-7-essential-strategies-b526532432e4

LLMs are to translation what McDonald’s is to fine dining. These systems are great at a couple of things: working in tandem with GitHub for rudimentary coding (i.e. “vibecoding”) and completing trivial administrative tasks. I’d argue that if an administrative task is rudimentary enough for an LLM to complete it and unimportant enough that it doesn’t matter if the LLM gets it terribly wrong, it probably comes within the remit of David Graeber’s idea of the “bulls— job.” It is not worth boiling megalitres of potable water, poluting the digital archive, and stealing work from writers and artists to do our “bulls— jobs.” In fact, I’d argue that it’s downright unethical.

Comments are closed.