Each year during Peer Review Week, we invite the Chefs to reflect on the most important questions facing our community. This year’s theme, Rethinking Peer Review in the AI Era, asks us to move beyond speculation and to consider how artificial intelligence is already reshaping peer review.

The question we posed to the Chefs was simple but ambitious:  What’s a bold experiment with AI in peer review you’d like to see tested?

Their responses range from bold thought experiments to critiques of the current system, and from challenges about equity to ideas for collaborative human–AI models. Together, they reveal just how wide the spectrum of possibilities has become.

Two sound bubbles, one reading "AI", the other blank, on top of computer circuitry

 

Tim Vines

“I would love to ask an LLM to set up, from scratch, its own peer review process for a journal.”

This one is going to generate some friction, but I saw the word ‘bold’ in the question and ran with it. I would love to ask an LLM to set up, from scratch, its own peer review process for a journal, manuscript management system, and all. The user would feed in a submitted manuscript at one end, and after some time, receive back an editorial decision from the LLM. The LLM would be able to select reviewers and send them review request emails, but could also call on other services, AI or otherwise, to provide assessments. I do not think this would work well with current technology, but maybe in 12 to 18 months, the processes and infrastructure surrounding agentic LLM systems would have matured enough for this to be a realistic project.

What sort of review system would it choose? Double blind? Single blind? How many reviewers would it contact? Would it be consistent in its peer review approach across articles? What other kinds of assessment would it request? How often would it decide to desk reject unsuitable articles? I have absolutely no idea, and that is why I would love to run the experiment.

Haseeb Md. Irfanullah

“Peer review needs to redefine its relevance to survive the AI era.”

I see two major discussion threads in the AI in peer review discourse: 1) how AI can effectively support the peer review process, and 2) how to avoid misuse of AI in the peer review process. These are guided by the notion that peer review itself is fundamentally fine. AI is therefore bringing in additional capabilities to improve the prevailing peer review system.

However, we don’t talk about the inherent weaknesses of the current peer review system often enough. In a Scholarly Kitchen piece in April, I argued that peer review has “lost its human face”. That’s why we see 1) reviewer fatigue is worsening; 2) peer review is now a mere compliance issue, devoid of collaborative culture; 3) researchers are being torn between their author-life and peer-reviewer-life; 4) peer review undermines authors’ personal research journey; and 5) peer reviewers are not affected by any dark side of publishing (e.g., article retraction).

Of course, there are many well-recognized best practices for things like finding peer reviewers, reducing peer-review time, improving editor-author relationship in peer review, recognizing peer reviewers, compensating peer reviewers, and updating peer-review policies in a dynamic publishing ecosystem. But such commendable practices are not without flaws and limitations, as my recent piece in the Editor’s Cafe explained.

Below, I am putting forward three hypotheses, which would redefine peer review as we know it.

Hypothesis 1: Journal articles widely validate non-peer-reviewed research (as grey literature/ preprints) by frequently citing them. Peer review is also widely undermined as a scholarly contribution in researchers’ professional assessments. Therefore, peer review’s importance in journal publishing is overrated.

Evidence needed: Can we run a (proper, not AI-run) model to show how the world would look like in absence of ‘peer review’?

Hypothesis 2: Two years ago, I proposed a ‘five-stage transition’ from a 100% human-dependent peer review to a 100% GenAI-depended review system. By September 2026, we will see some reputed journals moved to a 100% GenAI-based review system.

Evidence needed: Given the current pace of development, can anyone challenge this prediction?

Hypothesis 3: The current peer-review practice is exploitative, preying on reviewers’ ‘free labor’, thus maintaining ‘perpetual injustice’ in the scholarly ecosystem. Only a 100% GenAI-dependent review system will make the publishing ecosystem more just.

Evidence needed: Can anyone prove this assumption wrong?

I am inviting the readers of this post to present evidence in support of or against these hypotheses in the comments section.

Hong Zhou

Peer review is under strain from rising submissions, lengthy decision time, reviewer fatigue and shortage, increasing misconduct and unclear AI use. At the same time, AI is advancing rapidly, transforming both research and publishing workflows. This moment presents an opportunity to ask a bold question: how much of peer review could be automated, responsibly and with safeguards, and what must remain uniquely human to protect rigor and trust?

The experiment I would like to see is a structured comparison of three different peer review models, ranging from today’s conventional process to a future agent-driven system with human oversight. The aim is not to replace humans but to reimagine workflows where humans and AI collaborate by combining the strengths of both to enable speed with quality in service of research integrity and scholarly trust.

  1. Current Human-Led Workflow (Baseline)

This model simulates today’s standard journal workflow: manual triage, human reviewer recruitment, and fully human reviews. It provides the benchmark to measure:

  • Accuracy and alignment: How often do AI-assisted recommendations agree with human-only decisions?
  • Efficiency: Time-to-decision and volume of manuscripts processed.
  • Workload: Hours required from editors and reviewers.

This baseline is essential to demonstrate the true added value of AI in subsequent models.

  1. Near Future Hybrid Workflow (Human + AI Collaboration)

In this model, AI is integrated as an assistant, not a replacement. AI tools handle early integrity checks, plagiarism, scope, image manipulation, reference quality, and help editors identify suitable reviewers from large pools. During review, AI supports experts by summarizing manuscripts, surfacing relevant literature, flagging potential issues, and even helping to polish reviewer reports.

Humans, however, remain responsible for judgment: assessing novelty, methodological rigor, ethical concerns, and impact. They also continue to manage the “human-facing” aspects of workflow, such as operation planning/tracking, reviewer invitations, and author communications.

This model is already emerging across the industry, but rigorous testing is needed to measure how much efficiency and quality improve, and what new risks may be introduced when AI is embedded into workflows.

  1. Future Agent-Driven Workflow (Autonomous AI Agents + Human Oversight).

This is where the boldness lies. In this theoretical future model, each reviewer is paired with a Reviewer Agent — an AI fine-tuned on their historic reports, publications, preferences, and style. An Editor Agent plans the work, manages communication, orchestrates integrity checks, coordinates reviewer agents, collates assessments, and even drafts initial decisions. Editor agents could also coordinate with Integrity agents, which automatically decide which detection tools to use, in which order, and plan further investigation if needed.

Humans step into higher-level roles as gatekeepers and mentors: validating outputs, correcting errors, and making the final call. The drivers for this model are twofold:

  • Scaling: Could agent-based systems help journals process dramatically more quality submissions without exhausting the human community?
  • Role clarity: What are the roles of AI and humans in future peer review, and what are their boundaries? Can we redefine humans not as manual processors of manuscripts but as judges, consultants, and ethical stewards, focusing their limited time on tasks that demand human insight and accountability?

The future of peer review will not be about choosing between humans and machines. It will be about developing robust peer review workflows that feature AI-human collaboration: AI handling scale and routine, humans ensuring rigor, ethics, and accountability. A bold experiment like this could provide the evidence we need to move beyond debate and into practice.

Alice Meadows

I have to confess to being a bit of an AI skeptic/Luddite. I mostly resist using it myself – professionally or personally – and I find it hard to share other people’s enthusiasm for it. My reasons for this range from the fact that I actually enjoy doing a lot of the things that AI could “help” me with (writing, editing, etc.), and that I’ve had some bad experiences with AI “hallucinations”, to wider existential issues including the environmental impact, job security, etc., etc. But, of course, my own personal views and experiences won’t change the fact that AI is here to stay – and is being embedded in more and more elements of the publication process.

So what I believe we need to rethink – across that whole process, not just for peer review – is how we can be clear, consistent, and transparent about the use of AI. Ideally this would be at the industry level, where there has already been some work on this, such as the development of the ChatGPT, Generative Artificial Intelligence and Natural Large Language Models for Accountable Reporting and Use (CANGARU) Guidelines, and/or the discipline level. But realistically, at least for now, the onus will likely continue to be on individual publishers and journals to define and implement their own rules. This is already happening in terms of guidelines for authors – examples include Cambridge University Press, Sage, and Wiley – but I’ve not (yet) seen much in the way of guidelines or requirements for how AI can or should be used (or not) by reviewers themselves. For example, Taylor & Francis simply include the following sentence in their AI policy page: “Generative AI may only be utilised to assist with improving review language, but peer reviewers will at all times remain responsible for ensuring the accuracy and integrity of their reviews.” That’s better than saying nothing – but not exactly clear or detailed.

A community effort to establish a set of agreed guidelines for reviewers– one that incorporates the needs of a range of disciplines, publishers, and geographies – is at the top of my personal list of wishes for AI and peer review.

Closing Thoughts

The Chefs’ perspectives show that bold experiments with AI in peer review can take many forms. Some imagine agentic systems running entire workflows, others question whether peer review as we know it is overrated, while others propose structured comparisons of human and AI-driven models. What unites these views is a willingness to test, measure, and learn, to move from debate to evidence.

As Co-Chair of Peer Review Week 2025, this spirit of experimentation is both necessary and inspiring. The future of peer review will not be decided by theory alone but by careful trials, transparent results, and a commitment to equity and accountability.

We now invite you to share your perspective in comments. What bold experiments should be tested next, and what should success look like?

Maryam Sayab

Maryam Sayab

Maryam Sayab is the Director of Communications at the Asian Council of Science Editors (ACSE) and Co-Chair of Peer Review Week. She also serves on the Editorial Committee of Katina, contributing to its Open Access Knowledge section. With a background rooted in research integrity and publication ethics, she actively works to advance regional conversations around responsible peer review, transparent editorial practices, and inclusive open science. Maryam is dedicated to building bridges between global publishing standards and the practical realities faced by researchers and editors, particularly across Asia and the Arab world. She also supports initiatives that strengthen community-driven collaboration, ethical scholarship, and the sustainable development of research ecosystems.

Tim Vines

Tim Vines

Tim Vines is the Founder and Project Lead on DataSeer, an AI-based tool that helps authors, journals and other stakeholders with sharing research data. He's also a consultant with Origin Editorial, where he advises journals and publishers on peer review. Prior to that he founded Axios Review, an independent peer review company that helped authors find journals that wanted their paper. He was the Managing Editor for the journal Molecular Ecology for eight years, where he led their adoption of data sharing and numerous other initiatives. He has also published research papers on peer review, data sharing, and reproducibility (including one that was covered by Vanity Fair). He has a PhD in evolutionary ecology from the University of Edinburgh and now lives in Vancouver, Canada.

Haseeb Irfanullah

Haseeb Irfanullah

Haseeb Irfanullah is a biologist-turned-development facilitator, who often introduces himself as a research enthusiast. Over the last 26 years, Haseeb has worked for different international development organizations, academic institutions, donors, and the Government of Bangladesh in different capacities. Currently, he is an independent consultant on environment, climate change, and research system. He is also involved with the University of Liberal Arts Bangladesh as a visiting research fellow of its Center for Sustainable Development.

Hong Zhou

Hong Zhou

Dr. Hong Zhou is VP of Product Management at KnowledgeWorks Global Ltd., where he guides product vision and strategy, leads cross-functional teams, and drives innovation across publishing solutions for researchers, librarians, and publishers worldwide. Previously, he was Senior Director of AI Product & Innovation at Wiley, defining AI strategy and leading the roadmap. He helped shape Wiley’s AI ethics principles, advanced the Wiley Research Exchange and Atypon platforms, and led development of Wiley’s first AI-driven papermill detection tool, which won the 2025 Silver SSP EPIC Award for Excellence in Research Integrity Tools. He is a recognized industry leader in AI, product innovation, and workflow transformation. He also received an individual honorable mention for the 2024 APE Award for Innovation. He holds a PhD in 3D Modelling with AI and an MBA in Digital Transformation (Oxford University). He also serves as a COPE Advisor, Scholarly Kitchen Chef, Co-Chair of ALPSP’s AI Special Interest Group, and Distinguished Expert at China’s National Key Laboratory of Knowledge Mining for Medical Journals.

Alice Meadows

Alice Meadows

I am a Co-Founder of the MoreBrains Cooperative, a scholarly communications consultancy with a focus on open research and research infrastructure. I have many years experience of both scholarly publishing (including at Blackwell Publishing and Wiley) and research infrastructure (at ORCID and, most recently, NISO, where I was Director of Community Engagement). I’m actively involved in the information community, and served as SSP President in 2021-22. I was honored to receive the SSP Distinguished Service Award in 2018, the ALPSP Award for Contribution to Scholarly Publishing in 2016, and the ISMTE Recognition Award in 2013. I’m passionate about improving trust in scholarly communications, and about addressing inequities in our community (and beyond!). Note: The opinions expressed here are my own

Discussion

11 Thoughts on "Ask the Chefs: What’s a Bold Experiment with AI in Peer Review You’d Like to See Tested?"

A couple thoughts on these:

Hong: you state that “This is where the boldness lies. In this theoretical future model, each reviewer is paired with a Reviewer Agent — an AI fine-tuned on their historic reports, publications, preferences, and style.”

If the AI is writing the reports, then where does this training material come from? Perhaps this works in the short term using peer review reports from humans that have already manually done peer review, but once it becomes common practice, doesn’t everything just turn into recursive slop, with AI training itself on AI-generated content over and over again?

Haseeb: You state, “Only a 100% GenAI-dependent review system will make the publishing ecosystem more just…Can anyone prove this assumption wrong?”

Let’s start with the fact that the major AI systems in use are controlled by a small number of billionaires. We saw just this week that Elon Musk’s Grok AI gave out a significant amount of misinformation, and then later, when it offered theories that Musk didn’t personally like, Musk stated that he would be “fixing” Grok’s “cringe idiocy” (https://bsky.app/profile/chriso-wiki.bsky.social/post/3lysuyqda2c2j).

Is a publishing system where a small number of billionaires get to control what gets through peer review and what doesn’t more “just” than our current distributed system? Would we ever see, for example, an accurate paper assessing vaccine effectiveness if the AI’s Silicon Valley owner wanted to curry favor with a regime actively fighting against vaccines? Should the scientific literature be based on its ability to make money for wealthy tech overlords?

David, great points. From a US publishing perspective, I would also question AI’s capabilities when confronted with manuscripts by authors for whom English is not their primary language. Editors, reviewers, and publishing teams spend a great deal of time working with authors because proficiency in English should never be a barrier to publish great science.

Thank you, David, for raising these critical points. You’re absolutely right that the sustainability of AI in peer review depends on more than just efficiency gains; it’s about what we feed these systems and who controls them. The risk of recursive “AI on AI” training is real, and so is the danger of consolidating influence in the hands of a few tech companies whose priorities may not align with scientific integrity.

For me, this highlights why any bold experiment in AI-driven peer review must build in safeguards for data provenance, transparency, and diversity of input, and why human oversight remains indispensable. Otherwise, as you note, we risk replacing one flawed system with another, potentially less accountable one.

The big question, then, is: how do we create frameworks that allow AI to assist without ceding control of scholarly judgment to corporate interests? That’s where I think the scholarly community must stay united and vocal.

David, Great point — Reviewer Agents would indeed need high-quality human training data. In the near term, that comes from historic peer review reports, decision notes, and rubrics, and also depends on how much material becomes available through open review initiatives. Human oversight remains critical: AI drafts or structures reports, but reviewers validate and correct them, so only human-approved outputs would feed back into training.
You’re also right that the future of such systems isn’t just technical — it will depend on legal and policy updates around AI, which will shape what data can be used, how it’s governed, and where the boundaries are set. The boldness of the experiment is to test possibilities while keeping humans as the ultimate gatekeepers of judgment and accountability.

Why in the world have you illustrated this article with a feminized robot with boobs and a body that reinforces all the lousy stereotypes about the “ideal” woman’s body type? Among my many concerns about how we visualize, even fetishize, AI is how often we depict “smart” objects (robots?) as highly feminized and white-raced, youthful, slim (sexy even) in ways that just constantly reinforce our harmful cultural biases about how we value certain kinds of bodies (and don’t value many others). Why does the illustration chosen for this discussion look like a young, sexy, naked white woman? Please replace it.

Hi Jennifer, thanks for bringing this to our attention. We’ve swapped in a different image. The original was sent in by the group putting together PRW posts this week and chosen from the stock photo service we use. We do our best to be sensitive to the our readers and their viewpoints, but clearly this was a case where our collective empathy fell short and we apologize for any offense caused. We each can always learn more over time, so thank you for challenging the image and giving us a chance to consider and grow.

Like Alice, I have no great interest in using AI for any of my own work. As a journal editor, I receive up to 4 submissions a day now, whereas previously it was one or 2 a week, and many of those are clearly based on an AI prompt, and will immediately be rejected since they don’t address the journal’s submission guidelines or lack any clear evidence of individual fieldwork, essential for our discipline. This is just lazy scholarship. In terms of reviewing, we have fallen back on a stable of reviewers that we know personally, and generally try to avoid manufactured reviews using AI simply because it just doesn’t work for what we do. I have not been reading this blog lately but I am sure questions of ethics and accuracy are raised often in relation to the new tools at our disposal. Since AI is already challenging teaching, learning in SSH, our publications, and our purpose as academics, I have little time for it. Time to retire, I hear you say. Probably correct.

Although I’m no Luddite by any means, I very much share Ms. Meadows’ position: I don’t have any real use for AI in my life or work, and frankly I wish AI would stay out of the workplace, as my experience to date with it is AI-generated garbage as submitted manuscripts. I guess it’s too late for that.

However, if we were going to play with the technology, in a bold manner, what I would like to see is a paper written by AI reviewed by another AI, with no human interaction or oversight. Would the latter be able to identify the reality of the former?

That would make a great experiment Paul (and yes to your additional comment Simon).

Comments are closed.