As Peer Review 2025 turns its lens to the future, one question looms large: How will artificial intelligence (AI) reshape the way we conduct peer review? I think we have reached a point where we’ve gone beyond the if to the how. At the same time, there are multiple pressures — rising submissions, papermills, AI-generated manuscripts, and growing mistrust in the system. On one hand, AI is the root of some of these issues; on the other hand, it is a solution too. To explore these tensions, I spoke to two leading experts at the intersection of publishing and technology — Helen King and Christopher Leonard.

Helen is an Independent Consultant; she is recognized as an expert at the intersection of publishing and technology. She writes and curates the PubTech Radar newsletter, which is focused on trends, innovations, and tools in publication technology, and convenes the London-based AI in Publishing Collective meeting — catering to publishing professionals wanting to connect, share ideas, and stay informed on AI and tech in publishing.

Chris is Director of Product Solutions at Cactus Communications. In this role, he partners with publishers to optimize and future-proof peer review workflows. Scalene is a personal newsletter curated by Chris, exploring how artificial intelligence is reshaping peer review. In its debut issue, Chris laid out his key premise: AI might be capable of producing “better-than-human” peer reviews within five years — a possibility with deep implications for publishing.

Helen and Chris are both immersed in the rapidly evolving landscape of peer review and AI, yet they come from different vantage points — Helen as a strategist and community-builder, and Chris as a product leader and practitioner. Together, they offer a nuanced look at how peer review might adapt, fracture, or reinvent itself in the AI era.

Logo image for Peer Review Week 15 - 19 September 2025.

From your perspective, what is the most essential function of peer review in 2025?

Chris: Weeding out AI-generated nonsense. I’m not against the use of AI to help edit papers, and as a reviewer assistant, but the scale of malpractice in generating manuscripts is getting to a critical point — and humans need to be better at arbitrating what is true and publishable, and what is not. More critical thinking is required by both reviewers and readers. Peer review was never about ‘is this faultless’, but rather ‘is this interesting and/or moves the field forward’. I think, regardless of what we thought peer review was for in the past, one of its most important functions now is to prevent the spread of misinformation, however we can.

Helen: I agree. Right now, ‘peer reviewed’ could mean anything from two exhausted academics rubber-stamping a paper to a community of specialists collectively evaluating methods to a fake reviewer working in a peer review ring. The peer review ‘brand’ has become as muddied, carrying decades of baggage about bias, inconsistency, and AI contamination. Maybe peer review needs distinct brands that clearly communicate what type of scrutiny readers are actually getting? Perhaps we need new labels like ‘Traditionally Verified’ for traditional deep-dive reviews by recognized specialists, ‘AI Screened’ for basic automated quality checks, and ‘Patient Informed’ for medical research that includes patient perspectives, etc. Each brand would carry specific promises about what was actually scrutinized and by whom. Instead of forcing diverse validation methods under an increasingly vague umbrella term, strategic rebranding could restore trust by delivering clear quality indicators that actually tell researchers and readers what they’re getting.

How have you seen the expectations of peer review shift in recent years, and what pressures are shaping it now?

Helen: Back in the day, the biggest threats to peer review seemed to be the odd academic trying to prove a point about the system’s flaws or authors recommending their mates as reviewers. Ten years ago, Google Scholar co-founder Anurag Acharya said he was ‘not really’ worried about gaming because everything was visible and ’anyone in the world can call you on it.’ Deep-fake researchers or reviewers weren’t yet a thing, no one was talking about paper mills, and the publishing ecosystem relied on trust. Blockchain peer review solutions came and went, dismissed as an unnecessary decentralized solution to a trust problem that didn’t exist in a stable and centralized system. Times have changed; now we’re dealing with AI misuse, organized fraud through paper mills, and bad-faith peer review, rising retractions, and the commercialization of research misconduct. Now people are actively trying to break the system, it doesn’t seem quite as robust as it once did.

Imagine peer review being re-built for the AI era. What is the one change you would make that would fundamentally improve how it works?

Helen:  I think there’s a deeper question here that reminds me of Richard Susskind’s work on how AI is transforming the legal profession. He argues that lawyers get trapped in ‘task-based thinking’ — defending current processes instead of focusing on what clients actually want. Rather than asking, ‘Can AI do peer review as well as humans?’, maybe we should ask, ‘What outcomes do we actually want from peer review?’

The research community doesn’t necessarily want ‘peer review’ — it wants quality assurance, fraud detection, constructive feedback, ethical oversight, and credible validation. These outcomes could be delivered through AI screening, field-specific human expertise, or entirely different approaches. I think there’s a risk that we become too focused on whether AI can replicate traditional peer review and forget to ask how AI can best meet our needs.

Chris: Absolutely. AI will open up new avenues of quality assurance that we may never have considered as part of ‘peer review’ in the past. Even now, the fields of research integrity and peer review are merging into a continuum of multi-step ‘quality assurance’ processes, where we have the opportunity to add in new checks and balances that weren’t available — or not needed — previously. As with most technological revolutions, however, the most important thing is to adapt them to serve human needs to ensure they are adopted at scale. But in this case, we as human users also need to reconsider what peer review is and reinvent it appropriately.

If AI were fully integrated into peer review, what role should humans play that machines never could?

Helen: Future AI technologies will be able to handle most aspects of peer review better than humans, but each research community must choose whether they want that future. Some fields have well-defined methodological standards (like statistical requirements, formatting protocols, or experimental design criteria) that AI can consistently apply. Others require contextual judgment that currently remains human territory. Some communities will prohibit use, others will embrace it. I don’t think there will be one story about AI adoption. It’ll be thousands of separate decisions about what matters most to different fields.

Chris: This is where the concept of AI-assisted professional peer reviewers will become important. Firstly, we need to recognize that peer review plays a key role in publishing, and it shouldn’t be left to volunteers who return reports when they want to, with no real governance over the content of those reviews. This is partly why we find ourselves in our current predicament. So, no volunteers, let’s professionalize this key process and bring it in-house with proper oversight by publishers or vendors.

Secondly — and part of the reason these need to be professional roles — is that these reviewers need to be trained on all available AI systems and use them appropriately for the manuscript under consideration. Various LLMs have different strengths and weaknesses – image duplication tools are essential these days, as are reference list authenticators. They need access to prompt libraries and need to know how to go deep on answers by iterating in a conversation with the LLM. Finally, they need to be able to write, in their own words, the final report and sign it with their name.

Thirdly, they are no longer true ‘peers’ (although they are subject matter experts) and they are no longer performing traditional ‘reviews’, so we’ll need to consider what to call them, but for now, AI-assisted professional peer reviewers is the key way I see us making the most of human judgement and AI evaluations at scale for academic research.

What’s one principle of peer review you’d fight to protect, no matter how advanced AI becomes?

Helen:  The term itself! I think ‘Peer review’ should mean evaluation by human peers and that we should reserve that term for the human activity of one human expert evaluating another’s work, using their lived experience, contextual knowledge, and professional judgment.

Chris: I think I agree. Human evaluation, whether partly or in whole, is too important right now to ignore or to conflate with AI evaluations. Helen’s earlier comment about labels for ‘Traditionally Reviewed’ or ‘AI-assessed’ is important in this context. We can imagine a multitude of variations on these, too.

What’s one risk of AI in peer review that’s worth taking — and one that’s not?

Helen: The real risk is that individual research communities — whether it’s medieval history or particle physics — lose agency over how their peer review works because publishers control the infrastructure. AI systems aren’t like installing Microsoft Word once and using it for years. They need constant feeding, tweaking, bias monitoring, and updates as research practices evolve.

Chris: One risk worth taking is running AI evaluations alongside traditional review workflows to ascertain the quality of the output versus your current human workflows. We fetishize peer review to an extent, and many human reviews are poor, superficial, vague, or just plain wrong. Plus, they take too long. NEJM AI recently ran a great experiment where an editor reviewed a manuscript in 7 days, and concurrently, they generated two other reports from Gemini 2.5 and GPT-5. The editorial board then sat to evaluate all reports and give a decision (authors opted into this experiment).

What’s not worth doing is trying to one-shot peer review reports with people who don’t know the subject area or how LLMs work. Inevitably, the results will be poor, and you’ll come away with the sense that AI can’t help you. And you may close the book on AI for a while, when in truth it is a powerful aid to human evaluation, and is developing at a pace where you need to keep re-evaluating your opinions about it every few months.

How should disclosure and transparency evolve if AI tools become a standard part of the process? What’s enough to keep trust intact?

Helen: I like how Anthropic’s diligence framework for AI collaboration, which [I think!] aligns closely with COPE guidelines, approaches disclosure and transparency through Creation Diligence (being thoughtful about which AI systems we use), Transparency Diligence (being honest about AI’s role with everyone who needs to know), and Deployment Diligence (taking responsibility for verifying and vouching for outputs we share). I’m uncertain how meaningful these frameworks will be in a couple of years, but right now they’re a good approach for reviewers, authors, and publishers.

Beyond individual responsibility, we need transparency from AI tool builders themselves so that these don’t become black boxes. The community should ask for comprehensive ‘tool cards’ that reveal how these systems were constructed and who they represent. For example, research demonstrates that models trained on gender-imbalanced data significantly amplify existing disparities compared to human evaluators — a language model built on 80% male usage and writing patterns will behave fundamentally differently than one trained on balanced datasets.

Who, according to you, should be setting the guardrails for AI in peer review — publishers, funders, regulators, researchers themselves?

Helen: I think what matters is not letting individual actors set guardrails in isolation. Peer review integrity affects everyone in the research ecosystem. The solution needs to involve all the people who actually have skin in the game.

Having said that, publishers face real commercial pressures. They see patterns at scale that individual researchers don’t – the volume challenges, the abuse, the operational realities that may make idealistic solutions unworkable. If publishers want to run efficiency tools to catch plagiarism faster, automate some aspects of decision-making, or manage massive submission volumes, why shouldn’t they – providing they comply with frameworks, like the EU AI Act? If research communities are unhappy, they should vote with their feet.

Chris: I’m a little skeptical about AI guidelines since, in my opinion, they are overly conservative and not updated frequently enough. Add to that, they are largely unenforceable and undetectable, it makes me wonder if this is something we just need to accept as a part of life in the 21st century.

But guardrails are a different matter. Setting up AI tools so that they can’t do certain things is probably the best way we have of limiting ‘damage’ that we may do to ourselves, however inadvertently, with AI. The only people who can do that effectively are the people developing the AI. However, an advocacy group of publishers, researchers, and funders could be a useful addition to the landscape here.

Fast forward 10 years: What will surprise us most about how peer review has changed?

Helen: I’m torn between three visions of what’s coming, each pulling at different intuitions about how systems actually evolve:

  • Vision 1: Professional Evolution — Peer review doesn’t change fundamentally, just gets more systematic. AI handles initial screening for obvious flaws, but humans still make the important calls. The big shift is professionalizing reviewers — especially in science, technology, and medicine. This approach builds on existing systems without disrupting established hierarchies.
  • Vision 2: Industrial Automation — Large Western publishers turn peer review into another step on the content production line. AI doesn’t just screen — it evaluates methodology, checks statistical analysis, and suggests revisions. Humans become quality controllers rather than decision makers, intervening only when algorithms flag uncertainties. It’s ruthlessly efficient, dramatically faster, and treats manuscripts like any other manufactured product moving through standardized processes.
  • Vision 3: Community-Focused — Peer review splits into specialist communities that deploy AI in ways that suit their specific needs. Basic fraud detection and technical screening happen automatically, but different disciplines shape their own AI tools and review processes. Communities retain control over what gets evaluated and how, adapting technology to serve their research priorities rather than standardized efficiency metrics.

My experience points to Vision 1 — academic publishing usually chooses gradual evolution over revolution. Commercial logic drives toward Vision 2 — for most commercial publishers, this is ultimately a business optimization problem. But the heart wants Vision 3, because it restores peer review to its original purpose.

Chris: I think we’ll see all three of Helen’s visions. Number 2 for the vast majority of incremental research, with elements of number 1 (humans making the important calls) for outliers or odd outcomes. But I also see a future where human-driven community review is still the de facto way of evaluating research. The top 5% of journals will probably carry on with this since it works reasonably well now, and there is no shortage of reviewers for The Lancet or Nature. But also, society titles, which have a significant element of community support, will not need to make a big switch to quicker or AI-driven evaluation. Where this will be useful is at scale for evolutionary, rather than revolutionary, research — which probably accounts for 90% of all research output.

If you had one message for the next generation of reviewers about navigating the AI era, what would it be?

Helen: Approach this shift with curiosity and work out your stance on using AI in reviews while understanding your publisher’s AI policies. If the publisher allows it, test how AI tools can assist with literature searches, methodology checks, and evaluating your own review drafts. Experiment thoughtfully with these tools as you develop your reviewer skills rather than avoiding them entirely or accepting them uncritically. Find where AI genuinely helps versus where it undermines the critical thinking that makes peer review valuable.

Chris: Never let an AI tool write a report for you. Use it as input for your own critical thinking and then construct a report in your own language. Disregard anything you consider irrelevant. Add in things you consider important that haven’t cropped up. Then write the damn thing yourself. I love the concept of curiosity here. If I was to deliver one message to authors, too, it would be to pre-review your work before submission. You have access to the same tools as reviewers and publishers – use them to iron out what AI would consider to be major deficiencies before you submit.


As this conversation ended, what I was left with was the strong feeling that the future we are discussing is not about choosing one path. It is about learning to walk several at once. The three futures Helen sketches feel less like competing scenarios and more like parallel realities that will unfold simultaneously. Efficiency may drive some choices, and community values or commercial interests may drive some. And as Chris warned, setting up AI tools so that they can’t do certain things may be the best way to limit the damage we risk doing to ourselves — however inadvertently — with AI. But only those developing such tools can build in those guardrails. What’s important is how deliberately we make these choices.

Roohi Ghosh

Roohi Ghosh

Roohi Ghosh is the ambassador for researcher success at Cactus Communications (CACTUS). She is passionate about advocating for researchers and amplifying their voices on a global stage.

Discussion

2 Thoughts on "Peer Review in Transition: Helen King and Christopher Leonard on AI and the Future of Peer Review"

Thanks for an interesting read. I agree it’s important to think about what the research community really wants when it comes to using AI in peer review. One question to ask is how important the ‘peer’ is -review by a human expert in the field – and how important the ‘review’ is – quality assurance, fraud detection and constructive feedback. People will have different views on that, but it’s a conversation worth having.

At IOP Publishing, we take a bottom-up approach to peer review innovation to make sure it’s shaped by our scientific community in a way that’s safe, secure and led by integrity. That said, publishers do need to have a say in how AI is used. We have a responsibility to guide its development and make sure it supports trust in research.

To try and understand what the physical sciences research community felt about AI in peer review, we conducted a survey of almost 350 researchers. We found that views were polarised and getting more so. We also found gender and career-stage differences in how people foresee the impact of AI on peer review.

The job of publishers right now is to ask the right questions and listen carefully to how the community responds.

“The research community doesn’t necessarily want ‘peer review’ — it wants quality assurance, fraud detection, constructive feedback, ethical oversight, and credible validation.” Sorry but that is peer review. I imagine an AI is of most help on fraud and validation, as the authors later come around to saying.
“publishers face real commercial pressures” – only the commercial ones, and particularly in STEM. Even there, the big publishers are raking in big profits. They should not be economizing on peer review! Our diamond OA journal doesn’t have any commercial pressures.

Comments are closed.