Editor’s note: Today’s post is by Stuart Leitch, Chief Technology Officer at Silverchair, where he leads the strategic evolution and expansion of The Silverchair Platform and ScholarOne. This post is adapted from his presentation at Platform Strategies 2025.
When I spoke about AI in scholarly publishing two years ago, we were discussing the coming disruption from AI. Today, we’re no longer talking about what’s coming — we’re experiencing direct contact with that disruption. While it’s still early days, we’re starting to see important thresholds being crossed that will fundamentally change how we work.
Are We in an AI Bubble?
The media narrative around AI is all over the place. Critics point to the astronomical amounts of money being invested compared to the value being extracted. Some claim AI is hitting a ceiling, that it’s not getting any smarter, that it’s losing its personality. Yet, players like Bridgewater, the world’s largest hedge fund, suggest the market hasn’t yet priced in all the upside potential.
It’s genuinely hard to know whether we’re in a bubble — that’s ultimately a question for economists. But what I can show you is where we are in terms of capability, particularly in the domain of software development.

The Exponential Growth in AI Coding Capability
Software development offers a unique lens into AI’s progress because coding is complex to write but relatively easy to validate. This asymmetry allows us to run tests that measure how complex a task AI can handle. The results are striking: over the past five years, the level of complexity that an AI coding assistant can engage with has been doubling every seven months.
This tracks perfectly with our anecdotal experience on the AI team at Silverchair. Every six months, it seems like tasks get half as difficult, and AI can go twice as deep into problems. We’ve progressed from method-level code completion with tools like GitHub Copilot three years ago to AI that can now dynamically explore large code bases and propose large, complex changes. It’s not yet fully capable of handling everything we need in large enterprise code bases, but the exponential trend continues to hold — and remarkably, that rate started accelerating in late 2024. The doubling is now occurring every four months.
This acceleration is particularly significant because software development benefits from specific advantages: it’s a constrained language with finite rules, there’s abundant training data, and outputs are relatively verifiable. These systems can engage in self-play and learn in unsupervised ways. While software will be one of the first dominoes to fall in terms of major industry disruption, I believe we’ll see these capabilities trickle down into other domains.
The adoption is real. In a recent meeting of a CTO group in Charlottesville, Virginia — seventeen technology leaders representing various companies — we polled whether organizations had over 50% of their developers actively using an AI coding tool daily. Every single leader said yes.
Understanding the Three Layers that Give AI its Current Capability
To understand where AI is headed, we need to look at three distinct components:
The Models Themselves: When ChatGPT 3.5 burst onto the scene, it was remarkable simply because it could converse and almost pass a Turing test, though it hallucinated frequently. Since then, we’ve seen dramatic improvements. GPT-4 level intelligence is now lightning fast, with over 200 times compression in token cost for that same level of intelligence. OpenAI has released an open-source model (GPT-OSS 20B) that is superior to GPT-4 running on a laptop at impressive speeds.
Tools: Models have learned to use tools beyond just search capabilities. We are no longer just dealing with the models’ internal knowledge. Through things like Model Context Protocol (MCP) — essentially a USB-C type standard for AI—you can connect any arbitrary tool with an API. Whether it’s Salesforce, Jira, databases, or communication platforms, these can all be plugged into AI systems to dynamically populate the context, just in time learn, and perform relatively arbitrary tasks.
Scaffolding: This is where things get really interesting. The scaffolding is the glue that holds everything together, and it’s becoming increasingly sophisticated. We’re moving from simple single-loop chatbot experiences to multi-headed systems with significant sophistication under the covers. The future is multi-agentic, and this is where the real power lies.
Beyond Chatbots: The Power of Multi-Agent Orchestration
Think about a typical ChatGPT session. You start fresh every time — it’s like having a day-one intern who’s eager to please. When you give it access to tools, and it researches a topic, all that information progressively fills up its context window, which is like its working memory. The deeper you get into a conversation, the more of the model’s intelligence is consumed by attending to everything in its memory.
You’ve probably experienced this: the longer a chat session goes, the less sharp the responses become. In a manual workflow for complex research, you’d typically use an expand-contract methodology — let the AI gather lots of information, then distill it down to a succinct report, and bring only that report into a new session. When the context window isn’t filled up, models are substantially smarter.
The Multi-Agent Approach: A Live Example
During the presentation, I demonstrated what’s currently possible with multi-agent orchestration using Claude Code from Anthropic. While this particular tool has a command-line interface focused on developers, the underlying concepts apply broadly, and Anthropic is busy developing variants with more accessible interfaces for non-technical users. Even in its current form, it is highly useful for non-coders.
The demonstration involved a research task about how AI bots are disrupting traffic patterns on academic journals. Rather than giving this to a single AI instance, I structured it as a complex workflow with multiple specialized agents:
- Context Gathering Agent: One sub-agent researched the Platform Strategies conference to understand the audience, saving its findings to a document without cluttering the main agent’s memory.
- Theme Identification Agent: Another sub-agent did preliminary research to break down the problem space into key themes — essentially creating a work breakdown structure, just as you would with human workers.
- Parallel Research Agents: Separate sub-agents then researched each theme in parallel, with the target audience in mind.
- Validation Agents: Once research was complete, additional agents extracted key claims and conclusions from each theme.
- Critical Evaluation Agents: In parallel, separate agents critically evaluated each assumption or claim, producing confidence scores — essentially fact-checking the research.
- Feedback Loop: The system revised reports based on validation, and if themes didn’t hold up because claims weren’t sufficiently substantiated, it conducted more research and repeated the validation process.
- Synthesis Agents: Finally, agents merged all themes into a single research report, reduced redundancy, and pressure-tested everything against the original research prompt.
- Format Conversion: A final agent researched the presentation tool Gamma.app and reformatted the report for slides.
The prompt was dictated to the Claude interface in plain language, requiring no specialized coding or additional expertise. This workflow demonstrates something crucial: by breaking complex tasks into manageable chunks and assigning specialized agents to each piece, you can achieve far more sophisticated results than with a simple chatbot. Each agent has a limited context, stays focused on its specific task, and returns only the essential information — just like sending a researcher to the library and expecting a summary rather than a pile of books dumped on your desk.
The Scaffolding Revolution and Coming Disruption
What’s revolutionary is that this level of orchestration no longer requires technical knowledge. I’ve been working with these concepts for two years, but we’ve now transferred these skills to non-technical people throughout Silverchair, and for most people, the ramp-up time is about a week or two.
The sophistication you can create with language alone is remarkable. If you’ve used ChatGPT’s deep research feature, you’ve experienced something similar — it produces more solid results than regular chat sessions precisely because it’s using multi-agent orchestration under the covers.
When we hear headlines about a coming white-collar disruption, it’s because of multi-agent orchestration. This technology is moving incredibly fast, and scholarly publishing — while appropriately conservative given its role as the bulwark for rigorous science — needs to understand what’s coming.
We’re already seeing disruption in multiple ways. Traffic is being disintermediated as users increasingly rely on AI agents. Papers are becoming cheaper to produce as people use LLMs to generate content. Tech companies have largely stopped hiring junior developers because the technology can handle many basic tasks. Salesforce recently repurposed (effectively laying off) 4,000 customer service representatives because AI can handle routine inquiries.
As the exponential trend continues, the scope of what can be automated continues to expand.
Managing Quality: The Slop Problem
One critical concern with AI-generated work is quality control — what some call “AI slop.” These tools allow you to work much faster, but if used naively, they’ll produce code or research that does not withstand scrutiny.
The solution lies in creating feedback loops where AI can measure itself, combined with strategic quality gates. Using language, you can establish multiple validation agents that check work against different criteria. At Silverchair, our architecture team is documenting their concerns about quality issues, for example, and encoding them into sub-agents that run during continuous integration and even on individual developers’ machines.
The level of quality you achieve is inversely proportional to the effort you invest in original instructions, checks and balances, and ongoing steering. You still need human oversight — manual code reviews remain essential — but you can dramatically reduce errors through thoughtful orchestration.
Where Do We Go From Here?
The capabilities I’ve discussed are available today, not in some distant future. When combined with tools like MCPs (Model Context Protocol) that connect to external systems, the kinds of intelligence-requiring problems that organizations routinely solve are increasingly automatable through multi-agent systems.
The premium right now is on understanding these concepts and learning to work effectively with LLMs. Organizations need people who are experimenting with these tools, learning the mechanics of orchestration, and figuring out where to inject human supervision into automated workflows.
This is technology that will disrupt our businesses in countless ways. We see that disruption along three key pathways: the way end-users will consume content, the way researchers will create content, and the way publishers will curate and monetize content. As is the nature of disruption, much of this is out of our control as publishers. But what we’re uniquely positioned to do is determine how to develop MCP systems or AI agents that can reinforce the value of what we do and the importance of research and researchers.
The future isn’t about AI replacing human intelligence — it’s about humans learning to orchestrate AI agents to handle the grunt work, validation, and parallel processing that expands what we can accomplish. For those willing to invest a week or two in learning these concepts, the productivity gains are immediate and substantial.
The exponential curve is holding, and in some domains, it’s accelerating. Whether we’re in a bubble or not, the direction of travel is clear. The time to develop these capabilities is now.