Punctuated equilibrium
Gradualism versus Punctuated Equilibrium (Wikipedia)

If you spend anytime online, you have likely participated in an A/B test without knowing it.

A/B testing allows a web-based service to randomly direct users to two or more variations of an interface (A or B) in order to measure some kind of outcome like clicks, purchases, or recommendations. An online store may be interested if you were more likely to purchase a related item if the products were displayed on the right side or at the bottom of the screen. A political campaign may be interested in whether slight differences in wording affect how many people donate or join a mailing list. A news organization, like BuzzFeed, may be interested in measuring clicks to different variations of a headline, and those who produce long form journalism or podcasts, like NPR, may be interested in testing different introductions to the same story.

And while scholarly publishers were early adopters of web-based technology, interface development is not a high priority for many organizations. After considerable time, effort, and resources have been spent, a single online interface is launched and remains largely unchanged for years until some executive or subcommittee is charged to investigate something new. Rarely are multiple interfaces tested simultaneously, and the ultimate decision that will lock the publishers’ interface into another cycle of stasis is often based on the organization’s Highest Paid Person’s Opinion, a term in organizational behavior that receives its own acronym, HiPPO.

This got me wondering whether this is something different about scholarly publishers that dissuades them from putting much emphasis on interface design or the process that leads to incremental interface improvement.

None of the major journal platform providers who responded to my inquiry currently offer A/B testing, although some now work with third-party providers to provide A/B functionality. The American Medical Association, which hosts its journals with Silverchair, uses Optimizely — a software that can redirect traffic to different versions of a website,  gather and summarize user statistics. Matt Herron, web developer and product manager for the AMA told me that he has used A/B testing for online registration, branding, and testing various elements on journal articles. HighWire customers who use the open source content management system, JCore, (see Science Advances and The BMJ) can implement A/B testing although those publishers using HighWire’s native H20 interface cannot. Similarly, A/B testing is not offered to Atypon customers, but should be available sometime in the future, according to Gordon Tibbitts, EVP of Corporate Development for Atypon.

A/B testing is not going to solve large interface problems but it will allow publishers to engage in incremental testing and improvements so that website evolution becomes a more gradual process rather than an episodic event.

There are costs to adopting an incremental change approach, and small publishers lacking sufficient staff may decide that old-school is good enough because it allows them to focus on generating high quality content rather than chasing web design trends. What worries me about this approach is that publishers do not have a monopoly on their readers’ experience. Scientists may be just as likely to retrieve a journal article from a large subject-based repository, personal websites, or to a growing extent, from commercial products like ResearchGate and Academia.edu, as they are from a journal website. These non-publisher conduits of literature do not invest in the creation of content, but rely on a free supply of papers uploaded by their users or scraped from other sites. And by serving only as content providers — rather than content producers — they can focus entirely on providing superior user experiences.

If your organization is entirely focused on creating content and doesn’t care where or how it is consumed, then interface should not be a concern for you. However, if there are compelling reasons to keep readers coming back your site (community cohesion, branding, advertising, COUNTER downloads, among others), than journals must consider improving the user experience. It all begins by making a choice between A and B.


Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/


6 Thoughts on "Incremental Improvements Start With A/B Testing"

Good post on a subject we need to think about. A/B testing delivers the most benefits if you know what you’re optimizing and why. In e-commerce, marketing, and advertising, the goals are pretty clear — you want greater throughput, and can test for net benefit using testing routines and clear metrics. Adobe Test and Optimizely can provide good interface options for e-commerce and marketing (and even for headline writing and content presentation), and various advertising systems can provide data to help creative teams optimize their throughput.

However, with site interfaces, the benefits of A/B testing are more elusive, I think, especially as the percentage of visitors starting at the root domain decreases and sources like Google, Google Scholar, and Pubmed drive the lion’s share of traffic. Which interface are you optimizing then? I’d suggest the article interface is more important than a home page these days, and SEO/SEM could provide more benefits than an interface test. If the goal is increased article usage and user engagement based on content discovery, then it makes more sense to invest in SEO/SEM than in A/B interface testing.

It’s also harder to measure lift around content, because content is so variable for scholarly and scientific publishers. It’s not like we can compare our Golden Globes ratings to our Oscar ratings to our Grammy ratings. We have one article on neurobiology, one on endorphins, one on prevention, and so forth, and they aren’t comparable. Gauging lift requires a longer term and more macro view, which slows A/B testing around content. This may be good for smaller publishers, because the expensive A/B testing approaches are built for speed. A simpler approach of running one design for six months and another for six months, adjusting for site traffic seasonality, could be just the ticket.

We then also get to the meta aspect of interface design, which involves a set of conceits and placements that inform the bulk of anyone’s browsing experiences. Do we really need to test that comments work best at the bottom of content? That the transaction button needs to look most like Amazon’s to work best? That a “trash” icon works for deleting things online? Beyond these established conceits, most platforms have standard interface designs that become de facto standards for most users. Understanding users’ visual vocabularies and expectations is also important for effective A/B testing.

Gaining clarity on the goals around A/B testing is the most important step, because only then can you calculate the cost:benefit (and risk) of investing in interface designs. But I think one key point in the distributed information economy is that SEO and SEM seem like better investments if content discovery and increased usage are the goals for publisher sites.

I agree that the journal (and article) sites are typically more important than the Publisher’s home page. And here you have two different sets of users — authors and readers. Unless you have separate sites for each you cannot optimize the combination. By the way this morning I was on a journal site that did not display a link to its about page, although Google found it. It was a very clean looking design but clearly suboptimal (said the engineer).

Perhaps the most famous recent example of A/B testing is the case of Marissa Meyer at Google, who famously tested 41 different shades of blue on the site to see which would yield more clicks. What’s fascinating about that is that when you read about that test, it is either hailed as the smartest approach to web design ever, or a travesty of listening to engineers over designers and the reason why Google has had such poor design over the years. The opinion expressed invariably depends on whether the author is an engineer or a designer.

In reality, there’s a balance to be struck between the two.

While I cannot speak for all platforms it has been my experience with the four major platforms that my clients are using that the development teams are short staffed, often over committed and work hard just to deliver the promised enhancements according to planned schedules. All of the platforms have a healthy list of enhancement features they are working.. While A/B testing may not be one of the tools that developers use, it is important to understand that most platforms do release new enhancements to their platforms on a regular basis. All of the publishers that I have worked with maintain wish lists that are prioritized and there is always pressure placed on the platforms to deliver new enhancements. Talk with the technical teams working on the publishers side and interfacing with the platform staff, and they can produce a list of their top 10 enhancements. I know of no platform that has not been updated. While A/B testing might be of some value, I am not sure that the companies offering platforms in the STM industry can afford to support the level of testing that one might find in other industries.

Phil, you answered your own questions. Not enough time, money, or expertise. Everything online is a 1,000 times more complicated that we think, takes double or triple the time to implement, and costs a fortune. Also most of us are at the mercy of our platform vendors. It’s not pretty.

Comments are closed.