The recent finding that PubMed Central’s (PMC) competing copies of journal articles can reduce HTML downloads by approximately 14% will prove important if it proves generalizable. I believe it likely will. The study was robust, the logic of the hypothesis is sound, and there’s no reason to believe otherwise. PMC is competitive by its very nature, and even designs its search interface to make itself more competitive.
PMC tries to compete with publishers.
In fact, the general level of traffic competition represented by PMC may prove to be even higher once other journals are studied — a quick look at PLoS data comparing on-site downloads to PMC downloads shows that PMC is taking 22% of their downloads — a surprising finding given that there is no price disparity driving such behavior. This may ultimately provide a hint of how undifferentiated traffic behaves, but that’s speculation for another day.
For the sake of this essay, I’ll stick to the 14% number — but, keep in mind, at least one open access (OA) publisher has a 20% traffic attrition rate thanks to PMC.
While a decrease in traffic of 14% sounds competitive in the abstract, what does it mean practically to a publishing business, especially one that is increasingly dependent on digital behavior and engagement?
The extent of the effect depends on the lines of business a publisher has, but because traffic is the lifeblood of online business, the effects permeate any line of online business. Whether you’re a subscription or OA publisher, having traffic leeched from your site has desultory effects. It’s not just a problem for one type of publisher or another. It’s a problem all publishers have in common.
Cost to ad sales — Online ads are sold on the basis of impressions, and impressions depend on traffic. Many of the more successful journals with online advertising generate multiple page views as users navigate to articles. This has changed a bit as Web 2.0 search technologies have allowed users to pinpoint articles from Google and other major search engines, going directly to content, but both modes of usage still occur. To put this into financial terms, let’s say an online publisher with a robust advertising business generates $500,000 per year in digital ads — targeted, run of site, or run of network. If traffic is suppressed by 14%, that can lead to a $70,000 per year loss in ad revenues, a number that goes higher if you factor in navigation habits that would yield multiple page views per use.
Cost to upsells and cross-sells — One of the goals many publishers have is to use their Web site to upsell customers to product enhancements (e.g., iPad apps, CME, or a personal subscription) or cross-sell them to new products (e.g., books, new journals, white papers, submissions to other APC-driven journals). These revenue streams tend to be smaller and more sporadic, but more than nothing. Let’s assume a publisher has about $100,000 per year in upsells and cross-sells, for the sake of proportionality. At a 14% decrement, that’s another $14,000 in lost business.
Cost to institutional subscriptions (opportunity cost for price increases, compounded) — Institutional subscriptions are evaluated on usage, which is derived from traffic. Decrease the traffic flowing to a Web property, and you decrease its usage. Institutions are increasingly sensitive to usage as one of the ways they evaluate a publication’s value, and publishers are very aware of this. If a publisher sees usage going down or not increasing at historical rates, this can limit their willingness to increase prices. A 14% decrease in traffic can be more than enough — by a sizable amount — to offset any organic increase in traffic. If a publisher holds back from price increases because of the PMC effect, this opportunity cost, incurred in Year 1, compounds itself in later years — that is, a 2% holdback lowers prices, and decreases the dollar increases in subsequent years, out into infinity. For a publisher with $1,000,000 in institutional subscriptions planning a 5% price increase, but settling on a 3% price increase because of disappointing usage reports, results in $285,000 in lost revenues over a 10-year period.
Cost to brand — Most publishers invest a fair amount in their digital presence because it’s important to their brand. Brands are the most valuable assets any organization can own, and brand equity is increasingly recognized as a main source of value, both current and future. As the head of Quaker Oats said in the early 20th century:
If this business were split up, I would give you the land and bricks and mortar, and I would take the brand, and I would fare better than you.
When users are diverted from an immersive, branded experience to a sub-branded PMC experience, the cost to the core brand can be significant. Yet, it’s hard to measure. Because journals have such unique brands and content, competition doesn’t provide much of a measure of brand value. So let’s assume that the brand is as valuable as one year of revenues, a fairly modest assumption. In the business we’re contemplating, we have about $2 million in revenues so far (of course, in reality, this business would be too small, and I haven’t counted all the possible revenue sources, so this is very clearly an underestimation). At a decrease of brand power of 14%, offset slightly by sub-branding on PMC, let’s say that nets a 10% deficit to the brand. This comes out to about $200,000 in lost brand equity each year.
Cost to editorial (audience defection) — For editors, it’s important to know who you’re trying to reach and how well they’re responding to your editorial decisions and features. With 14% of the audience accessing content on a separate site, the ability to know your audience, or to attract them with new features, or features not included on the PMC version, diminishes greatly. This can affect all sorts of editorial initiatives. Want to know if a new feature is working well? It’s hard to know if you get an 8% response rate to your survey vs. a 9.2% response rate; it’s hard to know if traffic is suppressed by 14%, so that 14% of your audience never sees it; and it’s hard to know if that 14% of your audience happens to be the most Web-savvy part, and the new feature was designed to appeal to them.
Cost to the parent organization’s membership efforts — Many journals are owned by not-for-profit entities, which offer their journals as member benefits. In many cases, journals are deemed to be the most valuable benefit of membership for an organization. When members find that the same content is available at no cost on PMC, there’s a clear risk for one of two things to occur — either they drop their membership, or member dues don’t increase as they would have otherwise.
Cost to product development — With the ability to test new ideas, survey users, and gather information on usage suppressed by 14%, product development takes a hit. The chance of missing the mark increases, the chance of receiving less bang for the buck goes up, and the likelihood of new products succeeding goes down. It’s as if a tariff of 14% is being placed on new business initiatives.
This exercise wasn’t designed to generate specific numbers you can bank on. It was designed to show how traffic competition can be deleterious for online publishers — which we all are — regardless of business model. Lost traffic is lost brand equity, lost eyeballs, lost commercial opportunity, lost usage, and lost engagement — all the factors that make online businesses successful are damaged by the competition PMC creates.
13 Thoughts on "What PubMed Central's Drag on Publisher Traffic Could Mean Financially"
No publisher has to put their articles in PMC yet thousands do. Authors must put an accepted version of their manuscripts a year after publication when the research received NIH funding but that is the only requirement to put material in PMC.
Apparently the publishers who voluntarily put their articles in PMC must view the benefits and costs of putting their articles in PMC differently then you do.
This post seems a little strange in comparison to your other other post today on all the benefits eLife gets from PMC which are the same benefits every other journal in PMC gets from PMC.
Actually, the NIH Public Access Policy requires that the final published version of a manuscript be deposited.
Most journals do this on behalf of their authors — see the list of these few thousand here: http://publicaccess.nih.gov/submit_process_journals.htm
Here is more about the policy and how it was revised in 2008 to make this happen: http://www.arl.org/sparc/advocacy/nih/copyright.shtml
By passing this law, legislators made publishers compete with themselves, with PubMed Central the source of competition. On top of that, PMC has manipulated their search interface to make themselves more competitive. This has all resulted in a drag on publishers, something current legislators may want to consider as an unintended consequence of the original law. This post outlines how central the competitive threat is.
It could have been written a clearer but I believe it is the accepted version prior to copy editing and typesetting. Note “upon acceptance for publication”.
“The NIH Public Access Policy ensures that the public has access to the published results of NIH funded research. It requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to the digital archive PubMed Central upon acceptance for publication. To help advance science and improve human health, the Policy requires that these papers are accessible to the public on PubMed Central no later than 12 months after publication.”
The requirement is on the author not the journal. True, many journals do archive the manuscripts for the author and submit the published version but that’s their choice. The point I was making is that journals choose to do this and thousands choose to archive all their manuscripts, NIH funded and otherwise. It is cost them so much as you state, why are they doing any more than they are required to do which is nothing.
Journals comply to whatever degree they wish in order to not alienate authors. That’s the source of coercion. Now that data are emerging about how competitive this policy may be for publishers (they were uncertain before), there may be some reconsideration around this. That’s the point of this post — to show how corrosive to an online publishing business a competitive source of content can be.
Publishers want to help authors. They think they are helping authors by depositing manuscripts for them. However, they now need to consider whether this assistance is akin to slitting their own throats.
David’s argument seems like a diversion. The fact that publishers provide this service does not mean that they are not harmed by the PMC system.
Google’s ever-changing algorithms would seem to play a part, too. At one early point in Google’s development, depth of content mattered. Now, it’s the number and “power” of links to your site that help determine where a publisher’s content shows up in the Google Rankings which in turn affects traffic and ad revenue.
Just as PubMed drives business to journal sites, PubMedCentral archives also drive traffic to journal sites. The most likely place to find an article cited is in the same journal in which it was published. Readers interested in a topic are also naturally going to seek out current issues of a journal based on relevant articles they found in the PMC archives. In point of fact, the biomedical publishing industry benefits enormously from free and widely used NCBI products like PubMed and PubMedCentral. It is basically free advertising.
Having run many online journals, I haven’t seen PMC contributing traffic in any meaningful way. And it’s not “free advertising” — Google search results are free advertising without any downside as far as presenting a competing store of content at no cost to the same audience. PMC is competitive. It does not drive traffic in any meaningful way. If you have evidence to the contrary, I’d be very interested to know of it. However, beyond that, you’re merely stating theoretical possibilities that I’ve never seen actually amount to anything in the real world.
Also, PubMed has introduced search designs that prefer the PMC version over the publisher’s version, driving traffic to PMC instead of to the publisher. This is competitive, beyond merely hosting free, duplicative versions of the same content.
This brings us back to the point of, “Why does PMC need to host a competing version of the content?” If the NIH rules were enforced at journal sites (which journals are willing and able to accommodate), then PubMed and MEDLINE could serve as the discovery tool, but send all (ALL) traffic to the journal sites, not dilute brands, not steal traffic, etc.
I take it the policy solution is that PMC should be basically a search engine for NIH funded results, preferably available at the publisher’s site, not a repository. Is this correct? If not then what is the policy solution?
In developing nations like India where many researchers (and many editors too )are not aware of the differences b/w pubmed/medline and PMC ..inclusion to PMC is an active marketing tool for publishers.. (Indexed In Pubmed Central is the tag line ) .. this is one effect of PMC too..
Elsevier sends articles with NIH funding to PMC on behalf of authors. However, this is the article as it was accepted by the society and before it is edited, typeset, and proofed by Elsevier. Authors who want the version on PMC to be updated with changes made after society acceptance will come up against resistance. They either have to pay a fee to Elsevier or, if the change is something that effects the scientific accuracy of the article, contact PMC and jump through some hoops.