Recently, traffic from a set of journals from the American Physiological Society (APS) was studied, and it was found that PubMed Central’s (PMC’s) version of APS content decreased HTML views at the journal sites by about 14%. Later, in writing a separate post about the potential costs the PMC traffic drag might be creating for publishers, I did a quick comparison with PLoS data, and found that PLoS journals overall lose 22% of their traffic to PMC.
The fact that PMC draws traffic away from PLoS was really puzzling to me — why would journals that are always free for readers to access give up nearly 1/4 of their traffic to another free resource?
The puzzle grew more complex after I spend an hour or two looking further into the PLoS data. It turns out that the 22% overall average is part of a fairly wide spread of traffic migrations, ranging from 13.5% to more than 30% among PLoS titles.
Traffic is important to PLoS, since they sell advertising against every title included in the dataset they publish. In 2010, they generated more than $280,000 in online advertising revenues, a figure that has increased in the two years that have transpired, judging from their current media kit’s claims. If nearly one-quarter of their traffic is being lost to PMC, the economic cost could be deep into the five figures, and may be by now approaching the six figures.
Again, there was the basic question: Why would one free resource take traffic from another free resource? But now, there was more — Why would one resource take traffic differentially from a set of free resources? What economic theory, behavioral theory, or traffic theory supports that finding? What might influence the degree of attrition? What factors might contribute to a lower or higher rate? And why would there be any difference at all?
Here’s how the PLoS journals’ traffic losses break down, from launch through October 2012:
- PLoS Biology: 13.5% lost to PMC
- PLoS Computational Biology: 15.0% lost to PMC
- PLoS Medicine: 17.1% lost to PMC
- PLoS Genetics: 21.7% lost to PMC
- PLoS Pathogens: 28.4% lost to PMC
- PLoS Neglected and Tropical Diseases: 28.6% lost to PMC
- PLoS ONE: 30.6% lost to PMC
Is there something in Google diverting people? I tested a few articles, and couldn’t find a discernible pattern. Are there big spikes in the data driving the average for each title? There are some spikes, but they aren’t enough to skew the overall effect, and why would one free resource be preferred in any case?
There was another interesting phenomenon in the data — there were occasionally articles for which PMC provided the bulk of the traffic. That is, more views (sometimes by a factor of 1.5-2.5) occurred on the PMC version of the article than on the PLoS version of the article.
I asked my Kitchen Cabinet (never before used that term here, feel like it’s overdue) what they thought, and there were multiple ideas and recommendations. But we’d really need time-series data and clickstream data to definitively answer the question. Also, because the data are just numbers and have no demographic dimensions, there’s no way to know if the users differ between the venues. Some speculation circulated around use of PubMed as the search engine of choice, which has been designed to point to the PMC version in the results list while suppressing the publisher’s version. Some speculation involved social media pointers. But there was one line of reasoning I found compelling — branding.
Looking down that list above, the stronger PLoS brands — more distinctive, more clearly matching a domain of knowledge, and, in the cases of Medicine and Biology, more entrenched — are less subject to traffic migrations than the weaker brands, like PLoS ONE, which is undifferentiated and therefore less entrenched.
Brands make promises, with relevance being a promise audiences look for. The promises of the stronger brands are clearer — focused editorial content for an addressable domain. Even brands like PLoS Pathogens or PLoS Neglected and Tropical Diseases aren’t clear about exactly what they are — basic science, clinical, or a mix of both? For researchers, infectious disease specialists, microbiologists, virologists, or others? Hence, it’s a weaker brand that has a weaker implicit promise of relevance.
In fact, looking over the list above, if I had to rank the brands by specificity and clarity, I’d put them in something resembling the same order.
Lacking enough brand punch to promise relevance, the next level that can work is the article level, where specific keywords can deliver on the promise of relevance. Hence, there is more article-level activity off-site for these weaker brands than for the stronger brands with the clearer promises.
Of course, if branding carried value, you’d expect price differentials around APCs. Here is the list of PLoS APC pricing:
- PLoS Biology — US$2900
- PLoS Medicine — US$2900
- PLoS Computational Biology — US$2250
- PLoS Genetics — US$2250
- PLoS Pathogens — US$2250
- PLoS Neglected Tropical Diseases — US$2250
- PLoS ONE — US$1350
There seems to be a decent correlation between traffic leakage and APC pricing, suggesting again that branding is playing a role in establishing value.
In addition, based on the data above and its likely connection with branding, it seems PLoS might want to revise its pricing, as it is underpricing for PLoS Computational Biology and overpricing PLoS Neglected Tropical Diseases — one has the ability to attract readers based on brand and content, while the other leaks readers to other venues because of its weaker brand.
And what about those outliers, those cases where PMC actually outperformed the PLoS sites for traffic? I think they’re the results of links from media or social media coverage, since the articles for which this occurred seemed to be on topics that would generate such linking, either by virtue of topic or headline. Once a link is made, it can get amplified through social media especially (retweeting, copy and paste), and if an initial link were made to a PMC version, it might persist throughout the Interwebz for a long time.
There is another aspect to these findings — namely, that the mere presence of PMC diverts traffic from publisher sites and harms the associated businesses even if those sites are free from the moment of publication. There is no clearer evidence I can think of indicating that PMC is both competitive and redundant.