The Speaker presides over debates in the House...

British House of Commons. Image via Wikipedia

In 2004, a number of commercial publishers were brought before the UK House of Commons, Science and Technology Committee to explain and defend their business models.

Crispin Davis, the CEO of Reed Elsevier (now Sir Crispin Davis, having joined the British knighthood along with Sir Mick Jagger and Sir Elton John), was asked very directly about how his company could substantiate large and systematic increases in the subscription prices of their journals over time. It was a very unambiguous question, to which he replied:

The biggest single factor is usage. That is what librarians look at more than anything else and it is what they determine whether they renew, do not renew and so on. We have usage going up by an average of 75 per cent each year. In other words, the cost per article download is coming down by around 70 per cent each year. That is fantastic value for money in terms of the institution.

Avoiding the question, what Sir Crispin is really saying is that the real metric here is not cost but cost per download, and given enough downloads, any subscription price (even Brain Research) could be substantiated.

On the surface, a normative price model makes sense.  We purchase cheese by the pound (or kilogram), apples by the peck, wine is sold by the bottle, and gold is sold by the ounce.  But information functions very differently than other physical goods.  Producing journal articles incurs high fixed costs and very low marginal costs.  The cost of sending another PDF over the Internet costs the publisher virtually nothing, which is why Elsevier wants to send you more.  Lots more.

Elsevier has recently unveiled a new feature on its Science Direct platform called their Document Download Manager which allows a reader to download multiple articles simultaneously.  If you don’t pre-select any articles, they’ll just send you the first 20.

By why stop at 20?  As computer storage space gets bigger and cheaper and bandwidth grows, there is no reason why they couldn’t send you entire collections.  I could just imagine their press-release: “Don’t waste a second of your precious time waiting for a download!  We’ve just dumped our entire journal contents on your machine!”  One could then sign up for an RSS-like feed that would automatically update one’s computer with new issues when they become available.

Measuring and comparing the unit cost of a download starts becoming very meaningless in an environment where bulk downloading is not only facilitated, but highly encouraged. In addition, the interface of a publisher can result in different usage patterns, making comparison of journal usage across publishers — the explicit goal of Project COUNTER — very difficult to do.

While there are many reasons to consider usage based metrics, the development of a Usage Factor — a project undertaken by the United Kingdom Serials Group (UKSG) — has unintentional consequences.  By focusing on usage metrics, they reify the article download.  Article downloads cease to be a measure of readership and become a goal in-and-of-themselves, as publishers becomes transfixed on maximizing the number of documents they send out into the ether.  Not only does this create a new type of spam (a Tragedy of the Commons on the Internet), it obfuscates any meaning one derives from usage reports, making it impossible to distinguish the intention of a single human click from bulk machine downloading.

One could argue that what the UKSG is doing is no different than what the Institute for Scientific Information (ISI) did to the citation in creating the Impact Factor.  But there is a difference.  Citations are public, transparent, and can be validated.  If I suspect that a journal is artificially inflating its numbers, I can go back to the articles and start counting myself.

The UKSG is relying on the honesty of the publisher to send usage reports that reflect true download counts.  While I don’t question the honesty of most publishers, I do question some, and there is no way for a skeptic to validate the numbers.  Even if a publisher were willing to send raw transaction logs upon request, few have the resources or ability to digest the data.  It is a system built on Blind Faith, trust me.

When rewards are high and risk is low, any opaque system is open to gaming and abuse.  Usage Factor will be no different.

Reblog this post [with Zemanta]
Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist.


10 Thoughts on "The Article Download Game"

‘Citations are public, transparent, and can be validated.’

Not to mention that citations have context. I could download a paper that seems interesting (based on the abstract) but may turn out to be not very interesting after all. In that case, I can’t “take back the download count” to show that I didn’t have much use for it anyway.

Excellent point. One could say that the act of citation provides public acknowledgment of the contribution of another author. Downloading, however, is strictly a private process.

Well, to be very nitpicky, I’m not entirely correct. Citations actually lack context as well. For instance, certain fraudulent papers can be cited often in a “this paper is wrong and misled many scholars” kind of way. But in comparison to the citations that are meant to acknowledge the positive contribution of another author’s work, they are very rare.

Also, for citations of papers that have turned out to be wrong, the respective publishers/journals can (and usually do I think) have that “fixed” for their journal impact factors.

Downloads are, as you have said in this post, easy to abuse, unaccountable and lack any kind of context. What’s more, if these downloads become publicly available like in download rankings or whatever, they will cause a snowball effect of drawing more eyes to a specific paper/journal and contribute to its download count. And considering all its shortcomings to accurately represent the quality of a paper/journal (i.e. it doesn’t), using download counts is definitely not being remotely close to desirable, in my opinion.

Since we are weeks away from a national election in the United States, it is not difficult to think of counting article downloads like counting votes. One is looking to rank candidates based on popularity. Now imagine a system for counting article downloads where:

1) The machines are designed by a small group of powerful individuals with strong ties to industry and have stakes in their own outcome
2) Voters are allowed (even encouraged) to vote as many times as they like
3) The machines count slightly differently depending on the state they are in. Some will just count PDF, others HTML plus PDF, and others will consider a Reference view a vote as well.
4) There is no requirement to produce paper receipts, nor any form of validation at the time of enumeration (i.e. no transparency to the voting process)
5) There is no responsibility or accountability to the board governing the voting process

Would anyone rational person accept this as a valid voting procedure?

A side effect of this phenomenon is the emergence of new business models highly based on usage statistics : the more we use the resource, the more we pay.

When I worked at the charity INASP ( part of what we did was facilitate access to electronic content from both the North and South to users in developing countries. Since this work was government funded, there was considerable emphasis on measuring impact and effectiveness. In an imperfect world, the best measure we had of the true impact of electronic resources was the increase in citations in articles produced in developing countries, not numbers of downloads. In the developing country context, counting numbers of downloads was certainly useful but more as an indicator of infrastructure capacity (e.g. connectivity and bandwidth); citation was the quality metric as it gives some indication of utility not just availability.

I raised this issue at the recent ALPSP meeting, with Richard Gedeye who was speaking on behalf of COUNTER: that whilst having commonly derived and presented quantitative metrics is useful, an industry this sophisticated should be able to establish common qualitative usage metrics as well. He agreed.

Comments are closed.