In July, Project COUNTER released its report and statistical appendix on the feasibility of the Journal Usage Factor, a complement and challenge to the deeply-established .
Like the impact factor, the Journal Usage Factor (JUF) is a simple calculation that divides the total number of article downloads by the number of articles published in a journal over a specified window of time. The simplicity, however, stops there.
Usage is a complex, multi-dimensional construct that makes the impact factor look like a simple grade school test. Usage varies by article type and version, by time, by access mode, and by publisher interface. Journal articles may be hosted on multiple platforms and may exist simultaneously in public repositories and personal web pages.
The report identifies many of these caveats and proposes potential solutions. I’m not going to pick apart the details or what I believe are weaknesses in the statistical analyses. Instead, I’ll focus on some theoretical issues that underpin the creation of a JUF and why I believe the JUF, while an interesting idea, will ultimately collapse under its own practical weight.
What do we mean by “usage?” — The term “Journal Usage Factor” is a misnomer. It should really be called the “Journal Download Factor,” as the word “usage” implies some utility at the receiving end. Coming to understand what utility a download brings is fundamentally problematic, for one cannot discern with certainty who downloaded an article and for what purpose. More importantly, downloads should never be confused with readership.
A download is a download is a download. It is a successful request for a file between two networked computers. Anything that goes beyond this simple definition is conjecture.
The COUNTER report addresses this issue, in part, from the perspective of protecting the JUF from abuse, and by proposing sophisticated algorithms to detect when the system is manipulated by nefarious software agents (or human agents) attempting to game the numbers; however, I’m not talking about gaming.
A biostastician who downloads an entire corpus of literature for analysis is not gaming the system, nor is a graduate student who uses a browser plug-in to prefetch articles in order to speed up the browsing process, or a professor of a large undergraduate psychology class who directs hundreds of students to download an article the night before a prelim. An algorithm may consider these three scenarios cases of gaming based entirely on the pattern of article downloads and discount them all.
Without knowing the intention behind a download the best one can do is look for general patterns in a world that is punctuated, for most journals, by infrequent events. Deriving aggregate statistics from these sporadic events is not a problem in and of itself; it is a problem when these statistics are used to compare the value of one journal against another.
Why indicators require transparency and accountability — Many journal editors in the sciences are obsessed with their impact factor, and rightly so, for the impact factor conveys academic and financial rewards to its authors. Editors who do not agree with their journal’s impact factor can go into the Thompson Reuter’s system and count the citations themselves. If the editor unveils errors, the Journal Citation Reports will issue a correction and update the system, a process that will take place for over 100 journals this week. Here you have both transparency and accountability.
In comparison, validating a journal’s usage factor is both technically and feasibly impossible for a journal editor. The editor would have to request the original transaction log file from a publisher and have the ability to extract the relevant data, apply COUNTER’s Code of Practice, and perform the appropriate calculations on the data. If your journal is located on multiple platforms, your efforts are duplicated or triplicated. As these log files are considered property of the publisher, you can imagine how willing some publishers may be to provide usage logs to editors of competing journals.
If the JUF were run like an election, it would be a system where each party runs its own polls, hoards its own votes, provides no paper trail, and has the power to ignore any appeal.
What do downloads measure? — By calling aggregate downloads “usage,” the language implies that journals provide some level of utility and that this utility can be normalized and compared with other journals. Usage also implies popularity, as the more downloads a journal receives, the greater its popularity.
Oddly, the statistical analysis reveals that Journal Usage Factor has absolutely no relationship with the Journal Impact Factor (see p. 35 of the CIBER report). If we believe that science is an intellectual endeavor that values consensus and builds upon prior work, we would postulate a priori that these two variables would be related in some way. In science, popularity and prestige are tightly linked — not always, but most of the time. A complete lack of connection between these two, in CIBER’s case, should have raised validity concerns. Instead, the authors look for exceptions to help validate general findings:
This report finds no evidence that usage and citation impact metrics are statistically associated. This is hardly surprising since author and reader populations are not necessarily co-extensive. Indeed in the case of practitioner-facing journals, the overlap will be minimal
This following statement is true, but for the bulk of research journals, readers and authors are drawn from the same population and we should expect a strong correlation — if not at the low end of the journal scale, certainly at the high end. Top-tier journals are both highly read and highly cited. A complete lack of relationship between readership and citations could imply a major problem in their analysis or reveal that download data is just pure noise. Either conclusion is big problem for the validity of the JUF.
The curious case of scientific indicators — Last, I wish to deal with the issue of indicators in science, for there is always a tendency for an indicator, when accepted broadly, to cease serving as a proxy for some external goal to become that goal itself.
The impact factor is no exception, as many scientists believe the extensive use of citation metrics in promotion, grants, and awards has transubstantiated the impact factor from an indicator of quality into quality itself.
The US News and World Report College Rankings has had a similar effect on administrators in higher education in spite of the fact that the variables that go into college ranking bears little relationship to the goals of education.
If download statistics are a valid indicator of readership now, they will cease to remain so if the JUF is widely implemented. In a system where transparency and accountability are tightly shrouded behind layers of technical and political barriers, there is little holding the Journal Usage Factor from being grossly manipulated for the purposes of its various constituents.
When editors and authors change their online behavior in order to raise their usage scores, a download ceases to be an indicator of readership, and becomes something to maximize for its own sake. Articles are downloaded not to be read, but solely to generate a statistic and publishers will simply provide the tools to make this happen.
The result of this collective behavior is a clogging of collective Internet bandwidth and a worsening of the service for those who do wish to read. It’s a Tragedy of the Commons that benefits no one but those responsible for generating the rankings.
Where usage statistics are useful — Usage statistics have been immensely useful at the local level for allowing librarians to calculate their return on investment for purchasing journals and books. For this, Project COUNTER has done exceedingly well.
To me, it makes little sense why a librarian would care about how a journal is collectively used by a billion users in China when all that really matters is whether the journal is used locally. Focusing on a global download metric therefore follows the same folly as focusing on a global citation metric, and in the process, opens up the possibility for an even more distorted metric of the value of scholarly publishing. There are some things that operate better at a local level.
When it comes to usage statistics, we should think and act locally.