Think globally, act locally
Image by Vlastula via Flickr

In July, Project COUNTER released its report and statistical appendix on the feasibility of the Journal Usage Factor, a complement and challenge to the deeply-established Journal Impact Factor.

Like the impact factor, the Journal Usage Factor (JUF) is a simple calculation that divides the total number of article downloads by the number of articles published in a journal over a specified window of time. The simplicity, however, stops there.

Usage is a complex, multi-dimensional construct that makes the impact factor look like a simple grade school test. Usage varies by article type and version, by time, by access mode, and by publisher interface. Journal articles may be hosted on multiple platforms and may exist simultaneously in public repositories and personal web pages.

The report identifies many of these caveats and proposes potential solutions. I’m not going to pick apart the details or what I believe are weaknesses in the statistical analyses. Instead, I’ll focus on some theoretical issues that underpin the creation of a JUF and why I believe the JUF, while an interesting idea, will ultimately collapse under its own practical weight.

What do we mean by “usage?” — The term “Journal Usage Factor” is a misnomer. It should really be called the “Journal Download Factor,” as the word “usage” implies some utility at the receiving end. Coming to understand what utility a download brings is fundamentally problematic, for one cannot discern with certainty who downloaded an article and for what purpose. More importantly, downloads should never be confused with readership.

A download is a download is a download. It is a successful request for a file between two networked computers. Anything that goes beyond this simple definition is conjecture.

The COUNTER report addresses this issue, in part, from the perspective of protecting the JUF from abuse, and by proposing sophisticated algorithms to detect when the system is manipulated by nefarious software agents (or human agents) attempting to game the numbers; however, I’m not talking about gaming.

A biostastician who downloads an entire corpus of literature for analysis is not gaming the system, nor is a graduate student who uses a browser plug-in to prefetch articles in order to speed up the browsing process, or a professor of a large undergraduate psychology class who directs hundreds of students to download an article the night before a prelim. An algorithm may consider these three scenarios cases of gaming based entirely on the pattern of article downloads and discount them all.

Without knowing the intention behind a download the best one can do is look for general patterns in a world that is punctuated, for most journals, by infrequent events. Deriving aggregate statistics from these sporadic events is not a problem in and of itself; it is a problem when these statistics are used to compare the value of one journal against another.

Why indicators require transparency and accountability — Many journal editors in the sciences are obsessed with their impact factor, and rightly so, for the impact factor conveys academic and financial rewards to its authors. Editors who do not agree with their journal’s impact factor can go into the Thompson Reuter’s system and count the citations themselves. If the editor unveils errors, the Journal Citation Reports will issue a correction and update the system, a process that will take place for over 100 journals this week. Here you have both transparency and accountability.

In comparison, validating a journal’s usage factor is both technically and feasibly impossible for a journal editor. The editor would have to request the original transaction log file from a publisher and have the ability to extract the relevant data, apply COUNTER’s Code of Practice, and perform the appropriate calculations on the data. If your journal is located on multiple platforms, your efforts are duplicated or triplicated. As these log files are considered property of the publisher, you can imagine how willing some publishers may be to provide usage logs to editors of competing journals.

If the JUF were run like an election, it would be a system where each party runs its own polls, hoards its own votes, provides no paper trail, and has the power to ignore any appeal.

What do downloads measure? — By calling aggregate downloads “usage,” the language implies that journals provide some level of utility and that this utility can be normalized and compared with other journals. Usage also implies popularity, as the more downloads a journal receives, the greater its popularity.

Oddly, the statistical analysis reveals that Journal Usage Factor has absolutely no relationship with the Journal Impact Factor (see p. 35 of the CIBER report). If we believe that science is an intellectual endeavor that values consensus and builds upon prior work, we would postulate a priori that these two variables would be related in some way. In science, popularity and prestige are tightly linked — not always, but most of the time. A complete lack of connection between these two, in CIBER’s case, should have raised validity concerns. Instead, the authors look for exceptions to help validate general findings:

This report finds no evidence that usage and citation impact metrics are statistically associated. This is hardly surprising since author and reader populations are not necessarily co-extensive. Indeed in the case of practitioner-facing journals, the overlap will be minimal

This following statement is true, but for the bulk of research journals, readers and authors are drawn from the same population and we should expect a strong correlation — if not at the low end of the journal scale, certainly at the high end. Top-tier journals are both highly read and highly cited. A complete lack of relationship between readership and citations could imply a major problem in their analysis or reveal that download data is just pure noise. Either conclusion is big problem for the validity of the JUF.

The curious case of scientific indicators — Last, I wish to deal with the issue of indicators in science, for there is always a tendency for an indicator, when accepted broadly, to cease serving as a proxy for some external goal to become that goal itself.

The impact factor is no exception, as many scientists believe the extensive use of citation metrics in promotion, grants, and awards has transubstantiated the impact factor from an indicator of quality into quality itself.

The US News and World Report College Rankings has had a similar effect on administrators in higher education in spite of the fact that the variables that go into college ranking bears little relationship to the goals of education.

If download statistics are a valid indicator of readership now, they will cease to remain so if the JUF is widely implemented. In a system where transparency and accountability are tightly shrouded behind layers of technical and political barriers, there is little holding the Journal Usage Factor from being grossly manipulated for the purposes of its various constituents.

When editors and authors change their online behavior in order to raise their usage scores, a download ceases to be an indicator of readership, and becomes something to maximize for its own sake. Articles are downloaded not to be read, but solely to generate a statistic and publishers will simply provide the tools to make this happen.

The result of this collective behavior is a clogging of collective Internet bandwidth and a worsening of the service for those who do wish to read. It’s a Tragedy of the Commons that benefits no one but those responsible for generating the rankings.

Where usage statistics are useful — Usage statistics have been immensely useful at the local level for allowing librarians to calculate their return on investment for purchasing journals and books. For this, Project COUNTER has done exceedingly well.

To me, it makes little sense why a librarian would care about how a journal is collectively used by a billion users in China when all that really matters is whether the journal is used locally. Focusing on a global download metric therefore follows the same folly as focusing on a global citation metric, and in the process, opens up the possibility for an even more distorted metric of the value of scholarly publishing. There are some things that operate better at a local level.

When it comes to usage statistics, we should think and act locally.

Enhanced by Zemanta
Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/

Discussion

45 Thoughts on "The Journal Usage Factor — Think Locally, Act Locally"

Metrics that measure popularity are a poor substitute for those that measure quality. Science is a meritocracy, not a democracy. We want to reward research that is valuable, not research that is popular (though the two are not necessarily contradictory).

A quick glance at the “most viewed” videos at any given moment on YouTube gives a picture of the future of research if popularity metrics take hold. The tyranny of the masses leads to mediocrity, or as Slashdot’s Rob Malda put it, “you get “Man Gets Hit in Crotch With Football” and Everybody Loves Raymond, where it’s just good enough to not suck.”

Articles that see high number of downloads often do so for reasons of infamy (e.g. the Wakefield paper on autism and vaccines) or sensationalism (e.g. all the press hoopla around the arsenic life paper). PLoS’ article level metrics provide an excellent set of data for looking at these sorts of questions. It should be noted that PLoS’ number 3 all time most downloaded article is “Fellatio by Fruit Bats Prolongs Copulation Time”. This article has been downloaded over 200,000 times, yet cited only twice. I’m not expecting a Nobel Prize for pioneers in the field of bat reproduction any time soon.

A system that rewards these things will lead to a literature that is driven by sensationalism and press conferences. Scientists will hire public relations firms to hype their latest release, journal editors will look for the flashy and lurid rather than the meaningful. The techniques of online spammers and scammers will come into play (“click here for a free iPad” with a link that actually leads to a paper download). Networks will be set up offering “I’ll download your paper if you download mine.”

I don’t think these are the sorts of things we want driving research.

Unfortunately, science is already influenced by sensationalism – this is particularly acute in my field of paleontology (Darwinius is a fantastic example, but the arsenic life case and its response is an example in another field).

The bottom line is that anything that sticks a number to a fuzzy entity (impact factor, download counts, individual article citations, citations of an individual author’s work, number of downloads by an individual user or library) has been gamed, can be gamed, and will be gamed. It’s up to the users of these metrics to understand what the metrics can and can’t say.

The question is though, whether that sensationalism translated into career advancement and funding, rather than just getting a lot of notice (and after the fact, a lot of criticism). Everyone does want their work to be noticed, but sensationalism is still the exception, rather than the rule. In a world where a usage factor becomes the standard for determining funding, tenure, hiring, then sensationalism becomes the norm. I don’t think this is a good thing.

I do agree that the current metric of choice, the impact factor, has some problematic flaws and is terribly misused. But it’s nowhere near as gameable as something like downloads. To game downloads, all I have to do is click on a link. To game the impact factor, I have to publish a paper that cites another paper. That requires either doing a new and unique piece of research, or at the very least writing a review article good enough to get published. Gameable to some extent, but a much higher hurdle to clear than clicking on a link.

I think sensationalism has a mixed effect on career advancement and funding – it might work once or twice, but after awhile I think people (colleagues, funders) get a little tired of it. Then again, I can think of at least a few examples where borderline, sensational science frequently gets high profile and high funding. This may be a problem more restricted to dinosaurs and less a concern for workers on the genetics of algae.

The extent to which journals game the IF is really not a secret – if one wants to bring up the IF a bit, just invite a review article that happens to cite lots of the journal articles, or write a preface to a special issue that cites each of the articles in the issue (I’ve seen the latter practice especially frequently). Then again, the problem is less with the IF and more with the way it is used (and hence the incentive to increase it).

The JCR monitors self-citation (review article in the journal that cites other articles in the journal). When a practice of journal self-citation results in a significant distortion of the Journal Impact Factor, and the rank in category – the journal receives no listing in JCR for at least 2 years. During those two years “suppression,” the journal is listed in the JCR Notices file explaining why it was suppressed. The prior year’s distorted metrics – with self-citations displayed – remain in the product. The reason for the subsequent years’ suppression is completely transparent. You get that one high Impact Factor – but it is made very clear in the interface just how that value was “created.” 33 journals are listed in the Notices file this year.

Citations in an issue to items in that same issue have no effect on Journal Impact Factor. JIF only considers citations to the prior two years.

Is JCR perfect? No.
Is it careful? I’d like to say Yes.

Marie,
I was not aware of the practice of putting journals in “time-out” for poor citation behavior. When did the JCR begin this practice and has it reduced the kind of explicit gaming that you describe?

Phil –
Some aspects of journal suppression have been in place since the 2003 JCR data were published in 2004. We refined the process somewhat in the following few years, by listing the journals in the Notices file, by formalizing the 2 year interval before re-evaluation, by including a consideration of the effect of journal self-citation on the rank of the journal in category, and by increasing the visibility of journal self-citations in the JCR journal record.

In 2003 JCR, I found that 80% of journals in the Science JCR showed self-citation rates under 20%. In 2010 JCR, fewer than 15% of journals had more than 20% self-citations. Most often, a high journal self-citation rate is associated with a low overall citaiton count – and we do adjust for that, otherwise we would unfairly concentrate on journals with low citation counts and low impact. Only journals with a serious distortion of metrics and ranking are suppressed from coverage. Individual cases of journal self-citation are serious, but, overall, we believe most publishers earn their citations by focussing on content.

From the perspective of the JCR, we act in order to maintain the accuracy of the Journal Impact Factor – not to stand as the referree of publishing practices. The goal is to report the source of citations and reveal the content of the metrics; we let the users really see where the numbers come from so that they can make the most rational use of the data.

Marie, thanks for this further information. I often see people claiming that the Impact Factor and citation as a metric are easily gamed, yet I’ve yet to see a particularly compelling method for doing so. If you’re publishing commentaries that cite your own journal’s articles, then you must write a commentary this year that cites articles from last year or two years ago, and that’s not really a common practice. Even so, the JCR as you note has safeguards against such things and any boost will be limited at best. Authors do preferentially cite their own papers, though I don’t really see that as “gaming” as much as self-promotion.

For those claiming that the IF and citation are easily gamed, can you give us some insight into how that would work?

One other question here Phil, how does the JUF define a “usage”? Is it full-text html, pdf download or a combination of these? If I find an article of interest, scan the full-text html online, decide I want to read it in detail later and download the pdf, and then a few weeks later return to the html version to view the movie figures that accompany it, does that count as one “usage” or three?

Unfortunately, the report, “The Journal Usage Factor project” (PDF) by Peter Shepherd does not define the term “usage.”

You do bring up a practical question about how the JUF would reconcile publishers who provide multiple versions of the same article (HTML, PDF, XML etc.) versus those who provide just PDF. Early research that Jason Price and I did on usage patters revealed that each publisher interface exerts a unique effect on the pattern of article downloads (see: http://dx.doi.org/10.1002/asi.20405). As a result of this study, OUP changed its HighWire interface so that readers were not required to view the HTML before downloading the PDF.

If the Journal Usage Factor becomes reality, there would be incentives for publishers to change their interface to increase their counts. One could imagine a New York Times display, where an article is broken down into multiple pages, each page requiring a separate fulltext download (for example). For academic papers, this would mean chopping the fulltext of a paper down into smaller chunks for display (Intro, Methods, Results, Discussion, References, Acknowledgements, etc.) rather than providing contiguous text. Higher counts but more work for the reader.

I am fascinated by the lack of correlation between downloads and citations. Technically the downloads are the specific product, not the article viewed as an abstract entity. This disconnect suggests that we may not know how our product is actually being used, which limits our ability to improve that experience.

I would think it would be possible to game the system by writing overly broad abstracts and titles utilizing common keywords, whether appropriate or not. If the goal is simply to produce a download, these tactics would likely get plenty of clickthroughs from undergrads conducting quick and casual searches.

Also, there are still those publishers and platforms that do not adhere to the COUNTER code of practice. It seems they would be left out. Smaller publishers, or open access publishers (who have no subscribers requesting COUNTER stats) might be at a disadvantage.

I’m surprised that there is not more discussion of what the term “usage” means. I agree that a download is not usage. If that were the case we would say the the number of hits on a web page were accurate measures of usage, even though there may be little correlation between the actual number of hits on that web page and the use of the information communicated by that web page.

With journal articles it’s somewhat similar. Yes an article download can take place, but why was the article downloaded in the first place, and how was the information communicated by that article used? In fact, how did the downloader find out about the article in the first place, and did the downloader know the author or have prior knowledge of the research being reported by the article?

Usage of an article’s information may be different if the downloader already knows about the research being reported by the article. This is where the social and professional networks and communities surrounding the downloader come into play. Tightly knit and formally structured traditional scientific fields may have quite different social and community structures from loose, fringe, or nascent fields. Usage may also differ according to where in the downloader’s task the download takes place, e.g., at the beginning of a project or at the end of a project (after all the work is done).

This reminds me of the state of social media metrics where “conversations” that take place in various places on the web are tracked to measure whether or not, for example, certain brands or products (or organizations, or candidates) are being discussed in a positive or negative light. Such tracking of conversations is possible, but what does it tell us about what happens as a result of those conversations? Unless we can connect the conversations with some type of before or after activities, it’s almost like we’re measuring something because we can measure it, not because we know what it means.

The same is true of downloads. There will be those who simply say, “more is better” and emphasize volume. Whether or not the numbers can be gamed is an issue, clearly, but if we don’t know exactly what the numbers mean, what’s the point?

Dennis D. McDonald, Ph.D.
Alexandria, Virginia
http://www.ddmcd.com

Where are these downloads happening? Because technology is presenting opportunities for publishers to license their content for access in a variety of places, not just their own websites. I think Phil is right on.

I am puzzled by the hostility toward counting downloads shown on this thread. Everyone is pointing to obvious problems, but no one is suggesting solutions. Yet downloads should rank up there with sales and citations as measures of success. It is the Internet measure, part of the revolution, warts and all. Kiss the frog.

David, I don’t interpret the discussion in the same way. In spite of their warts, I’ve pointed out that downloads are an excellent indicator of Return on Investment for a librarian whose job it is to allocate collections funds appropriately based on external indicators of utility. As a measure to evaluate the merit of scientific work, I am much more hesitant.

Phil, I don’t see a single comment above that refers to library ROI. Everyone is dumping on downloads as measures of scientific merit, and they are wrong. Perhaps the difference is that I have studied downloads. One of my principal clients is OSTI.gov, which is the leading publisher of US federal research reports. As an OA publisher OSTI has no subscribers to measure. And since it is conventional not to cite research reports, OSTI has few citations to work with. What they do have are downloads and I can say that these downloads contain a wealth of information about what the scientific community is thinking about. This is the goal isn’t it, to know where science is, and is going?

But note that I am indifferent to the whole notion of using simple statistics as promotional criteria. If people want to be stupid that is a different issue. (I also do not favor banning the car just to reduce traffic deaths.) The people are the problem, not the statistics. The statistics are very useful.

David, you are correct that the comments have not focused on the merits of usage statistics for library ROI; it is my hope that a discussion thread will start.

Fundamentally, there may be two different arguments taking place in the comments. David Crotty is discussing how we give credit and reward for the contribution of scientific authors. You are discussing how downloads can tell us something about the nature of attention and interest in science. These are separate but not mutually exclusive ideas and both are correct.

Essentially, I see scientific publication as the exchange of scientific ideas for peer recognition, with peer being an important qualifier in this statement.

I’m with Phil on this. I don’t see it as hostility as much as it is recognition of what downloads measure, and trying to place it into context as far as what one can learn from such a measurement. As an indicator of interest, sure, it can tell you when a paper draws a lot of attention. As an indicator of quality, however, it’s fairly useless.

I disagree strongly. First of all, as I have said repeatedly (apparently to myself). I think this concept of “quality” that gets used here is bogus to begin with. What counts is importance and that is very close to popularity. Importance is measured by other people reading and using your stuff. The fact that there are several kinds of importance (including sexual) does not make this any less so. Other things being equal if 100,000 subscribers download my paper it is important. It is worth noting that citations have similar problems, albeit not the sexual one, or not usually. People cite papers for many reasons.

We are not talking about hyperbolic “bat sex” examples, we are talking about statistics on all the published papers and reports. Downloads are immediate and important, by far the best short term measure we have of importance.

What do you mean by “importance”? If an article is flagrantly fraudulent, and a lot of gawkers download it to check out the audacity, is it “important”? Is that something we want to reward?

A download does not mean an article is read (every academic I know has a stack of unread pdf’s sitting on their hard drive), nor does it mean that when read that it is of interest or use to the reader. If 100,000 subscribers download your paper, maybe you’ve done a spectacular marketing campaign for that article, complete with press conference, ads and offered incentives for downloads. How can I tell the difference between that and a paradigm shifting paper that will revolutionize a field merely from looking at download statistics?

Downloads do not measure “importance”. They measure “attention”. One can make a case for the value in a work that draws attention, but that says nothing of the quality or “importance” of the work. By just looking at downloads, you can’t separate out bat sex from the theory of relativity.

First of all, I am not rewarding anything. You folks are objecting to a statistical analysis on the grounds that it may become a reward function. That is not my concern, nor my argument. Let’s look at the statistic. You say, correctly, that a download is not necessarily read. I agree, but most citations are added after the research is done. They have no impact whatsoever on the work. These are both crude measures.

I am not claiming that the simple statistics have all the answers, far from it. They are just simple statistics. What I am claiming is that they are important, as far as they go. Big discoveries usually get big attention. The community is not stupid, so there is a real connection between attention and importance.

Well, the subject of this blog entry is an article that looks at the feasibility of a usage measure as, “a complement and challenge to the deeply-established Journal Impact Factor.” Given that the IF is primarily used as a means of measuring performance (rightly or wrongly), it’s not surprising that the discussion has gone the way it has. Usage has value, as noted particularly for librarians and for yourself, and I also found it tremendously informative when I was a journal editor. It does give an immediate measure, rather than the slow pace of citation (I’m told by history journal editors that in their field the peak of citation for an article is usually around 5 years after it has been published) and it does give a sense of what interests the community. For those sorts of things, usage is a valuable metric.

But as the initial question was whether it could replace (or supplement) a metric meant to measure impact, I think the usefulness of the metric is lower for this purpose. Big discoveries do get high levels of attention, but many attention-grabbers fizzle out over time, and many studies grab attention for all the wrong reasons. Mere usage numbers alone can’t differentiate between these, making it difficult to directly tie attention to impact.

And I’m not sure what you mean by “citations are added after the research is done.” Without those citations, without the knowledge learned from those papers, the work could not be done in the first place. Citations are tacked on in the writing process but these are in acknowledgement of the role the previous work played during the experimental process.

Good point, David C. I agree entirely that download statistics should not and cannot compete with citation statistics. Both are valuable, but very different, like weather and climate. Both are also weak as evaluative measures, but both are valuable as analytical tools.

My point about citations being after the fact might make an interesting discussion in a different post. I have done a lot of research on the logic of citations. Very few of them actually refer to recent work that the citing work builds upon, perhaps 20% is my guess. My conjecture is that most of them are collected after the work is done, in conjunction with the editorial requirement to explain the problem being worked on. Some are even to alternative approaches.

I suspect that many of the cited articles were never even read by the authors prior to being found for purposes of citation. Of course citation theory assumes the opposite, namely that citations are growth vectors. I think this is a fundamental flaw in citation theory.

I think most useful metrics are flawed in their own way. I tend to favor the idea of a panel of metrics that balance one another as a way of judging things like “impact”. I’m not sure though that I’d include popularity measures in that panel, or if I did, I’d probably weight them fairly low in comparison with things like citation.

I agree somewhat on your take on citations, but I think there’s more nuance to it than that. I don’t think one lines up all the relevant papers and makes a list before doing an experiment. But one has likely read the papers, and they’ve influenced what you plan to do and how you plan to do it. There may not be a conscious reckoning of from where things came until one sits down to write the paper. But that doesn’t mean they played no role or are just being tacked on after the fact.

Though that may be true for some part of citations. I’m thinking of things in an introduction, where one probably looks for the latest review article to use as one citation for a lifetime’s worth of reading about a subject.

David C: In its way this statement of yours is an elegant summary of my research: “…one probably looks for the latest review article to use as one citation for a lifetime’s worth of reading about a subject.”

Scientific communication is a diffusion process. The pathways are not traceable, so citation networks are crude approximations at best, not actual maps of the flow of ideas, which they are often taken to be. The impact factor is correspondingly crude.

I gave a talk on “The Use and Abuse of Usage Statistics” at the UKSG meeting in 2008, subsequently written up in Serials. Comparing downloads to citations, I wrote “The act of downloading is often meaningless, done by mistake, done by a robot, done because the interface encouraged you to do something that you might not have intended. Downloading requires little investment and is practically anonymous; citation is usually meaningful and requires significant investment of time, effort and reputation.”

The full article is available here http://dx.doi.org/10.1629/2193 but please don’t distort the usage stats by downloading it unless you really, really want to read it. I’ve not been aware of many critical voices raised since then, so was heartened to see this piece, thanks Phil.

Ian, I agree that downloads are not the same as citations as far as effort is concerned. But the time frames are fundamentally different, so it is like saying that weather (downloads) is not important because it is not climate (citations).

Moreover, I seriously doubt that most downloads are “done by mistake, done by a robot, done because the interface encouraged you to do something that you might not have intended.” Do you have any data to back up this strong claim? I do lots of downloads and I have yet to do one by mistake or unintentionally.

The issue is not comparing citations to downloads, it is the analytical value of downloads, which I consider to be great.

David, I wouldn’t say that most downloads are done by mistake (and I was careful not to) but when aggregated at journal level, anomalies occur often enough to suggest that the data should be interpreted with caution.

I agree that downloads provide powerful analytical data; we use them extensively and have gained real insight as a result. It is the misinterpretation of usage data that gives cause for concern and if the Journal Usage Factor is to become a proxy for quality or value then we need to consider the unintended consequences.

I agree with David here, and I think both citations and downloads have a valuable part to play in assessing the quality of a publication (be it article, journal or book). It’s especially true in the case of practitioner or student facing journals, the readers of which don’t publish themselves and therefore don’t cite. Does that make the content any less meaningful, because it’s not being cited by academics? Don’t we care about papers that have a wider impact than this – through policy, teaching, or practice?! How else do you suggest we measure this impact other than by download?

It’s true an article might be downloaded by people laughing it at, but can’t the same be said for citations? When you criticize a theory or finding from someone else’s study, you’re still citing it, even though you’re saying it’s bad. Citations are also just as open to manipulation – think about citation clubs and Editors that enforce a number of citations before accepting an article. It’s horribly unethical, but it happens.

As for people downloading articles by accident – I too have never done this. I read an abstract carefully before I download a paper – especially if it’s a paper I have to pay for!
I firmly believe download statistics can and should complement citations – they are a measure of impact outside of academia as well as inside.

I do think “attention” is worth tracking and provides valuable information. I’m still not sold on how effectively it indicates “impact”. Perhaps it’s worth including in a panel of measurements meant to determine impact, but I’d likely weight it lower than other factors, due to the uncertainty of what one is really measuring.

In particular, I worry about gaming, beyond the sorts of manipulation you mention above. I’m not convinced that there’s a huge level of citation manipulation going on, and even so, the effects are fairly limited (see the comments from Marie above on how self-citing journal are limited in how much that affects the Impact Factor). Gaming citation has a high barrier to entry–the requirement of publishing a new paper. But with attention, if measured by downloads, there are near infinite ways to massively impact the statistic, not all of them unethical (think of the advantage to researchers at an institute that could afford to hire a top advertising firm to promote their papers).

Most, if not all metrics are flawed. Some are more flawed than others. That does not mean they are valueless, just that the flaws have to be incorporated into an understanding of what they really tell us.

Do also bear in mind that for citations, we’re generally talking about very small figures – 1 or 2 extra cites can dramatically change an impact factor. In downloads we’re talking about hundreds of thousands – the odd extra click here and there won’t have anything like as big an impact. Does that mean any manipulation (intentional or not) isn’t as big a concern, because it would have to huge to make a difference? Isn’t this a barrier to entry then for gaming downloads – as to make a difference to download figures you’d have to make an extremely concerted effort?

Despite the difference in scale, the ease of adding a download makes it much easier to manipulate in large numbers than a small number of citations. When I was a journal editor, I would pick two featured articles in the journal each month, make them freely available, put out a press release about them and write a blog entry about each one. These featured articles would have much higher usage levels than non-featured articles (even freely accessible ones), generally at least in the range of hundreds more downloads if not thousands.

And that was a fairly minimal effort. Imagine what you could do with press conferences, buying advertisements, providing incentives, etc. In order to add one citation though, you have to write another paper and go through the review and publication process. That’s still a much bigger hurdle for one citation than the fairly trivial steps above which led to 1,000 plus additional downloads.

Phil and other commentators make some interesting (but sometimes inaccurate) points about the Usage Factor (UF). As co-chairs of the project, we welcome this discussion which is very timely as we strive to develop and refine the measure. Two important points to note are firstly that the UF is very much work in progress (we are just entering Phase 3 of the project), and secondly that we did not create it without detailed analysis from an independent scholarly body. During the first two phases we interviewed 29 key authors/ editors, librarians and publishers and we undertook two web-based surveys of 155 librarians and 1,4000 authors. The majority of the respondents welcomed a new measure. So at this point in the project we are still assessing the feasibility of the UF as a metric. All we have claimed in our recent report is that the results so far look very promising and the UF is worth developing further (www.projectcounter.org/usage_factor.html). We would encourage interested people to read the report in detail and feed back comments to us. Our objective is to have a transparent measure, which is why an independent audit of publisher-calculated UFs will be sine qua non.

The question of ‘what is usage’ is an important one and in the COUNTER context we have always been clear about what we are measuring and the limitations of such metrics. A parallel point can be made about whether citations/ impact factors are a valid measure of impact and quality. Both full-text downloads and citations are simply convenient proxies, and metrics based on them should be quoted with appropriate health warnings. Also, we feel it is important to highlight that usage of journal content is not confined to that subset of readers who also cite. This was clearly demonstrated in the second stage of the RIN report “E-journals: their use, value and impact”. Indeed the quality of a journal’s content can be considered to be as much how it influences the future body of research (as evidenced by citation) as by the use that readers can make of the information (as evidenced by downloads).

Some of the comments point out that gaming is an issue for citations, just as it is for usage. We should stress that we are very alert to the risk of gaming, which is why a major aspect of the project is to test the reilience of the proposed UF to gaming.

Hazel Woodward, Cranfield University
Jayne Marks, Sage
Co-Chairs, COUNTER Usage Factor Project

Hazel and Jayne,
Thanks for your response. Can you specific on the inaccurate points made in the story?

Well you start with: “Like the impact factor, the Journal Usage Factor (JUF) is a simple calculation that divides the total number of article downloads by the number of articles published in a journal over a specified window of time.”

Recommendation 1 of the exploratory data analysis report that the project commissioned from CIBER (http://www.projectcounter.org/documents/CIBER_final_report_July.pdf) states that “The journal usage factor should be calculated using the median rather than the arithmetic mean”. The report presents detailed evidence to support this conclusion. The arithmetic mean is found to be “too sensitive to a few very highly downloaded items”. Modelling has shown that the geometric mean would be a more stable and reliable metric, but the median gave results very close to the geometric mean and is probably more widely understood.

I would re-iterate the suggestion from Hazel and Jayne that you should study the data analysis report where you will hopefully appreciate that a lot of the issues raised in your post and in the comments have already been explored or have been identified for further analysis or consultation.

I would like to add to the recent posting by Terry. Let me take a couple of examples.

“If JUF were run like an election, it would be a system where each party runs its own polls, hoards its own votes, provides no paper trail, and has the powere to ignore any appeal”.

Not so. We have stated that an independent audit will be an essential feature of JUF compliance. Inclusion in the official register of JUFs will depend on passing this audit and failure to do so will result in de-registration. Independent audits are a well-established feature of lonline services, whose credibility, as well as the prices they can charge advertisers, depends on passing them. There are ways of ensuring the credibility of a measure that do not require open access to all the raw data that underly the measure.

Secondly. “there is little holding the JUF from being grossly manipulated for the purposes of its various constituents”.

Not so. Attempts will be made to manipulate any quantitative measure and the project team has a keen awareness of this threat in the context of the JUF. This is why a rigorous independent audit and a robust strategy for dealing with gaming is acknowledged as being essential to the measure’s success. We stated in our earlier comment that this project is still ‘work in progress’. Further development of the independent audit and approaches to dealing with gaming will be addressed in the next stages of the project.

There is an another important factor that doesn’t seem to have been discussed about the usage factor and that is open access vs. behind the pay-wall. I would think that content that is freely available to the world would lead to more downloads (if HTML full text qualifies as a download). For example, on many publisher platforms, users that reach an article from a Google search and that article is designated as open access by the publisher, the platform will automatically redirect that user to the full-text version. In that case, the abstract-only “pay-wall” doesn’t present to the user prior to them making a “read” decision. This would, in my estimation, support the argument that there is a lot of ‘usage’ that would be ‘peripheral’ in nature (i.e. an expecting mother landing on a research article about the impact of prescribed steroids on pregnancy that has rich scientific data that doesn’t make sense as opposed to the WebMD article she would probably most likely read). There are a lot of journals out there that open access to their entire legacy archive after 12 months (and in some cases 6).

Great discussion here though, Phil, thanks for providing the forum!

Thanks Matt,
Your speculation is exactly what I found in our randomized controlled trial of open access publishing (link to free article). Freely-available articles received many more fulltext (HTML) hits while abstract views decreased. PDF downloads also increased, but not nearly as much as HTML, suggesting a different behavioral intention of the reader.

A very good point Matt, and another reason to look at usage data with a grain of salt. We discussed the implications of a citation advantage for open access papers (though the most reliable evidence says no such advantage exists), as it would provide a means for gaming the system, letting wealthier labs that could afford author fees essentially buy prominence for their papers. The same thing would apply here but with a real world proven advantage, as in my experience, making a paper free does significantly increase its usage levels.

Some people say that we’re mad for actively playing given that the computer determines who wins.

Comments are closed.