Tape measure
Image via Simon A. Eugster.

In my last post for the Kitchen I explored what citations might mean within any given publication.   Do citations necessarily indicate the significance of the cited publication in question? And if we don’t really know what individual citations mean, why do we think we can draw important meaning from their aggregation?

Citation metrics are insidious. That’s the implication of plenty of conversation in the media and the sciences, the impetus for the San Francisco Declaration on Research Assessment (DORA), and the strong suggestion of a recent report from the Higher Education Funding Council for England (HEFCE).   Regular readers of the Kitchen will be familiar with debates about such metrics as the Journal Impact Factor (JIF), the h-index, and altmetrics, and concerns about the screwy incentives that such metrics produce. Regular readers might also wonder how much more can be said about the technical details of such metrics and the ethics of their utility.

As with so much in Higher Ed policy, particularly concerning scholarly communications, the problems with citation metrics are global, largely led by trends in the STEM disciplines, and often ignored or ill-understood by and on behalf of the humanities. And, I might add, often ignored or poorly understood by HSS faculty, particularly in the U.S. It is not surprising that these issues are being taken up assertively in the UK, where the Research Excellence Framework (REF) assesses “the quality of research in UK higher education institutions” through a review of “impact” of the “outputs” (publications) of their faculty.   Even Wikipedia has captured the main discontents with the REF: a systemic presumption that “impact,” measured in the media chatter, social policies or economic benefits scholarship quickly generates outside the academy stifles creative inquiry, and that measuring those impacts is itself a dodgy business.

HEFCE’s report, The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment and Management synthesizes, updates and reiterates some of the most regularly repeated concerns about an over-reliance on metrics to assess the quality and significance of research. The report includes two substantial supplements, a lucid literature review and a correlation of the 2014 REF and metrics. Calling for “responsible metrics” the report also asks for, among other things, “humility” in the use of metrical evaluation, “recognising that quantitative evaluation should support—but not supplant—qualitative, expert assessment.”

There is much for individual faculty, their department heads and other administrators, as well as publishers, to leverage in the report’s language. For humanists, it is helpful that The Metric Tide makes a pretty big deal out of the disciplinary differentials in citation metrics: “metrics should not become the ‘tail that wags the dog’ of research practice in all disciplines. Rather, it is incumbent on those who design and use metrics to take a fuller account of the existing diversity of research, and design sensitive and meaningful metrics to reflect and support this.”

Not surprisingly, given the implication that qualitative evaluation is as important or more important than quantitative indicators, peer review, “arguably the most important method of quality control in all disciplines,” gets some rehabilitation in The Metric Tide. Or perhaps it’s more accurate to say that the myriad problems with peer review are weighted against the myriad problems with the “Cultures of Counting” (a full chapter in the report, pp 79-95). Those latter include but are not limited to perverse incentives, gaming the system, and biases of all stripes—in other words, many of the same problems that critics have identified with peer review.

Quantitative methods are inherently biased; algorithms are built by humans, and in this case they are measuring human products–ergo they may share many of the same biases as other human creations (such as peer review). The primacy effect and gender bias, both given recent and renewed attention, represent just two pretty devastating challenges to citation and other kinds of metrics. Merely appearing first in a publication can guarantee not only more social media attention but more citations, which strikes me as clearly discriminatory for other authors. Gender bias in citations is a clear and present danger.

Knowing that these and other biases exist, can there be any reasonable argument in favor of citation metrics? Well, we have long known about gender and other biases in student evaluations of teaching, and yet even my own august institution still employs them as a component of faculty assessment. Here the Metric Tide call for “Responsible Metrics” may be a helpful guide to using metrics in evaluation beyond citations.

Other profoundly important issues still need to be addressed more fully, among them the pressure for quick if not immediate research “impact.” Measures of journal impact rely on a woefully short window of 5 years. This is, in the most literal sense, short-sighted, particularly for the humanities where views, downloads and citations tend to accumulate over time. It isn’t surprising that the most cited articles on Google Scholar Metrics for both, say Nature and the American Historical Review are from the oldest years captured in that 5 year window, and that none are for the newest (all of the top ten cited articles for the AHR were for the first year, 2009, while Nature citations are distributed a little more widely, 5 for 2009, two each for 2010 and 2011, and one for 2012). If we look at article access, however, I suspect we would see a much more divergent pattern. For the William and Mary Quarterly the most accessed articles on JSTOR in 2014 are, well, long-appreciated essays. The median year of publication was 1992. Among the top five the publication dates, in order: 1983, 1996, 2008, 1997, 2001. To be sure, these are classic essays. One, by Neal Salisbury, published in 1996 on “The Indians’ Old World: Native Americans and the Coming of Europeans,” is in part a response to an older classic of the literature published in the WMQ in 1984 by James Merrell, “The Indians’ New World: The Catawba Experience.”

I think this suggests more than just the slow digestion rate of historians.   We simply don’t always know — maybe we rarely know — when research will be brought to bear on other scholarship, or on a social issue.   Historian and frequent Inside Higher Education contributor Johann Neem has raised this issue more than once, most recently in a piece about the rich and slowly developed scholarship on marriage and family that helped frame the Supreme Court decision in Obergefell v. Hodges earlier his summer. Justice Anthony Kennedy, writing for the majority, cited prominent works on the history of marriage and family in America by Nancy Cott, Stephanie Coontz, and Hendrik Hartog that demonstrated the changing bases and legal status of marriage and marital partners (thus refuting the argument about the universal and unchanging nature of marriage). Kennedy also cited the amicus brief filed by Cott and others that had recounted some of the larger scholarly literature on these issues in noting that “developments in the institution of marriage over the past centuries were not mere superficial changes. Rather, they worked deep transformations in its structure, affecting aspects of marriage long viewed by many as essential.”

People, I think we can call that impact. A lot of research may in fact labor in obscurity, but that doesn’t mean at some point it won’t become of signal importance. Basic research takes time to germinate. Some won’t have any measurable impact at all. But that doesn’t mean it’s low quality and it doesn’t mean that it doesn’t have value.

Karin Wulf

Karin Wulf

Karin Wulf is Director of the Omohundro Institute of Early American History & Culture and Professor of History at the College of William & Mary. She is a scholar of early American and Atlantic history working on gender, family and sexuality.

View All Posts by Karin Wulf


7 Thoughts on "If We don’t Know What Citations Mean, What Does it Mean when We Count Them?"

The first two findings of Metric Tide seem pretty balanced:

“There is considerable scepticism among researchers, universities, representative bodies and learned societies about the broader use of metrics in research assessment and management.
Peer review, despite its flaws, continues to command widespread support as the primary basis for evaluating research outputs, proposals and individuals. However, a significant minority are enthusiastic about greater use of metrics, provided appropriate care is taken.”

I am among the enthusiastic.

One qualitative metric has not been much discussed for the assessment of the impact of books, viz., the prizes that many associations annually bestow on authors. This is surely a form of peer review as the books are selected for this honor by committees of scholars. Of course, some associations hand out a lot more prizes then others. The historians have plenty of book prizes, where as philosophers have very few. In fact, the American Philosophical Association has only one major prize for books, the Matchette Prize, and to qualify for it, the author must have been under the age of 40 at the time of its publication. Why some associations have far more prizes than others is a mystery, so this metric is not equally distributed, so to speak.

Doesn’t that difference make sense in light of the fact that history is a far more book-oriented discipline than philosophy is?

Only Anglo-American analytic philosophy is mainly article-oriented. Continental philosophy is very book-oriented. Certain branches of history, like quantitative history, are more article-oriented than book-oriented.

Leave a Comment