That was my reaction to the news that Clarivate would add an Impact Factor to more Humanities and Social Science journals. This will, it’s said, help “level the playing field for all quality journals.” The “category normalization” is, in theory, a way in which like can be compared to like so no essay in early American history is going to be scored in the same pool with a research article on vaccine efficacy, for example.
In the summer of 2015 I was thinking a lot about citation metrics and how those were fouling up a lot of analyses about the significance of humanities scholarship. I was worrying about the attention economy, yes, but even more so about the creeping structures of evaluation pegged to “impact” — and specifically by impact as measured by citation metrics. What does it even mean to have a work of historical scholarship cited? Well, it can mean a number of things. So I wrote two posts. In my first piece I analyzed one then-recent essay from The William and Mary Quarterly. I evaluated how many citations were for information or even refutation; after all, every citation “counts.” In the second, revisited here, I wrote about a publication The Rising Metric Tide, that was tackling how this metric-mania was playing out in the UK, where centralized higher education bureaucracy was making ever more regimented decision via the Research Excellence Framework (and then the Teaching Excellence Framework, and so on).
My basic concern was that, if as demonstrated in my first post, a citation can mean many things, then what does it mean to count them – and by extension to then make judgments about quality based on that counting. I assayed that impact is rarely as immediate as monolithic structures of evaluation, like the Journal Impact Factor suggest, and that there are other ways to measure it in any case. I made a reference to a long tradition of historical scholarship on the complexity and diversity of marriage practice, and its citation in the US Supreme Court’s decision in Obergefell v. Hodges. “People, I think we can call that impact.” Given the court’s recent reference to the significance of history, and invocation of something we can only describe as Not History in both Dobbs v Jackson (abortion) and NYSRPA v Bruen (guns) — despite extensive amicus briefs from scholars steeped in, well, actual history – I think we must call that something else.
So here’s what I wrote in 2015– let me know what you think about how my assessment of what counting citations actually measures, what it portends, and whether we’re too far gone to even have this conversation:
If We Don’t Know What Citations Mean, What Does it Mean When We Count Them
In my last post for the Kitchen I explored what citations might mean within any given publication. Do citations necessarily indicate the significance of the cited publication in question? And if we don’t really know what individual citations mean, why do we think we can draw important meaning from their aggregation?
Citation metrics are insidious. That’s the implication of plenty of conversation in the media and the sciences, the impetus for the San Francisco Declaration on Research Assessment (DORA), and the strong suggestion of a recent report from the Higher Education Funding Council for England (HEFCE). Regular readers of the Kitchen will be familiar with debates about such metrics as the Journal Impact Factor (JIF), the h-index, and altmetrics, and concerns about the screwy incentives that such metrics produce. Regular readers might also wonder how much more can be said about the technical details of such metrics and the ethics of their utility.
As with so much in Higher Ed policy, particularly concerning scholarly communications, the problems with citation metrics are global, largely led by trends in the STEM disciplines, and often ignored or ill-understood by and on behalf of the humanities. And, I might add, often ignored or poorly understood by HSS faculty, particularly in the U.S. It is not surprising that these issues are being taken up assertively in the UK, where the Research Excellence Framework (REF) assesses “the quality of research in UK higher education institutions” through a review of “impact” of the “outputs” (publications) of their faculty. Even Wikipedia has captured the main discontents with the REF: a systemic presumption that “impact,” measured in the media chatter, social policies or economic benefits scholarship quickly generates outside the academy stifles creative inquiry, and that measuring those impacts is itself a dodgy business.
HEFCE’s report, The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment and Management synthesizes, updates and reiterates some of the most regularly repeated concerns about an over-reliance on metrics to assess the quality and significance of research. The report includes two substantial supplements, a lucid literature review and a correlation of the 2014 REF and metrics. Calling for “responsible metrics” the report also asks for, among other things, “humility” in the use of metrical evaluation, “recognising that quantitative evaluation should support — but not supplant — qualitative, expert assessment.”
There is much for individual faculty, their department heads and other administrators, as well as publishers, to leverage in the report’s language. For humanists, it is helpful that The Metric Tide makes a pretty big deal out of the disciplinary differentials in citation metrics: “metrics should not become the ‘tail that wags the dog’ of research practice in all disciplines. Rather, it is incumbent on those who design and use metrics to take a fuller account of the existing diversity of research, and design sensitive and meaningful metrics to reflect and support this.”
Not surprisingly, given the implication that qualitative evaluation is as important or more important than quantitative indicators, peer review, “arguably the most important method of quality control in all disciplines,” gets some rehabilitation in The Metric Tide. Or perhaps it’s more accurate to say that the myriad problems with peer review are weighted against the myriad problems with the “Cultures of Counting” (a full chapter in the report, pp 79-95). Those latter include but are not limited to perverse incentives, gaming the system, and biases of all stripes — in other words, many of the same problems that critics have identified with peer review.
Quantitative methods are inherently biased; algorithms are built by humans, and in this case they are measuring human products – ergo they may share many of the same biases as other human creations (such as peer review). The primacy effect and gender bias, both given recent and renewed attention, represent just two pretty devastating challenges to citation and other kinds of metrics. Merely appearing first in a publication can guarantee not only more social media attention but more citations, which strikes me as clearly discriminatory for other authors. Gender bias in citations is a clear and present danger.
Knowing that these and other biases exist, can there be any reasonable argument in favor of citation metrics? Well, we have long known about gender and other biases in student evaluations of teaching, and yet even my own august institution still employs them as a component of faculty assessment. Here the Metric Tide call for “Responsible Metrics” may be a helpful guide to using metrics in evaluation beyond citations.
Other profoundly important issues still need to be addressed more fully, among them the pressure for quick if not immediate research “impact.” Measures of journal impact rely on a woefully short window of 5 years. This is, in the most literal sense, short-sighted, particularly for the humanities where views, downloads, and citations tend to accumulate over time. It isn’t surprising that the most cited articles on Google Scholar Metrics for both, say Nature and the American Historical Review are from the oldest years captured in that 5 year window, and that none are for the newest (all of the top ten cited articles for the AHR were for the first year, 2009, while Nature citations are distributed a little more widely, 5 for 2009, two each for 2010 and 2011, and one for 2012). If we look at article access, however, I suspect we would see a much more divergent pattern. For the William and Mary Quarterly, the most accessed articles on JSTOR in 2014 are, well, long-appreciated essays. The median year of publication was 1992. Among the top five the publication dates, in order: 1983, 1996, 2008, 1997, 2001. To be sure, these are classic essays. One, by Neal Salisbury, published in 1996 on “The Indians’ Old World: Native Americans and the Coming of Europeans,” is in part a response to an older classic of the literature published in the WMQ in 1984 by James Merrell, “The Indians’ New World: The Catawba Experience.”
I think this suggests more than just the slow digestion rate of historians. We simply don’t always know — maybe we rarely know — when research will be brought to bear on other scholarship, or on a social issue. Historian and frequent Inside Higher Education contributor Johann Neem has raised this issue more than once, most recently in a piece about the rich and slowly developed scholarship on marriage and family that helped frame the Supreme Court decision in Obergefell v. Hodges earlier his summer. Justice Anthony Kennedy, writing for the majority, cited prominent works on the history of marriage and family in America by Nancy Cott, Stephanie Coontz, and Hendrik Hartog that demonstrated the changing bases and legal status of marriage and marital partners (thus refuting the argument about the universal and unchanging nature of marriage). Kennedy also cited the amicus brief filed by Cott and others that had recounted some of the larger scholarly literature on these issues in noting that “developments in the institution of marriage over the past centuries were not mere superficial changes. Rather, they worked deep transformations in its structure, affecting aspects of marriage long viewed by many as essential.”
People, I think we can call that impact. A lot of research may in fact labor in obscurity, but that doesn’t mean at some point it won’t become of signal importance. Basic research takes time to germinate. Some won’t have any measurable impact at all. But that doesn’t mean it’s low quality and it doesn’t mean that it doesn’t have value.