Good lord.

That was my reaction to the news that Clarivate would add an Impact Factor to more Humanities and Social Science journals. This will, it’s said, help “level the playing field for all quality journals.” The “category normalization” is, in theory, a way in which like can be compared to like so no essay in early American history is going to be scored in the same pool with a research article on vaccine efficacy, for example. 

In the summer of 2015 I was thinking a lot about citation metrics and how those were fouling up a lot of analyses about the significance of humanities scholarship. I was worrying about the attention economy, yes, but even more so about the creeping structures of evaluation pegged to “impact” — and specifically by impact as measured by citation metrics. What does it even mean to have a work of historical scholarship cited? Well, it can mean a number of things. So I wrote two posts. In my first piece I analyzed one then-recent essay from The William and Mary Quarterly. I evaluated how many citations were for information or even refutation; after all, every citation “counts.” In the second, revisited here, I wrote about a publication The Rising Metric Tide, that was tackling how this metric-mania was playing out in the UK, where centralized higher education bureaucracy was making ever more regimented decision via the Research Excellence Framework (and then the Teaching Excellence Framework, and so on).

My basic concern was that, if as demonstrated in my first post, a citation can mean many things, then what does it mean to count them – and by extension to then make judgments about quality based on that counting. I assayed that impact is rarely as immediate as monolithic structures of evaluation, like the Journal Impact Factor suggest, and that there are other ways to measure it in any case. I made a reference to a long tradition of historical scholarship on the complexity and diversity of marriage practice, and its citation in the US Supreme Court’s decision in Obergefell v. Hodges. “People, I think we can call that impact.” Given the court’s recent reference to the significance of history, and invocation of something we can only describe as Not History in both Dobbs v Jackson (abortion) and NYSRPA v Bruen (guns) — despite extensive amicus briefs from scholars steeped in, well, actual history – I think we must call that something else.

So here’s what I wrote in 2015– let me know what you think about how my assessment of what counting citations actually measures, what it portends, and whether we’re too far gone to even have this conversation:

If We Don’t Know What Citations Mean, What Does it Mean When We Count Them

In my last post for the Kitchen I explored what citations might mean within any given publication.  Do citations necessarily indicate the significance of the cited publication in question? And if we don’t really know what individual citations mean, why do we think we can draw important meaning from their aggregation?

Citation metrics are insidious. That’s the implication of plenty of conversation in the media and the sciences, the impetus for the San Francisco Declaration on Research Assessment (DORA), and the strong suggestion of a recent report from the Higher Education Funding Council for England (HEFCE). Regular readers of the Kitchen will be familiar with debates about such metrics as the Journal Impact Factor (JIF), the h-index, and altmetrics, and concerns about the screwy incentives that such metrics produce. Regular readers might also wonder how much more can be said about the technical details of such metrics and the ethics of their utility.

yellow tape measure isolated on white

As with so much in Higher Ed policy, particularly concerning scholarly communications, the problems with citation metrics are global, largely led by trends in the STEM disciplines, and often ignored or ill-understood by and on behalf of the humanities. And, I might add, often ignored or poorly understood by HSS faculty, particularly in the U.S. It is not surprising that these issues are being taken up assertively in the UK, where the Research Excellence Framework (REF) assesses “the quality of research in UK higher education institutions” through a review of “impact” of the “outputs” (publications) of their faculty. Even Wikipedia has captured the main discontents with the REF: a systemic presumption that “impact,” measured in the media chatter, social policies or economic benefits scholarship quickly generates outside the academy stifles creative inquiry, and that measuring those impacts is itself a dodgy business.

HEFCE’s report, The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment and Management synthesizes, updates and reiterates some of the most regularly repeated concerns about an over-reliance on metrics to assess the quality and significance of research. The report includes two substantial supplements, a lucid literature review and a correlation of the 2014 REF and metrics. Calling for “responsible metrics” the report also asks for, among other things, “humility” in the use of metrical evaluation, “recognising that quantitative evaluation should support — but not supplant — qualitative, expert assessment.”

There is much for individual faculty, their department heads and other administrators, as well as publishers, to leverage in the report’s language. For humanists, it is helpful that The Metric Tide makes a pretty big deal out of the disciplinary differentials in citation metrics: “metrics should not become the ‘tail that wags the dog’ of research practice in all disciplines. Rather, it is incumbent on those who design and use metrics to take a fuller account of the existing diversity of research, and design sensitive and meaningful metrics to reflect and support this.”

Not surprisingly, given the implication that qualitative evaluation is as important or more important than quantitative indicators, peer review, “arguably the most important method of quality control in all disciplines,” gets some rehabilitation in The Metric Tide. Or perhaps it’s more accurate to say that the myriad problems with peer review are weighted against the myriad problems with the “Cultures of Counting” (a full chapter in the report, pp 79-95). Those latter include but are not limited to perverse incentives, gaming the system, and biases of all stripes — in other words, many of the same problems that critics have identified with peer review.

Quantitative methods are inherently biased; algorithms are built by humans, and in this case they are measuring human products – ergo they may share many of the same biases as other human creations (such as peer review). The primacy effect and gender bias, both given recent and renewed attention, represent just two pretty devastating challenges to citation and other kinds of metrics. Merely appearing first in a publication can guarantee not only more social media attention but more citations, which strikes me as clearly discriminatory for other authors. Gender bias in citations is a clear and present danger.

Knowing that these and other biases exist, can there be any reasonable argument in favor of citation metrics? Well, we have long known about gender and other biases in student evaluations of teaching, and yet even my own august institution still employs them as a component of faculty assessment. Here the Metric Tide call for “Responsible Metrics” may be a helpful guide to using metrics in evaluation beyond citations.

Other profoundly important issues still need to be addressed more fully, among them the pressure for quick if not immediate research “impact.” Measures of journal impact rely on a woefully short window of 5 years. This is, in the most literal sense, short-sighted, particularly for the humanities where views, downloads, and citations tend to accumulate over time. It isn’t surprising that the most cited articles on Google Scholar Metrics for both, say Nature and the American Historical Review are from the oldest years captured in that 5 year window, and that none are for the newest (all of the top ten cited articles for the AHR were for the first year, 2009, while Nature citations are distributed a little more widely, 5 for 2009, two each for 2010 and 2011, and one for 2012). If we look at article access, however, I suspect we would see a much more divergent pattern. For the William and Mary Quarterly, the most accessed articles on JSTOR in 2014 are, well, long-appreciated essays. The median year of publication was 1992. Among the top five the publication dates, in order: 1983, 1996, 2008, 1997, 2001. To be sure, these are classic essays. One, by Neal Salisbury, published in 1996 on “The Indians’ Old World: Native Americans and the Coming of Europeans,” is in part a response to an older classic of the literature published in the WMQ in 1984 by James Merrell, “The Indians’ New World: The Catawba Experience.”

I think this suggests more than just the slow digestion rate of historians. We simply don’t always know — maybe we rarely know — when research will be brought to bear on other scholarship, or on a social issue. Historian and frequent Inside Higher Education contributor Johann Neem has raised this issue more than once, most recently in a piece about the rich and slowly developed scholarship on marriage and family that helped frame the Supreme Court decision in Obergefell v. Hodges earlier his summer. Justice Anthony Kennedy, writing for the majority, cited prominent works on the history of marriage and family in America by Nancy Cott, Stephanie Coontz, and Hendrik Hartog that demonstrated the changing bases and legal status of marriage and marital partners (thus refuting the argument about the universal and unchanging nature of marriage). Kennedy also cited the amicus brief filed by Cott and others that had recounted some of the larger scholarly literature on these issues in noting that “developments in the institution of marriage over the past centuries were not mere superficial changes. Rather, they worked deep transformations in its structure, affecting aspects of marriage long viewed by many as essential.”

People, I think we can call that impact. A lot of research may in fact labor in obscurity, but that doesn’t mean at some point it won’t become of signal importance. Basic research takes time to germinate. Some won’t have any measurable impact at all. But that doesn’t mean it’s low quality and it doesn’t mean that it doesn’t have value.

Karin Wulf

Karin Wulf

Karin Wulf is the Beatrice and Julio Mario Santo Domingo Director and Librarian at the John Carter Brown Library and Professor of History, Brown University. She is a historian with a research specialty in family, gender and politics in eighteenth-century British America and has experience in non-profit humanities publishing.

Discussion

11 Thoughts on "Still Ambiguous at Best? Revisiting “If We Don’t Know What Citations Mean, What Does it Mean When We Count Them”"

We’re tackling part of this issue at scite.ai. Specifically, we’ve built a robust system that allows researchers and anyone interested in research to see how and why an article has been cited by providing the citation context. Additionally, we also indicate where the citation was made (Introduction versus Discussion section, for example) and indicate the type of citation (does it provide supporting or contrasting evidence to the cited claim?).

We’ve published a paper on our approach here: https://direct.mit.edu/qss/article/2/3/882/102990/scite-A-smart-citation-index-that-displays-the, and importantly, we are working directly with publishers to bring this contextual information and nuance to the version of record (our citation statements showing citation context now live on millions of articles published by ACS, Wiley, PNAS, and many others).

You can see more announcements and relevant pieces here: https://scite.ai/news-and-press.

Thanks so much, looking forward to checking that out. Though to be contrary, I still wonder what citations –more or less– really mean at a deeper level? Sometimes a work is sui generis, really the only thing on a subject and so (this is really the case in my field) it gets cited almost anytime someone touches on a related topic. Whereas there are works that are really subtle and differentiated on other topics, just as good, that rarely get cited. In some fields maybe citation reflects impact, but in others really not so much?

By your so-called smart citation, you are adding a further layer of bias to this distorted metric. There is no “smart” and “stupid” citations. Citation is just a reference to a source. How this can be smart or less smart?
It is time to stop to add new distortions to citations.

Hi Bob,

It sounds like you take issue with the branding and marketing of our citations but not the actual implementation. We show how and why an article has been cited by showing the in-text citation context. This citation context provides a more rich and nuanced view of the citation record than traditional citations. Some have called these “rich citations,” “citances,” and “smart citations.” What you call them matters less than what they are, which is not a traditional citation.

I hope that clarifies, and thanks for the feedback.

A case in point is the published PhD thesis of the distinguished historian, William B. Provine (1942-2015) –The Origins of Theoretical Population Genetics (1971). This was heavily cited by the population geneticists and their followers as giving an accurate account of the early history of their subject. The problem was that Provine’s mature viewpoint, which began to emerge in his numerous writings in the 1980s, differed greatly. The second edition reprinting (2001) had an “afterword” that greatly changed the picture. However, the 1971 first edition continued to be cited.

I know works like that in my field — and there is the reverse politics of citation where work gets ignored for reasons of bias.

It’s a great pleasure to revisit this excellent post. It begs the question: If metrics are not reliable for various reasons, what is? I think the answer to that is the reputation (in a particular field) of a journal. The journal brand is the best measurement we have ever had. It’s tragic that it is being put to death.

Thank you, Joe, high praise! I think I just more grumbly about this over time, but I do think at some level we have to have qualitative evaluation even if that’s problematic too.

Great post. The Clarivate update is how we remembered to ask our publishing partner Wiley for the updated impact factor #s.

One thing I’m challenged by with RE: impact factor is how much of it is noise versus an actual metric. The editors of our journals care about it insofar as, when it is up, it is good shorthand to authors to submit to our journals, but when it is down (which is our current situation, due in part to some inflation with the Clarivate changes in measurement), it is to be ignored and/or someone needs to find a magic bullet to correct. There is no magic bullet other than to constantly be publishing highly cited, high quality papers at as high a volume as is possible.

The other challenge in general with journal metrics/analytics, at least from the society/association lens, is trusting the data supplied to us. Getting access data in the first place is challenging, and when we do get data, it does not square with the traffic numbers on our journal pages, clickthroughs, etc. While we are working with Wiley to increase data transparency on both sides of the equation, there’s not a great understanding of “what does it all mean?” when trying to develop a long-term content strategy for the journals that makes the access numbers or impact factors align with the revenue numbers.

“And if we don’t really know what individual citations mean, why do we think we can draw important meaning from their aggregation?” – as you suggest, “we” think this because, astonishingly, research funders and academic institutions still cling to citation-counting-metrics as the basis for multi-billion dollar research investment decisions and researcher career advancement decisions.

Citations are a great way to show how ideas are connected but they are actually worse than useless when it comes to measuring research integrity, impact, reproducibility and productivity (https://figshare.com/articles/preprint/Article_Citations_are_an_Insufficient_Building_Block_for_Measuring_Research_Impact_in_the_Twenty-First_Century/8141783).

I would just the attention to the new Co-Citation Percentile Rank (CPR),
Yes, another citation metric, but with a different twist
https://oscsolutions.cc.jyu.fi/jyucite/about/

It seems, there are many people concerned with alternative fairer “citation evaluation” and this will go on until everybody leaves metrics behind o agrees on a “gold standard” metric (which I guess will never happen).

Comments are closed.