Over the past few weeks, I’ve been involved in a number of discussions over the role of alternative metrics in research evaluation. Amongst them, I moderated a session at SSP on the evaluation gap, took part in the short course on journal metrics, prior to the CSE conference in Philadelphia, and moderated a webinar on the subject. These experiences have taught me a lot about both the promise of, and challenges surrounding altmetrics, and how they fit into the broader research metrics challenge that funders and institutions face today. Particularly, I’ve become much more aware of the field of Informetrics, the academic discipline that supports research metrics, and have begun to think that we, as scholarly communication professionals and innovators have been neglecting a valuable source of information and guidance.
It seems that broadly, everybody agrees that the Impact Factor is a poor way to measure research quality. The most important objection is that it is designed to measure the academic impact of journals, and is therefore only a rough proxy for the quality of the research contained within those journals. As a result, article-level metrics are becoming increasingly common and are supported by Web of Science, Scopus and Google Scholar. There are also a number of alternative ways to measure citation impact for researchers themselves. In 2005 Jorge Hirsch, a physicist from UCSD, proposed the h-index, which is intended to be a direct measure of a researcher’s academic impact through citations. There are also a range of alternatives and refinements with names like m-index, c-index, and s-index, each with their own particular spin on how best to calculate individual contribution.
While the h-index and similar metrics are good attempts to tackle the problem of the impact factor being a proxy measure of research quality, they can’t speak to a problem that has been identified over the last few years and is becoming known as the Evaluation Gap.
Defining the Evaluation Gap
The Evaluation Gap is a concept that was introduced in a 2014 post by Paul Wouters, on the citation culture blog which he co-authors with Sarah de Rijcke, both of whom are scholars at the University of Leiden. The idea of the gap is summed up by Prof Wouters as:
…the emergence of a more fundamental gap between on the one hand the dominant criteria in scientific quality control (in peer review as well as in metrics approaches), and on the other hand the new roles of research in society.
In other words, research plays many different roles in society. Medical research, for example, can lead to new treatments and better outcomes for patients. There are clear economic impacts of work that leads to patents or the formation of new companies. Add to that legislative, policy and best practice impact, as well as education and public engagement, and we see just how broad the ways are in which research and society interact. Peer review of scholarly content and citation counts are a good way to understand the impact of research on the advancement of knowledge within the academy but a poor representation of the way in which research informs activities outside of the ivory tower.
Go0dhart’s Law: When a measure becomes a target it ceases to be a good measure
In April of this year, the Leiden manifesto, which was written by Diana Hicks and Paul Wouters, was published in nature. There has been surprisingly little discussion about it in publishing circles. It certainly seems to have been met with less buzz than the now iconic altmetrics manifesto, which Jason Priem et al., published in 2010. As Cassidy Sugimoto (@csugimoto) pointed out in the session at SSP that I moderated, the Leiden manifesto serves as a note of caution.
Hicks and Woulters point out that obsession with the Impact Factor is a relatively new phenomenon, with the number of academic articles with the words ‘impact factor’ in the title having steadily risen from almost none, to around 8 per 100,000 a few years ago. The misuse of this simple and rather crude metric to inform decisions that it was never intended to inform has distorted the academic landscape by over-incentivizing the authorship of high impact articles, and discounting other valuable contributions to knowledge, as well as giving rise to more sinister forms of gaming like citation cartels, stacking and excessive self-citation. In many ways, citation counting and altmetrics share some common risks. Both can be susceptible to gaming and as Hicks and Wouters put it…
….assessors must not be tempted to cede the decision-making to the numbers
Is history repeating itself?
Eugene Garfield is the founder of ISI and an important figure in bibliometrics. In his original 1955 article he makes an argument uncannily similar to the argument that Jason Priem made in the altmetrics manifesto (emphasis my own)
It is too much to expect a research worker to spend an inordinate amount of time searching for the bibliographic descendants of antecedent papers.
As the volume of academic literature explodes, scholars rely on filters to select the most relevant and significant sources from the rest. Unfortunately, scholarship’s three main filters for importance are failing.
In the case of both citation tracking and altmetrics, the original problem was one of discovery in the face of information overload but people inevitably start to look at anything that you can count as way to increase the amount of automation in assessment. How do we stop altmetrics heading down the same path as Impact Factor and distorting the process of research?
Engagement exists on a spectrum. While some online mentions, for example tweets, are superficial, requiring little effort to produce and conveying only the most basic commentary, some mentions are of very high value. For example, a medical article that is cited in up-to-date.com would not contribute to traditional citation counts but would inform the practice of countless physicians. What is important is context. To reach their full potential, altmetrics solutions and processes must not rely purely on scores but place sufficient weight on qualitative context based assessment.
The Research Excellence Framework (REF), is a good example of how some assessors are thinking positively about this issue. The REF is an assessment of higher education institutions across the UK, the results of which are used to allocate a government block grant that makes up approximately 15-20% of university funding. The framework currently contains no metrics of any kind and according to Stephen Hill of HEFCE, assessment panels are specifically told not to use Impact Factor as a proxy for research quality. Instead, institutions submit written impact statements and are assessed on a broad range of criteria including their formal academic contributions, economic impact of their work, influence on government policy, their public outreach efforts and their contribution to training the next generation of academics. HEFCE are treading carefully when it comes to metrics and are consulting with informaticians about how to properly incorporate metrics without distorting researcher behavior. Unfortunately, as Jonathan Adams, chief scientist as Digital Science notes, some researchers are already seeing evidence that the REF is affecting researcher behavior.
The importance of learning from the experts
I’ve only really touched very lightly on some of the issues facing altmetrics and informetrics. When I’ve spoken to people who work in the field, I get the impression they feel there isn’t enough flow of information from the discipline into the debate about the future of scholarly communication, leading to a risk that new efforts will suffer the same pitfalls as previous endeavors.
As a result, many in the field have been trying very hard to be heard by those of us working at the cutting edge of publishing innovation. The Leiden manifesto (which has been translated into an excellent and easy to understand video) as well as earlier documents like the San Francisco Declaration on Research Assessment (DORA), (available as a poster, here) are examples of these outreach efforts. These resources are more than opinion pieces, they are attempts to summarize aspects of state of the art thought in the discipline, to make is easier for publishers, librarians, funders and technologists to learn about them.
Funders and institutions clearly feel that they need to improve the way that research is assessed for the good of society and the advancement of human knowledge. Much of the criticism of altmetrics focuses on problems that traditional bibliometrics also suffer from, over matricization, the use of a score as an intellectual shortcut, the lack of subject normalization, and the risks of gaming. At the same time, people working in the field of informetrics have good ideas to address these issues. Publishers have a role to play in all of this by supporting the continued development of tools that enable better assessment.
Instead of thinking about criticisms of altmetrics as arguments against updating how we assess research, let’s instead think of them as helpful guidance as to how to improve the situation yet further. Altmetrics as they stand today are not a panacea and there is still work to be done. Now that we have the power of the web at our disposal however, it should be possible with some thought, and by learning from those who study informetrics, to continue to work towards a more complete and more useful system of research assessment.