This Spring has been a season of altmetrics, at least for me, as I have spent much of the last two months either moderating panel discussions on alternative and new metrics, or giving requested talks on the subject. I’ve spoken with librarians, representatives from funding agencies, executives from altmetrics service providers and academics involved in hiring and career advancement decisions. The impressive level of interest in the subject of altmetrics is telling. There is great discontent with the current system for understanding and evaluating scholarly research, and in the era of “big data”, an understandable desire to put some of that data to good use.
The question still remains though, what exactly should that “use” be?
When we look at altmetrics, or really any kind of metrics, we have to carefully examine the stories they tell us to understand whether they hold any significance.
Our brains like patterns. Steven Jay Gould, in his in his famous 1988 essay on Joe DiMaggio’s record hitting streak, talks about how evolutionary adaptations to seeing patterns continue to drive the way we see the world:
“We must have comforting answers. We see pattern, for pattern surely exists, even in a purely random world… Our error lies not in the perception of pattern but in automatically imbuing pattern with meaning, especially with meaning that can bring us comfort, or dispel confusion…we must impart meaning to a pattern—and we like meanings that tell stories about heroism, valor, and excellence.”
There’s a natural tendency in humans to want to create stories. To take available data and make sense of it by using it to show patterns that tell a story. The danger though, comes from our tendency to do this even when the meaning behind the story isn’t really there.
Much of the data collected and presented in the altmetrics world revolves around measurements of attention. For many, “altmetrics” and “attention metrics” are synonymous. The very nature of online publishing, social media and the interlinked internet itself make these sorts of data readily available. But measuring attention is not the same thing as measuring quality or value, and for important decisions like career advancement and funding, a sense of quality (this is an important result, this person does excellent research) is what we are hoping to learn.
A company called Chartbeat, which provides real-time data to major web publications did a recent study looking at 2 billion visits across the web over the course of a month. A few key findings were that most usage statistics like pageviews and clicks were meaningless—in 55% of cases, a reader spends less than 15 seconds on a given webpage.
Second, there is no correlation between someone sharing an article via social media and the attention paid to that article:
We looked at 10,000 socially-shared articles and found that there is no relationship whatsoever between the amount a piece of content is shared and the amount of attention an average reader will give that content…
Bottom line, measuring social sharing is great for understanding social sharing, but if you’re using that to understand which content is capturing more of someone’s attention, you’re going beyond the data. Social is not the silver bullet of the Attention Web.
This should give us pause when considering the value of attention metrics. Does an html view of a paper really mean that someone read it or that the title was intriguing enough to click on when seen in Google search results? As David Colquhoun and others have pointed out, tweets very rarely seem to show any sort of understanding of the content of the articles being shared.
Even if perfect measuring tools were available, we would still need to decide where measurements of attention should matter. There are clear use cases: knowing what your colleagues are reading can help you filter the literature; knowing which research projects have piqued the public’s interest can help an institution’s development officer solicit donations. Serious questions remain though, on whether these types of measurements should play any role whatsoever in the assessment of a researcher’s work or the value of a research publication.
If you set up a metric to be used in performance assessment, researchers will change their behaviors to maximize their performance of that metric.
Right now we ask researchers to do a certain amount of self-promotion, but this is a means to an end. By promoting their work through publishing, giving talks at meetings, doing interviews, things like that, it helps raise awareness of the work and hopefully increases its impact. And impactful research is the goal. But by rewarding researchers for doing the things attention metrics measure, we instead make attention itself the goal, rather than the impact we were hoping to gain through attention. By focusing on the means, rather than the end, we may end up favoring the drawing attention to oneself over the production of meaningful research results.
Metrics must line up with goals and offer incentives for meeting those goals. How do we really want researchers spending their time? How much of the job is about research and how much should be about other things? The PeerJ blog suggests that funding agencies should ask researchers to prioritize things like “inspiring citizen scientists” as much as generating results, but I’m not so sure. If publicity is the goal, why not hire an expert? Wouldn’t it be more cost effective for a funding agency to hire a professional publicity firm, rather than offering a research grant and expecting something other than research?
(As an aside, that PeerJ blog post, particularly the now struck-through retracted text, is the perfect example of why peer review should be done anonymously, a clear case of how doing a review could lead to retaliation against a researcher).
The skeptical part of my brain always lights up when I hear proposals to reward researchers for doing things other than research. Is what’s being discussed an important part of a researcher’s job or an attempt to change the game to better fit someone else’s skillset? Maybe if you’re not so great at doing groundbreaking research, but you’re really good at communicating, forming community and you enjoy arguing online all day, could this be a way to shift the criteria for career success to something that favors you? I worry that altmetrics reward things like effort and participation and talking about science rather than actually doing science.
Measurements of quality of work and fuzzy concepts like “impact” remain at the heart of our needs for decision making in areas like funding and career advancement. Attention metrics offer us, at best, correlative information, and those correlations are unlikely to hold up if attention becomes an official metric of gauging success.
If attention is going to factor in to how we judge the work of researchers, this will change the way that researchers plan their experiments and write their papers. If we set up a system that rewards popularity and sensationalism, researchers will understandably start to chase those goals, planning experiments like those at the top of altmetric scoring lists like this and this. We’ll see a rise in the flashy and sensationalistic, research on fad diets, Facebook and Sudoku, rather than meaningful but more mundane work that advances human health and society.
Don’t get me wrong–there’s great value in measuring attention, the interest in science, and the communication of science. These are fascinating subjects, but they are not a replacement for measurements of quality and importance. We know the Impact Factor is flawed, but we must be careful not to replace it with something even more flawed.
The good news is that from speaking to all of these meeting panels, there seems to be little traction for serious use of attention metrics in researcher assessment. Funding agencies stated that they were very interested in altmetrics, but are not using them (or any metric) in funding decisions. Attention metrics were intriguing to researchers but very far off of their radar as far as having any real impact on their careers. Even those from metric companies suggested that attention metrics were just a small part of the picture.
It may be that the overemphasis on attention metrics is slowing the growth and acceptance of altmetrics. The very term “altmetrics” may be so intertwined with attention measures that it is no longer of use for the very important need to find better ways to measure the quality of researcher performance. To many, “altmetrics” means “how many Facebook likes did the article get?” The toolset of altmetrics needs great refinement and attention metrics may be best left off to one side and only used where strictly appropriate.
As one of the better metaphors I heard suggests, measuring attention tells us how well the movie did at the box office, when what we really want to know is whether it is any good.
Discussion
39 Thoughts on "Altmetrics: Mistaking the Means for the End"
What “altmetrics” represent is a moving target. Yesterday, I did a “flash” session at the STM Spring Conference in DC for a new metric I’m developing, SocialCite. This metric is meant to help measure the quality of citations and the care with which they are used in context, as well as the type of citation being used. I awoke this morning seeing that I’ve been identified by some as “entering the altmetrics game.” That was surprising, because I view SocialCite and others (PRE-score is another that I’m helping to cultivate) as NEW metrics, but not ALTmetrics. They don’t challenge any existing metric (as others have pointed out, a lot of the altmetrics suggest they can be used as a proxy or alternative to the impact factor). They are new metrics that do other things.
Is “alt” just the cool prefix these days?
We need to and can measure more things. Does everything have to be “alt”? Or can we just have new metrics?
SocialCite based at looking at the website is an interesting concept and could be a useful tool to evaluate citations. I hope it works out.
It appears if I understand the site correctly to be different from what most people are looking for with a citation rate or “altmetrics”. They want a single number or set of numbers to easily and conveniently compare articles (or journals) on quality, impact etc. The problem of course is that you can’t describe a complex concept like quality/importance of research with a single number.
I’d disagree that “most people” who are interested in altmetrics are looking for a single number. Most in the altmetrics advocacy camp agree that the “one number to rule them all” approach is flawed.
There’s an inherent compromise with any metric. The benefit you gain is an ability to deal with scale and an ability to gain understanding for fields in which you lack a depth of knowledge. If I’m asked to review 500 applications for a job or grant applications for a field I don’t know all that well, then having a reliable system that helps me quickly get a handle on the quality of work done by different individuals would be very useful. Unfortunately, those tools remain crude.
But I think we both have a level of unease in turning to quantitative methods to answer qualitative questions. Metrics can provide insight, but at some point in the process, a human is going to have to make a judgement. Although they are currently out of fashion, we shouldn’t immediately dismiss methods just because they aren’t based on an algorithm. I’m told there’s great value in the NIH Biosketch, which is increasingly required for applications. As part of that, the researcher must succinctly explain who they are, what they do and why their work is important. This can be a valuable tool for narrowing down a huge stack of applicants to a workable number.
And if one thinks that the best way to evaluate the work of a researcher is to have a team of subject experts carefully read their papers and review them for value, placing them at the appropriate level ranking for the field, then isn’t this almost exactly what, if done properly, the peer review process does? How well a journal does this process and how high a standard they set for acceptance leads to journal reputation and brand. It may be unpopular with the tech crowd, but I suspect there’s still some value in journal brand, even if we do away with the Impact Factor.
I agree with you David. I definitely see a value in metrics. As you said they have to be used in context and for complex decisions like promotion and tenure their needs to be human beings evaluating a whole range of material including some metrics.
I also agree there is value the branding provided by journals through good peer review and editing but it also ends up creating a very inefficient system when acceptance rates are generally under 50% in some cases as low as 10%. When you get very selective the process also becomes pretty unreliable.It gets very frustrating for authors. I think that is why megajournals have become so popular.
Most of the folks that I’ve spoken with agree that “altmetrics” was actually an unfortunate name. “Alt” sets you up for a fight (will X replace Y) when the reality is we just need appropriate metrics. The “appropriate” metrics, like anything else, will (and should) vary with use case. What are you tying to understand, prove, measure, improve? Altmetrics has become the bucket under which almost anything but Impact Factor has gotten lumped together and that’s unfortunate.
Good point, Ann, and it goes to something I find myself saying a lot in libraries: it doesn’t much matter whether an idea represents an innovation, or a retrenchment, or a wholly new practice, or the continuation of an old practice. What matters is whether, if implemented, the idea is going to improve our ability to accomplish the tasks that our library is trying to accomplish. We absolutely have to be open to new thinking, but not because new thinking is always right — we have to be open to it because we shouldn’t be privileging old thinking. All ideas (new or old) need to be given due consideration on a level playing field, and selected or rejected based on their strategic fit to mission. Same goes for altmetrics, IF, etc.: what matters is what works. If it doesn’t do what you need done, it doesn’t matter one way or the other whether it’s in or out of fashion, whether it’s newfangled or old-fashioned, or whatever.
David, I don[‘t disagree with you assessment of altmetrics but I think citation rates are almost as meaningless as a measure of “quality” or even real impact for somewhat different reasons.
Articles get cited for a lot of reasons. Sometimes because they have real impact on a field and describe a real seminal piece of research but it can be for a lot of other reasons as well. Just from my personal experience that is clearly true. I’ve published about 70 articles and looking at the citation rates of the articles, they seem to have little to do with what I would consider quality and impact. The ones that are cited a lot tend to have to do with hot topics.
The one that is cited by far the most, probably around 150 or 200 times (it’s in an esoteric journal not indexed in Scopus) is an article that was published in 1999 or 2000 on web-based surveying. I was doing some web-based surveying which wasn’t being done that much back then and for the hell of it wrote an article summarizing the little research on the topic, some pointers on how to do it with links to a few PHP scripts readers could use since this was well before Survey Monkey. Web based surveying took off and I guess people needed to cite some reference about the methodology and that’s why it got cited so much.
My point is high citation rates can be for reasons as quirky as that yet they are taken as such an important measure of quality. It’s not that citation rates are a bad measure, just that like potentially altmetrics, they are interpreted to measure more than they really do.
You might be interested in a new venture called SocialCite which aims to characterize individual citations:
http://social-cite.org/
Thanks, It is an interesting concept. I hope it works out.
David:
I am not so sure you have correctly evaluated the the impact of your article. Being one of the first it probably served as a spark for others. I am reminded of the story – maybe even true – of Melville who when asked about Moby Dick replied I wrote a book about ships and whales nothing more!
Thanks Harvey, but I don’t think it was that great of an article, just the right topic at the right time. Moby Dick on the other hand really was a great book. 🙂
David C:
Data in search of meaning when there may not be any seems to be the driving force behind alt metrics. Additionally, it seems social media can be manipulated. We see this all the time in “viral” utube ditties.
Although there are shortcomings in the IF, at least there is a methodology designed to demonstrate something that scientists deem of worth. Those who seem to rail most vociferously against it seem to be those who are/were adversely effected by it. In short, are many of the complaints against the IF sour grapes?
Lastly, the researchers I have been associated with seem to love what they do and when asked to “hype” their work kindly decline.
In some 40 years of doing this thing we call publishing, I have met but a handful of great communicators and they often become president of societies, and appear before Congressional committees.
How do we distinguish between “good” cites (“This article helped me in my research.”) and “bad” cites (“This article is totally wrong!”)? As I recall, one of the most cited articles in Science was the one on cold fusion.
David, an additional point I feel your post doesn’t really touch on that I think is worth mentioning is the fact that now with Altmetric.com (not sure about all the other altmetrics) you can visibly see who is mentioning and talking about an article, Altmetric.com recently just added in the China weibo tweets as a source too widening the discovery net. The point being this is not only measuring an alternative or new metric (which I agree is still an evolving field/moving target) but it’s connecting science, researchers, authors, readers and publishers together in a more global, and technically easy to do way.
This is indeed a useful benefit of attention metrics. Research is in many ways a reputation-based career. In controversial fields, things like climate change, it’s probably helpful for researchers to be able to track the public conversation, how are people re-using my work and my name, can I check what they’re saying to make sure it’s accurate?
This is a very useful critique of the value of the metrics we have been trying to use to measure, well, value. It reminds me of something we in scholarly book publishing have known for a very long time: sales do not necessarily correlate with quality. This is like the example of the movie doing well at the box office David gives at the end. Popularity is just not a good measure of quality. Unfortunately, in scholarly book publishing, the economics of the business have compelled publishers to pay more attention to popularity (i.e., sales potential) than to quality in making decisions about what gets published. One major virtue of open access is that it shifts this dynamic away from market-based decisionmaking and gives more weight to quality. Probably the best measure of quality in academe for books comes in the form of reviews in the major scholarly journals, but the irony is that often reviews take so long to appear that decisions on tenure, for example, can’t wait for them to be available. Journal articles, of course, are not reviewed in this way, so measuring their quality remains an even greater challenge than it is for scholarly books.
Sales do not necessarily correlate with impact–but what about when they DO? What about the biomed literature that’s not necessarily highly cited, but is highly read–and used in a clinical setting by nurses and doctors? Or popular papers that go on to have an impact–positive or negative–on law and policy. (I’m thinking here of the Regenerus study and how it was used by the courts in Michigan recently to strike down an anti-gay law–with the judge pointing out how biased the original study is, and how it can’t serve as a basis for enacting such laws.) Point is, “quality vs non-quality” is a straw man. There are a lot of different types of impact, and the study of altmetrics has a long way to go towards helping us define these impacts. To dismiss it outright, when the field is still in its infancy, is a bit like throwing the baby out with the bathwater.
On another note, I wholeheartedly agree with Crotty–the measure should not become the target. This is true with altmetrics, same as with IFs and citation counts. University administrators need to ensure that this point is driven home with their tenure-track faculty. We can’t blame the researchers themselves if they’re just trying to get by in a system that stresses quantitative over qualitative demonstration of impact.
The point about clinical journals is an important one. An article could never see a single citation but could improve treatment for millions of people. For many engineering journals, the work is solutions-based rather than hypothesis-based. If your paper finds the definitive solution to a problem, that may not see a lot of citations because the problem is solved, hence little future work in the area. Should these sorts of journals be judged by the same criteria as a journal for say, theoretical physics? It’s why consolidating down to one metric or one number is an archaic and unhelpful approach.
Though to be fair, I’m not sure how one would use metrics to show the impact of these sorts of papers, particularly whether usage/readership would give us helpful insight.
Yet this is the issue that is troubling many funding agencies in the UK and US (particularly NIH). The “shelves” are filled with vetted research but getting this into practice is proving to be throttling the value in application of this knowledge (theory to practice in applied disciplines). In education, particularly K-12 the practice is being dominated by the “for-profits” and very practical minded VC’s with scant node to academic theory. Somewhere The Academy may need to rethink before the resource flows start to be repurposed, in more than a small part.
Your observation is interesting. NIH and the National Academy of Sciences has too many examples of research just sitting on the shelve until someone sees an application. We see shifts in grants from one sub field to another. In short, the research plays out and the scientists moves on, if able, to a new and related sub field. I think it is called re-tooling.
There are various statistics as to the number of scholarly journals (somewhere around 25,000) and the number of articles (about a million or so/yr) and the average number of reads (from 3, editor and reviewers, to about 12) Thus metrics, of any kind, start with select journals in various fields which are ranked and which filter out materials that may never surface even when “forced” by clever use of citations and other “marketing tools”. It’s a game where the stakes are promotion and tenure and large grants to academics. It’s also a game played by publishers. So all this allusion to quality and relevance is so much persiflage. Ego, self-worth and similar issues block the possibilities of changing the game for academics and having to wade through the rankings avoids the work of evaluation of the research by granting agencies and institutions. Alt or new metrics which do not change the game represents rear guard actions in the world of post secondary education and research which needs to face up to the change that is on the horizon
It would be interesting to see if Altmetrics is on the radar of either tenure and promotion committees or hiring committees for medical school faculty. I know from working with various medical schools that the hiring of new faculty often includes an evaluation of their research publications and committees do pay attention to impact factors. Also important is the candidates success rate at NIH research grants. Certainly there is a lot of interest in Altmetrics but is this new metric being used at the working level?
We recently saw that UC Denver’s med school includes metrics in their P&T dossier prep guidelines (see p 84) http://www.ucdenver.edu/academics/colleges/medicalschool/facultyAffairs/Documents/DossierBuildingGuide2013.pdf
As for other med schools, I haven’t sought out those examples but would be keen to know if others have found such evidence.
Today’s technology allows us to have data that we would have always wanted but was impossible to get–specifically how many times any specific article is accessed. In a paper-based world the only metric was citations because there was no way of knowing how many times it was read. Libraries still do shelving counts for bound volumes to detect use but is still by journal title not even by issue let alone article. An article I wrote a few years ago has been cited 11 times, for my area of librarianship that’s massive but that same article was downloaded several hundred times in its first year. Downloads are, of course, only a measure of interest not of impact but it’s certainly a better indication of impact than zero downloads. If an article is cited in presentations it’s still being cited but that impact is currently invisible. We do need to find meaningful ways of folding additional information into the evaluation process but we also need to stop clinging to those older, exclusionary, models like lichens to a stone.
To be fair. No one is clinging to the old, what one is doing is questioning the relevance of the new. Several hundred downloads, how many reads? In the early days one talked of eye balls only to find that eyeballs did not mean sales. With this observation the dot com bubble burst!
Thus, William Bruce Cameron’s observation is astute and to the point.
‘what we really want to know is whether it is any good.’
Not to be pernickety, but if you could just define ‘good’ for me?
A fair question. To quote US Supreme Court Justice Potter Stewart, “I know it when I see it.” https://en.wikipedia.org/wiki/I_know_it_when_I_see_it
That gets to the fundamental question of how much one can rely on algorithms and numerical scales to make what is essentially a qualitative judgement call.
So that’s the thing — leaning on particular metrics prematurely to the detriment of the whole enterprise. ‘Good’ is obviously as much about the reader as the material, so different metrics may come to be useful for different people, but only if not drowned at birth. We need the freedom to play around with this stuff (for a start, hardly any scientists are social so it’s bound to be patchy).
Experimentation is greatly welcomed, better solutions desperately needed. I just worry about declaring something meaningful because it can be measured, rather than because it’s meaningful. Or assuming there’s a quantitative solution to every qualitative question. As William Bruce Cameron famously said, “not everything that can be counted counts, and not everything that counts can be counted.”
And it should be encouraged because it is another experiment. While we don’t want to declare mission accomplished too quickly, we also don’t want to shut down explorations because they don’t immediately bear fruit. Rather than say that altmetrics or attention metrics are a good or bad thing, why not say that they are a new thing and worth studying for their own sake?
Well put Joe.
It’s all about vanity, right? We post to Facebook and look to see how many of our friends “like” what we had to say. We post on Twitter so see if we get retweeted. LinkedIn emails us all the time to say that someone “viewed” our profiles. A non digital version of this in years past would be creepy. Some of us blog and look at the number of comments or views to determine our “sphere” of influence. It is a fact that if you want people to listen to what you have to say, you need to say something interesting. I suppose this facet of human nature plays perfectly into the social networks we enjoy.
So it is certainly human nature for researchers to care very deeply about how they are influencing their community and now even outside their community. And now we have the metrics available to do this. Of course the downside to this potpourri of metrics is assessing the value of the metrics and the usefulness. One editorial board member recently suggested they give an award to the paper with the most downloads each year. I suppose that is one way to get our usage up as authors download their own papers hundreds of times.
What happens when the metrics are aggregrated and analyzed and funding gets cut to discipline A or problem B because the social metrics are low? What if a publisher has 90 percent of articles for a specific journal with donuts holding a big fat zero? Does the publisher look to cease publishing the journal because “no one cares”?
I guess my point is that the metrics are great if it is clearly defined what they are measuring and if the metrics aren’t used for extrapolating value not measured by the metric.
But isn’t “extrapolating value” is exactly what we’ve been doing all these years by depending primarily on citation counts and JIFs to understand the impact of articles? Kent’s SocialCite project highlights the fact that a citation can have many meanings; same goes for any other metrics, including altmetrics. Much research remains to be done–for both citations and other types of metrics–into the “why”, now that we can more easily count “how many.”
>> What happens when the metrics are aggregrated and analyzed and funding gets cut to discipline A or problem B because the social metrics are low? <<
On the one hand, this is pure speculation. On the other hand, I'd rather have funding get cut because of evidence than because of "intuition" or political agendas.
Interesting point but I don’t think it’s all about vanity; I think these social metrics are useful not just for a count of, say, tweets mentioning your paper but for the connections these mentions provide. Being able to track and interact with people interested in your research is potentially valuable as a precursor to potential future collaboration or at the very least, networking with people who find your research of some value to them. It’s still hard to know what the metrics are in fact measuring but maybe it’s time to stop simply counting ‘likes’ and tweets and start extracting people-centric information from them. These are *social* metrics after all.
I’m always a bit dubious about the value of online networks and collaboration, though this may just be a bias from the type of research I’ve done. For a bench biologist, time is a precious resource, as are many reagents. You’re often working with rare tissues that are hard to come by, or things that take months to build up enough of a supply to do experiments. Collaboration is a carefully vetted process. I recall at one meeting a scientist telling the audience that she’s more selective in who she collaborates with than who she sleeps with.
This may be very different for other fields where collaboration is less intense a process and reagents are readily duplicated. For a computational researcher, digital data or an algorithm can be easy to share and a collaboration may require little time or effort.
So the value offered is going to be widely variable, depending on the type of work one does.
This may be of practical interest: “Independent review of the role of metrics in research assessment” (by the Brits). Comments welcome.
http://www.hefce.ac.uk/whatwedo/rsrch/howfundr/metrics/
I think it is a mistake to characterize all non-citation based metrics (aka altmetrics) as strictly attention metrics. Gathering metrics from a wide variety of sources, categorizing them by the type of activity they represent, and then providing tools that allow people to compare like-with-like, gives tremendous insight into scholarly communication. It can be especially helpful for early career researchers to get traction long before they hit critical mass of citations to their work.
We debated this point during the closing of the Allen Press Emerging Trends in Scholarly Communication seminar that David moderated last week.
You can view the video from the closing remarks here: https://www.youtube.com/watch?v=eWBsx9Ejv1k
Agreed, and as noted in the post above, there’s perhaps been too much attention paid to attention, rather than looking more broadly at other measures that perhaps aren’t quite as obvious or easy. Similarly, perhaps too much attention paid to replicating the use of the Impact Factor, rather than thinking in a more open-minded manner about the enormous variety of things that metrics can help us with.
Good to know the video has been posted.