A new initiative to make academic assessment less reliant on the impact factor has been launched. The initiative is the San Francisco Declaration on Research Assessment, adorably dubbed “DORA” by scientists who have, over the past two decades, become masters of the acronym. The DORA Web site lists about 400 individual and 90 organizational signatories to the declaration currently.
As described in one of the editorials published in a coordinated fashion last week, Bruce Alberts, the editor-in-chief at Science, the declaration was:
. . . initiated by the American Society for Cell Biology (ASCB) together with a group of editors and publishers of scholarly journals, recognizes the need to improve the ways in which the outputs of scientific research are evaluated.
The main impetus for the group stems from what they perceive to be an over-extension and misapplication of the impact factor in the assessment of scholarly achievement across disciplines. As Alberts writes:
. . . [the impact factor] has been increasingly misused . . . with scientists now being ranked by weighting each of their publications according to the impact factor of the journal in which it appeared. For this reason, I have seen curricula vitae in which a scientist annotates each of his or her publications with its journal impact factor listed to three significant decimal places (for example, 11.345). And in some nations, publication in a journal with an impact factor below 5.0 is officially of zero value. As frequently pointed out by leading scientists, this impact factor mania makes no sense.
The head of the ACSB, Stefano Bertuzzi, sought to be both aggressive and conciliatory when quoted in a Science Insider article by indicating that this is not an attack on Thomson Reuters or the impact factor itself:
I see this as an insurrection. We don’t want to be at the mercy of this anymore. We’re not attacking [Thomson Reuters] in any way. [We are attacking] the misuse of impact factors.
This all sounds laudable, but there are problems. One of the problems is a lack of novelty in the thinking — that is, these tendencies are well-known and have dogged the impact factor for years, as shown in this quote from a paper published in 2005:
The use of journal impact factors instead of actual article citation counts to evaluate individuals is a highly controversial issue. Granting and other policy agencies often wish to bypass the work involved in obtaining actual citation counts for individual articles and authors. . . . Presumably the mere acceptance of the paper for publication by a high impact journal is an implied indicator of prestige. . . . Thus, the impact factor is used to estimate the expected influence of individual papers which is rather dubious considering the known skewness observed for most journals. Today so-called “webometrics” are increasingly brought into play, though there is little evidence that this is any better than traditional citation analysis.
Even so, there are indirect measures that can be applied given what the impact factor measures, as one researcher into citation metrics has noted:
Impact Factor is not a perfect tool to measure the quality of articles but there is nothing better and it has the advantage of already being in existence and is, therefore, a good technique for scientific evaluation. Experience has shown that in each specialty the best journals are those in which it is most difficult to have an article accepted, and these are the journals that have a high impact factor. Most of these journals existed long before the impact factor was devised. The use of impact factor as a measure of quality is widespread because it fits well with the opinion we have in each field of the best journals in our specialty.
The impact factor has clear utility, especially for distinguishing between journals within a field. Abandoning it for the wrong reasons would be akin to throwing the baby out with the bath water.
Yet, instead of focusing solely on researchers, funders, and academic institutions — the main bodies with control over how research assessment occurs — the draft DORA statement also draws publishers in as extrinsic elements reinforcing this purported environmental problem. The declaration also has some unsupportable elements, something Philip Campbell, the editor of Nature, pointed out in an interview in the Chronicle of Higher Education, pointing out that Nature has not signed on to DORA:
. . . the draft statement contained many specific elements, some of which were too sweeping for me or my colleagues to sign up to.
Campbell picked up on one of the statements that caught my eye, as well — that journals should:
. . . [g]reatly reduce emphasis on the journal impact factor as a promotional tool, ideally by ceasing to promote the impact factor or by presenting the metric in the context of a variety of journal-based metrics (e.g., 5-year impact factor, EigenFactor , SCImago , h-index, editorial and publication times, etc.) that provide a richer view of journal performance.
So, while this is ostensibly not an attack on Thomson Reuters, one recommendation is for journals to cease promoting the impact factor. That feels like an attack on both journals and Thomson Reuters. The editorial release in the EMBO Journal spends an inordinate amount of time complaining about the impact factor. In Molecular Biology of the Cell, the acronym-happy scientists create a mock impact factor and call it the Metric for Evaluating Scientific Scholarship (MESS), a definite slight against the impact factor. Despite acknowledging the utility of the impact factor in evaluating journals in many ways, some publishers have over-reacted, with eLife saying:
If and when eLife is awarded an impact factor, we will not promote this metric.
Other vocabulary emerging includes a “declaration of independence” and “scientific insurgents,” again creating the impression of throwing off an oppressive regime of publishers wielding the impact factor.
The core issue isn’t the existence of the impact factor or its utilization by journals, but its lazy adoption by academics. Conflating other issues is equally sloppy, and eLife is certainly throwing itself on its sword for no particular reason.
More disconcerting from a group of sophisticated academics is some confusion within various statements — for instance, in the quote above, the DORA authors list a “variety of journal-based metrics” but include the h-index, which is not a journal-based metric but a hybrid article- and researcher-based metric. In addition, while decrying the use of one journal-based metric, they ask publishers to deploy more journal-based metrics to . . . make it less likely that academics will use journal-based metrics?!? Things like this make the entire document seem hasty and substandard.
There are specific recommendations pertaining to publishers that also seem unnecessary or ill-conceived:
- Remove all reuse limitations on reference lists in research articles and make them available under the Creative Commons Public Domain Dedication. Publishers can spend a lot of time and not an insignificant amount of money ensuring that reference lists are accurate. What incentive is there for them to then make these lists freely available? Who does it help? Commercial entities wishing to make products off the backs of publishers?
- Remove or reduce the constraints on the number of references in research articles. Constraints on reference lists are intended to force researchers to point to the best sources and not throw everything in but the kitchen sink. These limits also help publishers control the costs of checking references. As with many points in the declaration, this is irrelevant, actually. If the goal is to make academic institutions stop using the impact factor as a proxy for academic achievement, why would this issue matter?
- Relax unnecessary limits on the number of words, figures, and references in articles. This isn’t an enumerated recommendation, but is included in the narrative preceding the list. The idea itself is completely author-centric — shorter articles are more readable and usable, but authors often find the work of shortening or revising their work to be burdensome and difficult. There is an editorial balance to be struck, and DORA should not stick its nose into that balance. It’s a superfluous issue for the initiative, and adds to the impression that an ill-defined animus informs the declaration.
There’s a deeper problem with the DORA declaration, which is an unexpressed and untested inference in their writing about how the impact factor may be relevant to academic assessment groups. They assert repeatedly, and the editorials expand on these assertions, that tenure committees and the like connect the impact factor to each individual article, as if the article had attained the impact stated. I don’t believe this is true. I believe that the impact factor for a journal provides a modest signal of the selectivity of the journal — given all the journal titles out there, a tenure committee has to look for other ways to evaluate how impressive a publication event might be. In fact, research has found that impact factor is actually a better marker than prestige in some fields, especially the social sciences, because it is more objective. If we dispense with something useful, we need something better in its place. That is not on offer here.
In a more narrative editorial about DORA published in eLife, the wedge politics of DORA and the reliance on personal stories both become more clearly evident:
Anecdotally, we as scientists and editors hear time and again from junior and senior colleagues alike that publication in high-impact-factor journals is essential for career advancement. However, deans and heads of departments send out a different message, saying that letters of recommendation hold more sway than impact factors in promotion and tenure decisions.
Another way of saying this is that you need both publication in high-impact journals and letters of recommendation for career advancement. Since that’s not an inflammatory statement but common sense, but controversy is a prerequisite, the situation is cast as an either-or choice — you have to drop one to have the other, a clearly false premise.
Another anecdote comes next:
. . . researchers on the sub-panels assessing the quality of research in higher education institutions in the UK as part of the Research Excellence Framework (REF) have been told: ‘No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs’. However, there is evidence that some universities are making use of journal impact factors when selecting the papers that will included in their submission to the REF.
Again, the prohibition is against the sub-panel’s use of impact factors. But if a university is trying to submit a tight, powerful application, and if the impact factor provides a reliable path for this activity, the complaint isn’t clear to me at all. If I’m whittling down a list, I’m going to use some tools to whack away the underbrush, including impact factor, authorship factors, recency, and so forth. Conflating this activity with the activity of a sub-panel isn’t fair, but doing so highlights the group’s singular focus on the impact factor.
Alternatives are mentioned in many editorials, and of course various alt-metrics groups have signed onto DORA. Yet, alt-metrics still provide metrics, and any metric is going to be susceptible to manipulation and perceived misuse. In addition, the transparency demanded by DORA would likely be challenging for alt-metrics providers, who have their own “secret sauce” approach or unclear standards in many cases.
To me, the authors of the DORA declaration have not done sufficient legwork to see exactly how the impact factor is being used by tenure committees, have not given sufficient thought to their goals, have offered mixed messages about how to improve the situation, have thrown journals into an issue that could be dealt with separately, and have allowed some petty resentments to mix into what could be a more vaunted approach to improving academic assessments.
That’s too bad, because the best point of the declaration — that academics should be evaluated using a variety of inputs and mostly on their potential and fit — risks being lost in what is being interpreted broadly as an attack on the impact factor. And if academia thinks the problem is not their own practices and cultural attitudes, they could miss the point of the DORA declaration entirely.
26 Thoughts on "Impact Crater — Does DORA Need to Attack the Impact Factor to Reform How It Is Used in Academia?"
Impact Factors, 5 Yr Impact Factors and Article Influence Scores are all highly correlated; advertising more related metrics does not add much more information for authors. Instead it just gives another chance for a journal to claim to be top in its field.
Conversely editorial and production times are unique to each journal and do provide authors with extra information to make an informed choice about submission.
There does seem to be a pervasive conspiracy theory that journal publishers are responsible for the Impact Factor when the truth is that they are instead merely reacting to the priorities set by academia. At nearly every editorial board meeting I attend, much time is spent complaining about how misused the IF is, yet the conclusion every time is that we can’t ignore it because our authors and readers want it. Our job is to provide the academic community with the services they need. If academia can reform the standards used for promotion/funding, then journals will follow.
As you note, that’s easier said than done. One phrase that’s pervasive throughout the DORA declaration is the notion of assessing research on its own merits. That would be ideal, but it requires a depth of expertise and time be spent on each level of evaluation, which makes for a much slower, much less efficient system. When you’re on a grant study section and you have a two foot high stack of grant applications, you need a quick and easy way to narrow things down. When you advertise a job opening and get 500 seemingly qualified applications, you need some shortcuts to bring that down to a manageable level. There’s no way you can read every single paper published by every single applicant in depth, or consult on each paper with an expert in that field. That time simply doesn’t exist. And so we need standards and shorthand ways of clearing through these unmanageable quantities, which continue to increase. The IF may indeed not be the best measure for doing so. I think the move to article level metrics should be applauded, as the article itself, and its citation count, tells me more about the individual work than the citations given to articles around it.
The other problematic issue raised in DORA is the idea of changing the criteria for evaluation to include activities beyond the production of scientific results. There’s some merit here, to be sure, for particular types of research where the outputs vary. But it’s also something of a slippery slope–we often see calls for career rewards for things like blogging or leaving online comments on papers. I worry about an attempted “end run” by scientists who aren’t particularly productive, who are trying to change the evaluation criteria to better fit their strengths, which are in areas other than discovery. Research institutions and funding agencies have to carefully assess their priorities and strictly limit the credit and rewards offered to activities that directly serve their needs and goals. Just because you’re particularly good at doing X doesn’t mean that your institution sees any return from X, or has any incentive to reward you for X.
It seems to me that the alternatives to IF are measuring popularity and not worth. So I would posit the question to those detractors of IF: What is the value of popularity?
Regarding article length. Writing is hard work and most, including scientists, do not like to work hard at writing. Writing is an onerous task, and the most challenging comment to a sentence or paragraph given by a reviewer is the word: vague.
I see the attack on IF coming in tandem to OA. If there is no standard then the journal Science becomes just as authoritative as the magazine Popular Science.
Is there a part of paragraph missing towards the end? (Fourth from the bottom – cut me off in the middle of the story…)
I just saw Eugene Garfield, the founder of ISI and the Impact Factor a few days ago at MLA in Boston. Amazing that something that was started all those years ago is still so strong and influential .I know that many medical school search committees often use the IF as a filtering device when searching for new faculty. Department chairman and Deans often are looking for highly talented individuals and the IF is used as a tool to identify candidates. Publications published in journals with low IF are discounted. The same can be said for tenure and promotion committees.
Researchers actively look for high impact factor journals to publish.
There have been challenges to the IF in the past but it is still the gold standard. Publishers work hard to maintain quality, build their brands, so why not promote their IF? Users like to know.
Does anyone in DORA realize the world’s largest search engine uses an algorithm not very different from Impact Factor? Links (cites?) to pages is a very significant ranking criteria, and the importance (i.e. links to) those citing pages counts as well. So, if your page scores highly in a search, it means it has been cited by other well-regarded pages. Sound familiar?
This is a really good point, and you just made me dope-slap myself for not thinking of it!
The networked elaboration of the impact factor concept has been extremely powerful, of course. It also plays out in networks of influence on the personal level. There’s a profound aspect to the concept that shouldn’t be overlooked, and which might help explain its durability.
I think it’s an interesting comparison. But doesn’t Google weigh both the source/site of the article and the article itself? A page gets a better ranking if it’s in the NY Times, but the links to the actual article itself are a bigger driver of ranking, correct? The IF though just looks at the site, so a move to be more Google like would look at a combination of both journal level metrics and the article level metrics.
I believe Google utilizes the PageRank algorithm, which does indeed account for source/site. I believe PageRank inspired the SCImago Journal Rank as well.
Google’s algorithms are indeed more complex than IF, and include sophisticated routines to prevent gaming like abusive self-cites. (Hmmmmm. Might be useful!) But the basic idea is that being cited is good.
Nope. Google looks at the specific item (‘page’) and is not fussed about the ‘journal’ (‘site’) in which that page appears (so long as it’s not buried too many pages down!). That’s the whole point of this debate. Impact Factor (or the Ephemerality Index as I’ve long described it … ) doesn’t look at individual papers, even if we accept the problem-ridden sense of equating ‘citation’ with ‘worth’. Being cited isn’t everything, being a journal that has many papers highly cited (within two years of appearing, remember) isn’t everything either. But it is not merely that IF isn’t *everything*, but that it genuinely isn’t *anything*. The argument is not that IF is a slightly unreliable index, it is that IF is grotesquely misleading pseudo-index.
When IF first emerged, there was at least a plausible knock-on argument in its favour. In ‘paper-only’ days, being physically close to other papers in a widely read (and thus cited) journal surely raised the chance that readers might stumble upon your efforts. But now most researchers don’t read the physical journal or even have to scan-read a volume’s article listings. Readers are electronically directed to the paper of ‘choice’, driven by search engine criteria (maybe author, maybe broad topic, maybe specific buzz-words). There are far fewer serendipitous ‘near-neighbour’ delights in library work these days! The decline of this knock-on benefit reduces still further any *potential* relevance of IF.
And remember, Google ranks don’t stop at 2 years ….. they’re cumulative.
My understanding is (assuming one trusts Wikipedia at this point):
The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it (“incoming links”). A page that is linked to by many pages with high PageRank receives a high rank itself.
That means the PageRank is not just about the number of links a page receives, but also factors from where those links originate. This creates a context, not that different from that offered by the journal title–a page in the NY Times, which is linked to throughout the NY Times, a reputable site, gets more from those links than a page in a lesser site linked from that lesser site. That would suggest that there’s value in contextual level ranking and looking beyond just the article level itself.
The Page Rank algorithm (named after Larry Page), principly counts inflow of links to single pages. It’s radically different from the calculation of the Impact Factor, which has an averaging effect, and a hard time cutoff.
If you want to look into algorithmic appropriatness then the seminal paper by Bollen et al applied both the Page rank algorithm and the impact factor algorithm to a large set of data (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0006022).
I’ll let their conclusion speak for itself
“Our results indicate that the notion of scientific impact is a multi-dimensional construct that can not be adequately measured by any single indicator, although some measures are more suitable than others. The commonly used citation Impact Factor is not positioned at the core of this construct, but at its periphery, and should thus be used with caution.”
If the DORA recommendations are accepted, this will result in nothing but paper spamming and publish-or-perish. There is a reason why some things go right into Nature and some “findings” end up in Journal of fill-in-your-discipline or PLoS One. Granted, sometimes there are hidden pearls and sometimes crap makes it to Nature. But overall, a high-impact article usually reports something of higher impact. And because faculty search committees are not necessarily total experts on each applicants work, impact factor is important for evaluation. Otherwise, it will result in a race to the bottom, just publishing whatever goes as fast as possible. Even right now, it is risky to pursue a high-effort-high-impact project, because even with your Science paper, in the end you might lose out to the guy with the three or four generic impact-around-4 style publications because of his perceived superior productivity. It’s bad enough that things like H-index already reward those certain co-author types who show up on everything that is published within a mile radius or the high-productivity types who spam out some generic self referential iteration of their standard experimental repertoire thrice a year.
DORA says that the IF is OK for journals, bad for science. Every scientist, particularly those on study sections, will say “it’s not great, but it’s the best we have.”
It’s NOT THE BEST WE HAVE ANY MORE. We now have the means to look at individual articles, so why don’t we do that?
This isn’t a researchers vs. publishers thing. If you sum up the article level metrics for all the papers in a journal and divide by the output of the journal, you have the impact factor, right? So publishers can still tout that if they like, whereas researchers can actually be judged on their own work, not on the reflected glory of the journal brand, which we now know doesn’t allow you to make useful predictions.
One would think that “this is good for researchers” would be enough to get behind it by itself…
A researcher publishes a paper in Nature three months before his tenure committee meeting. No citations have accrued. Does the impact factor have no utility here?
A researcher publishes a paper in Science, but it receives only three citations after five years because it’s so novel that it has started a new sub-branch of a field, and there are a bunch of funded project ongoing but no steady flow of papers hitting journals yet. Does the impact factor have utility here?
Any metric has limitations. Article-level metrics have latency problems, and can’t account for many situations.
The problem isn’t the metric, but how it is interpreted. The DORA declaration has some problems that keep it from being as incisive and important as it could have been. Dragging the impact factor into the town square for its annual flogging is just one of these problems.
“The problem isn’t the metric, but how it is interpreted. ”
The debate about ‘Impact Factor [sic] and its deployment by all and sundry is precisely whether it is a ‘metric’ or not. Given how IF is defined (why <=2 years? etc) and the multiple limitations on plausible interpretations of what it *might* indicate, it cannot justify being consider 'a metric' of any relevance to the assessment of the quality of an individual researcher, their published work, even their work published in that particular journal. IF is so far away from contact with any of those evaluations as to be useless in that context. But worse, if it is deployed in these evaluations then it becomes positively misleading – if one choses to be led!
Whether IF retains some, at best narrow, field-specific, significance for the *journal* is a separate consideration. As for "its annual flogging", well, Impact Factor is indeed a dead horse.
What article level metrics would you suggest be used in this manner? I do think that citation count for an individual article is much more informative about the work than the journal’s Impact Factor (though as Kent points out, it’s a slow metric). Are there any others in use that offer indications of quality (rather than popularity)?
And I’m not sure this post is really against the sentiment behind DORA, just in the methodologies chosen and in the finger pointing at publishers, whose role in the use of the IF is merely a reaction to the priorities of academia. Note that Kent’s other post from this same day points at those who have taken a different path toward improving things in this area and offers praise for their efforts (http://scholarlykitchen.sspnet.org/2013/05/21/populism-vs-activism-encountering-limitations-in-the-age-of-online-petitions-and-signatures/).
I think that DORA is demonising a metric that has been promoted simply because the stakeholders (researchers) demanded it. Indirectly, of course, I am concerned that DORA demonises Thomson Reuters and by doing so, promotes Scopus.
I’m not clear. Do you mean that the “stakeholders (researchers)” demanded Impact Factor’s use be promoted, or that DORA demonise it?
So far as I can report, researchers (the people doing the research and the publishing) haven’t ‘demanded’ the ‘use’ of Impact Factor for anything … at least not those of us in the numerate sciences! (At its most trivial, you can’t describe a highly non-normal distribution by quoting its mean value … etc etc)
I wonder if ‘too much to do, too little time’ is maybe the real disease here? This decades-long discussion of Impact Factors as a poor outcomes treatment is maybe symptomatic of a disease for which alternative treatments quickly become preoccupied with targeting the old formula instead of the disease.
Question for all involved in this discussion: Given the controversy here:
Should attribution of the original source journal matter? If the goal is that work should only be judged on the merits of that individual paper, then knowing where it was originally published should not play a role in one’s interpretation of it, correct? So does a license requiring attribution to the original journal title for a paper work against the goals stated in DORA?
science does not progress when one promotes people who specialize in generating repetitious least-publishable units and/or in knowing best how to game the bureaucracy and how to please failed-ph.d.-kingmakers at major journals, the mafiosos at NIH/NSF study sessions, and their peers by citing them or giving them openings for further fassade publications.
science progresses when important scientific breakthroughs are made.
therefore whatever contributes to increasing the probability of such breakthroughs is an important contribution to science (this includes onerous teaching beyond the textbook).
i propose to evaluate scientists and their output according to how many established theories (or “scientific” fads) they have refuted, how many seminal hypotheses and crucial new questions they have proposed, how many breakthrough new methods they have developed, etc.
5, 10, and 15 years after the ph.d., the scientist would write his/her own explicitly argued and heavily footnoted evaluation describing his vision and merits as breakthrough thinker and scientist by commenting explicitly on his established-theory refutations, novel hypotheses and questions proposed, breakthrough new methods developed, etc., and by contrasting everything to how things were before his work.
the factuality of the listed results and presented context and the relevance that the evaluated person attributes to the topics and results mentioned in the self-evaluation would then be critiqued by
a) a group of experts recommended and justified by the evaluated person and
b) a group of international experts chosen by a panel of national experts themselves chosen by the country’s professional organization.
(this could be refined of course; the most important thing is to avoid both invidious and crony reviewing).
the two reviews would then be exchanged between groups and contradictions would be eliminated.
those who would come up empty-handed would start performing more and more work for others who have delivered breakthrough work in the past (and the technical training of such “”support specialists” would be augmented and everybody would get paid the same to avoid careerists).
these “specialists” would also carry out work for starting postdocs.
say one would start working 25%, 50%, 75%, 100% for others, after coming up empty-handed after 5, 10, 15, and 20 years….
of course, after delivering an important breakthrough (“hard work” and a “steady output” do to qualify as such), one would regain all of the “lost” ground (quotation signs because it must be a nightmare to have to feign that one is a creative scientist when one is not, especially if one is paid
the same either away).
Whenever I read anything by a researcher that refers to journal editors as “failed PhD’s” or “failed postdocs” I tend to immediately dismiss everything else that person has to say, as it’s a clear sign they are woefully out of touch with both economic realities and how journals work. First, many, if not most journals have editors and editorial boards made up of working scientists, chosen carefully for their expertise and high level of respect from their peers.
Second is the notion of the “failed PhD” being someone not working as an academic PI. This is an outdated and quaint, if also terribly rude and insulting position. As shown in these charts, fewer than 40% of PhD’s get jobs at graduation:
Are more than 60% of the students at your institution failures? According to this article, the percentage of PhD biologists who hold tenure track positions is less than 30%:
I suggest you walk around your institution and call 70% of those you see “failures” to their faces. This may not enhance your reputation or your chances of career advancement.
As for your suggested system for career advancement, it seems to require quite a bit of magic. First, you will need a source of funding to provide support for 15 years of independent research from every single recipient of a PhD. Given the ever increasing number of students graduating, this will massively overwhelm current levels of research funding. Then again, that may not be a problem as likely every researcher will be spending most of their time serving on panels evaluating, conferring and reviewing the constant flow of self-evaluations flooding the system. Who would have time to do any research? Finally, requiring everyone be paid the same amount to “avoid careerists” means there is no incentive provided for excelling, no chance of bettering oneself through success, and is a surefire means of driving the best and brightest minds out of science and into the private sector where they can hope to improve their living conditions, maybe even support and feed their families.