Last Monday the Center for Open Science (COS) formally launched TOP Factor, an alternative to the Journal Impact Factor (JIF) based primarily on the Transparency and Openness Promotion (TOP) Guidelines. For full and early disclosure, I’m Vice Chair of the COS Board but my interest here is in situating this new initiative in the longer history of attempts to shift research assessment – and in other words, to fundamentally change research culture.

walnut cracking
walnut and nut peeling tools on the table

The need for fundamental culture change

A recently released Wellcome Trust report, What Researchers Think About the Culture They Work In, makes sobering reading. Collectively, it paints a “shocking portrait of the research environment” according to Wellcome’s Director. A system that puts more value on research metrics and quantity than quality has led to unhealthy competition, bullying, harassment, and mental health issues.

“These results paint a shocking portrait of the research environment – and one we must all help change. The pressures of working in research must be recognised and acted upon by all, from funders, to leaders of research and to heads of universities and institutions. As a funder, we understand that our own approach has played a role. We’re committed to changing this, to foster a creative, supportive, and inclusive research environment.” Jeremy Farrar, Director of Wellcome

Rapid and wide-ranging improvement is required not only for researchers themselves but also because of the ways in which this hyper-competitive environment reduces both the time available for thinking creatively, the likelihood that scientists will take risks to pursue their most imaginative ideas, and in some cases, the reliability of results.

At the core of the challenges facing the research enterprise is a broken incentive system rewarding novelty and publication in a small number of highly selective journals. 29% of Wellcome survey respondents felt that publishers have a “high responsibility” for changing this culture with another 42% believing that we have a “medium responsibility”. But, as noted in publication of the original TOP Guidelines, the situation is a classic collective action problem. Individual researchers lack strong incentives to be more transparent (and in fact, are rewarded for quite the opposite). Yet there is no centralized means of aligning individual and community incentives. Universities, funders, and publishers each unwittingly align to maintain the status quo because all of their incentive systems rely on journal name and/or quantity of publication to evaluate, fund, and promote researchers.

The case against JIF has been clear for many years: primarily that a factor derived from citations to all articles in a journal cannot tell us anything about the quality of any specific article, nor of the quality of work of any specific author. These points become even more evident when we understand the ways in which JIF can be manipulated (for example, by publishing more review articles). Despite all of these evident limitations (and the fact that JIF is widely accepted to be flawed by many different types of publisher), JIF remains highly influential.

There have been a number of bold attempts to reduce reliance on JIF, or at least to provide alternatives for consideration alongside JIF.  Most notably, the launch of the Declaration on Research Assessment (DORA) in 2013 aimed to promote robust and efficient ways of evaluating research and researchers that do not rely on JIF. While the DORA declaration has garnered an impressive number of signatories (1,870 organizations and 15,562 individuals), its ability to deliver meaningful change has been frustratingly slow (no doubt influenced by the powerful allure of the status quo).

The development of alternative metrics – altmetrics – has also sought to provide a viable alternative or complement to JIF. The most important advance with altmetrics is that they are article, rather than journal, based. There is clear evidence of the growing use of altmetrics – for example, within the UK Research Excellence Framework as demonstration of wider public reach. Altmetric themselves have interesting cases of how social media activity can support promotion and tenure, while other research suggests that the online activity measured by altmetrics, may drive citations.

Along comes TOP Factor

Development of TOP Factor followed the work of the original TOP Guidelines committee published in 2015. At the outset, the committee established the eight standards and levels of stringency that correspond to the scoring (and which are maintained by a standing committee). The development of TOP Factor scoring has occurred over the last year internally at COS through testing and maturing a rubric for evaluating journal policies for adherence to the TOP Guidelines. The TOP Guidelines themselves now have over 5,000 signatories – organizations and journals that are expressing support for the principles — along with about 1,140 journals that have TOP-compliant policies (although the vast majority minimally so).

TOP Factor assesses journal policies for the degree to which they promote core scholarly norms of transparency and reproducibility. The hope is that this new alternative to JIF may reduce the dysfunctional incentives for journals to publish exciting results with little regard for their reliability. Authors can use TOP Factor to identify journals that have policies aligned with their values and credit their effort to be more rigorous and transparent. Funders can use TOP Factor to assess which journals are most likely to support their policy mandates for grantees. And publishers can use TOP Factor to identify journals with progressive policies for inspiration, and monitor trends in policies by discipline. There are some clear benefits to this approach:

  • Perhaps most importantly, this is the first rating that focuses on something other than the novelty or newsworthiness of results. (New discovery is of course central to the scientific process but focus on novelty alone has too often led to bold claims of groundbreaking advances that cannot be fully scrutinized because so much – including the peer review itself – remains closed.)
  • It is a rating, not a ranking. (And in fact, seeing how their competitors are doing may encourage journals to improve their own practices.)
  • At the TOP Factor website, users can filter scores on a number of different dimensions which is important because practices are evolving differently in different domains (for example, economics is leading the charge in requiring transparency of data and code where psychology has been more assertive in promoting preregistration).
  • The scoring process and data behind TOP Factor are freely available and verifiable on the website (although the scoring itself still retains some level of subjective judgment).

COS is clear that TOP Factor is not a magic bullet and that its role is to complement other efforts to improve research culture and practice. But there are clearly limitations:

  • Most significantly, it only evaluates stated policies – text on a journal website — but not enforcement of those policies, leaving it open to abuse by those who may simply state policies strongly but do little to enforce them effectively. To say nothing of the differential way in which journals implement the “same” policy. (This is one reason why the Research Data Alliance has been pushing for more standardized approaches to policy features and wording, and use of simple “action item” lists in those policies making clear when things are mandatory and enforced and when they are not.)
  • It has the same limitation as JIF in that it is a journal-level metric and therefore does not have a direct implication about the transparency or rigor of any single article.
  • A journal’s TOP Factor score doesn’t solve the problem of measuring the quality of research published in that journal. As Brian Nosek, COS’s Executive Director, notes,“It is important to remember that research can be completely transparent and terrible at the same time. Policies promoting transparency and reproducibility make it easier to evaluate research quality.”

Can TOP Factor really shift the culture?

The real question about TOP Factor – and any other attempt to reform research culture  – is whether anyone who matters (promotion and tenure committees, funding agencies, etc.) will care about or use it. The lesson of past attempts is that this is an incredibly tough nut to crack. While DORA has been hugely successful in gaining signatories, it has been hard to get those signatories to actually change behavior – and that’s because behavior change a really hard.

Brian Nosek’s strategy for culture change acknowledges that “people are embedded in social and cultural systems” and that “those systems shape behavior by communicating norms…providing incentives…and imposing policies”. As such, any successful change strategy must focus on comprehensive, systemic reform and not simply individual behavior. One of the most powerful aspects of this model is the focus on disciplinary communities, which are, after all, where research culture is established and maintained.

Changes in research practice and publication in psychology provide a compelling example of the community-driven approach. Psychology’s replication challenges have been garnering headlines for a number of years. As a result, both leading scientists and professional societies have raised their expectations for transparency and rigor. These efforts have led to measurable changes in behavior, evidenced by:

  • The number of articles in Psychological Science, the leading APS journal, that now have one or more badges since launch in 2013. In 2019, 65% of articles were badged for open data, 50% for open materials and 28% for preregistration. (Data from D.S. Lindsay, EIC of Psychological Science, 2015-2019)
  • The level of market penetration of the Open Science Framework in psychology. In a recent crowd sourced survey, 35% of research faculty were found to have an OSF account (on a sample size of 69 departments).
  • The rapid growth in adoption of registered reports.

Chris Chambers has documented the lessons he learned from the frontlines of the registered reports revolution. One of his key takeaways is that “should” arguments fail because they offer only judgment, not solutions. Registered reports break this stalemate by “turning the pursuit of high quality science into a virtuous transaction”. Earlier feedback in the research process helps to increase the robustness and reproducibility of a study, and guarantees and outcome-neutral assessment and publication of the final research. But perhaps Chris’s most powerful observation is that the need for culture change before any reforms can successfully be adopted is a red herring. Rather:

“In academia, culture is the shadow created by the machine of rules, norms, mandates and incentives that drive everyday decisions. If we want to fix the machine, it makes no sense to direct our efforts at the shadow. We must instead replace the parts, one by one, and eventually – if necessary – the entire machine. If we succeed, the culture will have changed, but only because we changed everything else.”

Perhaps one of the primary reasons why JIF retains its primacy is that it’s such as easy cognitive shortcut – I can make assumptions about papers and researchers without a lot of work. And this is also why it’s so hard to replace – in fact, impossible to replace with anything meaningful for why should we expect any single metric to provide a reliable summary of an inherently subjective judgment? TOP Factor does contribute a new way to measure choice by researchers and along with other tools such as the TOP Guidelines, expands the models and tools we have to work with. By themselves, none of these initiatives is wholesale reform but they should be welcomed as pieces of multi-pronged attempts to nudge incentives in the publication process in the right direction, creating a publication process and system that is aligned with the true values of research and researchers.

Alison Mudditt

Alison Mudditt

Alison Mudditt joined PLOS as CEO in 2017, having previously served as Director of the University of California Press and Executive Vice President at SAGE Publications. Her 30 years in publishing also include leadership positions at Blackwell and Taylor & Francis. Alison also serves on the Board of Directors of SSP and the Center for Open Science.

Discussion

10 Thoughts on "Reforming Research Assessment: A Tough Nut to Crack"

Reforming the research and incentive system is undoubtedly an epic challenge!

Organizations such as COS, PLOS and DORA are clear policy leaders, and in due course the impact on journal publishers, scholarly societies and academic institutions will be dramatic.

At Rescognito we have been working on the “plumbing” needed to support such a transition. For example, we offer Open Research Checklists used to capture verifiable assertions from researchers. The Checklists are driven by persistent identifiers connected to an Open recognition system. This means the Checklists are massively scalable and can be rolled-out with minimal costs. Any Crossref DOI (with associated ORCID data) can be used to generate forms related to research objects such as: manuscripts, software, protocols, datasets, etc. For example, this bioRxiv DOI 10.1101/2020.02.05.936096 can be used to display:

Data Availability Checklist:
https://rescognito.com/dac/10.1101/2020.02.05.936096

Funder Information Checklist:
https://rescognito.com/fic/10.1101/2020.02.05.936096

Contributor CRediT Checklist
https://rescognito.com/ccc/10.1101/2020.02.05.936096

Completed Checklists create attributable assertions that can be visualized in a variety of ways on our recognition platform (e.g. https://rescognito.com/v/10.1111/1365-2656.13053 )

We also register and deposit DOIs for completed Checklists so as to create a persistent connection between the “parent” digital object and the Checklist assertions made about it (e.g. https://doi.org/10.37473/dac/10.1101/2020.02.06.935783)

We’re partnering with scholarly societies and publishers to improve researcher recognition. If you are interested in knowing more, please direct message me on LinkedIn.

Richard Wynne
Rescognito

We have X journal and Y journal. More articles are cited from X journal than Y journal but some articles are cited from Y journal. Hasn’t the audience stated that X journal is publishing more articles of importance than Y? I published about 5 of the highest ranked journals. I was curious as to why. The reason was simple they were review journals and often a researcher would go to a review article to help separate so to speak the wheat from the chaff! The articles were penned by leaders in the field and getting those leaders to write was dependent upon a sterling editor in chief and international editorial board.
Having spent years in S&T publishing, I found that each editor in chief strongly desired to attract the best and worked hard to do so. Also, in so doing they strived to raise their JIF ranking.
The haste to publish is more driven by conditions of the granting agency and job security.
Popularity of an article as measured in crowd sourcing may or may not reveal its impact and worth. It is like Supertramp or disco music remember them?
Thus, I think the attack on JIF is specious at best.

Well, Harvey, there’s a pretty clear consensus about the limitations of JIF. Not only from the likes of PLOS, but also key players such as the National Academies who are seeking to shift the ways in which we assess research (see, for example, https://sites.nationalacademies.org/pga/brdi/open-science-roundtable/index.htm). As discussed above, that doesn’t mean there’s an easy or single solution, but the focus on quantitative metrics is widely viewed as responsible not only for the at times toxic culture in research but also for the challenges we’ve seen over recent years to the reliability of the science itself. Both seem like good reasons to look for a different way to do this to me.

Alison, I do not believe that there is a consensus about the alleged limitations of JIF. One would have to define the universe of people/voters narrowly to come to such a consensus.

Fair point, Joe – perhaps I should have said that there are many people and organizations from many different backgrounds who see the limitations of JIF. I think that my wider point – which goes to Harvey’s note below – is that it’s unrealistic to think that any single metric can provide us with a rich assessment of any piece of research. The analogy I always use – which okay, isn’t perfect – is hiring. Turns out I need to do the hard work of reading and assessing letters and resumes myself.

I agree with your statement; I think it is very well put. The one point I would like to emphasize (you get to this at the end of your comment) is that the problem with metrics, any metric, is that they are sometimes used as the end point of coming to a judgment. In fact, metrics, whatever they are, can only bring you to the portico; the hard work of judgment is done in the room inside, where the numbers prove to be suggestive but not determinative. It would be a terrible thing if one metric, used as an end in itself, were to be replaced by another metric, also viewed as an end in itself. This is the spirit in which I would defend JIF: as one input among many. I understand the attraction of numbers, but we can’t let them distract us from this terribly hard business of being human.

This debate has been discussed here ad infinitum. What I am waiting for is a better way of evaluating worth. Crowdsourcing is a snapshot of popularity – this method too has been debated on these pages and its flaws brought to light. Is there a better way then JIF, maybe and maybe not. In the meantime, the market seems to say that JIF is the best of flawed techniques.

Harvey – I’m curious as to what part of this original post or topic, other than your curious mention of it, is related to crowdsourcing? Alison’s original post and topic is about using observable criteria related to commitment to scientific rigor and reproducibility to rate journals. If a journal’s commitment to rigor isn’t a good thing to measure about a journal, and compare between journals (in addition to citations, usage, etc.) can you tell us why? Wouldn’t this metric be a great complement to, say, your 5 highly ranked journals you used to publish? Wouldn’t it increase their standing? If not, why not?

To be clear, I have no idea which journals these are, but you see why I should ask that question if you claim a metric rating journals on their commitment to rigor is not helpful. You surely are familiar with science that has seemed, or even been rated as, important, and yet has proven to not be so rigorous over time. All of this, always, is about improving the integrity of science. I’ll clarify all this thread to say that it is, of course, the *use* and *interpretation* of the JIF which is the problem. (Although I am certain all readers of SK, by now, should have the know-how and context to understand that).

Pardon my ignorance, but has anyone looked at whether badges mean anything? So, if I have a badge for something like open methods or data, has anyone gone back in and asked whether my paper is more or less likely to actually have open methods or data than the average paper?

Comments are closed.