The traditional peer-review system used by academic journals has come under a great deal of recent scrutiny. There’s a vocal contingent arguing that the current system is broken and unsustainable, though that opinion does not seem to be shared by the majority of researchers. The Scholarly Kitchen has been covering new proposals for fixing, enhancing or completely replacing peer-review in recent weeks (see here, here, here, here and here). Many of these new proposals present interesting, innovative ideas, but most also share a common trait that brings to mind the image of Ouroboros — the mythical snake devouring its own tail.
As noted earlier, many proposals for revamping science publishing or ranking researcher performance rely on social reputation systems, and thus fall into the trap of losing sight of the real objectives they’re trying to address. Social scientists seem obsessed with these sorts of “karma” systems, perhaps because they readily provide data that can easily be analyzed.
Ouroboros comes to mind when thinking about these reputation systems because they are self-reflexive. The metrics provided are essentially meaningless outside of the context of the metrics themselves. They’re proposed as systems for providing incentive for researcher participation, but other than achieving a high score in the metric and perhaps achieving bragging rights, there seems to be no connection to anything providing any real world reward. The recent PubCred proposal suffered from such a problem, as do two other recent proposals.
In “Redesigning Scientific Reputation,” published in The Scientist, UC-Santa Cruz graduate students Bo Adler and Ian Pye along with their graduate advisor Luca de Alfaro (apparently now on leave and working at Google) suggest a two-pronged approach:
Our work in building large-scale reputation systems suggests that it may be possible to build such a system on two pillars: a system of incentives for authors of papers and reviewers alike, and a content-driven way of measuring merit and attributing rewards. The reputation of people as authors would depend, as usual, on their publishing works of high measured value. And crucially, the reputation of people as reviewers would depend on their ability to be early reliable predictors of the future value of a work. Thus, two skills would be required of a successful reviewer: the ability to produce reviews that are later deemed by the community to be accurate, and the ability to do so early, anticipating the consensus. This is the main factor that would drive well-respected people to act as talent scouts, and to review freshly published papers, rather than piling up on works by famous authors. Reviews would be ranked by reputation, thus diminishing irrelevant comments, as Amazon has shown it is possible to do.
In “Towards Scholarly Communication 2.0: Peer-to-Peer Review & Ranking in Open Access Preprint Repositories,” published by the Social Science Research Network, Roeland H. R. M. Aernoudts from the Erasmus University Rotterdam (though I can find no indication of his affiliation or position there–updated, see comment 6 below) and the mysterious Chao-man Chang (no affiliation provided, though his blog is here) propose a peer-to-peer system for overhauling peer-review. Their system is designed to begin in open access preprint repositories and then potentially spread into use in traditional journals. The proposal is full of gaping holes, including a need for a magical automated mechanism that will somehow select qualified reviewers for papers while eliminating conflicts of interest, an over-reliance on citation as the only metric for measuring impact, and a wide set of means that one could readily use to game the system.
The proposal doesn’t seem to solve any of the noted problems with traditional peer-review, as it seems just as open to as much bias and subjectivity as what we have now. It’s filled with potential waste and delays as reviewers can apparently endlessly stall the process and authors can repeatedly demand new reviews if they’re unhappy with the ones they’ve received. Reviewers are asked to do a tremendous amount of additional work beyond their current responsibilities, including reviewing the reviews of other reviewers, and taking on jobs normally done by editors. If one of the problems of the current system is the difficulty in finding reviewers with time to do a thorough job, then massively increasing that workload is not a solution. There’s a reason that editors are paid to do their jobs — it’s because scientists don’t want to spend their time doing those things. Scientists are more interested in doing actual research.
Like the PubCred proposal, it fails to address the uneven availability of expertise, and assumes all reviewers are equally qualified. Also like PubCred, the authors’ suggestions for paying for the system seem unrealistic. In this case, they’re suggesting a subscription model, which seems to argue against the very open access nature of the repositories themselves, limiting functionality and access to tools for those unwilling to pay.
The proposal is based around (surprise) a new metric, called the “Reviewer Impact”:
The Reviewer Impact is the numerical representation of the peer review “impact” of a scholar: their peer review proficiency and output. Other than output, it essentially represents several core peer review competencies. First is their ability to identify the strengths and weaknesses of significant characteristics of manuscripts. Second is their ability to provide practical suggestions for dealing with the weaknesses and strengthening the strengths of the manuscripts. A potential third element is their ability to inform (the authors) of the suitability of their manuscripts with regard to the journals in their field.
The authors spend several pages going into fetishistic detail about every aspect of the measurement, but just as in the proposed Scientific Reputation Ranking program suggested in The Scientist, they fail to answer key questions:
Who cares? To whom will these metrics matter? What is being measured and why should that have an impact on the things that really matter to a career in science? Why would a funding agency or a hiring committee accept these metrics as meaningful?
Unless there’s a clear cut answer to those sorts of questions, then the metric fails as an incentive for participation. Generating a meaningless number may provide a fun challenge in a video game, but what’s the point here? If you’re hoping to provide a powerful incentive toward participation, you must offer some real world benefit, something meaningful toward career advancement.
The Impact Factor, flawed though it may be, at least tries to measure something that directly affects career advancement–the quality and impact of one’s research results. It’s relevant because it has direct meaning toward determining the two keys to building a scientific career, jobs and funding.
Researchers are hired based on the likelihood of their running a successful research program, bringing prestige through their experimental findings, and generating funding for the institution, both in terms of grants and patents. Tenure is based on the actual performance in those categories. Grants are given based on the usefulness of the proposal and its likelihood of success.
Being a popular member of the online community or reviewing lots of papers well is at best meaningless for these purposes, and at worst detrimental to achieving these goals. Time spent achieving a high score reviewing papers and commenting online is time not spent doing experiments or securing funding. It’s unclear why any of these proposed social reputation metrics should matter in any significant way. Should a university deny tenure to a researcher who is a poor peer reviewer, even if he brings in millions of dollars in grants each year and does groundbreaking research? Should the NIH offer to fund poorly designed research proposals simply because the applicant is well-liked and does a good job interpreting the work of others?
If these metrics aren’t measuring anything practically meaningful, then how will they serve as incentives? Why bother?
The uncharitable answer is that social reputation metrics are being proposed by researchers who are good at socializing but not so good at research and discovery, as an attempt to change the criteria for success in their own favor. A less cynical view is “because they’re there.” These systems are being proposed simply because they already exist in other contexts, and it’s easier to try to shoehorn an inappropriate system in rather than coming up with a novel approach that better solves the problem. A system that works well for deciding if a Slashdot commenter is +5 informative or a -1 flamebait is not necessarily good for evaluating a scientist’s career performance. The things that really matter are entirely separable from the things these metrics count. Are there any other professions where one’s salary or position is directly determined by how popular one is in an online forum?
Peer-review and communication are important things for researchers and their value should be recognized and rewarded. But those rewards must be placed in context. Creating imaginary economies based on self-reflexive metrics requires an unrealistic set of priorities and buy-in from parties unlikely to care. The power structure in science selects for achievement, not for sociality. If social reputation metrics are going to be accepted as a meaningful part of that structure, they’re going to need a strong practical justification for inclusion, which so far, is lacking.