A PLoS ONE article recently went viral, hitting the front page of Reddit and garnering an amazing amount of reader interest. This was great news for the journal and the paper’s authors, but raises questions for the notion of post-publication peer review.
As Kent Anderson recently discussed, the idea of post-publication peer review is nothing new — it’s called “science”. Publication of the paper is an end of one process but the beginning of another. The paper is read and discussed, in private conversations, in lab meetings, at journal clubs, at the bench, in the bar and beyond. It’s how we analyze and understand what has been accomplished and plan the next set of experiments to follow.
The proposed revolution then, is not in the concept, but in the tools available, ways to open that conversation worldwide and to track the life of that paper after it has been published, to better measure its true impact. Despite initial momentum, movement toward implementation of these new technologies seems to have hit a stalling point.
Article commenting is increasingly seen as a futile pursuit. Nearly every publisher has tried to drive commenting in one way or another to little success. There are fairly obvious reasons for this failure, particularly the question of why someone would spend time commenting on another researcher’s article when they could be doing their own research instead.
Doing away with pre-publication peer review and replacing it entirely seems to have garnered little support in the research community. F1000 Research will be the biggest test of whether this has any viability. Their approach seems more a strategy meant to increase publisher revenue, rather than to benefit researchers. By collecting fees from authors with incomplete or unpublishable results, F1000 Research does away with the costs of rejection. All authors will be expected to pay for publication, whether their articles ever eventually pass a peer review process (which conveniently happens after payment is made).
It’s a clever strategy, but one that seems to benefit private investors more than readers or authors. Most researchers I speak to want less to read, not more. The idea of slogging through an enormous slush pile of bits and pieces of some strangers’ lab notebooks does not hold great appeal. Further, if peer review is to be practiced on this near-unlimited degree of salami-slicing, won’t that overload the system? How much time will researchers need to devote to performing peer review on a massive influx of leftovers and half-baked ideas?
That leaves the search for new metrics (“altmetrics“) as perhaps the greatest hope for near-term improvement in our post-publication understanding of a paper’s value. The Impact Factor is a reasonable, if flawed measurement of a journal, but a terrible method for measuring the quality of work in individual papers or from individual researchers. A move away from journal-level metrics to article-level metrics is certainly welcome. A researcher’s work should be judged on its own merits, not necessarily the company it keeps within a journal’s pages.
The ideal rating system would employ a deep and nuanced understanding of a researcher’s work. But we don’t live in an ideal world, and people seem intent on having quantitative ranking systems for decision making. As a species, we seem to like ordered lists. If we want to replace the Impact Factor, then we need to offer something that does a better job of measuring quality. Unfortunately, many of the new proposed metrics measure something different altogether. They seem chosen because they’re easy to determine, rather than because they’re important.
Metrics based on social media coverage of an article tell us more about the author’s ability to network than about their actual experiments. Metrics based on article usage are even harder to interpret as they offer information on reader interest and subject popularity, rather than quality of the article itself.
Looking at PLoS’ treasure trove of article metrics, these questions become evident. Two of the most read articles in the history of all PLoS journals are about antidepressant medications. Does this mean that these two articles are significant, important studies about depression or does it instead indicate a tremendous level of reader interest in the subject of depression? These are both valuable pieces of information, but they hold different meanings and offer different value to different parties.
The presence of quirky, oddball articles in the most-read list makes a strong case against usage as an indicator of quality or impact. A paper on sexual activity among fruit bats has long been one of the top 5 most-read papers from PLoS. In March, a group of Japanese researchers published a study on a line of fruit flies that had been maintained in constant dark conditions for 57 years. The study hit the front page of Reddit and within a week it became one of the top 5 most-read articles in the publisher’s history.
The fruit fly study is interesting, to be sure–the authors maintained a colony for flies in total darkness for more than half a century, how cool is that? But the results are not terribly significant scientifically. The take home message from the study is that evolution takes a really long time. 57 years meant 1,400 generations of flies, the equivalent of 30,000 years for humans. No, the flies didn’t lose their eyes like blind cave fish. All that was found were some inconclusive changes in gene sequence and an observation that they seemed to breed a bit better in the dark than in the light.
Still, the study drew over 180,000 readers in a very short period. But would this level of interest mean much to you if you were making funding or career advancement decisions? Really, all we learn here is that people are interested in weird things. The usage metrics don’t seem to immediately correlate with scientific impact. And that’s a problem if you’re looking to replace the impact factor as the key metric in these sorts of decisions.
That said, there are places where usage metrics are likely very useful. Librarian purchasing decisions are largely based on usage numbers. This seems a perfectly reasonable way of allocating subscription dollars, putting them toward the publications that one’s institution wants to read.
And there are types of publications and fields of research where our standard method for measuring impact, citation, doesn’t really work all that well. Clinical practice journals offer tremendous value to the medical community. There’s much to be learned from the experiences of others. But the value offered translates into treatment, not further research that yields citations.
Engineering journals can suffer the same fate. Publications in some of these fields are often more about solving specific problems than hypothesis-driven basic research. If the problem is solved, then there may not be many future projects based in the same area, and hence few citations.
For both of these areas, usage may offer a better measure of impact than citation. Do we then need to think about different classes of journals and apply different sets of criteria and metrics to the value of the research published?
For the mainstream of science journals, usage based metrics don’t seem to offer the much-desired replacement for the Impact Factor. There is value in understanding the interest drawn by research, but that value is not the same as measuring the quality of that research.
So far we’re mining all the easy and obvious metrics we can find. But they don’t offer us the information we really need. Until better metrics that truly deliver meaningful data on impact are offered, the altmetrics approach is in danger of stalling out. This points to a major crossroads for the field.
Like so many new technologies, there’s an initial rush of enthusiasm as we think about how it could fit with scholarly publishing. But then we hit a point where the easy and obvious approaches are exhausted without much return. Now the hard work begins.