Drowning...
Image by Charl22 ~ Charlotte Faye Addison via Flickr

Scholarly publishing’s reputation is that it uses peer-review and editorial judgment to separate the wheat from the chaff. This is why “getting published” is such a big deal. The reputation authors garner by being published in a scholarly journal is that he or she has fit through the tight filter on scholarly communications, where only the best of the best gets published.

But that reputation is no longer deserved. Scholarly publishing, under pressure to conform to a “publish or perish” academic culture, an undifferentiated (except in quantity) purchasing universe, and other incentives for more instead of better, is failing.

What we’re dealing with now is not the problem of information overload, because we’re always dealing (and always have been dealing) with information overload. . . . Thinking about information overload isn’t accurately describing the problem; thinking about filter failure is. – Clay Shirky

In many fields, most papers get published in some journal. For the New England Journal of Medicine, a recent analysis showed that 90% of submissions are published somewhere else. Rates from other journals run between 47% and 75%. So, in aggregate — at the system level, not the journal level — the rate of non-publication across all papers is somewhere between 10% and 53%, with most studies showing it to be between 10% and 45%.

Most papers get published. In fact, it’s more likely that your paper will get published than not — if you’re persistent and willing to submit to multiple journals.

Since the majority of papers get published, being published isn’t such a big deal anymore.

Instead, where you get published is the big deal. The journal that publishes you is the signal of quality, right?

Really?

There’s another mechanism that seems broken — the vaunted impact factor. Not only is its algorithm far too simple for a networked world, but a recent example shows the flaw of averages in a real way.

Just last month, PLoS One’s impact factor came in well above expectations, at 4.3, with only a slight amount of self-citation. PLoS One has an acceptance rate of 69% — which puts its acceptance rate in the midst of the aggregate acceptance rate for the journals system as a whole, but quite high for a single journal. And PLoS One publishes a lot of papers, meaning that there are hundreds of authors who submitted to a journal that provides a 7 in 10 chance of being published. Now, each can claim they were published in a journal with an impact factor of 4.3.

Yet in a recent Chronicle of Higher Education article, Bauerlein and collaborators write about how only a minority of articles published are cited within 5 years of publication:

Only 45 percent of the articles published in the 4,500 top scientific journals were cited within the first five years after publication. In recent years, the figure seems to have dropped further. In a 2009 article in Online Information Review, Péter Jacsó found that 40.6 percent of the articles published in the top science and social-science journals (the figures do not include the humanities) were cited in the period 2002 to 2006.

So, impact factor may actually be reflective of a minority of the published literature, yet every author gets to claim the aggregate, average impact factor for the journal in which they were published.

Did I mention that this system seems to be broken?

When a pooled resource’s impact factor is higher than dozens of more selective specialty and niche journals that carefully filter their material for specific audiences and have much lower acceptance rates, maybe scholarly publishing is about quantity more than quality.

The more cups of water you pour forth (papers you publish), the better your ability to wick up the impact factor?

It may be that PLoS ONE is ahead of its time, pooling papers in biology and related fields rather than forcing authors to ship them off to other journals. By doing so, it gets a diluted impact factor (about 1/4 of the main PLoS journals’), but even at that level of dilution, it makes waves due to sheer volume, even if a minority of its papers are cited. An average for a field will beat a large percentage of the journals in that field, especially if there’s skew in the distribution — a skew that will occur naturally in a citation dataset and can be driven by active blogging and other promotional means.

Also, authors publish by the demitasse, dividing studies into 2-3 papers and submitting them to different journals. In an interesting discussion of the Chronicle article on Reddit, one contributor states:

The problem is when papers are not cited because other papers already say the same thing better.

It’s hard to imagine that the New England Journal of Medicine or Nature would ever publish 70% of the papers it receives. Their impact factors would fall dramatically — probably only as far as the average impact factor for medical journals, but that’s a sea-change for a top-tier journal. But for a niche journal? Opening the floodgates and pooling more articles could yield an improvement if subjected to current measurement and reputation technologies.

Should we change from a set of struggling specialty journal buckets with mediocre impact factors into a larger pool of information that captures the average? Should we have a swimming pool of papers to make sure that we have a lot of high-scoring articles to drive our impact factor? Or should we carefully boil a few cups of water to create a pure puddle of papers?

From a journal user’s standpoint, the literature is most often viewed as a pooled resource these days — PubMed and Google gather it all together and present it as a list of search results. No longer is there much of a time benefit to be had by searching in branded content silos. That use-case seems to be a brief and fleeting one — a glance at an email table of contents, the cover of a journal as it passes from mailbox across desk toward wastebasket. It’s an anachronism.

Now, one or two searches can generate a swath of results across all sources and provide users with the confidence that they’ve seen most of what exists on a topic.

Because users commonly view the literature as a pooled resource, maintaining separate journal cultures and practices seems a little silly, especially given all the forces routinizing, automating, and normalizing behaviors among journals — from manuscript submissions systems to online publishers to consolidated composition vendors to publishing organizations to disclosure standards to funder mandates.

We’re being “pooled” no matter how you slice it — we’re using the same systems, attending the same conferences, modeling the same behaviors, and perpetuating the same beliefs.

The buyers of journals are increasingly pooling their resources, as well — from consortia to package buys to federations. Because pricing is viewed in pools, large pricing differentials are impossible. After all, Brain Research, a journal offered for $15K a decade ago, can be causally linked to the open access movement, despite the fact that a sober economic analysis showed that it was reasonably priced on a per-use basis. Because quality differentials can’t drive business growth in site licensing, quantity differentials are used. “Big deal” sales, the continuous rebucketing of content, and cynical new launches are responses to a purchasing environment that rewards quantity over quality.

Is this move to pooling information concealing a dangerous undertow for scholarly publishing? Is it a form of “filter failure” itself? Are we fooling ourselves that pooled resources are superior to more distilled resources?

It’s worth noting that even the advocates who state that the majority of studies should be published also state that filters are vital to making such information usable. They just don’t accept that human judgment placed into the hands of a few is where the filters should reside, even though this is how history tends to resolve filters (and filter customization counts as human judgment filtration).

Even Wikipedia evolved from a playground of thousands into the filtration done by a few dozen.

The problem with pooling resources is that pools contain undifferentiated liquids (do we need that dye that turns urine blue?). Some of these liquids can be added without immediate harm, or even filtered out later. But some are corrosive, uncomfortable, or simply unwanted.

We are in the age in which publishing is an expectation — not getting published is an exception on any front. But scholarly communication is supposed to be different — more exacting, a higher standard, peer-review and editorial judgment.

Yet the majority of papers are being published, and the growth rate means that we’ll experience a doubling of output in another 20 years, if not sooner.

And one of the deep questions of our age will continue to be how we filter information.

With financial rewards, philosophical pressures, academic incentives, and potentially false equivalencies driving us toward publishing more and more, filtering less and less, are we already in the midst of a “filter failure” of immense proportions?

Are we abdicating our filtration role at the upstream end, in the pursuit of short-term gains, short-sighted philosophies, and trendy group-think?

Are we deserving of a reputation for quantity instead of quality?

Are scholarly publishers, academic leaders, and information purchasers racing toward a pool in which they will drown?

Who will filter our increasingly brackish waters?

Enhanced by Zemanta