Diagram of simple (dead-end) filtration. Overs...
Image via Wikipedia

I think by the end of this post, you won’t think of your editorial filter in quite the way you did when you woke up this morning.

The metaphor of a filter has informed our thinking about information ever since Alvin Toffler popularized the concept of “information overload” in the 1970s. We scholarly publishing types take filtering very seriously. Journals filter out the dross, and editors filter out errors. Our pages are as high-quality and error-free as possible. For editors who eliminate errors and reject unwanted papers, filtering is a private, one-time, reductive process — we confidentially reduce the amount of information to only allow through the highest quality, eliminating the rest.

The junk is filtered out before the public sees it.

At least that’s how we think about it.

Yet there are changes the networked world introduces to our concept of the filter, and they dance together in interesting ways:

  1. Everything that’s published in the networked world is just a click away from any other resource.
  2. In the macrocosm of scholarly publishing, very little is ever really filtered out anymore. Any author with a little bit of persistence can get published and included in major indexing services and online searches.
  3. Many of the filters no longer eliminate information, but rather (obviously or inadvertently) add information.
  4. Filtering is no longer a private activity but a public, participatory activity.

In an interesting post by David Weinberger on Joho the Blog, Clay Shirky’s idea that “[i]t’s not information overload, it’s filter failure” is extended to introduce the notion that filters are no longer silent, private, and reductive. Instead, more and more are public, verbose, and increasing the size of what’s filtered.

Take this blog post, for example, which filters information by selecting and contextualizing it, just like a journal in some senses. I scanned a number of blogs and news items over the past few days, but the link above is what I wanted to share with you.

Now, because my blog will ping David’s blog, there will be a pingback. Many systems will register that pingback, and it’s potentially important. I filtered out a host of things I didn’t think you’d care about, but by choosing one, I have increased its reach and connectivity. I’m no longer isolated from it. Nor are other filters, and they know our linkage now. When Google indexes this site and David’s, it will use the link from here to Joho the Blog to help rank David’s blog as authoritative. You may add a comment to this post. We may debate the merits. This type of interactive filtering in plain sight of the community and the network only adds more information to David’s original post and to the Scholarly Kitchen as a filter. In fact, the more we debate in the spirit of getting the filter right, the larger the resulting information context around this single linkage — new words, new links, new ideas.

When filtering was private and isolated around the single chunk (the article), this didn’t happen unless the editor knew it was happening. Now, it happens without us realizing it, and the filters in the network concatenate it all rapidly into new modifications to be applied immediately.

The filter doesn’t work the same way it used to.

In the networked information space, filtering can add information in new ways. Google filters to the top of its rankings the most authoritative sites for a particular search query. Your contribution of the search term adds information to Google, driving not only its search filters but also its advertising system, its zeitgeist, its auto-suggest, its analytics, and other systems at Google and beyond. Your attempt to filter the Web through search added information to it. Google and others know how to turn filter use and refinement into ongoing business advantages.

Filtering is a dynamic system in the networked world.

This is a fundamentally different filtering system than the ones we’re accustomed to. And it consumes things in a way that shows how porous our traditional editorial filters are, even when we think they’re tight.

Our coarse, article-level filters aren’t suited to the current filtering environment. Why? Because we don’t apply the only filters, the fastest filters, or the finest filters. By comparison, our filters are light, slow, and non-recursive.

With coarse editorial filtration in an information world of abundance, it’s clear that traditional filters are potentially minor and brief impediments. And now we get to why the macrocosm of lots of papers matters more than it used to.

Many journals have studied what happens to rejected papers, and — no surprise — find that rejected papers usually get published somewhere, in some form. With more author-pays publishing, what used to be the small chance of getting published in a journal has probably reversed, and now there’s only a small chance that a slightly persistent author won’t get published in a journal.

So, while a publisher may be proud of its local filter — a journal’s article rejection rate, for instance — the fact is that the ecosystem allows for nearly universal publication. And the ecosystem is now linked and networked, everything just one click away.

Of course, your filter keeps those bad articles out of your journal, so you can rest easy. Your brand isn’t contributing to the prominence of bad articles elsewhere.

Really? Or does your filter’s relatively wide pores inadvertently let through network amplifiers?

Let’s say you just accepted a really good manuscript that cites a paper you rejected, even one that went way down the food chain, from your perspective. Lo and behold, the reference links to the paper which you (and maybe many others) rejected. A citation service makes sure the link works well. Suddenly, the rejected paper and its journal are more authoritative because your good journal threw it a reference. Your filtering process threw off a spark that lit up part of the network. You just increased another journal’s authority in Google. Your filter wasn’t fine enough to catch this loan of credibility.

Your filter is tuned to papers, not to the network. If it were tuned to the network, you might have rejected that reference, knowing its effects on a paper you rejected.

In the old days, this citation would have meant you’d increased the impact factor of that other journal by the tiniest amount. That effect was slow to be felt, and isolated to one measure. Now, the effect happens instantaneously, and it gets networked. It most likely stays in circulation for longer than an impact factor’s two-year window, and the link to the other journal will persist.

This is just one example of ways what we call “filtering” now extends information instead of reducing it. In the networked digital environment, information links with other information at tiny points we don’t currently really deal with. Is the article good? We’ll accept it. Is each reference worth allowing into the information expansion machine? That’s a new question.

And think about how often the most competitive journals cite each other. Are they are just SEO-ing each other, swapping context and brand authority in the network at a fairly high rate? They may compete for papers, but are they really competing in the network? If one were smart, it would prohibit citations to the other, slowly depriving it of borrowed authority, leaving it to fend for itself in the network, isolating it.

It’s the opposite of the citation-packing scandals of a decade or more ago. Instead of packing your journal with self-citations, you want to eliminate citations to your competitors.

By focusing on the power links have in the information economy, filtering (as in “eliminating junk”) becomes a less clearly effective act in scholarly publishing, which focuses on the articles, not the links or comments or other network drivers. We might want to do more granular filtering, realizing that legitimacy and prominence aren’t accomplished solely (or even primarily) through brands, impact factors, article selections, and reputations.

The papers we once rejected now have a back door, passing through our coarse article-centric filters and straight into networked authority systems, networked linking systems, and the myriad filtering systems (news reports, blogs, society sites, tweets) that actually expand them. Instead of small effects, the network amplifies and extends the effects of these traditional points of borrowed legitimacy while introducing a whole range of new ones.

Do you think about filtering differently now?

Reblog this post [with Zemanta]
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.


19 Thoughts on "How Networked Information Changes the Filter Metaphor for Journals"

Metaphors are always dangerous because they are false by definition. In this case the “filtering” metaphor has been pushed far too far. Journals are aggregators, not filters. They collect the bast stuff for a small community. This is a difficult task, which is why they get a lot of money.

Specifically, the premise that “Everything that’s published in the networked world is just a click away from any other resource” is very false. One might call it the “small world fallacy.” In reality it is extremely laborious and difficult to track down the most important research on a narrow research topic for which there is no journal. It takes months, not clicks. This is why journals continue to thrive in the networked world; the best comes to them so we don’t have to find it.

Metaphors are untrue statements that are not lies. Most importantly, they’re usually how we fundamentally think about the world, especially parts that aren’t completely defines (aka, most of it). Your comparison demonstrates that. You compare journals to aggregators, but then say they “collect the best stuff for a small community,” clearly a filtering function. It is a difficult task. But the way we’ve done it isn’t translating directly into the networked world. What we filtered before (articles) contain pieces that link directly to things we filtered out, and the network’s filters count those links as shared authority in many cases. They reverberate through the network space in ways we don’t control.

It is not “very false” to assert that everything that’s published is just a click away from everything else. It’s very true. Your point is that it’s hard to track down the best stuff, again a filtering effort. And that’s my point. What we think of as “filtering” isn’t cutting the mustard in some important ways, so this is what might be making your literature searches all the more difficult. Too much is published, it all links, the connections confound the filters, and you’re stuck digging through it all with your personal filters.

If metaphors are untrue statements (as you say), then I hope they are not “how we fundamentally think about the world,” because that means what we fundamentally think is untrue. Aggregation is not a form of filtering, in fact it is the opposite of filtering. Writing is not a form of filtering, nor is searching. You are pushing the metaphor to the point of uselessness. If everything is filtering then there is no such thing.

Journals do perform a filtering function, when they reject papers that have been submitted. But their primary value is in having the papers submitted in the first place. That is the aggregation function. No filtering is involved.

You might try rewriting your essay without using the filtering metaphor. It would eliminate a lot of untrue statements (as you yourself characterize them). You might be saying something important. As it is, it is impossible to tell, because the filtering metaphor swamps the piece.

The filtering metaphor isn’t mine. It’s a common metaphor among journal editors and the STM journal marketplace.

Metaphors are how we think, even if we’re unaware of them (in fact, that’s part of their power — we barely realize it). My rhetorical point is that metaphors are untrue statements that are not lies — for example, “a heart of stone” doesn’t mean that the heart is literally made of igneous rock, but is cold and unfeeling. You compare journals to aggregators. I’d argue that’s your metaphor for journals (so, yes, you do think in metaphors to understand something, as you’ve illustrated twice now), but “journal as aggregator” is one that’s not widely shared. To me, aggregators collect journal outputs. So, I think your metaphor is wrong.

I also think the metaphor of the filter is wrong . . . now. Network connectivity has changed this, as I’ve tried to explore in this post. What will it mean to editors? I think it means that publishing more of what you receive makes sense. If authority in the network space is shared, why share it? Is the model of the multi-title publisher going to make even more sense?

As a cognitive scientist I do not share your theory of metaphorical cognition. Describing journals as aggregators is not a metaphor, it is a fact. Journals collect articles for a community. Since this is their primary function the network growth you talk about is not important to them.

I could tell you don’t agree. Thanks for making it clear. But other cognitive scientists do agree with this, and publish on it regularly.

It is not a “fact” that journals are aggregators, any more than it’s a “fact” that journals are filters. Both are metaphorical constructs. If it were a “fact” that Journal A is an aggregator for Community A, then every article in Journal A would be relevant to Community A and nothing else would be needed for Community A. Since this is clearly not the case, you can’t state it’s a fact. I think it’s just your way of thinking about it. Sorry. Everything you say just makes that all the more apparent.

But it is generally true that every article in a journal is relevant to the community that journal covers. How could it be otherwise? Journals even help define communities. There is no metaphor here, just a fact. Journals collect (i.e., aggregate) the output of the communities they cover.

That “nothing else would be needed” is not my claim; you have made that up. For one thing, in many cases several journals compete for the community output. Then too there are community portals, blogs, listservs, conferences, etc., all of which are useful, hence needed. The network has become quite complex, but the journal’s function remains intact.

Well, David, I’ve been in journal publishing for a long time, and I know that not every article in a journal is relevant to the community the journal covers. Is every article in “Nature” relevant to everyone who receives “Nature”? Not by a long shot. That’s not a ding on “Nature,” just a normal condition in the editorial world. Editors guess what their readers want, and that’s getting harder to do well. With increasing knowledge specialization, journals are treading on thinner and thinner relevance ice each year. You can’t legitimize your ideas by calling them “facts.” You like to use the word “fact,” but asserting an argument as fact doesn’t change reality. Your point is still just an argument, not a fact, and I disagree with your argument. I think it’s flawed.

Yes, I did extend your point to illustrate how it breaks down — logically, if there is a 1:1 correlation between Journal and Community, then each community would only need one journal, which is I think a fair extension of your aggregation metaphor. But, as you say, if journals compete for a community’s output, if they were aggregators, they’d take it all. But journals reject the majority of what they receive, so instead they filter. Aggregators don’t reject, they collect. That’s why I think your metaphor is wrong.

The journal’s filtration function is warped by the presence and functions of network filters. I try to explain all that in the post. To respond to those changes effectively and adopt functions that fit the new filters, journals must change, or accept the faster, recursive filters as a fact of life and all the changes they portend. As you note, all the other filters entering the network are needed (blogs, listservs, conferences, etc.), but you’re just agreeing with me. These filters add information to things they cover, so they’re different filters than the ones we think of when we think of journals — they don’t reduce the amount of information by filtering, but increase the amount of information by filtering it. Hence, Shirky’s “filter failure” becomes untenable, since filter success in the networked world leads to increased information overload.

That’s an interesting point, but I’m also bemused by the fact that whenever I pick up a journal I subscribe to, different articles in the very same issue seize my attention, because I have different things on my mind that day. I also get a huge amount of what I read by encountering it through semi-filtered blogs, etc.

Of course Nature is meant to give a wide-angle view of science, but even highly focused journals give me that same weird optical effect – like I’m at an optician’s and he’s putting different lenses in to test my eyes, and I stare at the same thing but see something different.

That was probably not helpful, but I agree, my filtration system is complex and not really well mapped to any particular journal.

Sorry for the confusion Kent. I use the term fact to mean “not a metaphor.” As when I say it is a fact that my dog is white, not a metaphor. If you have a better term I will be happy to use it.

The point is this. I do the science of information flow in science and “aggregator” seems like the most accurate description of one of the journals’ two central roles. People send in their articles and the journal compiles them and publishes the aggregate, for the convenience of the community. If you have a better word to describe this pattern I would like to hear it. (The other role, which I acknowledged early on, is filtering the input to pick the best.)

The aggregation function is quite interesting because it is an example of self-organization in a distributed system. That is, the journal does not go out and collect the articles, rather the distributed agents decide when, and to whom, to send them.

There is nothing in this many-one mapping to suggest that there is no room for competition, or that journals are perfect maps of communities, whatever that might mean. Nor is Nature a good example because its target community is the whole of science. Most journals serve small communities.

But the important point is that I see no “filter failure,” or any other reason for the journals to change, in what you say. But then, as I said in the beginning, you are arguing from a metaphor. I prefer a scientifically accurate description. As far as I can see all the new media support the journals, by acting like word of mouth advertising. Blogs, listservs and conferences are not alternatives to journals. In fact they make journals more important. Nor are they filters, if anything they are amplifiers, metaphorically speaking.


The solution is open, multiple metrics. Citation alone has inflated power right now, but with Open Access, it will have many potential competitors and complements. Multiple joint “weights” on the metrics can also be controlled by the user. And abuses can be detected as departures from the norm — and named and shamed where needed. It’s far easier to abuse one metric, like citations, than to manipulate the whole lot. (As with spamming and spam-filtering, and other online abuses, it is more like the old “Spy vs. Spy” series in Mad Magazine, where each spy was always getting one step ahead of the other.)

Harnad, S. (2009) Open Access Scientometrics and the UK Research Assessment Exercise. Scientometrics 79 (1) Also in Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.  (2007)

Notice that this solution would expand information, not filter it by reducing it to what we intended. Even “naming and shaming” would likely lead to linking, which is not qualitatively different based on intent, so to Google and other network filters, an authority “shaming” carries the same weight as an authority “naming.” Users contributing votes and the like would also extend the initial information, likely in unintended ways.

Networked information seems to have the same challenges as non-networked, in that intent of linking is not measured. However, there are additional challenges that make “filter failure” even more problematic — verbose filters, fast filters, and recursive filters. How do we manage all those to accomplish filtration?

I agree with the comparison to the crazy “Spy vs. Spy” strips. But imagine if that were happening quickly, from millions of artists simultaneously, and the panels coming together whether they are coherent stories or not. And that we thought our initial drawings were the definitive cartoons, while the ones we threw away weren’t going to find their way into our cartoon catalog. That’s networked “Spy vs. Spy” for information linking.

Kent, I think you might be underestimating the power of multiple metrics to quell and quench spurious run-away amplification of single metrics, like citation. Google is not the sole arbiter, nor links the sole metric, especially in OA space.

But you are right that one big bug of all spy-vs-spy developments in technology is the time-constant (turn-around time): To pick just one cartoon example, a bogus report that generates a huge short-term burst could poison us all before the negative feedback lhas a chance to close the loop.

(Let’s hope it doesn’t happen. Right now, it looks like analog technology, like weapons, is more likely to do us in, in one fell burst, than digital technology, but who knows, that might just be a temporary failure of imagination on the part of those bent on doing harm. Meanwhile waste might be doing us in before we twig that it’s too late…)

Gloomy thoughts for the morning!

I think things are generally getting better, with mis-steps here and there.

But my point is that even OA metrics are going to be networked in some way (talked about, announced, linked to, etc.), and they will not filter the information so much as supplement it, at least in the sense of a filter as removing unwanted particles. OA publishing or subscription publishing or what have you, most articles get published somewhere and linked across the network somehow. More metrics might just generate more links.

Information overload and filtering are complex issues. Some of what I’m going to argue below may be semantics, but I think there are aspects that are often overlooked when discussing such things. I’m splitting this into two comments that each address a particular issue, hopefully to help focus the discussion, but would love to hear your thoughts on each Kent.

First, I think that information overload goes beyond just a question of filtering. We live in, as you put it, the systems age. Even with the most rigorous, the most sophisticated set of filters, there’s still more information that one needs to take in now than has ever been the case. As an example, there are more labs doing research and subsequently more journal articles being published than ever before. Let’s say your lab started researching biological process X twenty years ago. At the time, there were 10 labs working on that process. You had to keep up with the output of those 10 labs to stay current in your field. Now there are 200 labs working on the process, and the output is 20 times greater. That output is still vital information that you need to know in order to do your own research. There are cases where it’s not possible to filter things out, there are cases where you simply need to know more. Bioinformatics used to be its own field, now it’s just part of being a biologist. In the past, you didn’t need a detailed working knowledge of bioinformatic techniques, if you needed a sophisticated analysis you just found an expert and collaborated. Now if you’re going to be a good biologist, these are tools and knowledge-sets you need to have yourself, as part of your own arsenal. The same can be argued for Imaging, if you don’t have a good understanding of the physics of imaging and current fluorescent molecules and proteins, your work is going to be sub-par.

Many of the researchers I know are trying to deal with these sorts of problems. Is it really “filture failure” when you just need to know more stuff?

Okay, next issue. I think it’s very important to distinguish between “discovery” and “filtering what you’ve already found”. Both processes are often lumped together when discussing information overload.

Much of what you’re talking about in this blog posting relates to discovery, how do I find new information that’s useful for me to take in, how does a journal’s actions affect the discovery process? Providing discovery is a lot of what we do here at the Scholarly Kitchen, we find interesting pieces of information, explain why their interesting, and provide links to the best sources for that information. That helps readers find information they may not have found on their own and trusted sources like this are incredibly valuable time and effort savers for the discovery process.

But that’s different from the process that takes place once discovery has occurred. I’ve read your blog post, how do I decide which of the links to follow? I can use the first crude level of filtering to find lots of information on a subject that interests me–does it contain my keyword and is it published in a journal? Is it on the first three pages of a Google search? There, that’s discovery covered. I think people make too much of discovery, that’s not that hard part, and it’s not the real issue most are facing. The hard part is, how do I winnow that huge haystack down to the needles that I need? That’s a separate process that demands a different toolset. Discovery tools are a dime a dozen in the Web 2.0 world. Blogs and things like Connotea or Mendeley just add more hay to the stack, not less.

I think Stevan is on the right track in his comment above, suggesting better customizable search engines for research literature. Start with something like GoPubMed, which is a tremendously useful tool already. You’d have the already built-in filters of who, what, where and when. Then each user could select the specific metrics that they think are important, citation, journal, impact factor, links, blog coverage, user ratings, whatever. I’m sure we could argue all day about what metrics are best-suited, so why not include a suite and let the individual user decide?

As I said in my earlier comment, I may just be arguing semantics. It may be impossible to separate the two processes, as they should be complementary. I just think too much emphasis is being given to finding information, rather than eliminating information, probably because it’s an easier problem to address.

Comments are closed.