English: Florence Welch of Florence + the Mach...
Image via Wikipedia

One of the major powers of Google is that it can crawl sites at a deep level and unearth interesting and relevant content independent of any hierarchy imposed otherwise by the content’s owner or publisher. The home page, the table of contents, the chapter structure — such things are superfluous. You run a search on a topic, and you see an ordered list of results based on what Google’s algorithm thinks is most relevant and useful based on the term you’ve entered. PageRank is about relevance and granularity, not elegance and long-form thinking.

Load up digital music, and you can shuffle it, find it reordered alphabetically, or play all of the songs by an artists sequentially, depending on the system. Digital music carries with it only so much information — genre, artist, song title, duration, and date downloaded are some comment elements. But the information about what order to play the songs often gets lost as the music moves around your digital ecosystem.

Follow a Twitter stream, and you’re exposed to articles elevated by who you follow. Are these articles important in the minds of experts in their areas? Or filler being elevated by your friend or associate? Are you in the deep end of the pool? Or in the shallow end?

In each case, the order established by the originator of the content — the journal editor, the book editor, the newspaper editor, the producer, the artist — is usually lost. The article or chapter that Google thinks is most relevant to your search query may be something an editor put toward the end on a whim, or as a concession, or as filler. But Google doesn’t know this. The song that leads the digital music pile of your latest Florence + The Machine indulgence may be something from the middle of the album, and so a song meant to thematically bridge the somber early section with the later section more redolent of absolution becomes the album’s lead because of technology, not artistic design.

Journals often put order on their papers, but each journal’s ordering system and approach is unique. For some, news leads. For others, it’s opinion or a round-up. In the scientific well, the lead article is usually the most important as deemed by the editors at the time, with a rough pecking order following. Toward the bottom of the list, some really esoteric stuff might find its way in.

Not to say that journal editors get it right. The original Watson-Crick paper on DNA was somewhere in the middle of its 1953 issue of Nature. Outside of print, you can’t tell where in the issue the paper appeared. Nor was peer-review then what it is today. The sorting pressures weren’t nearly as intense. But it would be interesting to see what else was in the issue. Instead, our age of atomization has broken apart these old relationships, leaving little trace of the packaging of yesteryear.

Packaging is merely the physical manifestation of authority. If you control the packaging of content, you have a certain authority over it. You can put a lesser article in the proper spot. So when Google or another search engine responds to a user query and puts that same article at the top of a results list, the authority shifts — from editor to user/algorithm. This is why algorithms are forms of editorial expression. They recast authority.

Packaging is lacking in many newer publishing initiatives, which are filled with streams, refresh-dependent sortings, and the like. Order and organization are secondary to flow and dynamism. While it’s democratic, but perhaps there’s some signal to be sent to the community by a leader. How do you send a signal about hierarchy in a stream? In a newsfeed?

A question that lingers in my mind constantly is, “Does [blank] make things better or worse?” And the honest answer is usually, “I don’t know.” It’s hard to judge where certain initiatives or design approaches will end up, what benefits might lurk at their heart. Were the old ways of ordering information actually misleading and far too idiosyncratic to defend? Or were they better, more interesting, and important? Is the speed and precision of search far more valuable than a carefully considered package of information? I don’t know.

But as a firm believer that “and” is preferable to “either/or” in most cases, I do find having fewer clear and local controls a little distressing. The list on the face of this blog is assumed by the technology to be reverse-chronology. I can set which articles are sorted into that framework, but not their order. In a more controlled publishing environment, I’d be able to set not only which articles appear, but the order of their appearance. I could send more and different messages.

Because we now live in a world of increasingly undifferentiated lists assembled by opaque technologies, are the advantages we’re gaining as far as access to information coming at the price of organization, comprehension, and community? Or were these old approaches feeble attempts at precision, curation, and messaging, a low-yield and vain attempt at information control? One that we’re better off without?

I have a feeling we may never know. That’s the thing with change. Sometimes, the alternative overwhelms the predecessor so completely that it pushes it off the page.

Enhanced by Zemanta
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.


25 Thoughts on "Digital Disorder — Losing the Signal of Priority and Selection"

Packaging is merely the physical manifestation of authority. If you control the packaging of content, you have a certain authority over it. You can put a lesser article in the proper spot. So when Google or another search engine responds to a user query and puts that same article at the top of a results list, the authority shifts — from editor to user/algorithm. This is why algorithms are forms of editorial expression. They recast authority.

This is a pretty good manifesto for 21st Century communication. The consistent trend, in so far as anything is consistent, is away from the notion of there being any single arbiter of taste and importance, towards approaches that let each individual assemble collections that are right for him or her at that particular moment.

One kind of right-for-me-at-the-moment collection is of course a curated one, and there will always be a place for high-quality curation services. The important point is that those curation services are now multiple and can be chosen as needed — they are layered on top of the units of information rather than being a by-product of how that information is produced. More flexibility, more competition — everyone wins. (Except whoever used to be the sole authority, of course.)

If the creators of the algorithms are our new editors in certain venues, I wish they would be transparent. Who are they? How do their algorithms work? What are they leaving out? What assumptions about me or my query or my link have they made? Spiderman has it right: power brings responsibility. The filter bubble and other manifestations of opaque and fungible algorithms may not be at the level of responsibility we need.

Well, the glory of the new way is, if you don’t like or don’t trust a certain editor — don’t use that editor’s work. There are other editors/curators out there. At a simple level, our blog Sauropod Vertebra Picture of the Week acts (among other things) as a curation and aggregation of newly published sauropod research. But the research is in no way dependent on our doing that: to use a much-abused phrase, we add value; but you can take or leave that value as you wish. And anyone else who wants to provide a similar service is welcome to — they don’t need our permission to compete with us, or the papers’ authors’ permission to discuss and aggregate them.

That seems to me like a much better world than the old one where there was One True Curator, and you were obliged to use him and only him.

By why do we have to sacrifice one for the other? That’s my essential question, I guess. Why can’t we have it both ways? Or are the technology lords being either lazy or power-hungry — or both. Seems like you could present both the atomization and the integration.

Well, there is certainly no reason why the editor of a journal shouldn’t be one of the (potentially many) people to offer a curated view of the information contained in it! And of course the editor is in a great positionk to start out with a competitive advantage, both because of the prestige he starts out with and because he knows the material well, having handled it.

I think the reason why that doesn’t tend to happen is that what editors want to curate is the set of articles in a particular issue of a particular journal. But that is a rather arbitrary division that’s not really of much interest to most researchers. I know that in my own work, which journal a given paper was published in is a pretty small part of what would make me care about it — often it doesn’t register at all, and I couldn’t even tell you what journal something was in. And that I do think is wholeheartedly a good thing: it means we can partition the space of papers according to something relevant like subject or author rather than the historical accidents of venue and publication date.

So to follow up my own comment, the question for editor/curators becomes: are they in a position to better curate cross-journal collections (e.g. a subject collection on sauropod neck posture) than anyone else? Does their editorship give them an advantage? And I’m not sure that it does. (Except of course when the editor is also a subject expert; but in that case, it’s not his editorship that contributes the advantage.)

More filters are better.

If one assumes that everything worth publishing gets published, then we’ve effectively decoupled the filtering mechanisms from the yes/no question of whether the material is made available at all.

Most of the researchers I know value the filtering done by journal title. They have in mind a set of journals that is the “core” set of outlets for their particular research interests. They pay particular attention to these journals, subscribe to eTOCs, etc. They then fill in around that core set through a variety of other means, saved PubMed searches (inclusion in PubMed provides another level of filtering), Google Scholar, F1000, suggestions from colleagues.

As other commenters here have noted, some researchers don’t use this particular filtering mechanism. But it does not prevent them from finding the material they seek. More options are better as it allows the user to refine things as they best see fit.

I see the danger here is less the absence of transparency and more the presence of monopoly.

I doubt that many would take the time to understand Google’s algorithms for the purpose of “Is it right for me?”. Those that would study them would do it for “How to I confound them for my own evil purposes?”.

I am happy to test “Is it right for me” by the results. I am grateful to Google news for providing me with a selection of news suppliers, so that I can read several versions of the same story, knowing the different biases of the various suppliers. But I’d like there to be competitors to Google News so that I can defend against its biases.

Google is great. But when it goes evil – which it surely will – it is going to be a great evil.

Of course, monopoly is always a danger, and one to be avoided.

But that is exactly what decoupling production from curation does for us: allows multiple curators to compete for our attention on the basis of quality, price, service and relevance. I am not too worried that Google is winning in many categories at the moment, because it has no very compelling way of locking us into it. Competitors exist, and wil start to see more business as Google loses the qualities that enables it to become the market leader in the first place.

In short, what we have now is a market. Whereas in the Bad Old Days we had a de jure monopoly.

This becomes particularly evident when those algorithms shift from their initial purpose of providing the most relevant answers to instead being used to drive new business opportunities such as the Google+ social network. The filter bubble issue is particularly pernicious because so few people realize that their search results are customized and based upon their previous behavior.

Transparency helps one make the sorts of informed decisions Mike is celebrating in his response above, but that same transparency is often not in the best interests of those trying to sell ads or make money in other ways. I suspect that most companies will offer as little transparency as they can get away with.

Well; I am cautiously celebrating. Because whatever else we do or do not have now, we at least have a market, which means competition and choice. That’s nearly always good for consumers, even though it’s never perfect and tends to be messy.

Truth be told, the ordering of papers and their place of publication is usually of only minor importance for the kind of research I do. I’m more concerned about the content of an individual paper; only in the rarest of circumstances do I read a journal issue or technical volume like I listen to an album. There are enough journals out there spread out over so many years that it would be futile to do so when trying to investigate a topic. Tools like Google Scholar are most powerful in that they aggregate knowledge from across the years and the journals. Of course, these tools are also annoying when they start throwing in “creation science” articles next to legitimate publications. There are trade-offs with every system. The table-of-contents system (for lack of a better descriptor) has its primary advantage for those who are following the field in real-time, rather than post hoc research. As Mike points out, though, this function is at least partially being superseded by other modes of aggregation.

(BTW, I am a real-world curator, and wish that the current buzzwords “curator”/”curation” would just die a quick, fiery death. Yes, curators put thing into neat little boxes and sometimes decide what goes on exhibit, but they also ensure long-term preservation and accessibility – something that most digital “curators” definitely do NOT do. I suppose the overlap between the two worlds is in the selection function – but maybe in that case the word “editor” is more appropriate than “curator”? Apologies to the editors reading this.)

Andy, apologies for the unwelcome us of “curate”. But what else do you suggest for this? “Edit” really won’t do, especially in the context of a conversation where we’re asking whether editors can be good curators!

BTW., good point about “only in the rarest of circumstances do I read a journal issue or technical volume like I listen to an album”. Although I long ago converted to listening almost exclusively to MP3s rather than physical media, 90% of my listening is whole albums. I certainly won’t recommend someone to listen to random tracks selected from Dark Side Of The Moon!

No need to apologize – the term is everywhere! One can’t really fight societal usage. I think “editor” is still appropriate because of the overlap in functions (collation, discernment of value, etc.), but “content collector” might also be. That too, unfortunately, leaves out the fact that these digital “curators” are practicing some selectivity. In the end, I’ll probably just have to accept that the word “curator” is going to have different meanings in different contexts – just like “doctor” or “theory.”

Moving from a journal/issue economy to an individual article economy does raise a new set of challenges. The Watson/Crick paper you cite is a great example. It was originally published as a set of three papers. Rather than pooling their resources into one paper, Maurice Wilkins and Rosalind Franklin chose to publish separate papers together in the same issue with Watson and Crick. These other papers offer more information about the evidence at hand and historical context for the relationships between the different groups. If one merely sees one of the three papers alone via a search result, important information is lost.

Which makes it important for journal publishers to create new ways of linking these sets of information, particularly from within each individual paper. There are lots of interesting new methods of calling the reader’s attention to related articles, both based on manual curation and automated methods like semantic analysis or user behavior. The age of the carefully crafted issue is long gone.

One particularly useful new method has been having editors feature articles that they think are particularly important or interesting. This can replace the outdated method of putting the top article first in the issue’s table of contents. To be truly effective though, this designation must be made clear on the article itself, both the html and pdf version, so the curation is noticeable on the article level.

Most online journals post the table of contents for each issue. If the reader wants to know the order, it’s there. Also, many show “related articles” links to companion materials. True, you can’t flip through an electronic journal page by page, but a good website provides the alternative online functionality.

I’m not sure I see the connection.

Beltrao’s post is about a book that came out a while ago. The issue has been covered repeatedly in much bigger outlets than that blog (see http://mediadecoder.blogs.nytimes.com/2012/02/07/my-dinner-with-clay-shirky-and-what-i-learned-about-friendship/ or http://news.yahoo.com/breaking-internet-filter-bubble-195831633.html or http://boingboing.net/2011/05/23/the-filter-bubble-ho.html as examples).

Beltrao does a nice job discussing the inherent worries around personalized search. Kent’s posting here asks questions about the loss of curation from the originator of the content, and whether that’s made up for by all the other means of filtering available. Kent doesn’t even get into questions of personalized search, his description of Google is perhaps a bit antiquated as it talks about finding the most relevant results for the user, rather than the ones most favorable to Google’s commercial goals.

Even if it’s not Beltrao that prompted this, you’d think The Filter Bubble (http://www.thefilterbubble.com/) itself would get a mention in the post, no?

To me, it seems to be clearly where the idea for this post came from. Kent even uses the phrase himself in the comments section. Is proper scholarly attribution not important in The Scholarly Kitchen?

I can see how they’re at least somewhat related. One could have brought up the subject, but there are endless things one could have brought up as well. Is Kent wrong for not mentioning and linking Google+ and Google’s new “Search Plus Your World” initiative? Should he be faulted for not mentioning and linking filtering methodologies like Mendeley or Faculty of 1000? If anything that has anything to do with a subject must be cited, then you’d have an insanely enormous posting that lacked focus.

To me, the “filter bubble” issue is about using one particular information source that tailors the information you see to your previous activities. This post is not about that. This post is about the way editorial boards would tailor a journal issue (or even an overall journal) in ways that added information to what could be gleaned from the individual article. That means an expert outside of your own experience was putting information into your worldview, rather than just having your existing worldview reinforced. We’ve lost that context, in some ways for better (we have vastly more efficient ways of sorting vastly larger amounts of information) and in some ways for worse (there are some spectacularly smart and talented journal editors out there who offer great insight that’s useful for filtering).

One could certainly make a related argument about the filter bubble and how it plays into this issue, but the issue itself is a separate one and the idea presented here is different than that driving the “filter bubble”.

Never saw his post. Wrote this on the train earlier this week. Coincidence.

Thinking on a purely practical, production-focused track. . . an article’s metadata can represent a relationship to another article in a very basic way (“if you are reading this, here’s a related article”), and these relations can be used by particular publishing platforms to aid a reader’s discovery once he or she is at the primary article. Could more metadata be added to indicate one or several of the following: the kind or degree of relationship between the articles; a special relationship such as suggested order (an especially useful piece of information for the types of “dialogs” that are published in humanities journals’ issues); the editor’s sense of significance of a given article.

That’s of course would have no bearing on how, say, Google, chooses to read that metadata and factor it in, but it’s there if another entity–the publishing platform, a research tool like Mendely or Zotera, etc.–wanted to make use of it.

The answer to why we can’t have both is scale. In the words of David Weinberger, the Web is too big to know. Search and social curation are the only methods that scale to the scope of the web. Before the web, organized knowledge represented on a tiny fragment of all the data that was created, all the thoughts that were written, all the songs that were sung. Massive rejection, eliminating better than 99%, including much that was valuable, as well as much drek, was the first step in organization. The Web has given us access to the long tail of information, and in this case 99% of everything is in the long tail.

Whatever the shortcomings of search and social curation, therefore, their killer ap is scope. What I think we have to do is to start taking a different approach to local data sets. Organizing them into sequences and hierarchies is not very productive, because people can enter them from anywhere. Thus my own contention that on the Web, Every Page is Page One (this happens also to be the title of my blog: http://everypageispageone.com). Rather than arranging our local collections, therefore, we should focus on linking them.

Of course, organizing web pages and site into sequences and hierarchies is what the old Yahoo! did, back when it was an acronym for Yet Another Hierarchical Officious Oracle. it didn’t take them too long to realise that the Web is just too big for that, and that an automated approach to classification (i.e. a search engine with a crawled index) is necessary if you’re going to keep pace with the growth of information.

So I agree that in general scale is the problem. Throwing away 99% of everything was never a good way of doing thing, but back when the only way to publish was the expensive way, there wasn’t an alternative. Now in 2012 it’s a feature, not a bug, that Google Scholar searches find blogs pages as well as conventionally published papers.

Comments are closed.