Surprise, Surprise - The Web Turns Out to Be Too Persistent

I was surprised recently when a former co-worker mentioned that he’d found a hobbyist blog I’d abandoned years ago, still available and discoverable. I can no longer delete it, because the email address I used when establishing it is defunct, and I don’t recall the password. Without a major effort, one I’m not willing to exert, I am stuck with it.

The recent court ruling in Europe establishing a “right to be forgotten” brings up interesting issues for scholarly and scientific publishers, who have spent the better part of the last decade bringing vast archives of old research reports online.

Last week, in the New Yorker, Jeffrey Toobin’s excellent article detailed some of the issues involved in the ruling and its aftermath. He starts with a story of a girl who was decapitated in a car crash, pictures of which leaked out of the coroner’s office and onto the Internet. These devastating pictures have proven nearly impossible to remove and difficult to conceal. The parents have had to deal with this nightmare on top of the nightmare of their young daughter’s death.

Remember the days when editors and others would scoff at the Internet as something that was too unstable and fleeting to compete with paper, the more durable and enduring medium? Toobin’s article touches on the nice way paper tends to vanish or become very obscure around the time it suits humans, while the Internet remains stubbornly persistent and timeless:

“Back in the day, criminal records kind of faded away over time,” [Sharon] Dietrich[, director of Community Legal Services in Philadelphia,] said. “They existed, but you couldn’t find them. Nothing fades away anymore. I have a client who says he has a harder time finding a job now than he did when he got out of jail, thirty years ago.”

Viktor Mayer-Schönberger, author of, “Delete: The Virtue of Forgetting in the Digital Age,” is quoted in Toobin’s story, as well:

. . . digitization and cheap online storage make it easier to remember than to forget, shifting our “behavioral default,” Mayer-Schönberger explained. Storage in the Cloud has made information even more durable and retrievable. . . . “We do not know what the future holds in store for us, and whether future governments will honor the trust we put in them to protect information privacy rights.”

While some portray this as a “freedom of the press” issue, I don’t see it. Are we supposed to take the blinds off our windows to serve the press? The press having the freedom to say what it wants and to operate without answering to a governmental authority is far different from the press having access to all our information forever. The press can still contact us, ask questions, verify sources, and report like they always have. We have no obligation to serve it all up on a silver platter.

For scientific and scholarly publishers, there is a corollary issue, and it relates to our archives. They are, at best, mixed bags, a fact that has become more apparent as assumptions about journals as utilitarian resources for broad, non-expert audiences — rather than historical records for specialist research communities — have emerged. Some of the papers are classics and have aged well. Some are forgettable but harmless. Still others are misleading, wrong, and should be taken out of practical circulation, as they only matter for historical purposes. But we mostly manage our archives as if they are uniformly relevant and useful — or purely historical — as if there aren’t differences in what they contain.

Some journals and publishers manage the issues involved by segregating the archive off, usually at a clear cutoff point, say 1990 or 1995. Others add this cutoff point to their internal search engine — but this only solves native search, not general search, issues. Some mark their archive articles accordingly. In each case, these approaches acknowledge the issue but use a blunt and convenient approach to addressing it.

When journals do dive into their archives with intellectual intent, the picture often becomes more nuanced. Celebrating anniversaries or milestones is often a reason to do this work, where editors find the archive to be much different than they’d imagined. There are funnier, stranger, more interesting, and more astounding things in there than they’d thought.

In fast-moving fields, the archive can be downright misleading unless accessed by an expert who knows the field. In medical fields, in particular, historical information can include procedures that are no longer practices, diagnoses that are no longer used, and tests that have been proven irrelevant and misguided. A disease once treated only surgically can now be treated medically or with radiation therapy, or some combination.

And in an era with more access given to less qualified people (laypeople and an increasingly unqualified blogging corps presenting themselves as experts or journalists), not to mention to text-miners and others scouring the literature for connections, the obligation to better manage these materials seems to be growing. We can no longer depend on the scarcity of print or the difficulties of distance or barriers of professional expertise to narrow access down to experts with a true need. More and more, a simple search can unearth materials of questionable relevance, presented without condition or qualification.

An interesting issue involved in archival articles is what “peer review” was in the period in question. If your journal stretches back to before 1900, chances are the peer review of those preceding eras was much less rigorous than it was after, say, 1950. In fact, journals were scrambling for articles, many emanated from groups of colleagues who published one another almost exclusively, and many were regional and therefore limited to a small group of labs or hospitals. Peer review under these conditions was highly variable and far less diversified than it is today. Should your archive provide some insights into what publication practices were like in 1895 vs. 1995?

Taking the time to sort through these vast archives and make designations of what is put in historical shadow is a daunting task. Toobin writes about how Google and Bing and other search engines are scaling up efforts to assess requests to be forgotten. Our job is less daunting — we have static archives of finite size, making it a one-time effort. But it is a major editorial effort, nonetheless.

Such work is fraught with uncomfortable and unfamiliar editorial decisions. Leeches provide an interesting example. They are no longer mainline medical therapy, but they are of great historical interest and still inform some drug development approaches. Are papers about the application of leeches to be put in the “history” bin? Or should they remain in the main retail outlet? Surely, within the category “leeches,” there are some papers of purely historical interest while others remain somewhat relevant to current research pursuits. Making these distinctions would give our users not a right to be forgotten, but the right to forget.

The Internet has proven to be almost uncomfortably persistent. Our archives are, to some unknown degree, part of the problem, adding to search results articles of variable applicability and relevance. We are already facing recurring problems with a widened funnel of journals publishing papers of dubious quality. Over the last decade, we’ve also widened the funnel at the base, by adding huge archives. Mismanagement at either end is a sort of filter failure. Do we need to do some more work here, to provide our audiences with an implicit “right to an ahistorical literature,” one that is based more purely on pragmatic relevance than on historical habits?

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

22 Thoughts on "Surprise, Surprise — The Web Turns Out to Be Too Persistent"

In an era with more access given to less qualified people (laypeople and an increasingly unqualified blogging corps presenting themselves as experts or journalists), not to mention to text-miners and others scouring the literature for connections, the obligation to better manage these materials seems to be growing. We can no longer depend on the scarcity of print or the difficulties of distance or barriers of professional expertise to narrow access down to experts with a true need.

I think this may be the most revealing thing ever written on The Scholarly Kitchen. It’s hard to see a way of reading it that isn’t contemptuous of everyone outside the Magic Circle. Ideally, the great unwashed should be excluded altogether; but if we can’t do that, then at least we must tell when what to read and how to use it. Heaven forfend that we let Ordinary People make such decisions for themselves. That is for the priestly caste to do.

By Mike Taylor
Sep 30, 2014, 7:40 AM

Your so-called “Magic Circle” has no magic involved at all. It is, in reality, built of all sorts of varied expertise and training and educational attainments. Most of what we publish can’t be comprehended by most adults simply because of the reading level involved, not to mention the specialized vocabulary and jargon developed by each domain. Even cross-domain reading is severely inhibited by the differences in training and experience. Acknowledging these realities is what excellent educators do everyday. Rather than just throwing out materials people can’t possibly understand or interpret properly, they translate to the level of the student and work forward.

As for the vast archives we’ve put online, they are replete with dead ends and misguided hypotheses. Putting up content — no matter its age — creates obligations. Managing our archives creates obligations, as well. With journals being viewed more and more as practical outlets and not merely records of scientific reports (an implicit assertion in your comment), those obligations heighten and change. Meeting them is not a simple task, but it does take expertise to do it well.

By Kent Anderson
Sep 30, 2014, 9:37 AM

Kent, are you presenting yourself as an expert on this topic? It sounds like it. If so then might you be an example of that which you complain of? Or perhaps intelligent discussion by people who are not experts is actually okay, a good thing even. I kind of think it is.

By David Wojick
Sep 30, 2014, 3:08 PM

I do know a rhetorical device when I see one, and to flag it as such, even if it’s clever.

Examples are everywhere in archives, which often hold definitions of phenomena that are outdated (predating blood types, genetics, genomics, mosaicism, vaccines, germ theory, antibiotics, sterile surgery, anesthesia, imaging technology, and anti-inflammatories in medicine, just to name a few; prior to quantum theory, radio astronomy, discovery of numerous elements, nuclear theory, tectonics, and more in general science; and the list goes on) and therefore misleading to some degree or another if viewed in isolation or without the expertise to put them into context. One way to manage the archives would be to identify the concepts that emerged subsequent to the publication of the paper in question, leading people forward from the historical material more appropriately. But I think the more basic questions are, “Why are they there? Who are they for?”

By Kent Anderson
Sep 30, 2014, 3:39 PM

Publishers are keen to find a role for publishers in the 12st Century; librarians are keen to find a role for librarians in the 21st Century. It may be that there are indeed roles for both: but it will not be retrospectively rewriting the published record. We simply have to accept that what’s published is stays published; and assume that people, given access to the corpus, will find solutions to the problems presented in searching and navigating it — just people given access to the corpus of the developing World Wide Web came up with search engines, third-party tagging, RSS feeds, overlay collections and more.

Simply: whatever solutions the gatekeeper organisations might come up with, the lesson of history is that those solutions will be hugely surpassed and superseded by the much more numerous and ideas that come from a worldwide community — some of which will be bad, but some of which will be brilliant.

Best just to get out of the way and let it happen. Any one of us might be able to contribute to the process; but none of us will benefit the world by standing in its way.

By Mike Taylor
Sep 30, 2014, 3:47 PM

Publishers are keen to find a role for publishers in the 12st Century; librarians are keen to find a role for librarians in the 21st Century.

As someone who has worked with many, many publishers, and who has directly supervised quite a few librarians, I can say with some confidence that this is also often true in reverse.

By Rick Anderson
Sep 30, 2014, 7:45 PM

Actually Kent, I am quite serious. You have a tendency to claim that small problems are grand and to propose strong measures to solve them, and this seems like a good case of that. If you want to control access to the content of the web and feed it to people in your prescribed fashion than I have to disagree. In fact I think that Mike put it quite well. If your goal is to prevent error by controlling access to information then I oppose you.

By David Wojick
Sep 30, 2014, 4:05 PM

You and Mike are making the issues raised here about access, when I’m making points about responsibility. When we publish things, prospectively or retrospectively, we have some responsibilities. I don’t think we’ve thought through these all in regard to archival materials.

By Kent Anderson
Sep 30, 2014, 4:28 PM

So long as “responsiblity” here means “providing additional annotations alongside published works to optionally help guide people’s interpretation”, then that’s helpful. Some of your earlier statements indicated that “responsibility” meant “restricting access to certain materials”. That’s what won’t fly.

By Mike Taylor
Sep 30, 2014, 4:30 PM

One of my thesis advisors, a Nobel laureate, told me to never read anything less that 50 years old that isn’t still cited. He was only half joking (a great deal of that would have been his own work! 🙂 I now work primarily in the biomedical domain, and there is nearly a similar formal principle, to the effect: Only read review articles, and only the most recent (or at least start there). This is actually essentially codified into pubmed; it reports results latest first, and has a special button to filter so that you only see review articles (although this is not the default). A large part of what makes a profession a profession, as opposed to a trade, is that the knowledge and skill is so difficult or obscure, and the domain so important or dangerous, that there needs to be what amount to a peer review process to filter real from bogus practitioners, thus medical and legal licensing (as well as, I guess, driver licensing). Perhaps there should be a licensing authority for journalists, as opposed to the prices now where pretty much anyone can call themselves a journalist. Perhaps one for professional editors as well.

By Jeff Shrager
Sep 30, 2014, 7:49 AM

The Internet has also accomplished the feat of making the term “out of print” archaic. This has some importance to the future of the rare book business because, almost by definition, no book published in the Internet age can ever be considered rare because of the availability of POD. POD also renders senseless the idea of a “first edition.” So I guess the rare book business will be confined to books published prior to the Internet age.

The notion that an article or book may become regarded, in any given field, as of “historical interest” only plays right into the hands of the ARL and its Code of Best Practices on Fair Use, which argues that use of any such work is sufficiently different from the use of the work when originally published that it becomes use for a different purpose and hence “transformative” fair use.

By Sandy Thatcher
Sep 30, 2014, 11:29 AM

Reblogged this on DailyHistory.org and commented:
Ever notice that the comments you made in a obscure online chatroom in 1998 are still online? Kent Anderson’s article on The Scholarly Kitchen discusses the surprising persistence of postings on the internet. The Internet Archive’s WayBack Machine seeks to store archived pages web pages and other internet pages persistent even after they are left unattended.

Anderson addresses how this issue impacts search results for peer review journals. While the issue facing research libraries is somewhat different, Anderson argues that research libraries increasing need to distinguish between current content and historical content. When does current medical or scientific knowledge become historical as opposed to immediately relevant for researchers and scientists. Search results do not necessary make those distinctions. What role should libraries play to make sure that they are providing search results that best fit the needs of the researcher?

By sandvick
Sep 30, 2014, 1:12 PM

Largely agree with other commentators on this. Kent surely you do not mean to say that a publisher / aggregator should take it upon themselves to edit the corpus of historical scientific content and pick and choose what to delete ? You can imagine the uproar this would generate. It seems to me that there is already a process for ensuring that invalid arguments and poor research are not given credence, and that is the process of peer reviewed argument and counter argument, citations and academic dialogue.

There are a few issues that would arise if we removed ‘incorrect’ scholarly content from the records.

First, what seems wrong today may prove to be of interest or even not wrong in future, as old ideas and research can become relevant again.

Second, when I did my PhD some years ago looking back over the literature to see how arguments, methods and thinking evolved was enormously helpful in understanding the academic debate and evolution of thinking in my field.

Third, the information contained an article may be of interest even if conclusions are wrong e.g. methods used, specific data points, the author and content for their wider activity.

Fourth, most academics would discount older content anyway, in which case what is the source of bias exactly and why is it necessary to edit or delete such content from the record ?

Declaration of interest : I work for Elsevier, and shortly will be presenting a personal take on ‘The Right to Be Forgotten’ at the Cambridge Union. I have lots of sympathy for those who want stronger data protection online but EU rulings to force companies to edit search results and so on is not the right way to go about it in my view.

By Gabriel Hughes (@gabehughes)
Oct 2, 2014, 9:30 AM

I am arguing for us to think about these things, and not to just put archives up in an era of increased access and more lay access without taking some responsibility for what we have retrospectively published again.

Many of your points are good points if you assume the audience is a sophisticated scientifically literate audience, but they become weaker if you acknowledge the increasing use of archives by bots, scientifically illiterate audiences, and algorithms. In fact, all four of your points assume scientific sophistication. With the barriers of distance, library curation, and others, those assumptions were largely true. They are no longer, and I believe we need to rethink how we curate our own archives. I agree, the EU decision isn’t correct, but it hints at something core to the Internet Age and how humans need to continue to craft the blunt technologies into something that works for us.

By Kent Anderson
Oct 2, 2014, 6:47 PM

Many of your points are good points if you assume the audience is a sophisticated scientifically literate audience, but they become weaker if you acknowledge the increasing use of archives by bots, scientifically illiterate audiences, and algorithms.

One of these things is not like the others. Is “scientifically literate audience” an existential category? Is the upshot to make sure that it remains a close approximation? Or is this an insinuation about general intelligence?

By Boris Ogon
Oct 3, 2014, 2:14 AM

In fact, all four of your points assume scientific sophistication.

They really don’t make that assumption. They do the opposite: they make no assumptions up front about who is going to be using published materials or how — instead, leaving it up to other 6,999,999,999 of us to come up with our own ideas and make our own choices.

The issue is this: who’s smarter? One publisher, or seven billion citizens?

With the barriers of distance, library curation, and others, those assumptions were largely true.

Is that what this is all about? Trying to sell the idea of barriers as something desirable?

By Mike Taylor
Oct 3, 2014, 2:40 AM

You need to re-read his points. The first assumes that someone is scientifically literate enough to distinguish between old ideas that don’t matter and old ideas that might matter. That’s very high-level scientific thinking and knowledge. The second point is about when he did his PhD. Of your 7 billion people, this is a fraction of a percent of people, with reading skills and knowledge that are both orders of magnitude beyond the mean. The third point mentions using knowledge from an article in a scientific setting, which again sets the bar very high. And the fourth point characterizes “most academics,” which is not anywhere near your 7 billion people.

I mentioned barriers of old because we are behaving as if they still exist. They do not. Therefore, our responsibilities as curators need to be rethought and reconsidered. Contextualization, education, and expansion around archives are all legitimate approaches to making them clearer and safer to use.

You seem to be arguing that editors, publishers, and authors have no responsibility to their audiences (especially when audiences broaden significantly because physical access is no longer inhibiting the flow of information) and that these highly trained specialists should just put materials out into the world without any differentiating signals and let the 7 billion figure out what matters. I think that’s actually a pretty elitist view — that everyone is to be treated as if they are elite, and there are no other types of people. Your approach strikes me as our version of “let them eat cake.”

By Kent Anderson
Oct 3, 2014, 8:13 AM

I’m surprise this needs to be spelled out, but here it is:

No-one expects all seven billion people to be capable of coming up with brilliant new transformative uses for research papers old and new. The issue that we can’t tell which of the seven billion people are capable (or what idea they will come up with). The most effective way to find out is to make everything available for everyone, and see who comes up with what. However well intentioned, introducing artificial barriers will not help this to happen.

By Mike Taylor
Oct 3, 2014, 8:22 AM

No one objects to adding explanatory content to the flow of information. You sounded like you wanted to restrict the flow, which is quite different. I would like to know more about the problem you are trying to solve. Only then can we discuss solutions, if any. But if you are trying to prevent stupidity then I am skeptical.

By David Wojick
Oct 3, 2014, 9:08 AM

I do think that while Mike and Boris in their comments are being perhaps overly kind to the world’s population, they do make a valuable point about our inability to sort both the material and the members of the population where it can be of value.

We do have clear examples where there are problems, and in fact dangerous problems threatening human health. The obvious example where the general public has failed to properly understand the scientific literature is the anti-vaccination movement, which continues to be spurred forward via citing the infamous Wakefield article on the subject, and is responsible for outbreaks of long dormant diseases.

While I agree that trying to hide or restrict access to articles such as this is the wrong approach, the question is worth asking: can we, as stewards of the scholarly record, do more to help the non-expert accurately steer through the historical record?

And for what it’s worth, we might consider doing the same for the academic community, given the propensity for citing retracted articles:
http://scholarlykitchen.sspnet.org/2012/08/10/the-secret-life-of-retracted-articles/
I wonder if one solution might be using CrossRef functionality to build a database of those retracted articles, which could power a tool to flag them to the editor’s notice whenever they are cited?

By David Crotty
Oct 3, 2014, 9:27 AM

“But if you are trying to prevent stupidity then I am skeptical.”

Very nicely put. That is the issue in a nutshell. Nothing we do can ever prevent stupidity; but by trying to prevent stupidity, we can also — as collateral damage — prevent cleverness.

By Mike Taylor
Oct 3, 2014, 9:28 AM

The Scholarly Kitchen

Surprise, Surprise — The Web Turns Out to Be Too Persistent

Kent Anderson

Discussion

Latest “Pulse Check” Results Reveal Diverse Approaches to Social Media

SSP Joins Nearly Half Million Comments in Opposition of Proposed OMB Revisions

Kent Anderson

Related Articles:

Next Article: