E-journal Preservation and Archiving: Whether, How, Who, Which, Where, and When?

Archives' stacks (Photo credit: dolescum)

In a recent piece in Library Journal‘s “Digital Shift” section, Michael Kelley pointed out what looks like an alarming and growing problem:

A recently released study of e-journal preservation at Columbia and Cornell universities revealed that only about 15 percent of e-journals are being preserved and that the responsibility for preservation is diffuse at best.

The article goes on to point out that libraries and publishers are aware of this problem and some are taking concrete steps (evidenced by projects like LOCKSS and Portico and the Cornell/Columbia initiative 2CUL) to solve it. However, even those research libraries that participate in such initiatives usually archive only a few of their eligible holdings, and not all publishers allow their ejournals to be archived by third parties.

This article, and the study it cites, together raise a couple of interesting and difficult questions.

First, are the data accurate? This is a good question, but probably not a really contentious one. Even if the 15% figure is way off, the fundamental issue remains: lots of scholarly content is not being preserved in any kind of rigorous or even reasonably systematic way. I don’t think anyone would dispute that.

Second, how big a deal is this? That question is tougher and more fraught.

During the print era, scholarly publishers weren’t generally expected to perform a robust and reliable archiving function; they produced books and articles, sent them out into the world, and generally left it to others to worry about ensuring those products’ permanent curation. It was understood by everyone in the scholarly information chain that the fact that Yale University Press published a book in 1945 didn’t mean the press would necessarily still be making it available in 1965, let alone 2005. For the most part, archiving the book and ensuring its long-term availability to scholars was simply not part of the publisher’s remit. The same was generally true for scholarly journals.

The archiving-and-access function was performed by libraries—more specifically, by very large academic research libraries. But today, research libraries increasingly pay for online access (usually hosted by the publisher or a third-party aggregator) rather than purchasing physical copies of documents and curating them locally. Such an approach solves lots of problems for students and scholars by making access available remotely, around the clock, and by multiple simultaneous users, and by making it possible for libraries to offer access to far more content than they ever could have provided during the print era. But it also creates problems, among them the one pointed up by this report: a diffuse and ambiguous archiving mandate.

The report raises obvious and fairly urgent operational questions, and by largely ignoring them from this point on in this posting I hope I don’t give the impression that I’m dismissing them. It’s not that I think they’re unimportant—it’s just that a) I have no answers to those questions and b) I know there are lots of very smart people (like Vicky Reich and Kate Wittenberg and the folks involved in the wonderful 2CUL initiative) working on them.

What I want to do here, instead, is back up and ask a larger and maybe even more troubling question: how important is it that we archive all of the scholarly record?

I realize this question may sound crazy. How could any reasonable person (a librarian, no less) suggest that the scholarly record doesn’t need to be robustly and fully archived? I’m not saying that it doesn’t, but I am suggesting that we should stop and think before we automatically assume that it does—and that if we do decide that it does, we need to make ourselves fully aware of the scale of project we’re talking about.

Because let’s be clear about this: to say that we must archive 100% of the scholarly record is to propose an unbelievably monstrous undertaking. In 2010, the University of Ottawa’s Arif Jinha estimated that roughly 50 million scholarly articles had been published since 1665, and that about 1.5 million more would be published during the year in which he was writing. Citing Mark Ware, he predicts annual growth of this number at a rate of 3%. If these numbers are accurate, then simply identifying and tracking the creation of all scholarly articles is a gargantuan task, and it will be dwarfed by the project of systematically capturing, describing, and robustly archiving them.

Now obviously, no one expects that this project would be taken on by a single organization. The only way a comprehensive archive could possibly be created would be as a coordinated effort on the part of many entities. And in that word — “coordinated” — lies a challenge far greater than the already massive one of simply identifying and tracking 1.5 million+ articles per year.

One of the nice things about the old approach to archiving was that it was pretty much inadvertent — it happened organically and mostly without coordination as thousands and thousands of libraries around the world independently built their local collections. But that organic inadvertence hid enormous cost and terrible inefficiency. It also provided only an illusion of completeness and robustness; since there was no coordination, there was never any guarantee that the distributed archive resulting from all that collecting was truly comprehensive, or that if it was comprehensive today, it would remain so next year. If a well-coordinated, robust, and comprehensive scholarly archive was illusory in the print realm, it’s little more than a pipe dream in the online era, given the explosion of new documents and the wild and expanding variety of scholarly products.

Okay, so maybe we just have to accept the fact that an incomplete scholarly archive is inevitable. But this leaves us with another problem, because to say that it’s okay to archive less than 100% of the scholarly record is to reject a (probably impossible) program of comprehensive collecting in favor of an (overwhelmingly difficult) program of discrimination. Who will decide what will be robustly archived and what will not? What are the criteria, and who will determine them? Who will manage the process of discrimination? Who will pay for it?

The older I get, the more impatient I become with people who approach difficult issues with the attitude of “I have no answers; I bring only questions.” (I always want to respond “Whoa, dude, that’s really deep. But thanks for nothing.”) Honestly, though, I don’t know what else to say about this issue. The only really constructive proposal I can make is this: before we try to tackle the logistically daunting problem of comprehensive e-journal preservation, we’d better make sure we’ve addressed the politically daunting problems of deciding — in a rigorous and rational way — exactly how much of that problem we’re able to tackle, and then how we’re going to choose what gets left out. Because make no mistake: there is no way to avoid leaving something out. Better we should make that decision consciously (and painfully) than leave it (more comfortably, but less usefully) to chance and inertia. Do we have the guts to do that?

Rick Anderson

@Looptopper

Rick Anderson is University Librarian at Brigham Young University. He has worked previously as a bibliographer for YBP, Inc., as Head Acquisitions Librarian for the University of North Carolina, Greensboro, as Director of Resource Acquisition at the University of Nevada, Reno, and as Associate Dean for Collections & Scholarly Communication at the University of Utah.

Discussion

26 Thoughts on "E-journal Preservation and Archiving: Whether, How, Who, Which, Where, and When?"

Lots of interesting stuff here, and an important issue.

I am a lot more optimistic than you are, though.

First, the scale of the problem — I don’t think it’s as big as you think it is. I did some back-of-the-envelope calculations as follows. The total size of a pretty diverse collection of 5842 research-paper PDFs that I have comes to 2147483647 bytes, which averages to 367594 bytes per paper. (That seems lower than I expected, but let’s run with it.) At that size, the world’s total store of 50 million academic papers would come to 18379695712085 bytes, which is a little under 17 Terabytes. My usual hardware vendor currently sells a Seagate 3TB Barracuda SATA 6GB/s 64MB 7200RPM 3.5″ Hard Drive for £139.99. So it seems that ALL the world’s published papers could fit onto six such drives, for a total storage cost of £839.94.

That is a pretty shockingly low figure — and disk capacities are still rising fast, and prices falling fast, so that it wouldn’t too surprising if in ten years you can fit those 17 Tb onto a single hard drive costing maybe £200. At that point, the obvious thing to do would be for each of us to carry a copy around routinely.

So what are the reasons it’s not that simple?

1. My numbers could be way out. But even if (say) the average size of a PDF is double or triple what my sample suggested, the total storage space is still laughably small compared with what very ordinary businesses often use.

2. The real issue is of course that storage is only a small part of the problem. Gathering the papers is much more of an issue; and it’s obviously made completely impossible by the permission barriers that most publishers erect. While all articles published in (say) PLoS are easy to download in bulk (there are torrents), no-one is holding their breath for the day when we can do the same with Springer’s papers or Wiley’s.

3. Indexing, metadata and so on. To be brutally honest, this is much less important than it used to be. I have a smaller, more curated set of PDF for which I maintain a catalogue of references, and that is sometimes useful; but the larger set that I used above has no metadata but filenames, and surprisingly (to me) that turns out to be enough for most purposes. If I want to find, say, Osborn and Mook’s 1921 monograph on Camarasaurus, the filename “Osborn & Mook 1921 – Camarasaurus, Amphicoelias & Cope’s sauropods.pdf” tells me all I need to know. I’m not saying that further metadata is useless, but that this very primitive level gives us much of what we need.

So I think the solution for me is that I should wait ten years, buy a 30 Tb disk, download 80 million PDFs with well-chosen names, and hope that someone’s cleared all the permissions in the mean time. You can do the same, and so can all our friends.

Now that‘s what I call a distributed archive!

By Mike Taylor
Mar 7, 2012, 7:00 AM

Mike, you may well be right — only a sucker would bet money that what seems impossible (or even daunting) today will still seem that way ten years from now. But I think there are a few things missing from your analysis:

1. The problem of “gathering” that you cite is really the central one, I think, and the gathering process is blocked by more than just publishers’ permission barriers. It’s also made difficult, for example, by the great number of publishers and the variety of access terms (not to mention interfaces) they offer. I don’t see the “gathering” problem as primarily an issue of access being blocked–though that’s an issue too–but one of logistics.

2. Your storage analysis accounts for the size of the existing scholarly record, but not for the compounding factor of growth rates over time. Ten years from now it will indeed be relatively cheap and easy to store the 50 million documents that were produced by scholars between 1665 and 2010. But by that point the problem will have grown much larger. Of course, storage technology and the attendant economies of scale will have advanced by that point as well.

3. Papers are indeed relatively small documents. What are much more challenging, I think, are what I referred to as the “wild and expanding variety of scholarly products,” notably including data sets and multimedia documents that can be be produced with rapidly increasing ease and are often orders of magnitude larger than text documents. The scholarly record between 1665 and 2010 consisted primarily of text documents, but we clearly can’t assume that such will remain the case.

But the most important challenge, I think, lies in the fact that there’s probably not much money to be made by figuring out how to comprehensively archive previously-published scholarship. The free market solves some problems (like storage costs) very well, and others very poorly. Creating a comprehensive archive of the scholarly record is unlikely, I think, to make anyone rich, and that severely tempers the optimism I might otherwise have about solutions arising as the hidden hand of the marketplace guides future technological advances. (That said, it’s certainly possible that technological advances in seemingly unrelated disciplines and industries could create solutions that we can then apply to the archiving of scholarly products, and it may be that waiting hopefully for such developments is all we can reasonably afford to do, given other demands on our limited resources. But that’s probably not a very comforting thought to those who are concerned about the future of the scholarly record.)

By Rick Anderson
Mar 7, 2012, 10:38 AM

I’d agree with all of that. “Gathering” (there has to be a better term!) is certainly much, much more of a problem than storage. And it’s mostly a human rather than technical problem — the most obvious aspect of which is that most publishers just won’t let you do it. (I’d argue for example that a subscribing university has some kind of moral right to download its own local copy of the material it subscribes to, but you can bet they wouldn’t be allowed to do this.)

The storage-size problem for papers I don’t see being a problem at all. (Note that I allowed a 30 Tb disk for 80 million articles, rather than the 17 Tb for 50 million that we have now.) Storage sizes are going to grow MUCH faster that research outputs for a while yet.

For other data-sets: I don’t think we have such a problem, since other digital data (like gene sequences) is being born into an electronic world where archiving and preservation are natural issues — hence initiatives like GenBank. It’s the older material that bothers me.

Because you’re right that funding for this unglamorous work is not going to be easy to come by. There is not going to be a compelling financial case for anyone to do this — at least, not without locking up the results, which would render the whole thing much less useful. So it would really need to be something for governments to fund. And this is not a good time for government funding!

Maybe our best hope is someone like Bill Gates.

By Mike Taylor
Mar 7, 2012, 10:56 AM

Since Springer is mentioned in Mike’s comments, I feel obligated to note that very little breath-holding will be necessary to secure our archive of both journal and book content. We have taken the time and effort necessary to digitize our journal content back to volume 1, issue 1 wherever possible, and are currently undertaking high quality digitization of as much of our book content going back to 1840 as we can find. Yes, OK, there is a commercial aspect of this, as we do sell these digital archives, but our licenses certainly allow for backup copying, and we fully support LOCKSS, CLOCKSS and Portico. From my point of view, nothing could bring more of a sense of security for the scholarly record than distributed backup copies of this record at many libraries around the world. What is good for the scholarly record turns out to be remarkably good for the viability of a commercial publisher. The worth of our ongoing digital publishing program is only as good as the stability of the digital environment itself. We relied solely on libraries to store, preserve, and circulate our publications in the the long age of print, but publishers like Springer have become archive partners with libraries, and perhaps better stewards, in the digital era.

By Robert Boissy
Mar 12, 2012, 11:08 AM

Thanks, Robert, it’s great to know you’ve got such an exhaustive digitisation program and such a solid system of archiving for it all.

(Still, though, note the content of my original mention of Wiley — “While all articles published in (say) PLoS are easy to download in bulk (there are torrents), no-one is holding their breath for the day when we can do the same with Springer’s papers or Wiley’s”. Unless you know different that’s still true, isn’t it? Or are there places where I can bulk-download papers published by Wiley?)

By Mike Taylor
Mar 12, 2012, 11:47 AM

The software you seek is LOCKSS (Lots of Copies Keeps Stuff Safe). As long as the publisher has set up their host site to allow LOCKSS backups, and you have access to the content, and a spare PC and one of those inexpensive drives you mentioned at hand, you can commence downloading your archive. We have many academic clients around the world, who by virtue of the big deal, (aka the great deal), can make quite comprehensive LOCKSS backups of Springer content. And even better, once downloaded, all LOCKSS users can test the soundness of their content against that of other LOCKSS users who downloaded the same content. And while I would never set myself up as a spokesperson for another publishing house, I do see that the “Long-term Preservation” page on the Wiley site mentions both CLOCKSS and Portico. CLOCKSS is the controlled version of LOCKSS allowing a prestigious few libraries around the world to back up content in the event of some type of trigger event. The CLOCKSS Board decides when content needs to be opened up. The Board has an array of library and publisher representation. Quite a nice arrangement. Springer participates in both CLOCKSS and LOCKSS. For me, one of the nicest things about a good LOCKSS backup is that if for any reason the publisher server goes down, a LOCKSS backup can kick in immediately and display content to the end user at your location without it being obvious a backup is being viewed. Slick.

By Robert Boissy
Mar 12, 2012, 12:17 PM

I don’t think scale is the issue here. Whether there are 50 million or 150 million or 500 million is not really important – once one gets into the “millions” we are talking about systems that must be automated. I agree that “Access” is a red herring. The access policies of publishers are irrelevant as any system that is comprehensive (and I agree it must be comprehensive – selection is technologically unnecessary and, as you point out, politically fraught) must have the participation and consent of publishers (who can selectively open their access walls as needed for archiving purposes).

The real technology issues here are around format standardization (with publishers using many different DTDs and some not even publishing in full-text XML), gathering up all the pieces (e.g. PDF versions, figure files, data supplements, tables, etc.), tracking corrections and addendum’s, and format migration (or accommodation via software transforms) over long stretches of time. Simply determining and standardizing the formats to be archived (and PDF-only is most certainly NOT the solution) is a complex challenge. Certainly the folks at CLOCKSS and Portico have thought deeply about these issues and have implemented different solutions for approaching these challenges.

Rick, why do you think CLOCKSS and/or Portico are not the solution to the challenges you pose?

Also, why do you think the National Library of Medicine is totally silent on this important issue which is clearly aligned with their mission?

By Michael Clarke
Mar 7, 2012, 10:58 AM

Rick, why do you think CLOCKSS and/or Portico are not the solution to the challenges you pose?

I think because I disagree that scale is irrelevant, and I’m not sure that CLOCKSS and Portico are scalable to the full scope of the problem — especially once we start talking about scholarly products beyond journal articles such as data sets. I’d be very happy to be wrong about this, and maybe Vicky or Kate or someone will correct me.

Also, why do you think the National Library of Medicine is totally silent on this important issue which is clearly aligned with their mission?

You mean, in regard to medical scholarship specifically? (I don’t think the permanent and robust preservation of humanities scholarship can be said to be clearly aligned with NLM’s mission.) I honestly don’t know the answer to that question and would hesitate to guess. It does indeed seem like something NLM ought to be thinking about.

By Rick Anderson
Mar 7, 2012, 11:11 AM

This is a question I addressed in a talk at a meeting of the Association of American Law Libraries way back in 1994, titled “To Save or Not to Save: That Is the Question. In it I presented the results of an informal survey of what practices, if any, university presses were following to preserve electronic files of books and journals they published. The following words of wisdom from Princeton’s Chuck Creesy remain as true today as they were then: “Given the relatively short time frames before most electronic media need to be copied or refreshed, there’s still a lot to be said for acid-free paper. One of the surest archival methods may be to print out files on a high-quality laser printer, using a standard typeface in a large point size that can be scanned at a high level of accuracy. It’s been downhill ever since we stopped chiseling words on stone.”

By Sandy Thatcher
Mar 7, 2012, 4:01 PM

Physical capture on a highly stable medium is indeed an excellent archival strategy — as long as preservation is your only goal. If you want to provide access as well, then you’ve got a problem. (And if stone is your chosen archival medium, then it’s a really, really big problem.)

By Rick Anderson
Mar 7, 2012, 4:28 PM

Rick – I think that one of the things that librarians have conveniently ignored is that lots of scholarly content (and other, more pedestrian content that scholars’ wish to have now) was never preserved. Indeed, part of the challenge that preservation librarians faced is that their job was never intended to preserve everything for ever. It was about delaying the loss of information for as long as possible. That may be fifty years, a hundred years, or a thousand years, but the reality is that everything that they were preserving (and all of the mediums used to preserve the content) were organic in their composition and inherently subject to decay. Although we don’t like to admit it, it was always a holding action intended to preserve materials that (with the exception of some large microfilming projects) most often started with materials being identified through use, triaged to determine how (or if) they could be repaired, and if needed, replaced (if available) or reformatted.

It might sound heretical for a former preservation librarian and current collections librarian to say it, but we (a) have never preserved all of the scholarly record, (b) never will, and (c) maybe shouldn’t try. I would rather that we identify key resources and concentrate efforts on those materials.

That said, the insertion of the library into the stream of information generation (i.e., we are not just end consumers) should also mean a change in the place of publishers in that process. If they do not want to take responsibility for preserving the content that they generate, they should let libraries exercise the preservation clauses in the copyright law and get out of the way when libraries want to preserve the content that they have, in many cases, purchased.

By Tom
Mar 8, 2012, 11:52 AM

Agreed on all points, Tom. As librarians, I think we’re very good at identifying things (such as preservation of the scholarly record) that are valuable, but we’re not as good at dealing rigorously with the implications of resource limitations. When tell each other “The scholarly record must be preserved!”, everyone applauds and cheers. But when we say “Let’s decide what will be allowed to disappear in order to ensure the preservation of more essential parts of the record,” everyone boos (or quietly leaves the room).

By Rick Anderson
Mar 8, 2012, 12:56 PM

“when we say “Let’s decide what will be allowed to disappear in order to ensure the preservation of more essential parts of the record,” everyone boos (or quietly leaves the room).”

True!

But there are two rather separate issues here. One is how we can catch up with the scholarly record up till the present. That is hard, and involves all sorts of archaeology and digitization. The other is how we can get it right going forward: and that, in the scheme of things, is easy. There is really no very compelling reason now why every new article in every new journal should not be archived at birth (as indeed is done for many of them).

By Mike Taylor
Mar 8, 2012, 1:01 PM

But there are two rather separate issues here. One is how we can catch up with the scholarly record up till the present. That is hard, and involves all sorts of archaeology and digitization. The other is how we can get it right going forward: and that, in the scheme of things, is easy.

As to the first issue: I don’t think we can address “how” until we deal with the question of “whether.” Should we, in fact, fully archive the past scholarly record? If not (and I don’t see how we possibly could), then how will we decide where to focus our investment?

As to the second: I’m not sure those tasked with permanently archiving “every new article in every new journal” would agree that such a task is easy. But I do think proposals for a comprehensive, scalable, permanent, affordable, and easy program of scholarly preservation would be welcome.

By Rick Anderson
Mar 8, 2012, 1:20 PM

I did say easy in the scheme of things! Of course, there are plenty of human and political barriers to overcome. But as a technical challenge, it verges on the trivial now. Whereas archiving the past record offers huge technical and political challenges.

By Mike Taylor
Mar 8, 2012, 1:22 PM

We have always determined what would or would not be preserved. The act of copying a codex or acquiring a university press book is, at its most elemental, an initial step in preserving the scholarly literature. When we decide to not copy something or not acquire copies of the TV Guide, we are making qualitative judgments about the value of the material and, by extension, its likelihood of being preserved.

Folks can boo if they want, but it’s the truth.

By Tom
Mar 8, 2012, 3:31 PM

Archiving points out critical differences in the two current business models of journal publishing: subscription and open-access. For a subscription journal, every paper is an asset to be monetized by attracting paying subscribers. To some, this sounds evil, but look at the alternative. An open-access journal considers every paper a cost center, a liability they are obliged to service.

Even if a subscription journal were to fold, its assets (e.g. published papers) would retain some value and quite likely be acquired by another subscription service. A failing open-access journal would have to find some other entity willing to accept its obligations with no corresponding benefit. In which situation are papers more likely to remain available?

By Ken Lanfear
Mar 9, 2012, 12:16 PM

The issues for me also embrace concerns over the integrity of the scholarly record. If I write a scholarly paper, and a reader is not ale to locate and consider my sources because they have not been properly archived, then that reader must question my findings and conclusions. If I write a paper, and because it has not been properly archived by a third party repository, the text or the data has been changed, then the scholarly record is flawed. What we have failed to do well in the past should not guide the scope of our responsibilities in the future. We need a national strategy, and the expanding discussions of a Digital Preservation Network (the so-called Deepen Project) built and maintained by the research university ala Internet-2 may provide a substantive strategy.

By Jim Neal
Mar 14, 2012, 7:28 PM

What we have failed to do well in the past should not guide the scope of our responsibilities in the future.

True, but I think in some cases it should temper our optimism about what’s possible. That said, Jim, you make a good point and I share your interest in the DPN project. It’s worth noting, however, that at this point DPN’s focus seems to be on robustness rather than completeness. For an archive to serve its purpose, it obviously has to be robust, and too many of our existing archival solutions are not robust enough. But the caveat I raise in my piece is about completeness–more specifically, about what I strongly suspect is the impossibility of creating what the DPN brief calls “complete coverage of the full scholarly record.” My concern is that:

1. Complete coverage of the full scholarly record is impossible;
2. Not liking to admit that, the scholarly community will continue operating on the assumption that it is possible;
3. Operating on that assumption, no one will worry too much about how to decide what will and won’t be permanently archived;
3. Scholarly content will therefore end up being included or excluded from the permanent record based not on strategic prioritization, but rather on the forces of chance and inertia.

Now, I may be wrong–perhaps it is now (or soon will be) possible to capture and preserve all of the world’s scholarly content. I would love it if that were the case. But to me, it seems unlikely enough that we should be thinking hard–and making difficult decisions–about what it makes sense to do in the event that it isn’t.

By Rick Anderson
Mar 14, 2012, 11:17 PM

Thanks, this is a really important discussion. After reading here and watching the presentation I’m left pondering how much open access ejournal content is being archived by web archiving efforts. I agree with Mike’s point that it’s not really the “how much” that is a problem but the “how” of finding what needs to get archived, and what to archive (pdf, html, etc).

In order to prioritize I wonder if citation could provide a good boosting metric for whether or not to archive. This way journals that are cited more often would be prioritized higher in the archiving queue.

Another more qualitative metric was suggested by Vicky Reich in the Q/A period. Perhaps the vulnerability of the publication should influence whether it should be archived. Elsevier, Springer et al are probably going to take pretty good care of their catalog, and will pass on this intellectual property to their corporate heirs (unless the whole industry falls apart). The titles that are at risk are ones published only on the Web, without a huge budget, by a small staff. I guess this class of materials has something in common with what librarians have called “gray literature” in the past.

Maybe citation and vulnerability could be used together? Anyhow, thanks again for raising awareness about these results from the 2cul project.

By Ed Summers (@edsu)
Mar 20, 2012, 11:06 AM

I like the idea of using vulnerability as a criterion, as long as it isn’t the only (or even the driving) criterion. It seems to me — and I’m certain some will disagree — that there’s an awful lot of stuff out there that isn’t worth the cost and effort of preserving. Some documents are vulnerable despite their value, but I think we have to acknowledge that some documents may be vulnerable precisely because they lack much value (and for that reason have had a hard time getting a toehold in traditional publication streams). If we try to allocate resources based on the assumption that everything vulnerable is equally worthwhile, we run the risk of exhausting our resources long before everything has been captured, and thereby ending up with an archive consisting of a random mix of more-valuable and less-valuable things.

This raises a really difficult question, obviously: who will decide what has more or less value? But the difficulty of that question doesn’t change the fact that unless we try to make such decisions, we’ll end up essentially archiving at random.

By Rick Anderson
Mar 20, 2012, 11:38 AM

It occurs to me that the libraries at universities that operate presses might take it upon themselves to arrange with their presses to provide long-term preservation for the publications that emanate from their own presses. This is not now being done to any significant extent, so far as I’m aware.

By Sandy Thatcher
Mar 20, 2012, 1:09 PM

Prior question: Who’s included in that ‘we’ who’d be (a) deciding and (b) implementing the decision(s)?

By David Talley
Mar 20, 2012, 10:36 PM

Oh, a stickler for precision, eh? 🙂

I guess by “we” I mean “all of us who are involved in the scholarly communication chain and hold a stake in its ongoing integrity.” That would include, for example: authors, publishers, grant-makers, libraries, academic institutions, third-party hosts, archives. Does that sound like an unworkably broad constituency? Voilà le problème.

By Rick Anderson
Mar 21, 2012, 10:21 AM

That’s kind of the sense I got, but it’s a fine thing you’re doing in raising the question. I’m not sure if it contributes to a solution or exacerbates the problem, but I might add another category to that constituency: professional associations and scientific/technical societies. (Or did you consider them under “academic institutions”?) They have a stake in journal content within their fields and frequently also the information management expertise and technical resources to make a credible commitment to keeping the content safe and available. They do tend to try to limit access to their own membership, however.

More generally, I can imagine topic-specific libraries also committing to hold specific journals within their scope. That sort of self-selection by capable institutions might be part of an answer for the vulnerable content mentioned upthread, but it might also be too chaotic to qualify as the coordinated system you’re discussing. Still, it’s one way of making that selection of what gets archived on a basis other than pure use/access statistics — if a knowledge institution with the capability is willing to commit to archiving, that’s a sign that the journal is worth the effort. Or something. And it gives some likely suspects to approach with a suggestion that they pick up a journal that’s currently orphaned.

Such a situation would require some shared listing of which institutions are archiving what journals under what sorts of access policies. The listing would have to be mirrored widely, but it seems a more doable solution than widespread duplication of the content itself across who knows how many of those cheap terabyte drives, as well as the migration of content from those drives to new storage media as they emerge.

By David Talley
Mar 21, 2012, 11:48 AM

Good points, David, especially about the importance of societies as stakeholders — I guess I was lumping them in as “publishers” in my mind, but the stake they hold goes beyond the direct role that many play as publishers. And even those who aren’t publishers are still stakeholders in this issue.

By Rick Anderson
Mar 21, 2012, 12:51 PM

The Scholarly Kitchen

E-journal Preservation and Archiving: Whether, How, Who, Which, Where, and When?

Innovation Showcase Highlights Cutting-Edge Publishing Solutions

View photos from the 46th Annual Meeting!

Rick Anderson

Related Articles:

Next Article: