Continuing our Kitchen Essentials series of interviews with leaders of infrastructure organizations, this week we are speaking with leaders of preservation initiatives.Today we’re hearing from Alicia Wise, Executive Director of the CLOCKSS Archive, the digital archive for academic publishers and research libraries.

line drawings of various cooking and eating equipment

Please tell us a bit about yourself – your role at CLOCKSS, how you got there, and why you embarked on a career in preservation infrastructure?

I started university a bit early at 13 and supported myself with an eclectic mix of odd jobs after moving away from home at 16. After finishing a PhD in Anthropology/Archaeology at the University of North Carolina – Chapel Hill, I landed my first proper job in the Archaeology Data Service which was one of the earliest data archives. From there I followed my nose to interesting challenges and spent a spell doing consortial licensing at Jisc for UK academic libraries, learning about copyright and publishers as CEO of the Publishers Licensing Service, helping book publishers transition from print to electronic at the Publishers Association, and eight years at Elsevier pathfinding toward open access.

Common threads running throughout were community building and engagement, successfully leading through strategic change, and innovation at the intersection of content, copyright, and digital technology. At that intersection there are terrific people and a very complex set of stakeholder relationships. I really enjoy enabling groups with potentially disparate agenda and objectives to talk and work well together.

This all made me a good fit for the CLOCKSS digital archive, happily, and I was thrilled to be entrusted with the next phase of its development. My predecessors Vicky Reich, Gordon Tibbitts, Randy Kiefer, and Craig Van Dyck and the many talented librarians and publishers on the Board and in the CLOCKSS community have got the organization to a very strong point.

What is digital preservation and why is it important?

In the 1990s, concerns began to crystalize about the long-term preservation of digital information. Traditionally, libraries preserved materials in print format, but in the digital age, libraries license access to, or pay for open access to, content that is controlled by others. Books and journals are stored remotely by publishers on a variety of platforms and this content is accessed by libraries and their users over the web. While super convenient for immediate access and use, this creates challenges for long-term preservation, access, and use. If a publisher ceases to publish or to support a title, or the library cancels their agreement, then the research content may no longer be available.

Digital preservation requires effort from both libraries and publishers working together to serve the research community. Scholarship is built up over time, and researchers need reliable, sustainable access. They also want to be confident that their contributions to the scholarly record are secured. This is important as it is their intellectual legacy!

Under the leadership of organizations such as the Commission on Preservation and Access, OCLC, and the Research Libraries Group, academic libraries began to systematically explore how digital preservation of academic resources could be accomplished. Various projects launched in the late 1990s, and some have led to the development of preservation services.

CLOCKSS is one example, and it originated in the LOCKSS project launched in 1999. Led by David Rosenthal and colleagues at Stanford University, the purpose of this project was to operationalize the principle that “lots of copies keeps stuff safe”. The copies are not random copies, but instead carefully documented, ingested, managed, and preserved copies in sophisticated opensource software. This software is used around the world by CLOCKSS and many sister preservation services.

CLOCKSS was established as an independent charity in 2008 and is jointly governed by libraries and publishers. We have a community of supporting libraries and participating publishers in 62 countries and we are united to protect our shared intellectual heritage.

What sort of infrastructure does CLOCKSS provide, and who are your users?

We are an archive that preserves digital scholarship, particularly formally published books, journals, and related materials. We work with libraries and publishers around the world. CLOCKSS is a dark archive which means that no one uses the content in the archive until it has been abandoned by those who placed it on the web.

To date we’ve been entrusted to preserve over 54 million journal articles and almost 500,000 e-books. We work with publishers of all shapes and sizes and business models all around the world.

Scholarly content is archived in a network of carefully controlled servers distributed around the world at leading academic institutions, and the nodes in this network are in constant contact to check and if necessary, restore the authenticity to the content protected within. When content entrusted to us permanently disappears from the web, we then make it accessible to everyone.

The scholarly record is at risk without long-term preservation, and here I mean much more than a remote back-up copy. Long-term preservation requires active management to ensure that the content remains healthy, and vigilance in the face of changing technology, disk failures, hacking, and worse. Sadly, failure to preserve at all (or until it is too late) is the key challenge. These risks can impact any publisher, large or small.

When I took up the role of executive director, I rather naively imagined that most books and journals – at least academic ones – would now be safely preserved in digital archives. But this is sadly very far from being the case, and more is needed. According to the International ISSN Centre, in 2021 there were c. 2.8 million ISSNs issued. Fewer than c. 69,000 of these titles are fully preserved which means that they are archived in at least three independent digital archives. Martin Eve of CrossRef recently shared some equally sobering news. Keep in mind that if there’s no preservation service protecting the content to which a DOI is assigned, then if a publisher goes out of business or stops publishing a title, the DOI will stop working. In a sample of 7.5 million articles with DOIs, he could find no evidence of any kind of preservation for 2 million. Scholarly content needs to be preserved before adverse events, and preservation should be part of any publisher’s disaster recovery plan.

How is CLOCKSS sustained financially?

We are a not-for-profit charity based in California and sustained by financial contributions and service fees from libraries and publishers around the world. We would be delighted to welcome your organization to the CLOCKSS community. For more information about the benefits of membership and how your participation can positively impact the future, please see our website or email me at awise@clockss.org.

What do you like most and least about working in preservation infrastructure?

 Two terrific things are collaboration and the mission. It’s inspiring to see libraries and publishers working in partnership with one another, sharing knowledge and best-practice to achieve a common goal. The team at C/LOCKSS, as I’ve taken to thinking of CLOCKSS and the LOCKSS team at Stanford, is full of wonderful human beings and it is a real joy to work with each one. Our mission is inspiring: to ensure our intellectual heritage remains accessible through time. This motivates us every day.

In trying to answer this question, I’ve re-discovered just how Pollyannaish I can be. There are lots of challenges to this multi-faceted role, but I enjoy challenges and the variety that CLOCKSS offers.

Based on your own experiences, what advice would you give someone starting, or thinking of starting, a career in research infrastructure?

 That’s a brilliant question, however I question whether anyone wakes up and thinks “I want a career in research infrastructure” or indeed any kind of infrastructure. It’s super important stuff, but not necessarily on the radar. In fact, infrastructure works best if it’s invisible and just happens without apparent effort.

An archaeological analogy might be interesting here. The Romans and Victorians built excellent water systems, and in many parts of the world these are still in use. Rather incredible, really, given the passage of so much time. Anyway, good physical infrastructure is not seen or heard and so too for research infrastructure.

This of course is one of the reasons it is challenging to fundraise for research infrastructure. We need more people with creative ways of making infrastructure compelling and important… especially when it is working really well, and it is invisible.

As the leader of a preservation infrastructure organization, what do you think are the biggest opportunities we’ve not yet realized as a community — and what’s stopping us?

It would be a real relief to know that all parts of the scholarly record are properly preserved in at least 3 trusted archives. A trusted archive is one that has demonstrated its ability to preserve content and its usability in the long term. This can be demonstrated through means such as:

  • Relevant certification including peer review by library experts (e.g. CRL TRAC audit, ISO:16363)
  • Demonstrated mandate and funding
  • A proven track record of preserving academic content
  • Clear, transparent documented agreements, workflows, and processes to ensure long-term access to the repository’s contents
  • Open provision of information about their holdings on their websites, and via the KEEPERS registry (for content with an ISSN)
  • A succession plan so it is clear what happens to content if the archive goes under

To properly look after scholarship requires global collaboration and partnership, in fresh and creative ways. It’s time to decolonize collections and organizations, and find ways to be more diverse, equitable, and inclusive.

Organizational governance and structure are so important, necessary both to ensure long-term sustainability in a changing environment and to empower change and innovation. Flexibility, resilience, and strength are needed!

We need to debate more explicitly the interplay of print and digital curation if we are to formulate the correct balance. Both are needed. Both have environmental and financial costs, and neither our purses nor our planet is indefatigable.

Looking at your own organization, what are you most proud of, and what keeps you awake at night?

I’m incredibly proud that CLOCKSS is a harmonious community of academic librarians and publishers who believe in our mission. These stakeholders need to work together on digital preservation, and it’s a privilege to be part of an environment in which they want to work together.

Sleep is super important, and it’s a post-pandemic goal to live my life in ways more conducive to getting enough of it. That’s a fancy way of saying I don’t lie awake worrying about work anymore. I did have one very sleepless night early in 2023 when it seemed likely that our bank might go out of business before dawn, but that night ended well, and we have a much more robust system in place.

What keeps me on my toes during the day is thinking about what the CLOCKSS archive needs to do now to position it to be successful 50 or 100 years from now. One of those things is a closer focus on collections management. We don’t make the content we preserve accessible right away, but the pace and volume of content we are opening is increasing and will continue to increase over time.

What impact has/does/will AI have on CLOCKSS?

As a dark archive, the content entrusted to us is not available for users, whether human or machine, until it is no longer available in other ways. The content we preserve is a tempting collection of quality-assured content. We therefore need to remain vigilant about security.

We also need to think carefully about whether and how the entire online ecosystem in which scholars collaborate, discover, and share new knowledge needs to be preserved. To make sense of 21st century scholarship, do the AIs also need to be preserved alongside the persistent identifiers and knowledge graphs in our ecosystem? The task of digital archivists has evolved beyond ‘merely’ preserving content. In truth at present, we cannot preserve the entire online ecosystem of research. We can take snapshots and preserve meaningful points in a scholar’s journey through this ecosystem, and it is essential that we do so as we continue to reflect on responses to the many ways that our current boundaries are extending.

What changes do you think we’ll see in terms of the overall preservation infrastructure over the next five to ten years, and how will they impact the kinds of roles you’ll be hiring for at CLOCKSS?

Earlier I mentioned the parlous state of digital preservation in journal resources with a DOI or ISSN. No one even knows what the figure is for e-books. The infrastructure that exists in the journal world simply is not in place for the book world. We’ve got an exciting Proof of Concept project underway to try to crack this. It’s led by a wonderful consultant named Ruth Jones who is working in partnership with organizations such as EDItEUR, Nielsen Book Data, and OCLC.

Standards such as MARC and ONIX have a clear role to play in that project, and standards more broadly are an important growth area in preservation overall. I’m privileged to be chairing an ISO committee development the EPUB archival standard, and this should help more e-books to be more preservable. With standards need to come better ways to assess whether practices align and training and support to enable more good practice.

We need fresh approaches that make it easier for more materials to be preserved. Let me illustrate this point with reference to the JASPER project. CLOCKSS is a partner in this initiative along with the Directory of Open Access Journals, the ISSN International Centre’s KEEPERS registry, the Internet Archive, and the Public Knowledge Project private LOCKSS network (PKP-PN). The project was established in response to studies by Mikael Laakso and others showing that hundreds of open access journals have disappeared entirely from the web in the last 20 years, and that more than 7,000 titles registered with DOAJ have no preservation policy or archive in place. We’ve created a content pipeline which enables content to flow easily from DOAJ via the Internet Archive to CLOCKSS and PKP-PN. By working together in new ways we’ve been able to provide a really easy way for diamond OA publishers to archive their titles.

Vital services, such as those provided by CLOCKSS and other research infrastructure providers, must be underpinned by robust and sustainable business models and funding sources. The content and services entrusted to us continue to grow and to evolve in nature and complexity. There is increasing diversity in the organizations that publish and when and how scholarly outputs are disseminated and managed. The pace of innovation is lightning fast, and this certainly looks set to continue.

Roger C. Schonfeld

Roger C. Schonfeld

Roger C. Schonfeld is the vice president of organizational strategy for ITHAKA and of Ithaka S+R’s libraries, scholarly communication, and museums program. Roger leads a team of subject matter and methodological experts and analysts who conduct research and provide advisory services to drive evidence-based innovation and leadership among libraries, publishers, and museums to foster research, learning, and preservation. He serves as a Board Member for the Center for Research Libraries. Previously, Roger was a research associate at The Andrew W. Mellon Foundation.

Discussion

1 Thought on "Kitchen Essentials: An Interview with Alicia Wise of CLOCKSS"

Alicia, thank you for such a clear and insightful overview of CLOCKSS’s history and mission! I appreciated the overview and learned a lot reading it. And, Roger, thanks for shining a light on these important corners of the scholarly infrastructure – often the unseen work that keeps things running – literally in the face of an apocalpyse in this case, or perhaps a paper-clip maximizing misaligned super intelligence – we could need CLOCKSS as part of the audit trail for the accumulated knowledge of humanity.

Reading this interview, I couldn’t help but think of the “perpetual motion machine” analogy from the article Alicia co-authored in 2022; (Cramer et al https://onlinelibrary.wiley.com/doi/full/10.1002/leap.1494) The preservation challenge feels even more dynamic with the rise of GenAI and its ability to transform and synthesize content. This just reinforces the importance of what CLOCKSS does and the need for the funding and collaborative effort you both highlighted. Thanks for this post!

Comments are closed.