As the new academic year began late last month, the University of Michigan was faced with a major interruption in internet service. This downtime affected not only access to online resources but also the university’s ability to provide resources to the rest of us, including those from HathiTrust, ICPSR, and the University of Michigan Press. Given the recent attention that colleagues and I have been giving to shared infrastructure for scholarly communication, I have been reflecting on the importance — and limitations — of academy owned infrastructure.
Principles and Marketplace
Shared infrastructure for scholarly communication takes many forms. As our project team has scoped the field, it includes all the platforms and services necessary for publishing organizations to do their work, everything from identifiers to repositories, from hosting platforms to preservation services. In our work on this project, my colleagues Tracy Bergstrom and Oya Y. Rieger and I have heard many arguments in favor of academy owned or academy controlled infrastructure.
These arguments take several different forms. Some advocates and observers are looking for an alternative to publisher-owned journals that allows the academy to control its publishing activity. Others are looking for a stack of infrastructure that is more completely academy controlled, recognizing that newer systems and platforms — not just content and publications — are incredibly significant to the research and teaching enterprise. I continue to believe the argument that I first advanced in 2018, when it was then novel, that the academy needs to think with far greater seriousness of purpose about how to avoid monopoly or oligopoly dynamics that can result from commercially provided research infrastructure. Through an array of models, organizations like Invest in Open Infrastructure and the Center for Open Science, among others, have taken up the call for academy controlled infrastructure in one way or another.
Many of the principles of academy owned infrastructure are easy to accept in principle. All else being equal, it would indeed be preferable for the academy’s interests and values to more squarely be reflected in the research systems upon which it increasingly relies. But as we move beyond principle, we face tradeoffs in allocating resources. In one tangible example of this dynamic, my colleague Oya Y. Rieger and I found that while librarians and archivists may believe in the importance of academy controlled infrastructure, when collecting organizations are faced with choices about the digital preservation and curation systems they will adopt, system providers “compete within a marketplace that recognizes organizational values as one characteristic among many, such as the total cost of implementation and the feasibility of local implementation.” Similar dynamics in decision-making for enterprise software are found among not for profit publishers with journal hosting platforms and academic libraries with their library systems platforms.
Downtime and Recovery
We wrestle with these issues perhaps most effectively within the context of a specific example, where options and opportunities are constrained and risks and tradeoffs can be examined. Last week’s news at the University of Michigan is such an example.
Michigan experienced some kind of systems breach or threat that resulted in the university’s “intentional decision to sever” its internal network and systems from the Internet for a period of several days. I have heard speculation that few scenarios could have resulted in this dynamic other than a nation-state infiltration (perhaps in search of classified or sensitive information), but whatever the cause the consequences were substantial. Not only were university operations thrown into chaos right at the start of the new academic year, affecting everything from course registration to interlibrary loan and from payroll to research. In addition, services provided to the academic sector beyond the university itself, including publishing platforms maintained by the University of Michigan Press/Michigan Publishing as well as HathiTrust and ICPSR, were impacted and in some cases went dark for days.
This outage is notable not because a university experienced downtime — this has happened periodically at many institutions — but because the University of Michigan has long been a leader in the provision of digital library services not only locally but for the higher education sector. JSTOR was founded at Michigan, and Google Books could not have developed as it did but for the university’s leadership. Today, the university provides a number of important services, through various business models, for the higher education and research sector. This includes the university press and its Fulcrum publishing infrastructure, which also hosts other publishers and imprints including the ACLS Humanities Ebook Collection, Big Ten Open Books, Lever Press, and Amherst College Press. It includes HathiTrust, the archive of many millions of digitized books, a partnership of several hundred academic libraries, which became a lifeline during the worst disruptions of the pandemic. And it includes ICPSR, the collaboration of hundreds of universities and other organizations that operates a major research data archive that is vital for the social sciences. All these providers faced service disruptions as a result of the recent internet disruptions at the University of Michigan.
I am at pains to emphasize that all kinds of publishers and digital service providers experience downtime. Reliability has increased from the early Internet days, but it is not uncommon on librarian discussion forums to hear questions about whether a given resource, or a specific authorization service for it, is down. That said, a main reason that reliability has increased has been the steady movement away from local hosting of servers, for example in a server room at a company or a university campus, towards outsourcing to the cloud, such as those provided by Amazon, Google, and Microsoft. Because of their enormous scale and sophistication, including various forms of redundancy, these cloud providers provide high levels of uptime, which is one of the elements on which they compete with one another for business. To be sure, they are not infallible; in cases where Amazon Web Services has gone down, at least partially, it has affected numerous services and turned into a major news item (here is coverage of one recent episode). They offer a model of substantial scale, which, notwithstanding the built-in protections and redundancy they offer, a critic could argue represents a more centralized point of failure.
The Michigan services were locally hosted, in academy owned infrastructure, rather than outsourced to a commercial cloud. They were subject to the network administration, security, and other resiliency policies and expenditures deemed appropriate for the university as a whole. While it is possible that the Press or HathiTrust or ICPSR would have a service level agreement with its university IT department, as part of the same organization there would not have been any contractual protections available to them.
That said, academic services like these also can make choices about their backup and resiliency practices. For example, HathiTrust lost some services but not all of them, because it mirrors many of them at Indiana University (and some cloud services for further replication and disaster recovery). So the choice is not so much a binary one between local hosting and cloud but a more nuanced question about failover models under a variety of risk scenarios.
Beyond the fact of the downtime itself was the recovery and service restoration process. As Charles Watkinson, the Associate University Librarian for Publishing and director of Michigan’s press, rightly pointed out, the university may have understandably, if frustratingly, prioritized the needs of its own campus constituencies — and not its outward facing digital platforms — in the restoration process. This service restoration process was not simply a matter of turning internet access back on, as some of Michigan’s services, including those that might have been used for sales and customer service, were actually hosted in the cloud but also temporarily unavailable.
Thankfully, at present, it appears that no data held by these services were lost. HathiTrust, ICPSR, and the University of Michigan Press are among the most preservation-conscious organizations in our sector. There is a lesson in their approaches for others, some of whom may take a different risk profile. Their community minded communications also deserve praise.
And, this is not a case where it is clear that anyone made the wrong choice in terms of hosting or recovery. It would be too simple to suggest that every academic digital service should abandon local hosting and migrate to a commercial cloud provider. Such a step can require meaningful upfront costs as well as continuing fees and might not be the right choice for any number of reasons. Yet it does illustrate that infrastructure choices have costs as well as service level consequences.
Thus Michigan’s experience provides a tangible illustration of a set of questions that have no simple answers. What kinds of infrastructure must be academy controlled and what tradeoffs are worth making to achieve such an imperative?
In this case, commercial cloud hosting is not subject to the same kinds of monopoly/oligopoly concerns specific to the academic sector for example with research management platforms. With strong options that go well beyond the higher education sector, cloud services might be more like other non-core university operations such as food services. In such cases, academic institutions have made a calculation of costs and benefits without having to be concerned about impacts on the market overall.
Another set of questions are a little more pointed and specific, recognizing that we are dealing not just with infrastructure choices and values but also with threat modeling and risks. For example, what steps should academic providers take to best protect their assets and services in an environment in which hostile actors (including nation-states and their agents) routinely probe digital infrastructure?
These questions are situated within a broader dilemma. The scholarly communication and research communities need shared infrastructure more so today than ever before, particularly if we want to ensure a diverse publishing environment and a strong competitive services market. Nevertheless, providers wrestle with this dilemma within a context that provides enormous efficiencies and other returns that result from scale, even more so with the new services that emergent technologies such as generative AI will bring in the weeks and months ahead. I suspect the tradeoffs between the various models for providing shared infrastructure — everything from purely commercial to academy owned and lots of other points on this spectrum — will become even more acute in years to come.
I thank Tracy Bergstrom, Mike Furlough, Kimberly Lutz, Oya Y. Rieger, and Charles Watkinson both for a set of ongoing discussions on these topics and for their very helpful comments on an earlier draft.