Continuing our Kitchen Essentials series of interviews with leaders of infrastructure organizations, today we’re hearing from Stephanie Orphan, Program Director of arXiv, the e-print repository that serves a number of scientific and quantitative fields.
Please tell us a bit about yourself – your role at arXiv, how you got there, and why you embarked on a career in research infrastructure?
I’ve been the Program Director at arXiv since October 2022 and prior to that was at Portico, which is important infrastructure in its own right, working on digital preservation for 15 years. I didn’t consciously embark on a career in research infrastructure so much as landed in it due to my skillset and interests. I trained as a librarian, with an emphasis on information systems, though I never ended up actually working in a library. The commitment to service and belief that access to information is a cornerstone of a just society that led me down the librarianship path combined with my natural curiosity and enthusiasm for technology solutions to create the perfect storm. As Program Director, I am part of the arXiv leadership team, with responsibility for operational oversight and administrative management — essentially making sure that daily operations are (hopefully!) running smoothly. I also work with our advisory councils, leadership, and staff on planning and priority setting and executing on those plans.
What do you like most and least about working in research infrastructure?
I love working with smart, mission driven people — within arXiv and across the scholarly communications ecosystem — who are truly interested in solving problems and doing good for our corner of the world (with ripple effects out to the world at large). There is always something compelling to think about and very few days are the same. I suppose my least favorite thing — which isn’t limited to working in research infrastructure, but is certainly highlighted in this environment — is knowing how much more could be accomplished with more resources at our disposal.
Based on your own experiences, what advice would you give someone starting, or thinking of starting, a career in research infrastructure?
That’s an interesting question! My advice would be to be a flexible thinker, come with an open mind, and have an expectation that you may be in learning mode for a long time. Whatever expertise you bring with you will serve you well, but you may be applying it in different ways. It’s also good to remember that you are not an island; there is a welcoming community out there full of people who may not be doing the exact same thing as you, but who surely can relate and are always willing to share their experiences.
What sort of infrastructure does arXiv provide, and who are your users?
arXiv is a secure, permanent, open archive for both published and unpublished work, which allows researchers from around the world to read and share original research quickly, before or after peer review. We consider ourselves an e-print, rather than a preprint, repository because of this mix. arXiv currently serves the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Services for researchers include article submission, compilation (from TeX), production, retrieval, search and discovery, and web distribution for human readers. We also offer API access for machines. Our primary users are researchers, but users also include scholarly communications innovators, such as our arXivLabs collaborators, and other services and repositories, for example, Astrophysics Data System and HAL.
How is arXiv sustained financially?
arXiv is sustained through a combination of funding streams. We are thankful to have the Simons Foundation as a major donor, whose generous ongoing contributions cover about a quarter of arXiv’s annual operating costs. We are particularly grateful to the Simons Foundation for their recent additional three-year gift to support our migration to the cloud and the modernization of our code base. In addition, we have a successful membership program through which universities, libraries, research institutes, and labs contribute to arXiv, as well as an affiliate program (professional societies, government agencies, and other nonprofits) and a sponsorship program for corporate giving. arXiv also receives some funding from grants and accepts donations from individuals. And, of course, through our home institution, Cornell Tech, we benefit from in-kind contributions of things like shared services (finance, HR, etc.), office space, and the like. It really does take a village to keep arXiv going.
As one of the leaders of a research infrastructure organization, what do you think are the biggest opportunities we’ve not yet realized as a community — and what’s stopping us?
Research infrastructures in many ways function as the workhorses of scholarly communications and don’t always have a profile at a level that matches their actual impact. arXiv is a bit of an outlier in that regard in that it is well-known within scholarly communications and academia, while also having some recognition among the general public. However, even a well-known research infrastructure is not necessarily a well-understood research infrastructure — and this is true within our relatively small community. I think there is an opportunity here for us all to get to know each other a little better and to gain a deeper understanding of the activities that the various organizations are engaged in, the challenges they face, and the processes underlying their work. There is already a great deal of transparency in the research infrastructure space and increasing efforts to collaborate, but somewhere between kind of knowing what an organization does and being an expert on that organization is the sweet spot of having an informed, solid grasp of an organization/service. This level of understanding can really open up the doors to meaningful partnerships and collaboration. What’s stopping us are the usual suspects — time, competing priorities, and the understandable need to focus on your own operations.
Looking at your own organization, what are you most proud of, and what keeps you awake at night?
I am incredibly proud to be part of an organization that plays an important part in moving science forward and is, for many, a beloved service that is part of their daily life. Beyond that the dedication and skills of our small but mighty team and the generosity of our many volunteer moderators and advisors who lend their considerable expertise and time to arXiv is quite inspiring. If you’ve been following arXiv over the last couple of years, you’ll have noticed that change is an operative word. Change is hard (and sometimes messy!), and it can be easy to continue on with business as usual, so I am especially grateful to the staff, volunteers, and other stakeholders for doing the difficult work and embracing the changes that are happening at arXiv. I’m also proud of the work arXiv is undertaking to make the content we host readable by screen readers and other technologies. You can learn about what we’re doing in the accessibility space at Accessibility – arXiv info.
What keeps me up at night? Thinking about keeping up! — with changes in the ecosystem, the steadily increasing flow of papers into arXiv, and expectations of our community (submitters, readers, moderators, related infrastructures). Getting on a better footing in terms of our technology migration will definitely help on all these fronts, but what is exciting about operating in a dynamic environment can also sometimes give you pause.
What impact has/does/will AI have on arXiv?
As a repository for which AI and related fields are fast growing categories, an immediate impact on arXiv stems from the large number of AI-related papers that are being submitted and require some level of attention from staff and moderators. As AI assisted and written papers become easier to create and harder to detect, we anticipate that people will use it more, potentially resulting in papers being written more quickly and leading to an overall increase in arXiv submissions. We also face side-effects from having a large number of AI tools come to arXiv to pull new content, at times overloading the service.
While we already benefit from AI implementation (for example, within our classifier), we, of course, expect that we will soon be relying on it more to help us with the issues I just described.
AI will be further implemented to improve our tools and assist with the tedious aspects of content moderation. There are also possibilities for AI to lead to drastic improvement in search and discovery—prototypes exist and it is only a matter of time before those with real possibility rise to the surface.
What changes do you think we’ll see in terms of the overall research infrastructure over the next five to ten years, and how will they impact the kinds of roles you’ll be hiring for at arXiv?
I’ve never been good at crystal ball gazing, but I do believe that in five to ten years the needs of researchers will have continued to evolve and research infrastructure organizations will be expected to keep pace. The real benefits of AI to systems and processes will be much more known then they are now, and implementation will have happened/be happening. I staunchly believe that it is extremely important to pay attention to the human side of technology operations and can imagine the need for tighter coordination across organizations as well as needing to put more resources towards coordinating stakeholder groups. For arXiv, in particular, technology will solve some of the issues we see related to expanding into new subject areas and the sustained increased pace of submissions, but I imagine staffing will need to adjust to better meet the needs of the corresponding growth in volunteer moderators, committees, and users.
Discussion
3 Thoughts on "Kitchen Essentials: An Interview with Stephanie Orphan of arXiv"
Thanks to both Roger and Stephanie for this great interview. Follow-up question for Stephanie: when did arXiv move from the Cornell Library to Cornell Tech? An announcement last October about the Simons Foundation grant says that “arXiv was founded in 1991 by… Paul Ginsparg… and is now maintained and operated by Cornell Tech,” but doesn’t mention that it became part of the Cornell Library in 2011. The Wikipedia page for arXiv mentions that it moved to the Cornell Library in 2011 but doesn’t say anything about its subsequent move to Cornell Tech.
arXiv briefly transitioned from Cornell University Library to Cornell’s Computing and Information Science (CIS) department in January 2019 before moving to Cornell Tech in 2020. This was due to the CIS dean at the time (Greg Morrisett), who is a champion of arXiv, becoming the dean of Cornell Tech.