At the recent RAVE technologies annual publishing conference (#pubstech), I sat down with Digirati’s Paul Mollahan for an on-stage conversation about open source, its advantages, its challenges, and the potential for the open source movement to help change the way that academic and scholarly publishers use technology. We began by focusing on the importance of shared standards and workflows — we ended with a lively debate about sustainability and the nature of community.
What has open source got to do with Scholarly Publishing?
In 2011, Marc Andreessen said that software is eating the world, predicting that technology companies would continue to significantly disrupt an increasingly broad range of industries. Since then, publishers have embraced technology. Specifically, the internet — an infrastructure and platform set dominated by open source software.
Outside of technology communities, few realize the ubiquity of open source solutions. It spreads risk and reduces overall costs in everything from desktop computers and mobile phones to air traffic control systems. Meanwhile business models have matured, which has raised investor confidence. As an industry that has successfully adapted to Andreessen’s software-eaten world, we certainly need to pay attention to the importance of open source.
Over the past few months, I’ve been working alongside a colleague, Fiona Murphy, on a project for Knowledge Exchange — a collaboration of six national organizations across Europe who work together to support the development of digital infrastructure to enable open scholarship. The project is called the openness profile, which is intended to document an individual or groups’s contributions to open scholarship. To provide information to help shape the initiative, we have done a series of interviews with research contributors that practice openness. We have developed a sense of how they currently share everything from experimental protocols and computer code, to research data and articles. We also explored the barriers they face and what support they might need in order to do more. Although not the primary focus of our research, Fiona and I noticed that the concept of open source was a frequent topic of conversation.
There are increasingly noticeable connections between open source and open research. Both open research and open source are promoted as mechanisms to improve quality by creating faster and more robust feedback mechanisms, they’re both intended to reduce waste and unnecessarily duplicated effort (validation is not duplication of effort, they’re different things), and they both draw / are dependent on communities to be both valuable and sustainable.
As John Maxwell noted in his recent eBook on open source publishing solutions, which was reviewed here by Roger Schonfeld, open source isn’t just about making your code publicly available by sticking it up on Github, any more than posting something on WordPress constitutes publishing it. Both open research and open source are predicated on healthy, interactive communities. Open source software communities contribute to code, provide testing and contribute to discussions. Open researchers contribute to and reuse shared data resources like GenBank or the NERC datacenters, and use services like arXiv to share work in progress, often asking community members for feedback directly.
As we interviewed scholars, data stewards, librarians, administrators, and policy makers, we found that many saw open source as a mechanism to secure transparency and community governance around the tools that they use. In some cases, interviewees saw the use of open source tools and software to be inherent to an eventually fully open infrastructure for research.
The desire for community governance has parallels with autonomy concerns felt by many smaller publishers. As Kent Anderson noted in this piece in 2011, platform providers offer the promise of easy and cost-effective access to the latest technology, but once somebody else has control of how you access your market, you become critically dependent on another organization who have their own business needs and interests. Since 2011, we’ve seen the landscape evolve significantly to include supercontinents of vertical integration, a movement upstream in the research cycle and the potential for a loss of economic control for smaller publishers, as large ones offer a bundled deals of workflow services, platform, marketing, sales and integration into a big deal.
Sheep, common land…. etc., etc.
In the same way that open research advocates find themselves fighting against powerful network effects of embedded incentive structures in academia, technology infrastructures have their own network effects, as I personally discovered when I moved from physics to biology many years ago and tried to share a manuscript with colleagues that I’d written in LaTeX, only to be repeatedly asked if I had a Word version.
The reliance on community is both a strength, or even goal, of openness and a challenge; communities don’t just build themselves. They need to have a reason to exist and they need to be driven. As Ian Mulvaney of SAGE pointed out during the conference discussions on the merits of open source, there is an emergent sustainability issue. Mulvaney cited the cURL project, which is essentially supported by a single developer. With the best will in the world, relying on a piece of software with a bus factor of 1, is a risk for any organization. Open source projects require a critical mass of contributors to be sustainable. That can be tough because for any given project, the number of users outstrips the number of contributors by about two orders of magnitude. That raises the question of who, of all the people and organizations who benefit, is prepared to spend resources on maintaining any given project.
The most talked about open source project in the publishing space most recently has been Coko. At the end of last year, I wrote a post about the #AllThingsCoko meeting and wondered whether Coko would be the community project that offered publishers a third choice between buying into a publishing platform or building their own.
After publishing that post, I had the pleasure of seeing demos of three projects built using the Coko framework of components. Hindawi’s Phenom, UCP/CDL’s Editoria and eLife’s Libero. On the one hand, these projects have all taken a lot of work to get to their current level of maturity, and that might give some technology leaders pause before considering a similar strategy. On the other hand, what I saw were three well-made platforms that represent at the very least a proof of principle. Arguably, these early adopters have been the ones getting the grass into shape on the proverbial common land.
Now, it just needs a community. Oh, is that all?
The next challenge will be to build that much needed community, if the project is to expand beyond a small group of dedicated pioneers. Between eLife, Hindawi, University California Press and California Digital Library, there’s a lot of coverage in terms of influence in potential community groups. Also, looking at the community page on the Coko website, there are some interesting members.
Funders, open access publishers, university presses, and library publishing service providers seem like potential growth areas, if there are enough resources available to drive development and support its maintenance. Learned societies would be a valuable asset to the community. As of yet, nobody has connected the platform to a subscriber database and there aren’t yet any tools to help publishers with legacy data challenges. Some of those challenges may be addressed by service providers, but that market may still need to mature. With all the caveats mentioned, there is an interesting opportunity for coopetition that I think publishers would be well served to consider.
I would like to see open source continue to grow as part of our industry’s technology strategy. Part of my reasoning is admittedly philosophical. A wise developer by the name of Melvin Conway once said:
organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.
As our industry tries to adapt to the changing needs of academia, we’re going to need to support more collaborative and open workflows. In addition, we will likely face increasing calls to be more open ourselves as academics begin to see open source as a necessary part of transparency and perhaps even a sine qua non of open research. Changing the way we work will not be easy. It will mean forming new types of business relationships with different types of service providers, who in turn will have to actively nurture development communities to ensure that standards and good practice are met. As Paul Mollahan said:
…this is a key challenge for these emerging communities; without the right level of time and effort dedicated to these activities, divergence can quickly occur reducing the potential benefits for the wider communities. Ensuring this effort and cost is recognised and managed is often a barrier to long term success
In the end, however, there should be economic benefits. By standardizing further around use cases, using open standards, and common workflows, there’s an opportunity to share risk and bring significant efficiencies.
If we’re to figure out how to support researchers in being more open and collaborative, we need to better understand what it means to be open. By doing so, we can hopefully follow the example of other technology driven industries and collaborate in a way that shares risk, reduces costs and improves sustainability.