Open access (OA) scholarly publishing is a contentious subject. The loudest, most-readily heard voices are often those from the extreme ends of the spectrum, those lost in unrealistic idealism or those mired in the mundane details of running a business. Pairing these extremes leads to interesting, but ultimately unproductive conversations.
Instead, we might turn our attention away from the shouting, and focus on those somewhere in the middle, a group more interested in finding real-world, pragmatic solutions to translate idealism into functional publishing models. Cameron Neylon has, for several years now, been one of the most thoughtful and thorough proponents of OA and opening science in general.
Trained as a biophysicist, Neylon has been at the forefront of thought in understanding the way new means of communication interface with scientific research. He’s recently taken a new position as PLoS’ Director of Advocacy, and will concentrate on both immediate improvements and on long-term strategic development.
I don’t always agree with Cameron, but have great respect for his approach, his willingness to listen, and his ability to drive high-minded goals in realistic and achievable terms. At the height of the Research Works Act (RWA) furor, I asked Cameron if he’d be willing to do an interview with the Scholarly Kitchen, as I felt it would benefit our readers to see a different side of the OA movement than is often portrayed here.
Due to our busy schedules (including the announcement of Cameron’s new position), it’s taken a few months to get this together, and we both apologize for the delay.
Q: In a blog posting about the RWA, you predict a near future where all major funders will require Creative Commons licensing and full OA for any publications arising from their funded research. But your vision is not of a world without publishers; instead, it’s one of new opportunities, as you see new players arising to meet the community’s needs. What are these new opportunities likely to be, and what areas and services should current publishers focus on to better serve this changing landscape?
A: The core effect of the Web is that it makes a need for publishers (in the very narrow sense of “organizations that make things public and disseminate”) simply go away. We have traditionally bundled lots of functions together in the organizations we call “publishers,” and the question we face is the choice of which of these services we want, and indeed which ones we can continue to afford. At the same time, there are a whole new set of user needs that arise from the sheer scale that arises when publishing (again in the very narrow sense) becomes so cheap.
I’ve argued often that the traditional mode of filtering, pre-publication, by blocking the appearance of some works in specific channels, doesn’t really add any value, or at least doesn’t provide a good return on investment. But we clearly can’t abandon filtering. It is at the core of coping with the information abundance. So the core service that those organizations that used to be publishers can provide, and which has real value, is discovery — the services that validate, collect, index, summarize but above all bring the right content to me at the right time. The idea of overlay journals might be a good stepping stone here, journals themselves don’t need to go away, collections of works are useful, but I think its a stepping stone towards providing the infrastructure that will allow anyone to collect works together and act as an editor or curator.
The other big area where there are massive opportunities is to sort out the back end. Getting our current literature properly organized and properly searchable would be a big step, as would be thinking about collecting and indexing other kinds of research outputs. And in an author-pays world, these are valuable services that I think authors will be willing to pay for. In a sense it is academic SEO (search engine optimization), but if the interest of the researcher is in ensuring that their work gets the widest possible play and if enabling that is a value offering from publishers, then my sense is that it helps align everyone’s interests more effectively.
Q: In that same article, you suggest that, “several major publishers will not survive this transition.” If anything, OA seems to favor economies of scale in lowering per-article costs (e.g., the success of PLoS ONE as compared with PLoS Biology). This would suggest that an OA mandate would lead to market consolidation around the major publishing houses that can offer that level of scale. Can you elaborate on why you think major publishers would instead fail and where does that future leave the smaller, independent publishing houses and the self-publishing societies?
A: There are two big issues here — one is economies of scale for those publishers who make an orderly move (or start off) in open access article processing charge (APC)-based models. There is actually a really interesting question here because scale, as it spreads across disciplines, also creates some problems. It’s much easier to automate the handling of a conventional peer review process if it has a narrow scope. The best case of this is the IUCr Journal Acta Cryst E, which runs a surplus on an APC of around $150 an article. It does this because it accepts only one form of article, descriptions of crystal structures. The authors conform to a form-based authoring process, and a lot of the technical validation is done by computer, which makes the human peer review process much easier and cheaper to manage.
Because PLoS ONE and other wide scope journals are covering wide areas of research, they necessarily need processes that can cover many data types and different disciplinary approaches, and these are still human, and therefore relatively expensive, processes. It is also fairly difficult at scale to place as much reliance as you might like on community peer pressure to contribute — and this may be a real advantage that society and independent journals have — a close knit community can effectively run shoestring budget journals such as the Journal of Machine Learning Research. With the improvement of open-source publishing platforms getting to the point where they are both very powerful and quite useable (probably not quite useable enough yet — but getting very close), I think we will see a resurgence of community- and society-based small journals and publishing houses which I think will offer a very interesting competition to the mega-journals. I would very much like to see the conversation around OA for societies shift from it being a threat to it being an opportunity to focus on peer review as their core community service.
But the issue of scale also brings institutional inertia. And this is what I was really referring to in the article. My sense is that organizations like Elsevier, Wiley, and the American Chemical Society have so much institutional inertia built in that it is very difficult for them to even think about the kinds of change required to adapt to this new world. There has been next to no innovation around business models from these players; all of that has come from the new players, largely BMC, PLoS, and Hindawi.
Q: Are there areas of compromise that could help speed the acceptance process for OA mandates? For example, many publishers would be much more supportive if they were allowed to serve the free versions of papers, rather than losing that traffic to PubMed Central (PMC). Is that a reasonable request? Are there ways we can continue to experiment to find the appropriate lengths for embargoes?
A: I disagree with embargoes on principle — really, they are a compromise within a compromise en route to sorting out the issues of how we pay for the set of services we need for good research communication. What we really need is a grown up conversation about how we can set up a market for those services that works and enables a transition for everyone involved. The question of who hosts “the” free version is peculiar to me — if publisher think they can do a better job of that then they should, and demonstrate that by successfully competing with other sites that host that work. But funders and others have good reasons for not trusting subscription based publishers to do this properly — every time NPG does a system update, the access systems manage to forget the custom settings that make genome papers freely accessible. No one is being evil here, but the defaults are all set to limit access — doing anything else is non-standard. Show us you can do this properly and well, then there’s a discussion to be had. But at the same time, funders are always going to want to keep a copy themselves, it’s just good practice.
What we need to figure out is the stepping-stones that will get us from where we are to where we want to be. There are, I guess, three possible routes here. The first is the one that we seem to be stuck in. Funders propose another step up in policy, subscription publishers get upset, funders row back slightly, and then implement, publishers grumble but accede. The second route would be a real collaborative exercise. The Finch report in the UK is an effort in this direction although I have concerns about how successful that will be. The third would be for some traditional publishers to really step up to the plate and offer something exciting, perhaps entirely different, but in any case a real step forward going beyond the current round of debate. This just really hasn’t happened.
Fundamentally there just isn’t the level of trust either between subscription publishers and funders or between those publishers and the OA movement. And without that trust it is difficult to see how these kinds of conversations can happen effectively. From my perspective the bottom line is that it is much more effective to lobby funders to move that ratchet step by step.
Q: In a recent blog posting, you discuss the tremendous advantages offered by networked systems. It’s easy to see in terms of the examples used, which feature abstract subject matter (mathematics) and easily digitized datasets (images of the sky). How do those advantages translate to research where there is a need for physical work which can’t be as readily distributed; clinical research that requires seeing patients, or wet-bench laboratory work that involves hands-on testing of cells or tissues?
A: Yes, this is exactly the challenge, but I think its promising that we see good examples exactly where we’d expect to see early successes. That means we’ve got a reasonable understanding of what is going on. So the key points I make in that piece are that these advantages arise in systems where you have well connected networks with very low friction to the transfer of resources. For information, the Internet is incredible in both the scale and lack of friction for the transfer of digital information resources. So the question in transferring this into the physical lab world is; can we build that scale, and can we make transfer more frictionless?
In terms of the scale we can, because the major issue at scale is discovery — and we can do that with metadata and information resources. So I can discover that someone somewhere has exactly the plasmid, protein, sample, mouse, cell line that I need as long as that information is available somewhere. This is clearly technical possible (one of the arguments behind open notebooks) but equally culturally difficult as it gives away information on what people are currently working on. But if we could create the right incentives then the discovery part is relatively easy.
Currently the actual transfer of the material has a lot of friction. Institutional obsession with non-standard material transfer agreements is one example. We could reduce this by setting up standardized agreements. Physical storage can be a problem as can transport but in many cases we often have centralized infrastructure for storage and delivery (e.g. for cell lines, animal strains, plasmids etc.). This infrastructure is often very poorly funded of course. But the bottom line is that we could do a better job of this if we chose to address it. And we don’t need to reduce the friction to zero, we can get significant benefits from each reduction we can achieve.
There have been some interesting efforts in trying to provide private infrastructure to support public research in this space, letting researchers register their materials and capabilities for sale. This hasn’t taken off yet but I’ve recently been working with a completely outsourced biotech company–one where there is one person and a laptop at the center distributing a set of testing and development tasks for the building of a prototype medical device. I’m also seeing interesting signs of collaborations that are spontaneously developing online between grad students and postdocs, discovery that someone somewhere has the instrument, techniques, or skills needed to solve their problem. We have to expect this to be slower than in the digital information space because there are more significant local costs but that doesn’t mean that there isn’t value to be obtained. And the smart people will figure out how to turn that to their advantage.
Q: In that article, you suggest the research paper is part of a continuum of data sharing among collaborators. Is that the true purpose of a research paper? During my research career, I read many papers in subject areas where I was never going to work. I didn’t care about work in progress; what I wanted was an efficient summary of work that was completed, an understanding of what was already known. If we’re to build a new system that serves the communication needs of researchers, can that one system satisfy both the needs of information sharing for works in progress among collaborators and the needs of historically documenting completed research? Is it better to separate the two and create systems optimized for each? Or is this an artificial separation?
A: I think I feel that’s an artificial separation, in part because I have a strong sense that science is never finished. Given recent reports that some horrendous proportion of widely cited published biomedical experiments couldn’t be reproduced inside big pharmaceutical companies, I find it difficult to think of a single paper as anything but an artificial construct that gives the impression of being a whole story — but is actually only a piece of it. Thinking about it, this colors a lot of my thinking, the distinction between that artificially closed narrative that we need to create for a rhetorical purpose and the “real situation” which is always much more fluid and incomplete.
That said we clearly need summarization and integration at lots of different levels. In a weekly lab meeting the PI doesn’t want all the details, but wants more than will end up in a traditional paper. If I am looking at an area for the first time, I probably don’t want the most recent paper, I may not even want the most recent review. In practice I will usually start with Wikipedia, head for a few websites from there, and then dig into reviews. We have layers upon layers, with different levels of confidence, and completeness. But my problem is that currently all of these layers are constrained into one kind of structure, which carries with it a certain set of assumptions about indexing and discovery, but which are often orthogonal to the actual problem I have at hand.
I much more frequently need to know, “Has someone tried this? How did it go?”, than what their scientific story is – or even what the question was they were trying to answer. Day to day there are lots of different information needs and many of them could be served in principle by access to an underlying layer of more immediate but incomplete information. And a lot of this ends up unpublished so we never reap any benefit from it.
If you follow the logic of the network opportunities, then the system will be at its global best when everyone shares everything instantly but there are good mechanisms to support discovery and trust (being unable to find the thing you need when it does exist is another form of friction). This raises two immediate problems, the first is the social one of people not being willing to share that much –we’re not going to solve human nature, but I think we will see a continuum where there is much more sharing than occurs at the moment. The second one is that this sharing requires effort and resource and that we don’t have these perfect discovery tools. So a working system will find a balance somewhere between the costs of sharing, the costs of good discovery, and the benefits that accrue.
But that doesn’t really answer your question. Fundamentally I don’t personally see a qualitative difference between sharing my notebook, authoring a paper, or writing up a Wikipedia article. They are all summaries at different levels that are likely to be of interest to different possible audiences in different ways. But the core of the networked viewpoint is to not assume that we know exactly what those audiences are, or the kind of interest there might be in any given output but to be open to idea of the unexpected user and use. If there’s no additional cost then there’s no harm in supporting them. In reality there is a cost, sometimes very small, sometimes a bit larger and we’re going to have to work through the process of deciding when and where that cost is worth this benefit that by definition we can’t quantify up front. What we can do is try to understand what the benefits are at the system level and plan our resource allocation accordingly.
Q: The Bayh-Dole Act gives researchers and institutions full ownership of the intellectual property (IP) derived from federally-funded research. Given this ownership, can a federal agency legally compel a researcher to publicly release that IP in the form of a data mandate? Should the research paper that results from the grant be considered researcher-owned IP as well?
A: I think it is interesting to explore both the technical legal reality here as well as the intention behind Bayh-Dole. The reality is that it vests IP in the institution and places obligations on the institution to optimally exploit that IP. I have actually wondered whether in fact US institutions are actually properly discharging their obligations when they allow authors to sign over copyright to publishers. I am not a lawyer and certainly not a US IP lawyer but I would imagine that the fact that the NIH mandate and existing data policies alongside the developing NSF policies haven’t really raised serious issue about incompatibility with Bayh-Dole means there isn’t really a legal issue. But I don’t pretend to understand all the subtleties.
If we go back to what Bayh-Dole was supposed to achieve I think the answer becomes much clearer. The intention was to ensure that research was appropriately exploited, with an emphasis on commercial exploitation, but my sense is that the intention was to ensure that research got results. In the context of US political philosophy the way to do this was to hand over IP because it meant that the institutions had an interest in maximal exploitation because they got the direct benefit. However there is an interesting question–most institutions have focused on the narrow question of how to optimize IP exploitation on a project-by-project basis. If we take a slightly different view, that of the global optimization of exploitation, the view may become quite different. Is it in the interest of a given institution to pursue IP protection for each and every project? Or would they do better if they gave away the majority of IP to support innovation more generally. In one narrow sense it is easy to argue that institutions have done badly. They still have tech transfer offices in most cases–if they were any good and actually making money they would have spun themselves out.
If we take another couple of steps back and look globally, then there is also good evidence that giving government data away maximizes the economic return on that data. This has been most studied with respect to geographical data but it seems to hold well for most government data and there is no particular reason for that not to extend to research data. So if we assume that governments invest in research to generate (in part) economic returns and it is the role of government to globally optimize those returns then it makes perfect sense for government to mandate open data.
So I’d like to see Bayh-Dole re-interpreted for the 21st century. I think it entirely appropriate that there is an obligation on both researcher and institution to maximize the impact of their work. If one way to do this is to vest IP in the institution, then that’s great. My suspicion is that it is slightly counterproductive in practice. But the principle that there should be a global optimization is a good one. And that should include the economic return on the copyright on researcher-authored papers.
Q: F1000 Research is a new journal that is going to attempt to create something like the continuum you’ve suggested. You served as a consultant in its formation. How does this journal fit with the way you see research being conducted and communicated in the future?
A: I think F1000 Research is a really interesting experiment that takes another step down the road towards pulling apart the different parts of the services that the organizations we call publishers offer. The biggest component of the cash cost of conventional peer reviewed journal publication is the managing of the peer review process (the biggest cost overall is the cost of that peer review but this is a non-cash cost). We also don’t really have good data on whether our conventional processes are any good–or more specifically what they are good for and they are not. At the same time the community is still very wedded to those traditional processes.
The recent history of successful innovation in scholarly communication has been dominated by projects that offer something new enough to be interesting and to offer real advantages but not so different to be unrecognizable. PLoS ONE is a great example of this; recognizably a journal, containing articles, and with what is basically a very conventional peer review process, but with a twist on the selection criteria. Alongside this a lot of attention was paid to the internals to keep costs down, but most of that isn’t immediately obvious from the outside. So while PLoS ONE is a radical shift, it was still within the bounds of what people expected.
Because of PLoS ONE, the world has now shifted a lot. That idea of “publish everything publishable” is a lot more acceptable today than it was five years ago. I see F1000 Research as another step towards what might be a better validation system. It might be a step too far for most of the community, or it might hit a sweet spot and take off. But it’s a valuable experiment. Another effort, which is in some ways similar but a bigger jump, is Figshare, developed by Mark Hahnel. I thought at the time that Figshare was unlikely to get major traction because it did seem like too much of a jump. Kind of preprint or data repository, or micropublications, but it felt to me like it was too different to take off. I think I was probably wrong about that and that suggests to me that the community might be ready for more radical experimentation than we’ve seen up until now.
Q: F1000 Research is a privately owned for-profit venture. Much of the resentment voiced against commercial publishers seems to stem from the profits they generate. Is offering profits for providing services like this an acceptable means of driving progress? Is there a recognizable line that can be drawn between reasonable rewards and exploitation?
A: I think this is a misunderstanding. While there are parts of the OA community who have a strong anti-commercial ethos, the majority of us are fairly economically liberal (in the European sense!) and perfectly happy with people making money. Good value products need to be sustainable and that means they need to generate a reasonable return. We also see good returns as a strong signal that something is sustainable — which is also very important. I also think there are massive neglected business opportunities in the scholarly communications space.
The objection to level of profits that Elsevier in particular are making is subtler. Firstly there is the scale of those profits, regularly and consistently over 35%. In most other markets this would be a signal of market failure, and a recent JISC report has said that there is clear evidence of market failure in this sector. Making 40% in one year is the sign of a company ahead of the curve, but in a functioning market returns usually hover around 5-15% when averaged over time. So there is a real objection to commercial concerns taking large sums of money as a result of effective monopolies and market failure. The second issue is that this money is taken out of the research system — if it were being re-invested in the infrastructure or in improved services then there would be a discussion to be had but it is just being siphoned out to shareholders.
Finally there is a lack of understanding on both sides of what costs really are and what the price could be. The opacity created by some publishers on what their costs and returns are doesn’t help this. It is clear that shoestring journals can run very cheaply and that pre-print repositories are also very cheap. It’s also clear, at the moment at least, that much of the customer base wants more than these basic services but we don’t really have a functioning market. Nor do we have particularly well informed customers making rational choices.
So overall I’m very comfortable with the profit motive being a driver as long as we have a functional market. And that to me means that authors need to be making choices about the services they purchase and how much they are prepared to pay for them, and what they expect to get in return. Others disagree with me, but I think this is the best way to drive real technical innovation where we need it at least in the medium term. But there is one other important point. The vast majority of the resource we are talking about is government money of one sort or another. Subscription publishers often get in a lather about government encroaching on their business model but there never was a business model, just a government subsidy, whether it’s paid through research grants and article processing charges or through libraries. If private companies want to play there and can provide good value then they should do that, but equally they should never forget that this is a market created by government largesse and one that is therefore ultimately subservient to the policy direction of government and the funders.