Editor’s Note: I was recently asked to present a “news roundup” at this year’s US International Society of Managing and Technical Editors (ISMTE) meeting. Rather than just reporting the news, I took it upon myself to do more of a “state of the union” address, essentially distilling the last few years of The Scholarly Kitchen down into a half an hour talk. (Or, at least, what I’ve learned from The Scholarly Kitchen over that time). And like my fellow bloggers (here and here most recently), I like to get as much mileage as possible out of any effort, and so have converted the transcript of my talk into a blog post. I apologize in advance for the length — it was a 30-minute talk, and so is a bit longer than our usual posts, so feel free to abandon it when you start to nod off. Slides are at the end of the text if you wish to follow along.
As an industry, we are inundated by threats and opportunities, both real and imagined. There are hordes of startup companies desperate for high value online content that are trying to intermediate themselves into our processes or cut us out altogether, and a seemingly infinite number of advocates, each fighting for their own particular cause. Because we all have limited time and limited funds to put toward any one effort, we need to do an enormous amount of filtering and figure out where we need to put our attention.
Traditionally, publishers are pessimists. “The sky is falling” seems to be the theme of an awful lot of publishing meetings, and Chicken Little (Henny Penny if you’re British) is regularly the keynote speaker. There’s a very old joke about how the first book off of the Gutenberg printing press was the Bible and the second book was about the decline of the publishing industry.
Despite this pessimism, scholarly publishing continues to thrive, to grow and to experiment with new ways of communicating. Scholarly publishing has moved into the digital age more gracefully and more successfully than nearly any other media. Compare where we are with the current state of newspapers or magazines. They’re struggling for their very survival while we’re expanding and experimenting with new business models, new technologies and new ways to better fulfill our mission. More people now have more access to more research than ever before.
I’m not sure whether all the publisher pessimism is central to that success, but we do devote an enormous amount of energy toward an enormous number of ideas and questions. A colleague recently suggested that publishing was subject to the famous Chinese curse, “may you live in interesting times” (a saying which, I’m told, is not of Chinese origin and can be traced back to British diplomats of the early 20th century). We do indeed live in interesting times, with never a dull moment.
There is, however, a gap that emerges between the issues that publishers spend years arguing about at meetings or online, and the practical, real-world technologies and practices that emerge. That gap sometimes comes from losing sight of the needs and desires of the researchers who make up our authors and readers. Publishers can be so inwardly focused that they drift away from the things that the research community really cares about. We tend to listen to the loudest, and often the angriest voices from the research community, and often these tend to be edge cases that do not represent the needs or desires of the mainstream.
At many publishing meetings, there is a session called a “researcher panel” where a bunch of scientists or social scientists or humanities researchers get up on stage and publishers get to ask them what they think about publishing and all the cool new things publishers are doing.
The results are remarkably consistent. We ask the researchers, what do you think about this incredibly important issue that we’ve been agonizing over, having meetings to discuss, and arguing about on the internet for years? Their answer is almost always, “Huh, never heard of it. What is that?” Or perhaps most telling, “Why would I want to do that?”
Separating out what really matters is important. With that in mind, I want to give a quick rundown of the things that I think are likely to emerge as important and useful, the ideas to keep floating somewhere in the back of your mind.
Let’s start at a high level on the industry as a whole. We are in the midst of an era of consolidation. We continue to see mergers and acquisitions, and the biggest of publishers continue to get bigger. You’ll note that most now have names that are conglomerations of their former entities, Springer Nature, Wiley Blackwell, etc.
Scale is a driver of nearly everything in the journal publishing market. It is increasingly difficult to remain an independent or small publishing house. Big companies can do nearly everything cheaper, and often better, than small companies. A small publisher can likely afford one or two marketing experts, while a big corporation can offer a global team of hundreds of marketers.
Journals are no longer sold as individual journals to individuals, and less and less to individual institutions, but instead are sold as collections to consortia. “The Big Deal”, or buying all of a publisher’s content as a package, is the dominant economic force at play in the market. And of course, the bigger a package one can offer, the more weight it carries, hence the drive toward consolidation, and the smaller publishers or independent societies are getting squeezed out.
Along with consolidation, we’re just starting to see the beginnings of a move away from content as the thing that publishers sell, and toward the selling of services. In many ways, this has long been the case anyway.
We are a service industry — Journals perform a long list of services for researchers, meeting important needs that they have for things like communication, career advancement, and filtering out where you should spend your very limited time and attention. And doing these things, particularly doing these things well with rigor and care, costs money. Traditionally, we provide those services to authors for free and the costs are passed along to consumers of the literature through sales of the content produced.
Given that we are in an era where content (whether words, music, or images) is being devalued by the idea that everything on the internet should be magically free, publishers are starting to shift their business further and further away from relying on the sale of that content. Open access (OA) is a good example of this — the author needs services performed and they pay for them in a direct exchange, and the content is made freely available.
Elsevier is a company particularly worth watching in this space. In recent years they’ve acquired assets like Mendeley and the Social Sciences Research Network, and built their own powerful systems like Scopus. Digital Sciences, currently spun off from Springer Nature but possibly later to rejoin it after the company’s IPO, is another good example, with a publisher investing heavily in the things around the publications, rather than the publications themselves. An interesting recent thought from a colleague: the paper’s metadata may be more valuable than its content.
One sign of this loosening up of the grip on the material itself and a move toward services is the rise of preprints. I should probably put the word “rise” in quotation marks, since the use of preprints is nothing new in many fields like the social sciences and physics, but the last year or so has seen a sudden interest in preprints from the biomedical world who seem to think that they just invented the concept. For those unfamiliar, basically we’re talking about circulating an early draft of a paper before it’s submitted to a journal for formal review. Unlike the old days where you’d mail a copy of preprint around to colleagues, we’re now using digital technologies to expose them to the world.
This is breaking down the somewhat stale concept of the Ingelfinger Rule, the idea that journals will only publish material that has never appeared in public before. We’re moving away from the idea that journals are the sole point of dissemination and registration for a researcher’s work, and focusing more on the last 3 services on this list (see slide number 8, below), validation, filtration and designation, as the key services journals provide.
OA is of course another key service and a path to growth in the industry. It’s important to put that in perspective though—although we’ve seen enormous levels of growth in OA, it’s still a small portion of the market, around 4%.
In general, OA remains a low priority for most researchers. When you speak to researchers, they almost always think of themselves as authors, rather than as readers. As authors, access is not a primary concern. Here’s an annual survey from Nature, this chart showing responses for science researchers in how they choose which journal to submit their article to — top reasons are journal reputation and relevance, quality of peer review and Impact Factor.
Here’s the same survey results for humanities and social sciences authors, basically showing similar results, and a study just out from the Universities of California showed the same, with OA falling last on the list of priorities for authors choosing a journal.
So if OA isn’t a priority for most researchers, what’s driving the expansion of OA?
There are two real driving forces in play here, both economic. Library budgets are flat, if not declining. If you start a new journal, in order to sell it to a library, they have to drop a journal that they currently buy. That used to be fairly easy, given that there were lots of weak journals that could be dropped. But now libraries have really pruned all the low hanging fruit from their budgets, and many journals are tied up in “Big Deal” collections and can’t be dropped. So it’s really hard to get a new subscription journal into the market.
OA lets us serve the needs of the research community without further burdening the libraries. What it has done is open up new sources of support that reach beyond the library. Commercial publishers in particular, driven by Wall Street’s demands, have to continually increase their earnings. In a flat economy, they can do this by gobbling up more and more of the current market, hence all the consolidation. OA fits in here as route 2 (see slide 12, below) because it’s seen as a new revenue stream, additive to the current market, basically money from funding agencies and researchers, rather than libraries. And route 3 is the new services built around the content.
The other major driver for OA is demand from funding agencies. Funding agencies want to get the most bang for their buck, and they see OA as a great tool for better dissemination of the discoveries they’re funding. Many funders are instituting new policies regarding open or public access to papers describing research they’ve funded, and those with existing policies are now getting strict about enforcing them, withholding funds from researchers who are not in compliance. And where the funding goes, the researchers are certain to follow.
This map (see slide 13, below) shows a sampling of policies, and ROARMAP currently tracks over 770 funder and institutional access policies. Hopefully by the color-coding, Green for Green OA policies, Orange for Gold OA, you can see something of a consensus that has formed.
The strongest proponents of Gold OA, other than some small but wealthy private funding agencies, are in the UK and the EU. Given the recent Brexit vote and the likely economic upheaval we’ll be seeing for the near term, it’s unclear how much follow-through there will be on these policies, particularly because they’re proving to bring in significant additional expense, both in terms of having to pay to provide OA to their own authors while still having to pay for subscriptions to read papers from the rest of the world, and also because the policies have resulted in surprisingly high administrative costs for monitoring and compliance.
The big stumbling point for all of these policies is compliance. It costs a lot to monitor and enforce researcher behavior, no matter the policy. Institutions and funders have, for years, been trying to build repositories of their researchers’ work, and generally, researchers can’t be bothered to deposit the requested articles. The University of California system recently noted that the OA policy that years ago their own faculty voted to implement still only sees compliance from around 25% of those same faculty members.
PMC, the NIH’s repository, has a much higher compliance rate, but this is significantly driven by publishers depositing on behalf of authors. This brings us back around to the notion of selling services, rather than content. Elsevier is in the midst of a pilot program with the University of Florida, where basically the library is outsourcing their repository to ScienceDirect. Authors and librarians don’t have to do anything, and papers are added to the repository automatically. This is an experiment worth watching, as it could open up a new area of author services for all journals and publishers.
But how will those services work? You have 770-plus policies, and each paper has multiple authors, multiple funding sources and they often come from multiple institutions in multiple countries. It’s simply too much to track by hand. What we need to do is turn to alternative ways to process it and the way we’re doing this is through Persistent Identifers, or PIDs.
You’re probably familiar with DOIs, digital object identifiers we use to permanently keep track of a published paper. ORCID can be used to tag the paper to identify the researchers behind it and what used to be called FundRef, now the Crossref Open Funder Registry tags the paper with its funding sources. That gives us the paper, who wrote it and who paid for the work. The next missing piece is an institutional identifier, how do we tag a paper and a researcher with the place where the work was done? Was the research done at the University of York in England or York University in Toronto or York College in Pennsylvania? Right now there are more than 20 competing standards for this, so that needs to be winnowed down to one. We’re also seeing pilots for identifiers for the version of the paper (preprint, author’s manuscript or version of record) and the reuse rights available for the paper.
PIDs let us do an enormous amount of things that we never could in the past, or at least it makes things a lot easier. We can follow the flow of research better than ever before and create automated systems to handle the complex things we can no longer do by hand. CHORUS, the system built to provide public access to US federally funded research, is an example here, where compliance is handled automatically, without direct intervention by the author, funder or publisher.
If you’re not already integrating and working with these PIDs, I urge you to get moving on them as soon as is possible. You will need them, and they will make your life much easier.
The policies around research papers are pretty simple and straightforward compared to those around research data, and the potential payoff from data availability is probably much higher. It’s an incredibly complex undertaking with issues involving intellectual property, informed consent and patient confidentiality. Many researchers are extremely hesitant if not downright hostile toward releasing their data. Last week the New England Journal of Medicine featured a letter from 300 medical researchers asking for a slowing of these requirements.
In short, many funding agencies have policies that require, or at least encourage the release of data from funded studies. Data availability offers a really interesting opportunity for publishers. There are a lot of interesting ways that publishers could serve the data needs of the research community which opens up a slate of new business opportunities.
One of the big benefits of open data, beyond its reuse for new experiments, is adding transparency, and hopefully better reproducibility to the literature. Right now most papers are very light on detail, materials and methods sections are tiny, if they exist at all, and usually, “typical results are shown”.
This is helpful because we’re increasingly seeing questions about how reliable the scholarly literature really is. Estimates range from “very reliable” to “not at all”. Many of these claims are extremely sensationalistic, based on little evidence, or things like economists trying to understand cancer cell biology, so the true reliability of the literature is really unclear. You may have seen a recent study from the Reproducibility Project that only 39% of psychology studies they tested were found to be reproducible, which was followed by a study from a group at Harvard showing that those reproducibility efforts were themselves, not reproducible and that most of the original studies in question were actually valid. Much of the confusion here stems from how we define “reproducible”. Do we mean that the conclusions of the study hold up when tested in different ways? Do we mean that I will reach the same conclusion if I test the same question? Do we mean that I can exactly replicate their results if I follow their exact experimental protocol?
More to the point, what can journals do to drive better reproducibility? Do we need to change our standards for statistical significance? Should we be doing more statistical reviewing of manuscripts? Are we part of the problem because we don’t publish detailed experimental protocols? Can we improve reproducibility by helping to make the data behind studies available?
Better data availability will help drive reproducibility, and hopefully reduce the next subject on our minds, Misconduct. Why are we seeing such a rise in retractions across the literature? Are more and more people either being sloppy or cheating? Or is there just better scrutiny these days, and better technology for detection of misconduct?
Certainly digital publishing offers a host of new scams including citation rings among journals looking to boost their impact factors, the use of fake peer reviewers (or fake email addresses for real peer reviewers), or even, as we see with predatory publishers, not bothering with peer review at all. Though we’re 20 years into our digital transition, we’re still in many ways in the Wild West, with much to still be sorted out.
That digital environment brings many benefits which we’re still discovering, but also a dark side of many problems which also continue to arise. The move away from the physical object of the article to an instantly copied and distributed bit of digital code has many repercussions.
On the plus side, it massively improves the spread of information, and that’s why we do what we do in the first place. Article sharing, if done in a responsible and fair manner is becoming codified with best practices emerging. Where journals used to just turn a blind eye toward authors emailing someone a PDF of an article, now they are explicitly stating that private sharing, among colleagues or research groups, is perfectly fine with journal policies. Again, a shift away from rigid control of content.
There’s been a real effort to delineate the different versions of an article that exist, the Author’s Original Version, basically the preprint version before it was submitted to the journal, the Accepted Manuscript version, the article after it’s been peer reviewed and at the point where it was accepted by the journal, and the Version of Record, the final, edited and typeset published version. What you can legally do with each version, and when you can do it, is in the course of being defined.
Before we travel to the dark side, we need to pass through a grey area, that of scholarly collaboration networks, like ResearchGate and Academia.edu. It is important to recognize what these are: privately-owned, for-profit, venture-capital backed business ventures. Don’t be fooled by that .edu name, the domain was actually purchased from someone who held it before they started restricting .edu domains to actual educational institutions.
These are commercial organizations, with business models that range from selling ads to spying on researchers and selling data on what they’re reading and talking about to anyone willing to pay for it. They are both backed by tens of millions of investment dollars, but neither has shown any sort of viable business model as of yet. It’s hard to get by on ad dollars unless you’re the size of Google or Facebook, and there aren’t that many researchers on earth. It’s also unclear if anyone wants or is willing to pay for all that data.
Further, it’s not really clear that a significant proportion of the research community is using either site for any sort of social interaction, other than what has become the mainstay of activity for both sites, the quasi-legal downloading of research papers. There continue to be allegations that these networks are greatly infringing copyright as a means of driving traffic and activity.
The sites remind me of YouTube in its early days, which was filled with illegal content, and faced years of costly lawsuits from content companies until eventually realizing that it was best for all to work together in their mutual interest, to actively filter uploads for copyright violations, to sign licensing agreements and to share revenue with copyright holders. The questions here are whether these sites have the staying power of YouTube, and whether we can reach that point of mutually beneficial cooperativity without going through all the costly litigation.
Now we come to the dark side and flagrant piracy, which seems an unavoidable and unfortunate part of the digital environment. You’re all probably aware of Sci-Hub, the website accused of both copyright infringement and criminal hacking and stealing of passwords. Scholarly publishing has come somewhat late to the world of organized theft in this manner, but given that every other media has had to deal with it, it’s perhaps not surprising.
In many ways, piracy is just part of the digital landscape, a cost of doing business in a digital environment. But there is much that can be done to minimize the damage. Legal cases against Sci-Hub continue, which make it harder to find online hosts for the website, and make the life of the people behind it increasingly difficult, and while legal actions may make the site harder to find, it is unlikely that they will make it go away altogether.
One valuable lesson from Sci-Hub is an industry-wide recognition that our security and authentication systems are no longer fit for purpose. Journal access is based on IP range, which is a technology that has long been abandoned by nearly every other company doing commerce on the internet. It is time for journals to catch up with Facebook, Google and the rest of the world and move to more secure systems like multifactor authentication. This will make it harder for pirates to gain illegal access to journals, and make it easier to track and shut down any security breaches.
Just as important are approaches that make the use of pirated materials less attractive. A few studies have shown that a good deal of traffic to Sci-Hub comes from researchers who have legal access to the journals, but see Sci-Hub as an easier way to get to the papers. The user experience at many universities is miserable, particularly for trying to gain access when one is away from campus or using a mobile device like a phone or a tablet. We need to do a better job of easing the work needed to get into our journals, to get that pathway down to the point where legal access is easier than piracy.
These are both big undertakings, particularly because so many of the systems in use are not under the control of publishers. Librarians and university IT directors need to be brought on board and need to upgrade their systems as well. They need to be clear on the dangers they currently face, both the legal liability for violating the contracts they’ve signed with publishers, and more importantly the huge security holes currently existing in their systems. Once a criminal gains access to a university’s systems to access journals, they very often have access to many other things including financial records and payment systems, medical records, student records and grading systems.
This is an enormous threat to universities and more and more are gaining recognition of the problem and starting on the path to better technologies. Expect to hear much more about this shift over the next few years.
I could go on for a few more hours but want to stop here because I think that last point, turning a piracy threat into an industry-wide effort to improve the quality of services offered, does a nice job of summing up our current lives as publishers. We’re venturing into unknown territories which come with both dangers and opportunities. You can go the Chicken Little route and see everything in terms of doom and gloom, or you can rise to the challenge of navigating uncharted waters and turn threats into new services and assets.
These are, without a doubt, “interesting times.”