Today, journal content distribution is largely a publisher activity. Researchers are able to access journal articles through each of hundreds of publisher-specific content distribution websites. But, as the extensive use of Sci-Hub, repository versions, and other workarounds by entitled users makes clear, publishers are losing online traffic on their own platforms. What does this mean for the future of the publisher site and the hosted platform companies?
In their existing platform environment, publishers have struggled to manage the intersections of content discovery and access, publisher and publication brand and context, and business issues. All three of these aspects are vital individually. But publishers seem to have become accustomed to thinking that their content distribution strategy needs to address all three of these key elements in the same way. In recent years, some publishers and workflow providers have attempted to move towards a model that separates content distribution from the business issues of pricing and sales. In light of this, I want to explore the individual publisher content distribution strategy — and question the long-term role of publisher-specific content distribution platforms.
My piece today is focused on the issues facing journal publishers, particularly in the sciences. For a number of reasons, including the slower format transition to digital, content distribution models for books — especially for humanities and social sciences monographs — have developed differently. I will consider those separately in the future.
Three Content Distribution Models
We think of publishers as having platforms, but in fact many types of discrete systems are utilized by publishers. These vary from manuscript submission and management systems, which are increasingly important to publishers of all kinds, to user access tools. In this piece, I am focusing on the access platforms through which content is made available to readers, whether on a licensed or open basis. As we look across the industry, we see three fundamental models for these types of platforms have emerged in different sectors or to serve different needs, recognizing of course that there are also publishers that are using more than one of these models for different categories of their content.
Some smaller publishers, for example a variety of scholarly societies, publish through a larger publisher. Wiley, Elsevier, Springer Nature, Oxford, and Cambridge are among those that have major businesses in publishing on behalf of smaller publishers. In these cases, pricing, marketing, and sales are generally handled by the publishing service provider. The smaller publisher essentially outsources much of the business aspects of publishing, retaining the editorial component, generating revenue from the deal.
Seeing the opportunity to help smaller scientific publishers maintain their independence while achieving scale, early leaders created alternatives, including Ingenta and Highwire Press. Thus began the era of each publisher having a branded site on common infrastructure. Today, many publishers contract with vendors, a group that has also come to include Wiley’s Atypon and the privately held SilverChair, to provide the technology for their distribution websites. Libero is one example of an open source effort to provide similar types of content hosting and distribution, albeit with a smaller current installed base. These common infrastructures allow the costs of developing these distribution platforms to be spread widely. Regardless of whether these platforms are cloud-based or locally hosted, they enable the publisher to retain control of its business — pricing, branding, and sales — as well as editorial control.
In the third route, we have seen a handful of the largest publishers build their own platforms. Elsevier and Springer Nature are examples of the class of publishers that develop platform infrastructure for many categories of their publications in house, and their platforms are intended to serve as a value differentiator. In Elsevier’s case, this is clearly connected with its intentions to integrate across a variety of platforms it owns — its journal distribution platform (ScienceDirect), with article preprint platforms (SSRN and bepress Digital Commons), with manuscript submission and management (Aries), with research information management (Pure), etc., as Lisa Janicke Hinchliffe and I have each detailed in a series of posts here at The Kitchen.
Not all of the largest publishers that have sought to build their own platform infrastructure have found that the resources required are justified by the value added. After efforts to build and maintain its own proprietary platform, Wiley eventually purchased Atypon, allowing it greater control than if it were just one of the many customers of this shared platform.
It is the smaller publishers that could not possibly each develop its own platform technology whose dilemma I wish to consider in contrast with the largest publishers, which have fairly clear strategic rationales for maintaining their own platforms. These smaller publishers face the choice in Model 1 to outsource their business and thereby integrate with their business partner’s hosting platform or in Model to 2 to adopt a shared technology platform while retaining branding, pricing, marketing, sales (or in some cases, using a combination of the two). In neither case are they positioned to integrate with the elements of a broader reader workflow process (beyond reliance on third party discovery services for traffic flow), let alone a larger researcher workflow process.
The current situation has a number of fairly serious drawbacks for both readers and publishers alike. Each publisher-specific platform represents a meaningful impediment to seamless access to content and use of it in an integrated fashion. The near term drawback is in legitimate reader access to scholarly content. The broader drawback of publisher-specific access silos is the inability to work with and across content without substantial impediments. In a piracy laden environment, one that is transitioning fitfully towards open, these are not only user impediments but also as a result business impediments.
Previously, I have examined the challenges in legitimate reader access to scholarly content, framing it as a problem in user experience. It is one of the issues leading users to gray market and pirate sites, and it is absolutely a key factor driving users, even those who have otherwise legitimate access to materials, to Sci-Hub.
Open access advocates are quick to pin blame for these challenges on authentication mechanisms. And, in a way, the STM community seems to have agreed that this is a key problem by fostering the RA21 initiative that is attempting to improve authentication paths to licensed resources.
But the challenges that are driving readers to Sci-Hub are not just about authentication. Even if authentication were successfully addressed, or no authentication were required at all, users would still be frustrated by publisher-specific platform silos. Today, they simply try to grab PDFs and pull them into a local context, because they cannot even imagine the kind of system and interface that would allow them actually to work with and across individual articles seamlessly. To whatever extent there is a longer term transition towards working across content silos, whether through analysis of underlying datasets, machine analysis of articles, or a variety of other use cases, the drawbacks of having content siloed on an unnecessary number of websites will go up in significance. As one publishing technology executive explained to me a few months ago, RA21 without addressing these more fundamental challenges would be “a disembodied spirit” — lacking the key ingredients to solving the full user experience challenges as I explain below and not just the more limited authentication issues.
Cross-platform issues have been addressed to some degree, but largely in ways that chip away at the value of publisher brand. The “discovery service to access platform” workflow has been optimized to enable “grab and go” content seeking behaviors, whether from Google Scholar, Web of Science, Summon, Meta, Scopus, EDS, and many others that compete to serve the discovery starting point role. Many of these services are moving beyond search as their primary use case, to include various kinds of increasingly advanced alerts that are steadily chipping away at the importance of the journal-specific table of contents alerts for current awareness. CrossRef’s efforts have enabled cross-platform content access. All these dramatic improvements in user experience each in its way addresses the shortcomings of publisher-specific content hosting while commensurately diluting the potential brand value to the individual publisher of maintaining its own hosting.
Aggregation vs. Syndication
These efforts to work around the shortcomings of publisher-specific content distribution have two major alternatives, both of which are about creating cross-publisher content hubs which ideally would contain “all the content” — the approach widely known as aggregation, which is a well established business model, and what I call syndication, which could emerge.
First, in aggregation, services like those from ProQuest and EBSCO license rights to redistribute selected parts of many publishers’ portfolios to a broader array of institutions than typically would license extensive sets of scholarly journals from the publishers directly. Aggregation models return revenue to the publishers, but typically not enough to serve as a replacement to direct subscriptions (although EBSCO in particular has some exclusive arrangements that push further in this direction). Aggregations represent a meaningful secondary source of income for publishers. But, given the competitive pressures between publishers and aggregators that Joe Esposito has discussed, only the most carefully balanced aggregation models provide a long-term stable relationship. Rarely, if ever, in scholarly publishing have they alleviated the need for the publisher to maintain a primary publishing platform and business.
It is possible that syndication can offer a more viable alternative. Unlike aggregation, syndication involves a clear separation between hosting and access provision, on the one hand, and business considerations such as pricing and sales. Through syndication, the primary publisher can continue to set its own prices at a title or bundled level. It transmits information about resulting entitlements to cross-publisher content hubs (which I have called “supercontinents”). Those cross-publisher hubs would provide content access to anyone with an appropriate entitlement. Through distributed usage logging, information about usage would be transmitted from the cross-publisher hubs to the primary publishers, to help them with author relationships and marketing as well as library marketing, pricing, and sales. Syndication is a strategy for publishers to retain at least some degree of control of their content — and usage information about it — in an open access environment, by providing legitimate or preferred avenues for its distribution.
If syndication develops, it appears that these cross-publisher hubs will develop from some of the discovery services. Contenders include citation databases like Web of Science, Scopus, Dimensions, and their open alternatives, as well as other discovery services. Many of them are working to address the barriers from discovery to access, including various forms of direct linking that would culminate in the logical conclusion of enabling access directly on the discovery platform. Digital Science is perhaps furthest along in this transition, through Anywhere Access on Dimensions, which enables same-platform discovery and then access to an article, an early version of the supercontinent model. Depending on how Web of Science chooses to integrate Kopernio, and how Elsevier’s strategy proceeds, we may see strong supercontinents from these players as well. On the other hand, we could see emerge a series of hubs each with tools optimized to the workflows and methodologies of an individual discipline or disciplinary group.
Syndication might have different meanings across publishing houses. Some might wish to try to preserve their existing platforms and restrict syndication only to other types of content distribution. For example, some publishers or providers might urge that syndication be focused only on “scholarly collaboration networks” like Mendeley. But there is also every reason to wonder if ResearchGate could emerge as a cross-publisher content hub; the strategic tussle about ResearchGate is in my analysis one of the key factors impeding the move towards syndication. It is easy to see how ResearchGate blurs the lines between scholarly collaboration network and discovery service. Syndication can only proceed to the extent that a critical mass of major publishing houses can agree on a common set of protocols.
In recent years, some publishers have begun to face the prospect not of content “leakage” — which is inevitable in a digital environment due to piracy — but actually losing control, as one executive described it to me, of the library channel. Syndication may not have been anyone’s ideal. But if it allows publishers to maintain the licensed content environment and provides a reasonable pathway towards an open access environment with a stable platform order, it may be adopted pragmatically. And if, as a result, it becomes possible for a publisher to control the terms of its publishing business while simply distributing its content through third party platforms, then syndication raises all sorts of questions for publisher technology investments.
Taken to the extreme, we might wonder for what purposes, if any, publishers would need to continue to have their own distribution platforms. Can syndication provide an alternative distribution model allowing publishers to reduce or eliminate their investments in branded silos? If publishers agree to pursue content syndication models, and a growing share of content access takes place through cross-publisher hubs, might we expect to see a meaningful reduction in publisher investment away from publisher-specific hosting platforms?
Reflecting on brand, one of the most significant needs for publisher-specific access platforms may not be for readers but rather for author marketing purposes. Author marketing could perhaps take place through a slice of a supercontinent that is effectively branded by publisher and title for author marketing purposes. It might also become a separate interface altogether.
The most acute implications of syndication may be not for publishers themselves, most of which might be perfectly happy to redirect their technology investments, but rather for those that provide publishing platforms. What role will platform providers, such as Atypon, HighWire, Ingenta, and SilverChair, have for themselves if syndication takes hold? Is there a scenario where their publisher-specific platform offerings are no longer needed or in which the value that they afford declines substantially? Are some of them already in the process of transitioning to alternative models? Or might some or all of them — presumably, ones with least technical debt and most strategic agility — see an opportunity to compete as one of the “supercontinents” or at least a disciplinary continent providing syndicated access to all content?
I thank David Crotty, Joe Esposito, Lisa Janicke Hinchliffe, Kimberly Lutz, and Todd Toler for helpful comments on drafts of this piece.
17 Thoughts on "Will Publishers Have Platforms?"
“Publishers are losing traffic on their own platforms.” Is this an industry-wide phenomenon? Not for Elsevier. Downloads from ScienceDirect increased from 900m in 2017 to 1bn in 2018.
https://www.relx.com/~/media/Files/R/RELX-Group/documents/investors/relx-overview-feb-19.pdf Page 27. PS: I work for RELX, Elsevier’s parent company.
Paul, Can you share publicly Elsevier’s estimates about the amount of “leakage” to green platforms, pirate sites etc? The numbers you provided offer a very selective version of the real story. -Roger
Roger, we’re also seeing consistent growth of content usage across the publishers we host on the Silverchair platform (note that the “C” is not capitalized). It seems to me that you and Paul are speaking different languages. “Losing traffic” is a pretty absolute statement (is traffic higher this year than last year?) and I understand why Paul (and I) would refute that phrasing with straightforward-to-measure figures. I think what what you’re getting at is actually “not fully capturing all potential traffic” or “losing share of traffic.” I don’t think this is a new phenomenon, before Sci-Hub and sharing sites there were PDFs downloaded once and shared via email/office systems to colleagues. It is a lot easier to measure use of these new systems, however, so perhaps the “losing share” is in part an overdue recognition of content activity that was once hidden from view? Certainly the number of online venues in which to lose online traffic have increased.
Overall: Working with societies, we are careful not to make broad projections of their situations–as soon as I think I’ve got it figured out I walk into a next society and learn something new! To follow your point in the article, we have seen some small societies band together for shared technology/services (some with commercial publishers, some with more comprehensive publishing service providers like Allen Press and some in discipline-focused collaborations like BioOne or GeoScienceWorld). The larger societies seem to have renewed their focus on serving their researcher communities in a multi-faceted way with publications as one piece (a key one!) tied to other society services (meetings, membership, career advancement, grants, education, professional certifications, etc.). As they desire a close, direct relationship with their community of researchers and a strong brand experience, I see them wanting to directly orchestrate these online experiences in a manner that syndication would not fully allow. Times can and do change, of course, so we are always looking for emerging ways to serve these societies–some may come to resemble what you’ve laid out in this piece.
Thanks for your thoughtful comments about the piece. I am sorry for getting us distracted in a discussion about whether usage is down or rather the share of usage accounted for by publisher sites is down. For some publishers, both are the case. For all publishers where I have seen estimates, though, the latter is the case, which is to say that “leakage” is growing and the relative impact of the publisher site is therefore worth examining critically. That was my intention in the piece today.
I agree that there are important downsides from a publisher perspective to syndication. One key potential downside, which I discussed in my Supercontinent piece in the fall, is the loss of the detailed usage data that are needed to foster analytics businesses. Another, as I referenced perhaps too obliquely here, is the opportunity to engage deeply, well beyond publications, with scholars in a given field, who are typically members of the key society/societies in that field.
That any publishers would consider syndication suggests that these types of downsides may be outweighed by concerns about leakage and other strategic shifts. From what I can see of discussions about syndication, I think these are being led more by commercial publishers than societies, but I can’t claim to have a perfect view of the landscape in that respect.
And, as I hope I have made clear, these are nothing other than discussions at this point. The announcement in the fall about Distribution Usage Logging was one key enabler. We’ll see what further developments if any will be announced in 2019.
I too would be very interested in “leakage” data. I presume it must be significant or we wouldn’t see SciHub and ResearchGate lawsuits, takedown notices to websites/ networks/repositories, etc.
Although it won’t cover piracy sites, this Distributed Usage Logging project from CrossRef may lead to better measurements of these numbers: https://www.crossref.org/community/project-dul/
Elsevier encourages the use of green platforms. For example, it encourages researchers to post their pre-publication submitted manuscripts on pre-print servers, including its own SSRN. It also helps universities showcase their researchers work through bePress. Also, where an author has identified themselves as being NIH funded or an NIH employee, Elsevier automatically deposits the accepted manuscript to PubMed Central on behalf of the author, to be made publicly available after 12 months. I’m not sure that Elsevier views that as “leakage”. As for pirate sites, of course there is “leakage”, but I’m not aware of any public data quantifying the impact. The point I was making was that despite pirate sites, downloads from ScienceDirect continue to grow at a rapid rate.
Today’s Elsevier is definitely doing a lot to enable off-platform use – no doubt about that. Even so, it is working hard (and more effectively than almost anyone else) to control the channels through which off-platform use transpires. This is a key ingredient in the transition to becoming the analytics business that you and your colleagues regularly trumpet.
I do not mean to suggest that there is public data quantifying leakage, only that there is plenty of internal data at many major publishers quantifying leakage. These are the estimates that contribute to the strategic deliberations I analyzed in my piece.
I’m sorry that I wasn’t clearer in referring to share of leakage vs absolute usage figures in my first paragraph.
There are a number of aggregator, analysis and synthesizing sites that support researchers outside of the academic and related areas using a growing number of “intelligent” AI bots. They do not scrape the journals that are outside of paywalls but are sensitive to this material as it filters into the larger publishing sphere whether it comes from pirate sites or from authors seeing new exposures without violation of publishers terms. With the rise of alternatives cited by Roger bringing this alternative media closer in time, the issue seems headed for convergence.
Another problem is the continuing demand for increasing scholarly publication, reducing the value of publications except those of breakthrough quality which, in turn gives the latter a higher escape velocity into other spheres of interest.
When faced with an abundance of student PDFs to grade, the need to deep-dive the sources they cite goes more quickly when I can find everything in one place. That’s not the library. And it’s not a publisher or green platform. No, for sheer speed and ease, it’s good old sci-hub from the former Soviet Union. Yes, a twinge of guilt. Yes, my campus library has (almost) most of this stuff already and pays dearly for it. But who wants to spend more time looking for the ones that are missing, faced with so many citations to check (for details not found in the Abstract)? At least the publishers are getting paid by my library, even if I choose the faster route to information.
This is interesting. I can’t help but wonder whether the students who wrote those research papers used the online subscription for which the library dearly paid or some other ‘faster route’ to the information. We may witness the path of convergence toward syndication becoming a slippery slope greased by the need for speed and convenience. In the absence of usage data for subscriptions, libraries may no longer be able to rationalize renewals at year-end. If publishers cannot gather meaningful usage data from their own platforms, the business case may behoove them to move to a more measurable alternative such as the ones Roger describes in his post. We’re clearly in a watershed moment in publishing. Time and technology will reveal which way the tides will flow.
In the absence of usage data for subscriptions, libraries may no longer be able to rationalize renewals at year-end. If publishers cannot gather meaningful usage data from their own platforms, the business case may behoove them to move to a more measurable alternative such as the ones Roger describes in his post.
Wanted to make clear here that libraries receive extensive usage data from publishers. See: https://www.projectcounter.org/
Thank you, David. I am familiar with COUNTER as an excellent oversight tool for tracking usage, attempted usage, type of content accessed, etc. by subscribers on a publisher platform. In my comment, I was referring to the usage described by DF — not a publisher or a green platform, but sci-hub and the like. This type of access cannot be measured by COUNTER and therefore cannot be “counted” when renewal decisions are made. The circumventing of the content silos creates a domino effect.
Thanks for the clarification. As noted in another comment, CrossRef has a project underway called “Distributed Usage Logging” which may be able to help count usage from other platforms, at least from legitimate/non-criminal platforms: https://www.crossref.org/community/project-dul/
Roger, I agree the role of the publisher site is worth a critical examination at this point in time and glad you have kicked off a fresh discussion. What’s most interesting to me is how complete the shift in collective consciousness towards the “article” economy versus the “publication” economy now is. What you refer to as “publisher brand” I think of as “publication context.” The branded site, if you will, is where the content is framed in contexts that have always been important to publishers and editors and is part of the craft – for instance, the “issue” only lives on the publisher site. Editorials and other front matter, errata, themed collections, etc. In the world of “just get me the pdf” this sort of thing is dismissed. Does that mean readers don’t care about this stuff at all? Our usage data would imply that about 15% of the time they do. Or maybe 15% of the people do. Or maybe 15% of the traffic is editors looking at their own handi-work. Who knows? Print-era traditions hang on long after print is the consumption medium, and in many ways the workflows aren’t adequately serving the research enterprise in the age of the linked-data web. As long as there are publications, there will be publisher sites on which they call their home. But I do see publishers having more sophisticated views on how to maximize all online channels for distribution, and less willingness to invest in bells and whistles on their publication sites unless it really drives engagement and new monetization opportunities. Maybe it’s only the era of the publisher site McMansions that is over.
Todd, I’m loving the McMansion metaphor. With today’s Springer Nature news, I’m humored at the thought of RsearchGate as the fancy summer camp you send your content to, counting on receiving regular letters (analytics) sent back home.
In response to both Todd and Lisa, I have noted some time ago that journals are gateways to the publishing arena, but going back to the old print versions of “Current Content” most researchers, with whom I have been familiar, would scan the journal content page to selectively find those article of particular interest to their work, rarely interested in the entire mixed grill except in select instances. With ICT and intelligent search this has been notched up.
The second issue is with increasing intelligent search, today. The knowledge in academic journals has been of interest to academics but that information rapidly leaks into the world outside of academia, hence these engines often ignore those materials behind paywall, again, with areas of select interest. But, again, it’s at the article level though journal imprimatur counts as a validator.