I have long thought that the transition of publishing from print to online was a natural byproduct of the opportunities created by the Internet. The changes have driven both growth and periods of consolidation that are not unique to our industry — a fact Cory Doctorow noted at the NISO Plus meeting this year when he recited a long list of industries dominated by a few companies that earn the majority of the revenue.
Reflecting on the last two decades, the combination of technology and economics feels as though waves of change have reshaped our industry. Functions that were essential twenty years ago (such as book vendors and subscription agents) barely exist today. Looking ahead, the next big wave is to use analytics and AI as we complete the transition to open content. Operating at scale will reduce the cost of publishing and enable the creation of new products. What follows are a few observations on the role of scale in driving consolidations and the need for scale for future developments.
First Wave – 2000s – Digitization and Consortia
The Internet revolutionized publishing with “create once — view many times.” Only the largest publishers could develop their own systems, and vendors created platforms to serve the middle market. They were complemented by the aggregators (EBSCO, ProQuest) that made digital distribution possible for small publishers.
The Big Deal enabled publishers to offer a package of content to a group of libraries. Confronted by the need to adopt new technologies and methods for sales, small and medium publishers were acquired en masse by larger companies whose goal was to achieve the scale necessary to bring the content online.
Second Wave – 2010s – Workflow and Open Access
By the second decade, many societies and publishers relied on vendor software to support editorial workflows and to host content. Over time workflow software had become a mature market with limited growth opportunities, which led to the acquisition of Atypon by Wiley and Aries by Elsevier, while HighWire became part of MPS. Open Access (OA) received a boost in 2013 from UK/EU/US mandates. Five years later, Plan S aligned funders to require a shift from OA articles to OA journals. This change drove more societies to partner with large publishers who could handle Transformative Agreements that leverage consortia library budgets in support of OA. The stability of large publishers was necessary to adapt to this transition.
Third Wave – 2020s – AI and Open Content
This decade will see the tipping point reached for open research content between the [top down] expansion of OA initiatives from commercial publishers and the [bottom up] support for Open Science efforts from within the academy. Having more content freely available and more content on the same platforms enables large scale analyses. The economic models are shifting from the value of the content at the unit level to the deployment of tools to uncover intelligence in a large body of content.
Current AI initiatives
The closing session of the STM Spring Conference Innovations Day 2021, led by Jabe Wilson, Global Commerce Director, Data & Analytics at Elsevier, provided snapshots of new tools created to streamline existing operations and to create new products.
- Journal Selection – American Chemical Society
AI-assisted journal recommendations were incorporated into the review process, reducing rejections, and increasing author satisfaction.
- Language Check — Writefull, Hindawi
AI based tool was created to offer a language check during the submission process. It resulted in authors accepting 87% of suggestions which reduced the time to acceptance (by 20 days) with fewer rounds of review.
- Selection of reviewers – Optical Society of America
AI tools were used to profile submissions and published papers in order to assist in identification of reviewers beyond those known to the editor.
- Annotation of articles – Springer Nature
Tools were used to harvest the relation between entities to help annotate articles and construct knowledge graphs.
- Repurposing drugs – Elsevier
AI predictions based on a neural network built by machine learning was used to repurpose drugs for a specific disease.
In each case the level of investment and expertise requires scale to achieve the desired results.
Challenges and Opportunities
The transition to Open in a global research environment requires that we lower the cost of publishing. High production costs and customized systems are giving way to standardized models that produce efficiencies. Current trends are driving scholarly publishing in two directions, small and unsophisticated vs large and structured. Where do we go from here?
During this year’s PIDapalooza, The Open Festival of Persistent Identifiers, the development of PID-graphs in Finland and Australia demonstrated the structure and analytics necessary to address a variety of questions about research outcomes and relationships between projects. The volume of content is only going to increase, providing the scale needed to obtain the desired insights.
On the occasion of Crossref’s 20th anniversary last year, Ed Pentz noted in a blog post that since it was founded, DOI resolutions grew to 470 million per year by 2010, and were close to 470 million per month by 2020. While this applies mostly to journal articles including the backfiles, it reflects the expansion of research content in a global environment. With growth comes the need for speed and the opportunity for technology to be used in new ways.
I think of the following quote often as a reference point and a reminder that we live in a time when new discoveries and developments are occurring rapidly. Some days that is little consolation in the effort to keep up.
It may feel like the pace of technology disruption and change these days is so dizzying that it could not possibly get any more intense. Yet here’s the science fact: the pace of change right now is the absolute slowest it will be for the rest of your life. Fasten your seatbelts, it’s going to be a fascinating ride.
14 Thoughts on "Content at Scale – The Third Wave"
Thanks Judy, great post and summary, for further examples of how data and AI are being used in interesting ways, also outside publishing for public health, I’d encourage people to come to the SSP Annual meeting panel I’m helping organize on Infodemiology and Infoveillance Monday 24th, 2pm https://43ssp.cd.pathable.com/meetings/virtual/CwYHPt2vyGWa67eWm
Thanks for highlighting this session, Adrian. See you during the conference.
This doesn’t change your conclusions Judy, just your wave count, but, of course, if you go back 30 years instead of 20, then the emergence of high-quality desktop publishing was a tsunami that completely upended the book publishing industry as thousands of small publishers (including my own company at the time) came online. This digitization, combined with the growth of less traditional sales outlets (like Costco), and the rapid consolidation of the retail book industry (Borders, B&N), completely revolutionized the industry (at about the same time digitization was also revolutionizing the music industry). Was digitization less of a force in transforming academic journals (relative to the impact of the Internet)? I don’t know the answer here—this is an actual question, not a rhetorical one 😊. With regard to your article’s main thesis, in addition to the AI tools you mention, a couple other interesting ones that readers may want to explore are DARPA’s SCORE system (https://bit.ly/3f2fzoI) to help quickly evaluate social science research, and new big data tools like those proposed by Wang and Barabasi (in their new book, https://amzn.to/3fr30SE) that may lead to a better impact metrics.
Glenn, that’s interesting as I almost included the CD-ROM era in the early 1990s and remember how they were replaced once the Internet became accessible in 1995. The late 1990s were heady days with a lot of hype. In answer to your question, I’d say that the Internet’s ability to deliver content to a much wider audience drove the digitization of journals. I agree with you that the reliability and impact of science are key topics as we look ahead.
A thoughtful and insightful overview, Judy but not much hopefulness for established and successful mid-level independent journals whose subscriptions dwindle with each passing year while their usage worldwide increases substantially. Re your comment about EBSCO and ProQuest, you are probably correct in the general case that they help smaller journals extend their markets. The experience of the Canadian Journal of Communication is that EBSCO and ProQuest add 15% to our usage stats primarily in developed Anglophone countries with very little penetration into Europe, India and China for example. Maybe Clarivate’s purchase of ProQuest will change things. The difficulty with the current situation is that they arguably diminish our subscription numbers.
Rowland, I appreciate your kind words. Though as content becomes more prolific, I think we’re headed towards a “subscription” being an indicator of interest (alerts) rather than a source of revenue. The new revenue models are based on the analytics around the content which requires scale. It’s an entirely different rulebook.
I agree, unfortunately.
Most last half of the 20th and 21st century technologies were predicted at the NY Worlds Fair in 1939 except one – the computer and it made those predictions as passe as the buggy whip. I tend to think the third wave is old hat and what the future holds is yet to be seen. Beam me up Scotty!
Judy, I’ve been thinking about this all day: “Language Check — Writefull, Hindawi
AI based tool was created to offer a language check during the submission process. It resulted in authors accepting 87% of suggestions which reduced the time to acceptance (by 20 days) with fewer rounds of review.” This is really quite a gain. Is there a link you can provide to learn more about this implementation/project? Thanks!
Lisa, I’m glad you asked the question as I learned more about it at https://www.writefull.com/ There is a free version and plans for publishers to use. Looks like a great team developed it and it’s part of digital science. I expect there are many more applications in addition to its use in the submission process.
Hi Judy, I’m Juan, one of Writefull’s co-founders. Thanks a lot for including us in this overview and for pointing to our website. We do indeed offer different levels of integration and more types of feedback (in addition to language, we also offer metadata extraction and structural checks). We have an API for integration purposes that I’d like to refer to: https://www.writefull.com/language-api.
Thanks Judy for the summary. The Third Wave is indeed coming and open content will allow aggregation of data and the resulting insights from it. With respect to AI overall, it is true that a number of these technologies are yet to hit the growth phase. While improvements to technology is necessary, I believe that may not be sufficient for adoption at scale. Convenience of use by way of superior user experience will have a telling impact.
To plug another product here, we have Paperpal Preflight, which is an AI-powered language and technical reporting checker rolled into one. Paperpal Preflight is plug and play already, and we’re now building a self-service UI to allow journal editors to simply configure their language and technical checks and offer it directly to incoming authors just before submission. This simple intervention has a non-trivial impact in elevating the base quality of language and reporting in the submissions to the journal.
Thanks Judy for this interesting article. Indeed many publishers are already productively employing AI and contributing to its development in many ways. For its promise to be fulfilled and truly improve research, science, technology, medicine and broader society, AI has to be grounded in the values of trust and integrity fundamental to scholarly communication however. With this in mind, STM recently published a white paper exploring principles for a trustworthy, ethical and human-centric AI. The white paper can be downloaded here: https://www.stm-assoc.org/2021_05_11_STM_AI_White_Paper_April2021.pdf
Joris, Thank you for your reference to the report on STM Best Practice Principles for Ethical,
Trustworthy and Human-centric AI. That topic was an important part of the STM session that day and one that is of primary importance for a successful implementation of an AI application.