Today’s guest post is a recap of the recent SSP webinar, “Ask the Experts: AI in Publishing”, held on October 6, 2022 by the moderator, Anita de Waard, who is VP of Research Collaborations at Elsevier.
There are numerous conferences, workshops, and keynotes about how or whether techniques developed under the moniker ‘Artificial Intelligence’ (AI) can support (or ruin!) scholarly publishing (not to mention two recent Scholarly Kitchen posts on ChatGPT and the issues it presents). But what is actually meant by AI, according to people who do this for a living? How, precisely, can this mysterious set of technologies help or harm scholarly publishing, and what are some current trends? What are the risks of AI, and what should we look out for?
At the SSP ‘Ask the Experts’ Webinar on “AI and Publishing” I (AdW) posed these and other questions to our three invited experts:
- Helen King (HK), Head of Transformation at SAGE Publishers who hosts the influential blog, PubTech Radar
- Lucy Lu Wang (LLW), Assistant Professor at the University of Washington Information School and visiting researcher at the Allen Institute for AI, who helped build the influential and innovative ‘Semantic Scholar’ platform and helps lead a number of workshops on natural language processing of scholarly text, including the SDP and SciNLP Workshops
- Paul Groth (PG), Professor of Algorithmic Data Science at the University of Amsterdam, and scientific director of the UvA Data Science Center. He previously worked as Disruptive Technologies Director at Elsevier and is a former Board member of Force11
The following is a condensed version of that conversation, edited for clarity.
To begin with, let’s ask a simple question: how do each of you define ‘AI’?
HK: For me, AI is an umbrella term for a range of algorithmic-based technologies that solve complex tasks which previously required human thinking. I talk about ‘software solutions that help with decision making’.
PG: As Larry Tessler says: “AI is whatever hasn’t been done yet” The textbook definition is: artificial intelligence is about building and design of intelligent agents. The word ‘intelligence’ has two parts: on the one hand, learning; and on the other, problem solving. In publishing this means the ability for machines to learn patterns in order to do things.
LLW: I define AI as a set of technologies that can perform tasks that have typically been done by humans and require higher level intelligence or knowledge to perform. Recently, AI mostly refers to models learned directly from data, rather than explicitly encoding human knowledge in a structured way.
Great definitions, especially the focus on tasks that were ‘previously’ or ‘typically’ done by humans. Talking about things that people do, which AI technologies do you see currently being used in (scholarly) publishing?
HK: It’s hard to think of an area where AI isn’t touching the publishing workflow! If I go through the full scientific publishing flow, there are AI tools to support article writing (such as PaperPal and Writefull), article submission (such as Wiley’s Rex, which automatically extracts data from your paper); tools to screen manuscripts on submission like Penelope and RipetaReview; and tools to support peer review such as SciScore for method checking, and Proofigand ImageTwin for scientific image checking, though statistics checking is not yet quite so developed. There are many AI-based tools and services to support finding reviewers. Interesting work is being done by Scite.ai around citation analysis to show how citations support the arguments in the paper. At the production stage there are lots of tools to create proofs, especially in book publishing, and many publishers are using automated triaging or copyediting services. Post-publication there are search engines and recommender tools that use AI to categorize content, and help with the ‘marketing’ of papers by using information about what you have read to suggest ‘what should I look at next?’
PG: That’s a great list! Next to those I see two main areas: first, summarization: e.g., by Scholarcy, which is right here, right now! I think we will see a lot more coming for publishing in that regard. A second topic I see is using published material as data, especially in the area of natural language processing (NLP). This can lead to better semantic search, which you can use as a pipeline for papers you can provide as a publisher to different companies. I’m seeing a move to look at citation-driven recommenders, which, using information extraction, allows you to expand to other areas.
LLW: There are many steps in the publishing and consuming process: search, recommendation, access, reading, writing. One can’t focus on everything, but I am most interested in AI for assisting reading, as well as interpretingdocuments in the context of the scholarly literature. I see a host of multi- and cross-document tools that can create connections between one work and the rest of literature. There is lots we can do in terms of reading. For example, AI can support extreme summarization: on Semantic Scholar we have a TLDR (“Too Long, Didn’t Read!”) feature which provides one to two sentence summaries of a paper, which can help you figure out whether you should read a paper. Once you decide to open a paper you have 5-50 pages of very dense text, and books are even denser. How can we help people get to the right place in those papers? Another technology we are interested in is Question-Answering systems that allow for interfaces to actively search within papers to find the right section.
It seems quite a few of these tools fall into the general category of ‘recommender systems’. Speaking of that, are there any new tools that specifically look for similarities between documents?
LLW: Similarity can mean a lot of different things. An interesting development is building out systems that can verify claims made in one article with evidence from other work. Other preliminary models are out there that can perform something like a literature review search, and I think models for this application can help scholars be a lot faster. A particular area of interest for me is clinical research: I study ways to speed up systematic reviews in the clinical domain.
PG: AI can tell you about similarities between documents. Two examples are: first, tools for automatically clustering papers at a recent AI conference; second, what to do when you have zero results for a query? With some of the new tools, you can find papers that are similar to what you are looking for, even if there are no results to your query directly. Algorithms are very good at learning representations that help, here!
LLW: Reviewer recommendation is an area where this technology can be very helpful! I don’t believe in automating the full review process, but this is an interesting area.
Lucy, you explicitly say that peer review should not be automated. What are everyone’s thoughts on this topic: should and can peer review be done or supported by AI?
LLW: There are two bottlenecks in the reviewing process: one is finding appropriate reviewers, and the other is getting them to write high-quality reviews. For the first, there’s a lot of work to do automatic assignments within a pool of reviewers. Mostly, this works okay, but if you send out a lot of invitations without having a human connection to the reviewer, people tend to say no or be less responsive and write less good reviews. If we could merge the AI aspect of finding reviewers with the human touch of explaining why you should review, that would be great. On writing the reviews themselves: some parts could be automated, e.g., finding references, as opposed to making suggestions to improve the work itself, which requires a human touch.
PG: I think it depends on what kind of reviewing system we want. These tools can’t really help to detect whether the authors are publishing novel things – but in terms of thinking about whether the science was done correctly, maybe we can have a deeper involvement for intelligent systems. Tools can help answer questions about whether the research was done correctly: “Did you follow STAR Methods?” We have a lot of checklists for authors: if we move to environments where automated systems are directly involved in enforcing methodological rigor it could be a lot higher! This does not replace what I do when I review a paper, which is more about taste, about why the work was done. Systems can help assess if the science is done right.
HK: If you’re talking about screening papers, then tools like Penelope, SciScore, or Ripeta can help with checklists. They can make sure figures and tables are there, which is frustrating if not done right! Using AI in checking images, e.g., to detect fraud, is very important, and there are ongoing efforts there. I think this is more a publisher than a reviewer responsibility. On the publisher side I think it’s also important to do basic identity checks: is the author or reviewer who they say they are? Papers are sometimes not coming from actual people [i.e., fake authors], this needs to be checked. Is this collaboration probable? For example, are the authors from wildly different departments, which could be a sign of fraud?
PG: Helen, I have a question to you. It is so hard to get reviewers for a paper, and there is so much to do for a reviewer! To what degree do you think we can lighten the load?
HK: Systems to find reviewers are being built. Perhaps we need to move towards [writing] semi-automated papers, where only a subsection of the paper is reviewed by a person. Or, if you can use a machine to automate [writing] the methods [section], then we should support that!
AdW: I find it interesting that we are seeing a merger between human and AI work: Lucy says we still need the human touch, but Helen and Paul both point out that there are elements in science that are largely done by machines, which can be both written and checked by machines. If we think about an extreme scenario, in a lab that is fully automated: what if the machine itself can write a report? Who would read these: other machines?
This is a larger question: currently, we are seeing the rise of computationally generated papers. How can we make sure we keep on top of this?
LLW: One question we should be asking is: what is the purpose of papers, are they for humans to read? If there were a way of highlighting the specific contributions the paper is making, you may be able to lighten the load on reading. Or are we making more papers for computers to read? For example, if you are running an experiment where the machine can generate a report that replicates all settings such that another machine can redo them, that would improve reproducibility. I think there is room for both: AI can ingest huge volumes of data, so maybe there is room for machine consumption, as well as papers for human consumption. We (humans) can focus on interpretation and communication!
HK: This is different in the social sciences. I do see developments in the experimental sciences for machine-generated scientific work, but I’m not really seeing this in the humanities.
PG: On the topic of plagiarism I want to point people to the Content Authenticity Initiative: an effort by Adobe and others on how to identify whether an image is authentic. Their goal is to track and report transformations and the provenance of images: that seems interesting for publishers to look at!
Any thoughts on bias as a result of AI, what we should explore more?
LLW: There are issues, for instance, the so-called ‘rich-get-richer’ or the ‘Matthew effect’: it is hard for people to not exhibit bias towards people, institutions, work that they are closer to. How can we make this process more equitable? Creating tools that grant broader access to scholars from all over the world with different resources helps to level the playing field. Giving everybody access to these tools, such as reading or writing assistants, can be helpful in some contexts. There are also different norms in different countries around things like borrowing from other work. You could help people detect this as they are writing papers and support their writing or citation habits to be more aligned with the scientific norm. That could be a potential intervention that would make AI more helpful to a broader part of the scientific community.
HK: It is important that we understand what we are doing. For instance, if we are writing an algorithm that predicts the impact [of a paper], is that really true, or is it just predicting that white males from prestigious institutions will have a greater impact, because historically, they always have? COPE has some very good guidelines in this regard. There is no magic wand to ensure these things don’t happen; it is important to keep involving stakeholders.
PG: I agree that mindfulness is key. These are systems, not just single models, there are humans feeding data, making decisions about data, supply chains, visualizations. The way to think about this issue is first, define for us, as a publisher: what are our values? And how do we represent those values, with regard to the systems we are building? Maybe this means we don’t do something, or maybe we put filters on the top, and keep checking what we are actually doing. These are socio-technical systems, not just simple algorithms. Again, a human touch is key!
What are key risks of AI systems, how do we make sure humans remain in the loop?
PG: For publishers, I think you really need to be aware of the legal risks around automated decision-making processes. You need to talk to your lawyers: what are implications of involving AI in some of the core publishing processes? You can ameliorate some of the potential risks around algorithms that are performing badly, but the legal risks are very complex. You need to sort that out first!
LLW: I have a comment on this idea of trying to remove things like demographic features from data. It may seem like a good idea to promote equity, but there are many places where identity matters! Removing someone’s name or affiliation is not sufficient to truly anonymize, nor is it always better to do so: it can do some injustice to the work in certain cases. Sometimes factors like gender and ethnicity affect how we write, and how the work should be interpreted! If we get back to our original goals for reviewing: is the goal to judge each paper on a singular standard, or do we evaluate each paper on its own merit? These are complex questions, but publishers should consider them. Publishing is not a very sustainable process right now and we need to think about how to make it more valuable for the community.
HK: I agree with Lucy: you may want to keep some of the demographic data as a part of the manuscript, because you want to actively change the proportion of people from different groups represented. We need to think this through with diverse groups in the room: not everyone feels the same about these issues.
You have offered a lot of thoughts on what AI can do for publishing, so in closing I want to ask you: what can publishing give to the AI Community?
PG: Number one thing is to reach out to AI researchers! It is such a cutting-edge field that cooperating with researchers in AI is a great idea for publishers to do.
HK: Give AI lots of nice, clean, well-processed datasets!
Is there a list of all of these tools somewhere?
HK: I have something from two years ago; but I am not sure if there is an up to date list of tools anywhere.
Here is a brief list of the tools mentioned in this interview:
- Writing tools: PaperPal, Writefull
- Submission tool: Rex
- Citation context: ai
- Plagiarism checking: Similarity Check, STM efforts
- Checking reproducible elements: Penelope, SciScore, Ripeta
- Summarization: Scholarcy
- Figure checking: Content Authenticity Initiative, Proofig, ImageTwin
- Checking for computationally generated papers, see this challenge: Dagpap
- COPE Guidelines on Ethical publishing: https://publicationethics.org/
- A further helpful list of paper screening tools is here:https://www.bihealth.org/en/quest/service/service/automated-screening-tools
- A list of tools from Helen, from two years ago: https://docs.google.com/spreadsheets/d/1M0LvD0AqTwYlW99fsU_GSI0FTt2gCovOaP8hU4pbTBw/edit#gid=0