While some journal publishers have been addressing AI (or, at least text and data mining) for more than a decade, many organizations are at the earliest planning stages in terms of their AI strategy. However, the American Psychological Association (APA) has been consistently ahead of the curve in addressing the implications of AI as a publisher, a licensor, and a user.
On March 11, during this year’s London Book Fair, I moderated a session entitled The Power of Licensing & Responsible AI – A Proactive Path for Publishers. During the session, I had the opportunity to interview Aaron Wood, Head of Product and Content Management for APA, about APA’s multi-faceted approach to Artificial Intelligence and the change management it engenders. I was especially excited about this interview, given the APA’s comprehensive approach to AI. The following is adapted from the talk.
Can you tell me a bit about what APA has done and how it has approached AI from a management perspective?
We have been involved in text and data mining, artificial intelligence, and natural language processing for a while in terms of workflow and that type of thing. But we had the same panic as anybody else upon the launch of OpenAI’s ChatGPT.
Instead of taking the daunting approach of trying to come up with a complete solution, what we’ve been doing is to take a very stepped approach, small piece by small piece.
The first thing that we did was to protect our intellectual property as best we could. This involved creating an internal policy for staff. It’s one thing to have an AI developer coming and taking your content off your site. It’s a completely different thing to have staff unwittingly uploading content into an Large Language Model (LLM). We started with a quite restrictive policy, but then opened up by finding the right tools for staff to use. We found tools that discarded the inputs and the outputs, ensuring that staff could take advantage of these new ways of conducting their work without risking the content.
Another area was rights reservation. When the EU AI Act came out, it was very clear that you needed to be explicit about your reservation of rights for text and data mining and artificial intelligence. We didn’t initially know what to do. We looked to industry providers and associations like STM, where there were Task and Finish Groups looking into various aspects of AI. We slowly rolled out the recommendations that came from the STM in the TDM Reservation Protocol, which was an effort between STM, other associations in publishing, as well as the W3C, to develop the beginnings of a standard around machine-readable and human-readable rights reservations for LLMs.
For a publisher, protecting rights includes creating a path to legal and ethical reuse of content. We quickly decided to work with CCC by joining its AI reuse as part of its Annual Copyright License. That allowed us to put our article content, and as well as our books content as chapters, into a pool that can be used for AI purposes for research.
We didn’t do everything in one stroke, because the technical side of that was a little too daunting. We just did little pieces that helped us protect our content along the way: Updating copyright statements, changing what we have in the footer of our sites, and then slowly getting to all of the technical pieces, like the rights reservations in HTML metatags and HTTP response headers.
“AI strategy” can be overwhelming and mean many things, from monetizing outbound rights to changing editorial workflows. The stepped approach has obviously been successful for you. How do you handle change management at the APA?
There is psychological and sociological research that indicates that people’s attitudes towards technology such as AI tends to be reductive. An individual will either completely mistrust and fear a technology like AI or, on the other side, will overly trust and be extremely enthusiastic about it. Essentially, people look at it as a broad category instead of individual cases. That’s quite a challenge because we need to work internally in our organizations, we need to work across organizations, and we need to work within the industry. I’ve found that to be more of a challenge than choosing when to automate with new tools.
We needed staff to be comfortable with the concepts within generative AI. In product and content management, this led to specific upskilling goals for staff. People got their “feet wet” by beginning to understand the technology in general, and then they started to look at it within their areas. Whether it’s a workflow or a license, they needed an understanding of the technology and a space to discuss it with their peers.
I also found it helpful to ground people’s considerations in a few areas that resonate well with them: research integrity, copyright, piracy, freedom of expression, and academic freedom. In doing that, it’s helped people begin to look at things individually instead of looking at it all very broadly. Of course, we still have the two extremes within staff but with some middle ground, too.
Research integrity often means ensuring that papers are original, created by the named authors, and that peer review is impartial. However, there are larger concerns with AI that research is accomplished with the highest quality materials, such as peer reviewed versions of record. How does APA approach larger questions of research integrity and AI?
It’s about the appropriate use of science and the curation of scientific knowledge. I think a good way of illustrating it is to think about retractions. Look at cases where a research paper that has been published is then retracted either by the author, for example, because the methodology was flawed and therefore the results were flawed, or by the editors because there was data manipulation or other dubious activities. That retracted content is part of the scientific record. You should know that it’s there, but it’s not really part of the scientific knowledge. Your knowledge is supposed to go past that.
With the advent of generative AI systems, this becomes somewhat problematic. If you think of a typical search system like PubMed or Google Scholar, essentially what’s happening is you have a researcher or a student putting in a query. They get back results, they look at those results, and they determine for themselves the relevance of those results to their query. As they investigate them further, they also look at the reliability. Is this from a peer reviewed journal? Has this content been retracted? These are things that the researcher or end user investigates.
With generative AI technologies, much of that context has been stripped out. For example, retracted content is treated the same as anything else. And that’s problematic for the science because it could be perpetuating flawed research.
So what does that mean? How do you deal with that as a publisher? There are a number of angles to that.
At APA, when we’re developing features or generative AI technologies for our content and products, we make sure to remove that content from our vector database and our RAG models. Luckily, there are tools for that. CrossRef has an API, for example, where publishers contribute their information about updates to records, including retractions. That API also includes retractions from Retraction Watch, which means that there’s a central place to go in and pull that content out of your systems and therefore improve the science. We also use licensing to reinforce content integrity.
Aaron, if you had one piece of advice, what should organizations do when they’re starting this holistic, stepped AI path?
I think the most important thing is to create the space in your organization for the conversation, because it can’t really be led from just one area or the other. You really need staff involved to have an open discussion about what’s going on. That’s the only way to get to the point where you’re looking at each application, each license, and each tool individually, and not thinking of AI in generalities.
About Aaron Wood
Aaron Wood leads product development and strategy, as well as indexing and production systems, for the American Psychological Association, a publisher of books, journals, courseware, video, and discovery solutions. His experience in scholarly communications and publishing is broad and multinational. Wood has led metadata and technical services at academic libraries and consortia, streaming video and audio platform development, print and electronic book acquisitions and ecommerce solutions, full-text journal and book production and distribution, and abstracting and indexing discovery solutions. He is a member of STM’s Standard and Technology Committee and the Crossref Board of Directors.