In considering the supply and demand issue of a greater supply of written content, why can’t we just use machines to help us deal with this increased supply? Building on Esther Dyson’s ideas about the value of “garbage subtraction” in a post in November, I pondered the increased value of selection and curation for the reader. The editorial and peer review processes are a few such ways which scholarly communications has developed to incorporate a measure of quality control in what is distributed. There are other ways to reduce the burden of increased content, to be sure, such as increasing the cost of content, reputation filtering, narrowing one’s content interests, or applying similar information selectivity approaches. Each of these options comes with their associated benefits and downsides. For example, selecting by costs, be they subscription or author fees depending on your model of choice, creates economic barriers to selecting content. Only listening to trusted colleagues or the most-cited papers one’s favorite journals for what to access could turn into an echo chamber.

In response to the article and the growth of content more generally, several have suggested that our community needs more machine learning tools to help with that curation process. Of course, we have been using machine-intermediation for search and discovery for years. For all the excitement of artificial intelligence technology, the approach du jour, seems to be “Let’s have the AI do it for us.”  On cue, practically every technology tool has embraced and touted its AI integrations, highlighting how their new, improved tool will solve your information management problems.

Toy robot reading a book

What are the implications of ceding this control to the machines? In the same community, some people are expressing concerns about research integrity and decry the use of large language model-based processing tools for content generation while others are taking the opposite tack, doubling down on the opposite approach. They claim we should rely on those same technologies to recognize fake or machine-generated content or provide the most relevant resources to our queries.

The solution to ever more machine-generated texts isn’t necessarily more machine-generated curation. This is an arms race, in which neither side can win. More mid-range quality information curated by mid-range tools will not advance science in the best possible ways. We need to recognize that these tools can be used both for useful ends and malign purposes, and that we can learn from previous experience with waves of techno-euphoria. We need to maintain that better quality content and better quality results will only be maintained by human interactions with the process.

To this end, it makes sense to draw people’s attention to some of the problems associated with relying on machines to do the thinking for us.  Below is a brief list of some of these issues:

  • Computers don’t always identify they correct results, from the accuracy of search results to funny computational output to computational hallucination.
  • Computers can only triangulate based on your interests as expressed in behavioral data and will infer your search query based on those estimations of your interests, all of which is prone to error.
  • Feedback loops, either reinforcing existing biases, patterns of behavior, or from algorithmically produced results being reintroduced into models, can lead to model collapse.
  • The currently ascendent computational models focus on different varieties of mathematical understandings of the world. This probabilistic approach to problem solving has its benefits, but also its pitfalls.
  • Computers cannot reason in the way that humans do, nor apply subjective rules to a system. While this delves deeply into what it means to “think” and the theory of mind, and how humans perceive the world (for more, consider the perspectives on consciousness and machine thinking by Searle, Rosenthal, McCarthy, Alston, and many others).
  • It is worth recognizing that many if not most of the questions researchers address are not simply what inputs correlate with what outputs and how are processes to achieve __________ (insert your own research question here) result determined. Simply adding randomness to machine analysis isn’t likely to help to escape from a model’s local minimums.
  • Who is writing those algorithms, and what transparency do users have in their motivations and their expected outcomes from the application of their tools? Do they align with the user’s interests?
  • What unknown benefits might accrue to those toward whom the algorithms have been biased?

Many of these issues are not necessarily new, nor specific to the past year’s conversation around artificial-intelligence-based tools. For example, some of these issues were first identified and were areas of concern when search tools were first gaining significant traction in the community. In their paper describing the PageRank algorithm and the basic infrastructure of Google, Sergey Brin and Larry Page explicitly call out the bias that can be introduced by advertising. They state unambiguously, “we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.”  Given the business model that drives Google’s success, it is worth questioning whether Google’s search results are as good as they used to be and whether this guidance by the company’s founders has been set aside in pursuit of profit. From the perspective of providing trusted results, the final bullet above is important to reflect upon. Also how these motivations change over time, or could be expected to change, is worth considering before one locks oneself into a service model.

In December, a paper was published in Nature by the Google DeepMind Team, in which they describe how FunSearch, a model that devised a novel mathematical solution to the cap set problem. Because this solution was not existent in the training data, nor anywhere in any literature, some have taken this as a sign that the much discussed, machine sentience had arrived. Alternatively, have we reached  a point where the proverbial million monkeys pecking away at a keyboard that will eventually write out a string that is a Shakespearean sonnet, given enough time? In this case, the FunSearch algorithm tried “ a couple of million suggestions and a few dozen repetitions of the overall process — which took a few days.” Perhaps ‘infinite monkeys’ working away can get things done faster than we think.

How to consider and incorporate the use of algorithms in mathematical proof discovery is not new, dating back at least to the 2000s, and there has been much written lately about its implications, both in the mathematical literature and in mainstream publications. It has wider implications for domains outside of mathematics as well. More than 20 years ago, in the domain of digital humanities, a discussion began around the concept of Distant Reading, which captures the notion of how researchers had taken to studying machine outputs (statistics, samples, and connectivity of terms among other things) about texts instead of the texts themselves. The term “distant reading” was coined by Franco Moretti, as he described it in his article, Conjectures on World Literature. There is now a corollary in distant viewing, as machine capabilities in image analysis are beginning to take shape. Certainly the past two decades of research has shown that interesting insights can be gleaned from computational analysis of content. However, the ensuing debate among humanities scholars has also shown that these approaches shouldn’t be the only, or even the primary approaches to studying literature.

The key fallacy here is that the most important things are inherently not quantifiable. As much as we like to assign quantitative metrics to things, much of life, particularly the most important things are subjective and not quantifiable. How do you rank excitement? Is a 1-10 scale appropriate, or should it be on a more granular 100 point scale? If you think about these questions, you quickly come to the foundational question of what does that even mean? How beautiful is that thing or how much do you love your family are not quantities, they are qualities. The entire notion of expressing intelligence on a metric scale is absurd. Whether something is novel or interesting is not something that can be expressed algorithmically.

In a similar way to how there should be caution about relying on citation counts for promotion and tenure decisions, we should all be reminded that the best way to assess quality or applicability is to read the content. Perhaps deference to other humans who’ve read the content is a decent proxy at times, but certainly not without its flaws. It is important here, that one not overstate this post as my opposition to AI tools or recognition of the benefits that these tools can provide. We are seeing real and important progress in the domain of neural nets, large language models, and artificial intelligence tools. We are also far from being able to put all our full faith in these tools without significant human intervention. Reflecting on the ways these tools can fail us, is an important check on complacency and will reinforce the value that people bring to these processes.

Todd A Carpenter

Todd A Carpenter

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.

Discussion

1 Thought on "Let’s Be Cautious As We Cede Reading to Machines"

This is an important conversation to be having, and thank you to Todd Carpenter and the Scholarly Kitchen team for keeping the torch lit.

That being said, as I have commented multiple times before on previous AI posts, it might actually be prudent to speak with those of us in scholarly and scientific communications who are also experts in AI before jumping to conclusions as to how these platforms are used in the world today.

Having just successfully wrapped up ISMPP-EU as Co-Chair of ISMPP’s AI Task Force, I can tell you that most life sciences firms, publishers, technologists and medical writers who are collaborating with generative AI as a copilot are not ceding their intellect to the machines but rather, catalyzing their creativity, speeding time to content creation and enabling clinicians and patients to access scientific information of higher quality, and faster than historically possible. A number of important research questions remain, but we can only begin to address them if we directly engage with those who are innovating, and learning.

Comments are closed.