It almost seems quaint now — a case of old school information manipulation, outlined recently in a riveting New Yorker article about the Sackler Foundation and its development of oxycontin and the pain medicine field needed to justify the drug’s existence and distribution. A 1960s memo from Senator Estes Kefauver of Tennessee gets to the gist:
The Sackler empire is a completely integrated operation in that it can devise a new drug in its drug development enterprise, have the drug clinically tested and secure favorable reports on the drug from the various hospitals with which they have connections, conceive the advertising approach and prepare the actual advertising copy with which to promote the drug, have the clinical articles as well as advertising copy published in their own medical journals, [and] prepare and plant articles in newspapers and magazines.
Seeds the Sackler family planted decades ago via their vertically integrated drug and information operation are now in full bloom, devastating communities all across America.
It’s a reminder of what long-term harm can be done by misinformation and information manipulation. Yet, compared to the scope and scale of the potential information pollution today, the Sacklers had relatively limited reach (physicians and policymakers) over a relatively small subject area (opioids) at a relatively slow pace (monthly magazine advertising). We now face commercial biases emerging 24/7/365 on billions of screens and millions of topics, while 2-3 algorithms from corporate technology companies dependent on advertising revenue effectively control what billions of people see in a week.
Despite all our supposed advances, have we really made progress? Or have we embedded the manipulation into the system more deeply? And are we feeding the systems the manipulators use?
Today’s information manipulation techniques have really coalesced over the past few years, at an accelerating pace. With all these active intermediaries dreaming up ways to manipulate the information they disperse and that we see, the entire playbook is being utilized in one way or another:
- Intimidation. The US Environmental Protection Agency has famously mistreated scientists and scientific information.
- Persecution. Michael Lewis’ recent story in Vanity Fair about the transition of the Trump Administration for agencies like the US Department of Agriculture and the US Department of Energy demonstrate witch hunts for climate scientists.
- Disregard and Neglect. Lewis’ story also highlights deep disregard from the new political administration for scientific evidence to bolster long-term planning, along with incompetence when it comes to basic regulatory responsibilities.
- Engineered Censorship. China is growing increasingly adept at managing its Great Firewall, blocking VPNs traditionally used to circumvent it while also focusing in on content with offending keywords to trigger blockades of information, most recently from Springer Nature, and earlier from Cambridge University Press. Springer Nature is apparently allowing its content to be censored, while CUP ultimately determined it would fight the censors.
- Traditional Censorship. The South African State Security Agency and the South African Revenue Service have been attempting to ban a work of investigative journalism (The President’s Keepers) because it might expose corruption in the government.
- Total Noise. Hiding information can be done merely by overwhelming the senses, the condition David Foster Wallace termed “total noise,” a snowstorm of information that makes sorting meaning out of static almost impossible. With more articles than ever, discerning good information is harder than ever, which oddly makes bad information look more valid because it’s competing as effectively in the noise.
- Ad-algorithm Censorship. Perhaps the most insidious kind, this is the de facto censorship that occurs in conjunction with services that depend on clicks for their revenues — your Googles and Facebooks and Twitters. These services tend to downplay immediately or very quickly information that may be rock solid and important but which delivers fewer clicks than questionable information that triggers reactions. This polarizes opinions and creates “the upside-down” for information consumers.
With governmental and in some ways cultural views of science becoming politicized and commercialized, other forms of information manipulation become possible, as false equivalency infiltrates our decentralized (i.e., scattered and ill-managed) knowledge sources and new players sense opportunity. There are simply too many outlets for anyone to manage effectively, given the lack of human intermediaries, a supine trust of technology, and lower barriers to publication across society.
Lacking broad cultural support via regulation or funding for quality information filtering, commercial bias becomes the informing ethos of information delivery. The recent inability for Google to reliably surface accurate searches around developing events has painfully illustrated the pitfalls of commercial curation. Some of these inherent commercial biases have exacerbated the same opioid epidemic the Sacklers started, with Google recently caught cultivating advertising of shady rehab centers that have bilked addicts and their families while only making the addicts’ problems worse.
So, we collectively manage an industry with a history of manipulation achieved by information producers seeding findings into papers. This same industry now has a commercial and technological overlay that itself is prone to strong biases from giant corporations more interested in their own revenues than in scientific facts and trusted information.
Are we thinking about our information purveyance practices with all of this in mind? Are we being careful and thoughtful enough?
The area of preprints is one we may want to consider more carefully given the propensity for planting papers and exploiting free information to drive users to findings that trigger a click.
For example, the area of preprints is one we may want to consider more carefully given the propensity for planting papers and exploiting free information to drive users to findings that trigger a click.
Preprints have become a sleight of hand in a way, allowing us to elide the fact that we are allowing full access to non-peer reviewed manuscripts in a manner that is inherently validated as “pre,” as if its fate is known, that it will be “printed,” a proxy for being reviewed, accepted, and published in a peer-reviewed journal. It’s a neat linguistic trick, the preprint, turning manuscripts into something more credible, via semantics.
A preprint used to be a printed version of a peer-reviewed article. Printers would run a few dozen copies for authors, and mail them to the author a few weeks before the completed journal arrived. Each author would share these preprints with selected colleagues — a limited distribution of a peer-reviewed article accepted and edited by an identifiable journal.
Now, when we say “preprint,” the definition has expanded in ways we may not have fully considered. We are now saying, effectively, “Here is an article destined for a journal, we know not which, but you can have access early.” We don’t say, “Here is a manuscript that has not been peer-reviewed, may change dramatically before publication, may not ever be published in a peer-reviewed journal, or which may be quietly dropped by its authors before any next steps are taken, you just don’t know, and there are no tools for you to really track its fate in the information system’s current whiteout conditions.”
Being human, and prone to intellectual biases that can be easily manipulated, preprints have the ability to establish what will seem like facts in hundreds or thousands of minds. Once a preprint is out, it’s hard to unring the bell, so to speak, or to get the toothpaste back in the tube. Once the message of a preprint is heard, minds can change, and people can believe things that are not vetted or proven. We’ve had decades of problems from peer-reviewed papers taking society down dead-end paths, from opioids to fake vaccine scares. The case of Jack Andraka illustrates an extreme case of this — where media attention alone created a “science hero” who ultimately flamed out without producing a paper of any kind. In September, a preprint server in medicine was proposed. Do we really want this?
We have managed preprints as if major aspects — the stage being “manuscript” and not “article,” the distribution being global and not local, and the discoverability being immense and not personal — have not been modified by our current practices. We invoke a term of old in an imprecise and some might argue deceptive manner. (ArXiv calls their server an e-print server, which at least is a novel term and not loaded with the baggage of “preprint.”) Is this the right approach? Is there is really not much at stake in the larger information system when we make thousands of preliminary and non-peer reviewed manuscripts available for indexing, discovery, and use?
Perhaps we need to think harder about these things. Facebook’s indirect integration of scholarly information (via CZI’s purchase of Meta and funding of bioRxiv) is reminiscent of the Sackler integration between drug maker and publisher, a similar wine in a spray bottle instead of a decanter. If Mark Zuckerberg and family purchased Meta and are funding bioRxiv in order to discover treatments through algorithms, who is going to check their work if and when they assert success? Is their mere presence in the market a sign of a bias for positive results? Who will check their algorithms?
Recently, an initiative headed by a group from McGill University set out to establish a checking precedent, partly driven by a preliminary finding from a group in the UK showing that 30 studies of AI they examined left out key information about how the AI worked, interrogated the data, and established operational conclusions:
While the studies provide useful reports of their results, they lack information on access to the dataset in the form and order as used in the original study (as against raw data), the software environment used, randomization control and the implementation of proposed techniques. In order to increase the chances of being reproduced, researchers should ensure that details about and/or access to information about these factors are provided in their reports.
The professor leading the charge, Joelle Pineau, an associate professor at McGill University and head of Facebook’s AI research lab in Montreal, has issued a challenge coordinated with professors from five other universities across the world. Students have been tasked with reproducing papers submitted to the 2018 International Conference on Learning Representations, one of AI’s biggest meetings.
This kind of precedent is important. AI and algorithms should be validated in some manner. The lesson of peer review is obvious — anyone can assert an algorithm works, but what biases does it hold? Currently, algorithms are the technology equivalents of commercial preprints — not yet validated, potentially misleading. Yet, are we pushing to ensure that the technology world learns from some of the methods we’ve employed successfully to move the world through the germ theory to space travel to quantum computing? Or are we supine in the face of technologists?
We should insist that governments no longer censor scientific or scholarly information. We should insist that technology companies create algorithms and use business models that incentivize the presentation of facts and careful opinion over false claims and click-triggered controversies. We should support the review and validation of system algorithms, and insist on a level of transparency that seems easy to achieve — when was it last changed, which tests did it pass and at what level, and so forth. Proprietary elements can still be protected while trust and transparency increase.
The practices of scientific publishers and scholars need to adapt to these new players seeking to manipulate or divert or suppress scientific information — for political or commercial reasons. A few powerful people with a few algorithms can now change the world. Such entities need to have their power checked, their overreach called out, and their opacity decreased. It’s time to get ahead of where this is going. We can’t continue to behave as if there aren’t forces out there trying to get us to do things in ways that suit their larger goals, which may not be the goals of science and scholarship. It’s the same as it ever was, and yet, it’s completely different.
Why do we allow PubMed to drift into a morass, when we could put together ourselves a search engine superior to Google Scholar or PubMed, one that all our sites might be able to use to unify the discovery experience for our users? Why do we hesitate to adopt ORCID when it clearly serves useful purposes, from disambiguation to potentially a source of authentication that would serve our industry and not feed the mega-corporations? Why don’t we have cloud solutions of our own to support our industry without creating dependencies on the AWS’ or Googles of the world?
Perhaps it’s time to wield technology in ways that protect the integrity of the scientific and scholarly endeavors, and once again work on differentiation rather than integration and assimilation. After all, the larger information ecosystem is largely controlled by puppetmasters with algorithms and interests of their own. We also need to keep inserting smart people into systems to make sure those systems are working well, with McGill’s initiative showing us how this can be done.
It might be time to cut some strings while we pull hard to make sure others are going where we need them.