We all recognize that the Internet has transformed not only how much information is available, but also how – or whether – that information is discovered, consumed and disseminated. But how do we separate trustworthy information from questionable “facts” in a world where technology has disrupted the traditional knowledge supply chain controlled by libraries and publishers? In this post Julia Kostova (Oxford University Press) and Patrick H. Alexander (The Pennsylvania State University Press) consider some of the issues raised for the role of publishers as custodians of quality and reliable information.
Last March, a report that Google was planning to change its search algorithm made the news. According to the story, picked up by numerous media outlets, Google would rank pages based not on popularity, as it does currently with its PageRank algorithm, but on veracity. Under the new plan, instead of ranking sites according to the number of in-links, which inadvertently conflates trustworthy results and spurious claims, Google would rank sites by weighing the reliability of the information they contain against Google’s repository, the Knowledge Vault. The more accurate the information, the higher it would rank in search results.
Given the exponential — and unstructured –- growth of online resources, organizing and vetting information in the digital space has become a priority. For Google, which in 2014 received 68% of all internet searches – some 5.7 billion searches per day or 2 trillion searches per year — and where the number one position in Google search gets 33% of search traffic, providing dependable information to users is critical if they are to continue to view the search engine as essential to information discovery. But Google’s mission to organize knowledge raises questions about who decides what information is trustworthy in the digital-era knowledge-supply economy, and what role scholarly publishers will play in it.
The internet has transformed not only how much information is available, but also how – and whether – that information is discovered, consumed and disseminated. It has also altered how we determine what is trustworthy and what is not. Before the internet, publishers and libraries, together with the academic community, controlled the knowledge supply chain that assured “trustworthiness” and minimized the amount of misinformation. The academic community vetted knowledge, and libraries, in particular, were the final validation of research by virtue of collecting trustworthy (i.e., “peer-reviewed” or otherwise verified) information in research libraries. But in the digital-world supply chain, where the internet, and not the library, has become the first point of departure for researchers, libraries have been disintermediated, cut out not only of the distribution link in the supply chain but also of establishing what is trustworthy. As a result, in the digital space, information – whether reliable or questionable – swirls together in an algorithmic stew, affecting research outcomes as well as their interpretations.
For example, internet search results are highly selective, both at the search/discovery level and at the very level of information. Resources that are not available digitally are being excluded. Conversely, an overabundance of information distorts research outcomes. Buried by an avalanche of too much (mis)information, researchers can’t always dig out the good from the bad. On the other end of the spectrum, misinformation persists online despite its untrustworthiness. The Scholarly Kitchen has discussed in the past the afterlife of retracted papers, which often continue to circulate in the digital space, even after retraction, possibly still impacting research. A recent report in Inside Higher Ed noted that “predatory publishing” by Open Access journals with “questionable peer review and marketing” is on the rise. The presentation of information as factually accurate when it is not can torment the most diligent researcher.
The question of trustworthiness is therefore more important than ever, as it can affect research outcomes. Google’s desire to determine veracity raises questions about the future place of scholarly publishers in the knowledge supply chain. To be sure, university presses do a lot more than simply manage and ensure rigorous peer review. In an information-overload environment, where “facts” can easily be collected, published and located, the vetting and curating of expert knowledge that scholarly publishers provide is arguably more critical than ever, and can’t be dislodged by improved search technology.
But as new players enter the fray – technology companies like Google and Facebook, crowd-sourced projects like Wikipedia, commercial publishers, and others, whose commitment to ensuring the accuracy of knowledge may differ from those of peer reviewed scholarly research – they upend the established knowledge validation process. Currently publishers of all stripes obligingly send Google and other search engines their metadata, DOIs, and keywords, and are rushing to supply full-text XML so that it can be crawled by search engines. We cross our fingers and throw salt over our left shoulders in the hope that our prestige as scholarly publishers will ensure the integrity of our publications and this, in turn, will put them in the primary position in search outcomes.
But what will happen as information from outside the traditional, peer-reviewed supply chain enters the stream via metadata, online publications, or other forms of content? With the blurring of lines between what constitutes commercial, not-for-profit, society, university, and institution-based publishing — and as was vividly illustrated in the recent announcement of Random House’s partnership with Author Solutions to create “a new model of academic publishing,” Alliant University Press — new players are entering the space of trustworthy research. Can scholarly publishers maintain control of the standards for trustworthy scholarship, or do they risk a disintermediation not unlike what happened to libraries?
As it turns out, the story about Google’s Trust-Based algorithm turned out to be less-than-accurate: the search engine giant was merely exploring the possibility in theory, in a research paper — not that you’d be able to tell this was only theory by the 1.5 million hits on the search expression “Google truth algorithm.” Many took the report as reflecting concrete plans or as at least a portent of the future. Whatever Google’s plans, it provides an occasion to reconsider the role of academic publishers as custodians of trust in the knowledge-creation supply chain.