Over the last few months, the Scholarly Kitchen has featured a number posts exploring the new world of data-driven tools, ways to better enable reader discovery (here and here also), to identify emerging areas of scholarship and to customize content to meet reader needs. Each piece makes a compelling argument for taking advantage of digital technologies for the benefit of users. But each suggestion raises as many questions as it answers. As we explore these services, we must ask basic questions about their utility, their trustworthiness and what impact they will have on the creative process.
Discovery versus filtering: are we barking up the wrong tree?
I’m not sure if this is a semantic argument and that different people mean different things by “discovery”, but frankly, finding stuff to read is not a major problem for most researchers. We live in an age of abundance with powerful search tools at our fingertips. The days of scouring through the latest print edition of Current Contents and heading over to the library with a copy card are, thankfully, long over. But those stacks of unread papers (either living as printouts on a researcher’s desk or as PDFs on their hard drive) have grown exponentially, and given the increasing workload foisted upon researchers, this overload has become an unconquerable challenge.
Every time I’ve discussed recommendation systems with researchers (“…more articles like this…” or “…readers who read this also read this…”), the most common response is, “no thanks, I already have enough to read.” I don’t know of any researcher who wants more discovery, who wants an even bigger pile through which to wade.
What they want is not discovery, but instead filtering mechanisms. Find a way for me to reduce my pile of papers to read, help me know which papers to prioritize. Most researchers are experts in collecting information about what’s going on in their fields. They don’t need help with this. What they need is help in processing the overwhelming amount of information that is available.
This is one reason why journal brands persist, and are perhaps more important now than they ever have been. Forget about the Impact Factor for a moment–if you know your field, you have a very clear sense of which journals are most relevant, and the relative quality and importance of the research they publish. It’s an imperfect system, but one that helps readers prioritize. Journal X only peripherally touches on my research and publishes a lot of low-level, incremental work. Journal Y is the key place where research in my sub-specialty is available, and they have very high standards. So papers from Journal Y go to the top of the stack and X down to the bottom.
Can we trust these services?
If you’re like most people, when you want to learn about a new subject, you hop into your web browser and do a Google search. You assume that Google will give you search results that are trustworthy and that best reflect the nature of the question you’re asking. But given Google’s secrecy around their search algorithms, can you trust those results?
The Wall Street Journal made a Freedom of Information Act request to the Federal Trade Commission (FTC) to see the FTC’s staff report that recommended filing an antitrust lawsuit against Google. The FTC inadvertently sent the newspaper an unredacted version of the report, and the revelations are startling.
First, the report showed that Google illegally took content from competitors such as Yelp, TripAdvisor and Amazon and used it to improve the content of its own services. When competitors asked Google to stop doing this, Google threatened to delist them from search results.
That abuse of power is scary enough on its own, but what’s really relevant here is evidence of how Google cooked the books to favor its own sites:
In a lengthy investigation, staffers in the FTC’s bureau of competition found evidence that Google boosted its own services for shopping, travel and local businesses by altering its ranking criteria and “scraping” content from other sites. It also deliberately demoted rivals.
We are in the midst of an era of market consolidation. The large commercial publishers are gobbling up any interesting startup, many of which are the very companies that we’re turning to for help with content discovery and filtering.
The question then must be asked–do you trust commercially-driven companies to play fair? How long did Google’s “do no evil” pledge last after their IPO? Would it surprise you in any way if the recommendations coming out of a service owned by Publisher X favored articles in journals from that same publisher? Can these tools only be trustworthy if they are run by neutral third parties freed from profit motives (think CrossRef, ORCID, etc.)?
There are also important questions to ask about whether algorithms can truly determine trustworthy information. One really interesting recent development is seeing Google moving away from automation of search results and toward good old fashioned editorial oversight. For medical information, Google is essentially admitting that popular and well-linked information is not the same thing as accurate information. They are proposing to use a panel of experts to curate information rather than crowdsourcing and relying on data collection. So does that mean that automated recommendation systems will eventually come around to employing editorial boards and peer review for suggestions?
What does “spoonfed” information do to the creative process?
While Roger Schonfeld recently wrote about the idea of building “serendipity” into automated discovery tools, I remain somewhat unconvinced that intellectual leaps can be pre-programmed at the press of a button and that the information assimilation process can be approached by broad, generic tools.
For creative endeavors, whether choreographing a new dance or making a scientific breakthrough, we rely on the vision of the individual. Joe Esposito suggests that books could be improved by paying attention to reader data and tailoring content based on their usage patterns. Personally, I find this concept somewhat horrifying.
If you rely on user data and focus groups, and your goal is to appeal to the broadest section of the bell curve, then you will much more likely end up with Two and a Half Men rather than Breaking Bad. By many measures, Two and a Half Men could be seen as one of the most successful creative endeavors ever. But when one looks back on the current “golden age” of television, I suspect that it will not likely enter the conversation.
Further, the main reason I avoid Google these days is not so much concerns about privacy as it is wanting to stay out of the “filter bubble”. Google’s algorithms are trained over time to give you answers that are like the things you have clicked on in the past. Google’s goal is to anticipate what you want to know.
That may make sense for something like shopping (David prefers skinny ties and metallic colored combat boots so let’s show him more of those), but it’s harmful when you’re trying to break new ground or learn something new. I don’t want my previous behaviors reinforced, I want my beliefs challenged. I don’t want to see research that’s like what I’ve already read or what I’ve already done, my job is to make a breakthrough into something unknown.
There is a homogenization of culture that comes through feeding everyone through the same algorithm. If everyone working on Hirschprung’s Disease is fed the same discovery cues pointing to the same papers, then this limits the scope of approaches taken and potentially slows research progress. The job of a researcher is to make new connections. We know that there is tremendous power in interdisciplinary work. Researchers who are not deeply invested in a field’s dogma are often able to bring in new viewpoints that would not have occurred to someone thoroughly enmeshed in that field.
If we leave researchers to find their own roads, does that increase the number of roads taken, and does that make all the difference?