When a light is blocked by an object, it casts a shadow. The dark part is called the umbra, but I’ve always been intrigued by the penumbra, the adjacent area where only some of the light is blocked while some still penetrates. I think of the penumbra as the most interesting part of the scholarly discovery process. A researcher’s core interests may be in a specific set of areas, but effective discovery also helps that researcher to stay aware of adjacent areas of interest or even potential areas of unknown interest.
We sometimes treat these important aspects of discovery somewhat dismissively, calling them “serendipity.” There is a long and fascinating history to the term itself, but it is currently used to emphasize the chance associated with a happy outcome. In a recent piece, Patrick Carr has noted that “serendipity is problematic because it is an indicator of a potential misalignment between user intention and process outcome.”
Many of the systems that researchers have relied upon for serendipity have become bent if not broken in recent decades. Perhaps the most vital system for serendipity has been the journal title, which bundled together a variety of articles in a general topic area. Browsing through the newest issue would help scholars to maintain current awareness in a field that mattered to them, even if many of the articles were not ones they would have intentionally sought out. In the transition to online journals, content platforms enabled a researcher to subscribe to an email containing the table of contents of each new issue of every journal of interest. It has been widely reported that these table of contents alerts are failing scholars, who feel overwhelmed by the number of them they might wish to retrieve. The system is insufficiently granular, and I would argue insufficiently personalized, to meet their needs. For humanists, stack browsing is also finding its limits, as so many tangible collections are increasingly fragmented and poorly integrated with digital materials. In both cases, suitable substitutes are sorely needed.
With today’s discovery environment being mindfully designed, we should build systems that intentionally bring forward materials that would benefit scholars and not just be grateful for the happy accidents, especially if they are in decline. The question is how best to do so.
When I recently argued that data can and should be used to personalize discovery and bring greater efficiency to research, David Crotty raised some very important questions about whether these systems would yield a decline in serendipity. We fear what one commentator has called a filter bubble, in which almost nothing that we aren’t looking for doesn’t come our way. One sees this all too frequently in simple follower-driven systems, like Twitter, where too many intelligent people only follow individuals likely to share their own opinions, rather than experiencing the full diversity of smart views available on almost any subject. Scholars need discovery systems that allow them to encounter not only the resources they seek intentionally, but also those unsought resources that would benefit their research.
Designing systems to avoid such a filter bubble, and ensure that researchers discover not only the items they are seeking in their core areas of interest, is essential to scholarly discovery. It can be designed into data-rich systems just as surely as it has been designed into earlier information retrieval mechanisms.
Take the case of an academic chemist focused on polymers. This individual wants to keep up to date with virtually everything in her specific subfield exhaustively and in real time to the greatest extent possible. But at the same time, she also might wish to maintain an overall awareness of some of the more important work in adjacent subfields, in methodologies that may be important, and broader yet across the discipline. It is these latter areas where providing enough focus but also enough serendipity is vital.
Several years ago, Ben Showers and I proposed adding a “serendipity button” to a personalized information platform. While results in the umbra of the researcher’s interests would appear in a discovery result set as ever, we were beginning to envision mechanisms for incorporating an appropriately-sized selection of especially key results from adjacent and related subfields and methodologies – the research penumbra. One approach would be to apply usage data as a mechanism to gauge the importance or notoriety of an individual item, allowing for materials to be discovered from further afield only insofar as they were relatively important. Such an approach need not be linear, but could also be pursued on multiple dimensions. Ultimately, we were looking for stronger filtering than the journal issue provides while avoiding any sort of filter bubble.
This type of approach underscores the importance of controlling, or at least having access to, data not only about researchers’ interests and practices, but also about research materials and how they are used. A few years ago, usage data were more concentrated with publishers, but due to efforts to make articles freely available online (whether via institutional repositories or the article management tools like Mendeley, ReadCube, and ResearchGate), it is not clear what share of usage data any single party actually possesses today.
Of course, usage data is by no means the only option for engineering serendipity more effectively as a process outcome of discovery services. Recent efforts to provide a reintegrated virtual browsing experience may supplement, if not supplant, stack browsing. But without designing this process outcome more mindfully into discovery systems, we run the real risk of having only umbra, potentially at real loss to scholarship.