Many information professionals are concerned about the loss of serendipitous discovery in research pursuits (see this 2015 Kitchen post by Roger Schonfeld.) Depending upon what an individual user knows about a topic when framing their search, our sophisticated systems may either direct the person’s thinking into too narrow a groove — precluding discovery of more loosely relevant items — or inundate the user with too many content possibilities. Online information resources are tightly engineered and so dependent upon well-structured metadata. What’s needed may be a different approach — one that allows the user more latitude in thinking out the scope of the question without becoming too precise.
Yewno is a semantic-analysis engine that was formally launched at ALA this year, although its creators offered some low-key presentations earlier in 2016 at meetings held by SSP and AAUP. The Yewno technology is run across full-text content, with the system creating a matrix of semantic entities found in each document. Yewno uses a mix of computational semantics, graph theory, and machine learning to retrieve relevant documents without reliance on restrictive conventions imposed by external technology or data format requirements. According to Michael Keller of Stanford, this means that Yewno enables searching of ideas rather than specific expressions, such as keywords. The technology is currently in beta-testing and/or trials at eight institutions: Harvard, Stanford, MIT, the University of Michigan, University of California–Berkeley, Stonehill College, Oxford University, and the Bavarian State Library.
There are two panels in the highly visual interface. On the left is the graph or concept map, while the panel on the right is referred to as the context bar. Run a search and the system presents the graph showing orange and blue nodes. The orange nodes displayed represent the central concepts with which the user is concerned (perhaps the proper name of an individual or an umbrella phrase representing a school of thought or a body of knowledge.) That node will be centrally placed within the concept map. Surrounding it will be blue nodes—circles representing concepts that are related (via lines) to the concept contained in the orange node. Double clicking on the blue nodes/concepts adds the specific ideas to the user’s map and enables discovery of further correlations.
Clicking on any concept node brings up a description of the concept in the right-hand context bar. The user’s clicks move through an iterative process that allows the original point of inquiry to be either broadly expanded or narrowly refined, and ultimately the click process yields relevant content.
Sitting in the Yewno booth at ALA, I input one of my standard test queries, first developed when Google and Microsoft began introducing search tools specifically for academic literature, having to do with Jane Austen’s depiction of the clergy in the Georgian Church of England. For the record, the test bed of content that was used for demonstrations at the conference included content from Wikipedia, Springer Nature, and Taylor & Francis, a respectable disciplinary mix. (It’s worth noting that the benefits to publishers of allowing their content to be crawled in this fashion includes not only enhanced discoverability of their content, but also access to useful referral data and metrics.)
Yewno recognized two concepts from my query—“Jane Austen” and “Church,” but did not directly connect the two as I might have expected (Austen was the daughter of an Anglican clergyman). However, Connor Shepherd, Product Lead for Yewno, explained that this might have been due to one of two factors — the specificity of my initial six-word query and the scope of the content collection over which the Yewno was running. The content might not have been sufficient to allow Yewno to be able to capture the relationship, weight its relevance, and build the related connection.
This is not to suggest that Yewno was unable to retrieve appropriate content for me, however. Clicking about a bit brought up more information in that side context bar, specifically a scholarly review of Juliette Wells’s Everybody’s Jane: Austen in the Popular Imagination (Bloomsbury Academic, 2012). The system showed me a “snippet” of that book review with a highlighted passage. Readers might shrug and note that at this point any content platform worth its salt can do that, but there’s a nuance here. The displayed highlight represents what the Yewno algorithms have identified and extracted as an important thesis statement from within the document. The output is drawn from the system’s semantic analysis, and one’s original concept phrases need not appear in the highlighted portion of the snippet. See another demonstration here:
The user can scroll down the right-hand context bar to discover the scope of material that may be immediately available in the local library. (Like all discovery systems, Yewno is not itself a host platform, but ingests content and constructs its matrices based on what the particular institution’s licensed access allows.)
The approach here is to de-emphasize search based on pre-assigned metadata and thereby minimize system influence over the direction adopted by the researcher; instead Ruggaro Gramatica and Ruth Pickering, the founders behind this nascent start-up, hope that their technology will assist students in visualizing the shape and scope of a research topic and aid in the subsequent work of refining the focus of the investigation. Content retrieval might almost be seen as a merely secondary aspect of this discovery tool — it’s a needed outcome, but that outcome is kept in proper proportion.
The two librarians talking about Yewno during the ALA launch session noted that the product is particularly valued as a mechanism for supporting the development of critical-thinking skills. Cheryl McGrath, Director of the MacPháidín Library at Stonehill College in Easton, MA, and Jason Price, Director of Licensing Operations, Southern California Electronic Library Consortium (SCELC), both indicated that in early trials the system is being enthusiastically embraced by undergraduate and graduate students and some faculty advisors.
From my perspective, the central idea behind Yewno is a good one. The intent to enable more fluid forms of discovery for undergraduates need not eliminate professional-grade indexing services or discovery tools, such as those provided by the Modern Language Association, CAS (Chemical Abstracts Service), OCLC, or any commercial player that one might name. Yewno represents an alternate approach that will be suitable for some — but by no means, all — user needs. I recommend that followers of the Scholarly Kitchen take the opportunity to explore the possibilities of the system when a publicly accessible demonstration site opens. According to the discussion at ALA, such a site is due before summer’s end.
19 Thoughts on "Have You Looked At This? Yewno"
Document retrieval need not be the goal at all. Simply seeing the concept structure of a specific body of knowledge can be quite useful, even fundamental. In this context it would be a bad thing if the Yewno analysis were limited to just the locally licensed content. There should also be a “universal” mode.
One can do something similar, albeit without the cool analytical tools, simply using the Related Articles button on Google Scholar. Starting with a given article, not keywords, GS uses “more like this,” full text semantic analysis and returns about 100 closely related articles. Scanning the language in these related snippets quickly gives one a concept overview of the local knowledge domain.
And one can then choose a related article and repeat the process to crawl slowly away from the starting point is a chosen direction. I have an algorithm that even measures concept distances this way.
Thanks for this write-up. I had intended to go to the demo at ALA but then Orlando and distances between meeting rooms/hotels prevented. I’m curious about one thing you write: “Like all discovery systems, Yewno is not itself a host platform, but ingests content and constructs its matrices based on what the particular institution’s licensed access allows.” I’m puzzled that the matrices are depended on an institution’s subscribed content? E.g., the matrices for U of IL Urbana will be different than the one for IL State U? Did the presenters offer an explanation of why? I understand why access to the full-text documents is bound to a subscription, but why the conceptual analysis isn’t “web-scale” isn’t clear to me. Thanks!
Lisa, that’s the understanding I took away from the demonstration I received in the Yewno booth. I really don’t have an adequate response for you, beyond that. I’m hopeful that someone from Yewno will see this post and weigh in on your question.
Hello and thank you for the question! Yes, you are right- users will all see the same graphs and document snippets, but access to the full text depends on the institution’s licensed content.
Thanks! I am already anticipating my first query about how to cite a snippet when one does not have access to the full-text. I hope they are addressable in some way.
The snippet is part of the full text so you can merely cite the document to cite the snippet. If you wanted to you could add a note to the citation saying you only saw the snippet.
This makes my point David – to indicate that you saw the snippet (sort of analogous to “Hinchliffe as cited in Wojick”), the snippet itself needs to be addressable.
I am afraid you have lost me, Lisa. The citation is normally to the whole document so there is no occasion to indicate that the part of the document one has seen is a search snippet. Likewise for the title, the abstract, or even a review of the document.
Well, let me take one more try but it is possible that you aren’t as familiar with the myriad of citation manuals, etc. out there as we who work at library reference desks so this may still not make sense. 🙂
You are correct that the citation is typically to a whole document. But, not always. For example, to continue with what I was trying to indicate above … if you wrote an article and included a quote from one of my works. And, someone who read your article and wanted to quote my work but couldn’t get access to it. That person would cite it as “Hinchliffe as cited in Wojick” … not cite my work directly as they hadn’t actually seen it.
Lisa, I do not see what “Hinchliffe as cited in Wojick” has to do with search snippets, which are part of the original work. Are you thinking of citing a Google Scholar snippet of an article as “cited by GS” rather than simply citing the original article that the snippet is taken from? I see no reason to do this. A snippet is not a secondary source; it is a primary source.
I readily confess that I am not familiar with the apparent myriad of citation manuals, in fact I have never seen one, but I have studied actual logic of citation. Perhaps the key point is that citation does not require access. In scientific journal articles most citation occurs up front, where the research issue is being introduced. This citation effort is basically historical in nature and one does not have to have actually read the cited documents; one merely has to know their importance to the history of the topic. Thus a search snippet may give one enough information to justify citing the snippeted document.
This may be true in scientific fields: “This citation effort is basically historical in nature and one does not have to have actually read the cited documents; one merely has to know their importance to the history of the topic.” But I don’t think it is true of all fields.
So, perhaps it is that the snippet is sufficient in some cases. I am confident that it is not sufficient in all cases to see only a snippet and cite the entire document. So, for the sake of the students I’ll be assisting who are going to need to be clear that they are citing the snippet that they read and not the entire document that they did not, I remain hopeful that the snippets are citable.
If my “as cited in” analogy distracted, I apologize.
As Watson’s children mature we are getting an increasing number of semantic search engines, many for free with fees for upgrades. Meta.com is one in the STEM/STM area as discussed here, previously
Yewno reminds one of the search trees that grew out of the US gov’t work with Dialog and other systems. Watson’s performance on Jeopardy points out that semantic search and even search that can interpret images exist.
It might be interesting and important to see what exists, is about to emerge and what is in the wings.
Jill, is the ALA endorsing this particular product to member libraries or just facilitating their product development and marketing to potential library customers? There many are many other semantic indexing and search products on the market, as you know.
Phil, I didn’t mean to suggest in any way that the ALA was formally endorsing the product; that is certainly not the case. My piece was simply noting that I visited the Yewno booth in the ALA exhibit hall and that there was a session on the program about it. You should not assume any formal relationship other than that between the two entities.
This is very similar to the work being done on Semantic Medline by Marcelo Fiszman, M.D., Ph.D., for the National Library of Medicine and the National Institutes of Health. I attended the Woods Hole Biomedical Informatics immersion school in 2011 and he presented the topic then.
The problem is that there are a lot of different ways of doing this sort of thing and no easy way to specify the differences, or what difference these differences make.
For that matter, Watson, which Tabeles mentions above, uses a complex combination of at least a dozen different semantic algorithms. Bit of a kludge actually, one that works well in some cases but not others.