Like many other people, I often find myself trying to figure out which articles are likely to be most relevant or important for a project I am working on. I use well-established heuristics such as scanning the article title, author(s), journal name, abstract, and keywords. I also greatly appreciate various services that “push” documents to me based on algorithms that use my past reading and personal publication history to predict my future interests. None of these approaches are perfect but I benefit from them all and so my interest was piqued immediately when I heard about Paper Digest.
I was already following Paper Digest on Twitter and so made sure to attend the Previews Session at the 2019 Annual Meeting of the Society for Scholarly Publishing when I saw the founders were presenting. After they won the “People’s Choice” award at the session, I asked if they would do an interview with me for The Scholarly Kitchen as I thought others might find this technology as promising and intriguing as I do, particularly if they are — like me — often frustrated that abstracts are not as useful as it feels that they should be.
As a fan of Raganathan’s five laws of library science, I am particularly drawn to “Law 4: Save the time of the reader.” Personally I’m hoping that someday the “push” alerts I get might be automatically enhanced by the kind of summary that Paper Digest envisions. That would be a great time-saver!
My thanks to Yasutomo Takano, co-founder of Paper Digest, for answering my questions and his colleagues Christian Mejia and Nobuko Miyairi for their contributions as well.
What is Paper Digest and why did you create it?
Paper Digest is an AI-based academic article summarization service. Being researchers ourselves, we thought the world could use a solution to quickly grasp the core ideas of a paper without reading the whole thing. Simply put, there is too little time to read all the papers we want to. A study says that it takes an average US faculty member 32 minutes to read a paper. Another study reports, when not limited to native English speakers, it takes nearly one hour per article or even longer for those in their early career stage. If we can free up time spent reading, how much more productive could our research be? Using our experience in machine learning and bibliometrics, we decided to take up this challenge.
Can you give us an example (or more than one!) of how Paper Digest can assist a researcher?
We all want to know if the paper at hand is worth reading, so we usually skim through the paper before reading more carefully. Paper Digest is trying to imitate this researcher behavior by automatically summarizing what the paper is about and what you can learn at the end. You might think “that’s what the abstract is for,” but we think the abstract is like a movie trailer with Paper Digest offering a “spoiler.” Our algorithm tries to determine seemingly important sentences from across the full text and list them out in a single page summary. The goal is to list the most central concepts in the paper so you can quickly decide whether to read the whole thing.
If someone wants to try Paper Digest, how do they get started?
We offer a simple web interface, where you can enter a DOI or URL of the PDF full text. Paper Digest automatically lists out the key sentences of the paper, taking about 10 seconds to do so. For this to work, the paper being requested must be open access. Everything works within the web browser and you don’t have to install anything.
We also allow users to upload a PDF article and generate a summary but this works only for registered users. When registered, you can retain up to 20 recent digests on your dashboard, and can “like” sentences that you think are most helpful to understanding the paper. This reader input is also used to improve our algorithm.
What kinds of documents is Paper Digest most successful analyzing?
Those articles with clear section headers, such as introduction, results, or conclusion would work best for obvious reasons. So-called “original articles” in STM journals usually have such a structure, even if section headers may be different from one field to another. When the key concepts of the paper include non-text format, such as math equations, our algorithm may fail because it only works for text currently. Some document types — review articles, editorial, etc.— or articles from domains that do not follow the standard STM article structure can be challenging but we aim to come up with ways to handle them.
What is the future development path for Paper Digest?
First and foremost, we want to improve our current algorithm. And for that, we need lots of datasets to feed the machine learning. We recently released a new feature to gather user feedback so the user can “like” an extracted sentence. We also want to hear from our users to understand which domains Paper Digest needs to improve in. We are also developing an API so that Paper Digest can be “called upon” rather than the user having to come to our website. Publishers, database providers, peer-review platforms and the like will be able to embed Paper Digest through this API.
The current algorithm is using the extraction-based summarization technique; that is, extract sentences verbatim from the full text. We know this approach works well for researchers as they want to see the exact sentences as they appear in the full text; however, a reader with no research background may struggle to understand extracted sentences with no context. Abstraction-based summaries would be ideal for a more general audience, or research promotion purposes, and we aim to try working on another algorithm to accommodate these needs in the future.
Who are the people on your team and what are their backgrounds?
I (Yasutomo) am a postdoc at the University of Tokyo. My co-founder, Cristian, is also a postdoc at the Tokyo Institute of Technology. We are constantly challenged by the volume of research papers, especially when expanding into new research areas. Leveraging our specialties, we conducted citation network analysis to effectively reduce our reading pile, but the biggest pain of going through the full-text persisted, since English is not our mother tongue. We often had to turn to textbooks to gain basic domain knowledge before reading the full-text, only to find it less useful than expected. As such, Paper Digest originates from our own experiences as early-career, non-English native researchers. In early May 2018, we started working on this project — Yasutomo conceptualizing the algorithm and Cristian working on the web application. Nobuko, an open science enthusiast with 15+ years experience in the STM industry, has been advising us on product positioning and business strategy. If it were not for her, we wouldn’t have applied for the Catalyst Grant and received support from Digital Science. She also introduced us to the opportunity at the SSP annual meeting where we won the People’s Choice Award. The three of us have skills to complement each other and it’s been a great collaboration so far.
Is there anything else you’d like to share about Paper Digest?
We are looking for publishers and STM solution vendors to join our pilot tests to trial some new features on their own platforms or conduct user studies. If you are interested, please drop us a line at firstname.lastname@example.org.