Articles Summaries that "Spoil" the Paper to Save Reader Time

Like many other people, I often find myself trying to figure out which articles are likely to be most relevant or important for a project I am working on. I use well-established heuristics such as scanning the article title, author(s), journal name, abstract, and keywords. I also greatly appreciate various services that “push” documents to me based on algorithms that use my past reading and personal publication history to predict my future interests. None of these approaches are perfect but I benefit from them all and so my interest was piqued immediately when I heard about Paper Digest.

I was already following Paper Digest on Twitter and so made sure to attend the Previews Session at the 2019 Annual Meeting of the Society for Scholarly Publishing when I saw the founders were presenting. After they won the “People’s Choice” award at the session, I asked if they would do an interview with me for The Scholarly Kitchen as I thought others might find this technology as promising and intriguing as I do, particularly if they are — like me — often frustrated that abstracts are not as useful as it feels that they should be.

As a fan of Raganathan’s five laws of library science, I am particularly drawn to “Law 4: Save the time of the reader.” Personally I’m hoping that someday the “push” alerts I get might be automatically enhanced by the kind of summary that Paper Digest envisions. That would be a great time-saver!

My thanks to Yasutomo Takano, co-founder of Paper Digest, for answering my questions and his colleagues Christian Mejia and Nobuko Miyairi for their contributions as well.

What is Paper Digest and why did you create it?

Paper Digest is an AI-based academic article summarization service. Being researchers ourselves, we thought the world could use a solution to quickly grasp the core ideas of a paper without reading the whole thing. Simply put, there is too little time to read all the papers we want to. A study says that it takes an average US faculty member 32 minutes to read a paper. Another study reports, when not limited to native English speakers, it takes nearly one hour per article or even longer for those in their early career stage. If we can free up time spent reading, how much more productive could our research be? Using our experience in machine learning and bibliometrics, we decided to take up this challenge.

Can you give us an example (or more than one!) of how Paper Digest can assist a researcher?

We all want to know if the paper at hand is worth reading, so we usually skim through the paper before reading more carefully. Paper Digest is trying to imitate this researcher behavior by automatically summarizing what the paper is about and what you can learn at the end. You might think “that’s what the abstract is for,” but we think the abstract is like a movie trailer with Paper Digest offering a “spoiler.” Our algorithm tries to determine seemingly important sentences from across the full text and list them out in a single page summary. The goal is to list the most central concepts in the paper so you can quickly decide whether to read the whole thing.

If someone wants to try Paper Digest, how do they get started?

We offer a simple web interface, where you can enter a DOI or URL of the PDF full text. Paper Digest automatically lists out the key sentences of the paper, taking about 10 seconds to do so. For this to work, the paper being requested must be open access. Everything works within the web browser and you don’t have to install anything.

We also allow users to upload a PDF article and generate a summary but this works only for registered users. When registered, you can retain up to 20 recent digests on your dashboard, and can “like” sentences that you think are most helpful to understanding the paper. This reader input is also used to improve our algorithm.

What kinds of documents is Paper Digest most successful analyzing?

Those articles with clear section headers, such as introduction, results, or conclusion would work best for obvious reasons. So-called “original articles” in STM journals usually have such a structure, even if section headers may be different from one field to another. When the key concepts of the paper include non-text format, such as math equations, our algorithm may fail because it only works for text currently. Some document types — review articles, editorial, etc.— or articles from domains that do not follow the standard STM article structure can be challenging but we aim to come up with ways to handle them.

What is the future development path for Paper Digest?

First and foremost, we want to improve our current algorithm. And for that, we need lots of datasets to feed the machine learning. We recently released a new feature to gather user feedback so the user can “like” an extracted sentence. We also want to hear from our users to understand which domains Paper Digest needs to improve in. We are also developing an API so that Paper Digest can be “called upon” rather than the user having to come to our website. Publishers, database providers, peer-review platforms and the like will be able to embed Paper Digest through this API.

The current algorithm is using the extraction-based summarization technique; that is, extract sentences verbatim from the full text. We know this approach works well for researchers as they want to see the exact sentences as they appear in the full text; however, a reader with no research background may struggle to understand extracted sentences with no context. Abstraction-based summaries would be ideal for a more general audience, or research promotion purposes, and we aim to try working on another algorithm to accommodate these needs in the future.

Who are the people on your team and what are their backgrounds?

I (Yasutomo) am a postdoc at the University of Tokyo. My co-founder, Cristian, is also a postdoc at the Tokyo Institute of Technology. We are constantly challenged by the volume of research papers, especially when expanding into new research areas. Leveraging our specialties, we conducted citation network analysis to effectively reduce our reading pile, but the biggest pain of going through the full-text persisted, since English is not our mother tongue. We often had to turn to textbooks to gain basic domain knowledge before reading the full-text, only to find it less useful than expected. As such, Paper Digest originates from our own experiences as early-career, non-English native researchers. In early May 2018, we started working on this project — Yasutomo conceptualizing the algorithm and Cristian working on the web application. Nobuko, an open science enthusiast with 15+ years experience in the STM industry, has been advising us on product positioning and business strategy. If it were not for her, we wouldn’t have applied for the Catalyst Grant and received support from Digital Science. She also introduced us to the opportunity at the SSP annual meeting where we won the People’s Choice Award. The three of us have skills to complement each other and it’s been a great collaboration so far.

Is there anything else you’d like to share about Paper Digest?

We are looking for publishers and STM solution vendors to join our pilot tests to trial some new features on their own platforms or conduct user studies. If you are interested, please drop us a line at info@paper-digest.com.

Lisa Janicke Hinchliffe

@lisalibrarian

Lisa Janicke Hinchliffe is Professor/Coordinator for Research Professional Development in the University Library and affiliate faculty in the School of Information Sciences, European Union Center, and Center for Global Studies at the University of Illinois at Urbana-Champaign. lisahinchliffe.com

Discussion

11 Thoughts on "Articles Summaries that “Spoil” the Paper to Save Reader Time"

I tired the beta version. Out of three cases, in two algorithm did not recognise DOI. It worked with PloSOne, and gave me about half a page summary, based on conclusions. I will try final version, if it will be available. For non-english users, this may be fine.

By LB
Aug 7, 2019, 8:06 AM

Thank you very much for trying out Paper Digest. Please refer to the following link (https://www.paper-digest.com/faq#faq4) to see which publishers/journals work with Paper Digest. If DOI doesn’t work please try the URL for PDF full-text instead. More reasons here (https://www.paper-digest.com/faq#faq5).

By Yasutomo Takano (co-founder of Paper Digest)
Aug 19, 2019, 12:35 AM

Interesting to have a librarian report on this service, since our own scholarly/professional literature is notorious for having abstracts that do NOT summarize the conclusion of the research but instead seem to be written to “tease” the reader. It’s been a peeve of mine since entering the profession, and your post title use of the word “spoil” may explain why. It’s as if librarians think that giving away the ending somehow allows readers to “cheat”. That’s absolutely antithetical to the spirit of scholarly communication and, as you point out, R’s law about saving the reader time. I find the latest two generations of librarians actually haven’t heard of R and his “laws”.

By Melissa Belvadi
Aug 7, 2019, 8:42 AM

Amen. Abstracts should reveal the main finding and other details not in the title for the Introduction, e.g., finding out in the Method section that participants all live on some relatively obscure island where behavior might be different.

By DF
Aug 7, 2019, 4:33 PM

I’m happy to see Paper Digest resonating with the 4th law about saving the reader time as I’m a librarian by training 🙂 The very fact this service was proposed by two young researchers prompted me to help, as I think there’s a lot to think about for librarians, perhaps for publishers too, on how we can help them navigate the sea of ever-proliferating literature. That’s been my motivation to support this endeavor until now. Many thanks for your comment!

By Nobuko Miyairi
Aug 8, 2019, 5:32 AM

It seems to me that what is needed is a CAS for the masses!

By harvey kane
Aug 7, 2019, 10:56 AM

Thanks for your comment and I want to know more about ‘CAS for the masses’.

By Yasutomo Takano (co-founder of Paper Digest)
Aug 8, 2019, 5:17 AM

Providing a summary is definitely what the abstract is for! How often are readers so excited by a spoiler-type abstract that they are then desperate to devote the quoted half hour or so of their time to read the whole paper? Not often, I suspect. Far better to give the top line results in the abstract so you can be sure that whether people read any further or not, at least they will get the key message.

By Beverley Moore
Aug 8, 2019, 6:29 AM

Paper Digest is not the only actor in this field.

SciencePOD (https://www.sciencepod.net) is also involved in a developing automated summaries of reseaerch papers that are placed in context (keywords, definitions of technical terms, etc…).

What matters in this field is the quality of algorithms. This is an area where we received positive feedback from scholarly publishers due to the quality of our summaries.

You may check samples of our work in this webinar: https://youtu.be/UXpVuoHVUbM

Feel free to get in touch by contacting us at editor@sciencepod.net if you want to know more.

By Sabine Louet
Aug 14, 2019, 8:05 AM

A few things that are important when automatically summarising research papers, particularly for non-experts – and these are quite challenging to solve in the general case:

– providing context and definitions for technical terms, as mentioned in a previous comment
– expansion of abbreviations in context: does AI refer to aortic insufficiency or artificial intelligence?
– coreference resolution: if you pluck a sentence from the paper, it may refer back to entities in the previous sentence. Displaying a sentence that begins with ‘As a result of this feature …’ is not helpful without knowing what ‘this feature’ refers to.
– dealing with atypical document structures – not all papers have the standard Introduction, Methods, Results, Conclusion structure
– resolving citations: if you generate a summary sentence containing a reference to [3, 4] or (Smith 2017) then it’s helpful to link out to those citations (or even provide the main findings of them).

Some of the latest developments in language modelling, transfer learning, and sequence modelling can be brought to bear to help solve some of these challenges.

One thing we’ve found with evaluations we’ve done with journal editors and publishers is that although you can calculate standardised scores for summarisation quality (e.g. BLEU or ROUGE metrics), readers’ opinions on the utility of a given summary will vary widely. So some element of customisation or personalisation is also often necessary.