Impactstory recently announced a new tool in development. Called, “Get The Research” and aimed at serving the general public rather than an audience of scholars and specialists, it promises to provide a new level of accessibility (in multiple senses of the term) to published scholarship. I asked Impactstory cofounder Jason Priem a few questions about how this new tool will work.
Give us a quick overview of what this new tool does and how it does it.
Get The Research is a website where regular people can find, read, and understand the scholarly research on any topic. It’ll be built on the 20 million open access articles in the Unpaywall index, and feature AI-powered tools that help make the content and context of scholarly articles more clear to readers. We won’t be dumbing down or interpreting science, but we will be giving motivated readers scaffolding to help them build their understanding.
There are a lot of risks — and commensurate rewards — with a project this ambitious. We’re dealing with that by building the system in three levels, each with increasing risk and reward. The idea is that even if the riskiest parts fail, we’ll still have something useful to show for our efforts.
- A search engine on the Unpaywall database. It’ll allow anyone to search our corpus of 20M OA articles for free. It’ll also offer an open API, allowing anyone to leverage our search infrastructure for their own projects, commercial or otherwise.
- Learning tools built on top of the literature. This will include an automatic annotation layer that will flag up difficult jargon, get definitions from Wikipedia, and show them in the margins. We want to build an experience like reading Hamlet for school, where the tough words are explained for you as you read. It’ll also include ways to traverse a subject area (e.g., recommended articles, relevant lit reviews, etc.) and categorizations of articles (randomized controlled trial [RCT], non-peer-reviewed, systematic review, etc.).
- Tools that actually translate articles into plain language…something Simple Wikipedia does for regular Wikipedia, but done automatically (we probably wouldn’t go full Up Goer 5 but that’s another fun example). This is pretty hard, and may not work. We’ll certainly be proceeding cautiously here, given the risks of mistranslation.
How does Get the Research interact with Unpaywall? Does the former replace the latter, or are they complementary tools?
Unpaywall is an open database of 20M free scholarly articles, harvested from over 50,000 publishers and repositories.
Get The Research will be a free search engine that helps people find and understand papers in the Unpaywall database.
You’ve said that one important aspect of Get the Research is that it provides not only access to the content, but also to the context of the research literature. Tell us more about that.
By “context,” we mean the subject matter knowledge that makes an article understandable for a specialist, but hard or impossible to read for everyone else. That includes everything from field-specific jargon, to strategies for skimming to key findings, to knowledge of core concepts like p-values. Specialists have access to that kind of context. Non-specialists don’t. This makes reading the scholarly literature like reading Shakespeare without notes: you get glimmers of beauty, but without some help it’s mostly just frustrating — unless you’ve put in the years of study needed to grok the context.
How (if at all) does Get the Research interact with sites like ResearchGate and Sci-Hub?
Not at all. We don’t use Sci-Hub because it’s illegal (more on our thoughts about Sci-Hub here). We don’t use ResearchGate (RG) because, unlike credible institutional and disciplinary repositories:
- RG works hard to keep people from indexing or harvesting its “open” content.
- Half of RG content is posted illegally.
- RG makes no effort to ensure persistence of author-uploaded content.
One of your stated goals with this product is to provide access to “trustworthy research about anything.” What are your criteria of trustworthiness, and how do you filter out the stuff that doesn’t meet those criteria?
Someday, it would be cool to try and quantify trustworthiness automatically, using information about authors, the citation graph, clinical guidelines, and so on…there are certainly lots of interesting efforts in automated credibility detection right now, much of it directed at spotlighting fake news.
However, our approach for now is much more modest. Simply by limiting search to peer-reviewed literature, we improve the trustworthiness of content quite a lot, compared to many other free-to-read sources that readers encounter online. We’ll also be automatically flagging up articles with characteristics likely to make them extra useful (for instance systematic reviews, meta-studies, large-sample RCTs, and so on). We’ll be hiding retracted papers. And importantly, we’ll be working to educate readers on why those types of articles are more generalizable than, say, small-scale case studies.
Does this product rely on any kinds of partnerships with publishers?
We’re exploring that, and have had some promising discussions.
What kinds of projects are on the horizon for Impactstory?
Our goal is to build open infrastructure to support the Open Science revolution, and there’s certainly no shortage of things to do there. We’ve got quite a few exciting projects in the pipeline! Here are a few that spring quickly to mind:
- Enlarging the scope of the Unpaywall dataset to include research papers without DOIs, which will add 50M or so new fulltext records.
- Disambiguating all the authors of all scholarly papers, along with their institutions, and then making that dataset available to anyone for free. We’ll use ORCID where we can, and AI everywhere else. We think this would be quite useful in many assessment contexts, as well as in building cool tools that interact with academic authorship and identity.
- Using Unpaywall to assess the openness of faculty publishing behaviors (i.e., the percentage of an faculty’s papers that are available OA, that are associated with open data and open code, etc.), with the goal of increasing openness over time. Unpaywall data is already being used in national-level assessments by policy bodies in the Netherlands, Switzerland, the UK, and elsewhere. We’re building a web-based dashboard to support doing the same at the institutional or departmental level, and are currently trialing this system with the US National Institutes of Mental Health.
- Building a research software citation index, by mining research software mentions and citations from our fulltext records.
- Supporting an easy-to-use, fully-open API for altmetrics, built on the Crossref Event Data system.
6 Thoughts on "Get The Research: Impactstory Announces a New Science-Finding Tool for the General Public"
These were interesting enough for me to actually take a look. One big disappointment: there is no information whatsoever about privacy policies and implications in any of these sites. In particular, Impactstory is keen to advertise itself by luring researchers to opt-in. Since also Twitter is closely integrated, I would assume that the platform does (or aims to do) massive privacy-violating data mining about researchers. All in the name of open science, open data, and open source, of course…
According to your policy, you, among other things, retain the right to: store IP addresses, email addresses, and online identifiers (e.g., Twitter handles), store any data supplied to you by third-parties (e.g., employers), including demographic and location data, share any data with arbitrary third-parties, including creepy third-party web trackers used by web advertisement companies. In addition, you collect all the usual tracking information (referrers and rest of the HTTP header fields, etc.), which can be used for browser fingerprinting and other profiling. You also use creepy email tracking with web beacons and of course store a hefty amount of cookies.
And you justify all these with the same “legitimate interests” -clause used by traditional web trackers. (Legality of this clause in Europe will be determined in future court cases, by the way.) Even if a user request data to be deleted (e.g., due to GDPR), you give no guarantees that “shadow profiles” would not exist. And so on and so forth.
So I guess there exist open data and open science, and then there are “open data” and closed research.
Now I am not accusing you of anything specific. This is pretty standard stuff for any online platform. But it is irking that you use the openness label to pursue things you are not willing to fully disclose. Because scientists are identified with their real names, the data you collect are a gold mine for advertisers and other parties.
All these points should also deserve a wider debate among open science advocates.
Reviewer 1 : ANONYMOUS COWARD
Reviewer 2 (me): Reviewer 1 makes some good and fair points that should be addressed (and are worthy of further debate).
Neither me nor my collegians, but still worth listening:
I am trying to understand how some of the ideas presented could mesh together, specifically the claim that this search system will be limited to “peer-reviewed literature” while at the same time expanding to include 50 million full text research papers without DOIs.
How can you determine if a particular paper has undergone peer review? First, I’m curious how you think you can reliably do it for papers with DOIs, but for papers without DOIs it would seem to be pure guesswork.
Do you have a source or any particular reason to think that there are really 50 million “research papers” without DOIs? What kinds of things do you expect to be included in this?