In today’s Kitchen Essentials post, we will hear from Anita Bandrowski, CEO and Co-founder of SciCrunch. SciCrunch was founded in 2016, with the goal of helping to reduce scientific irreproducibility. Its mission is to improve the scientific literature through the development of tools and services around the provisioning of research resource identifiers (RRIDs) — persistent identifiers for biological resources, which were first launched in 2013.
Please tell us a bit about yourself — your role at SciCrunch, how you got there, and why you embarked on a career in research infrastructure?
So I am a bit of an oddball. I started out as a psychologist. Neuroscience wasn’t a thing at that point, but I would probably call myself a “recovering neuroscientist”. Recovering, because I have been working on infrastructures for the last 20 years, starting with the Human Genome Project, where someone finally explained to me what a database was; and then working on the Neuroscience Information Framework, which later became the SciCrunch.org platform.
You could say that I am an accidental infrastructure provider.
At the Neuroscience Information Framework, one of my tasks was to create a robot that could find antibodies in the scientific literature, because our National Institutes of Health (NIH) funders told us that “people were wasting lots of time trying to find them.” I discovered that, actually, no matter how great our text mining colleagues were, they could not divine things from the literature that were omitted by the authors…
So we told our NIH funder that we failed to accomplish the task that we were given. But we did create the antibodyregistry.org, and populated it with ~2M antibody records to answer the questions, “how many antibodies are out there for scientists to use?” and “which ones are they?”. We also began working with the Journal of Comparative Neurology, which had the reputation of being “true, but not interesting” unlike some higher impact journals, which had the reputation of being “interesting, but not necessarily true”. We were into verifiable truth, which meant that we were tagging all antibodies with persistent identifiers called Research Resource Identifiers (RRID), and capturing their validation in our brand new database.
It then became apparent that this one journal doing a reasonable job was a drop in the proverbial bucket, and our team started to encourage more journals to join us to address the scientific reagent problem. So we cobbled together a project to ask authors to tag their antibodies and other resources with an “RRID” tag for three months. We are now in the tenth year of this three-month pilot project and have brought on board ~1000 journals – and I still run the infrastructure.
However, we found out fairly early that running a scholarly infrastructure like the RRID portal, and even “bringing a journal on board”, i.e., having RRIDs in the instructions to authors, is relatively ineffective at getting RRIDs adopted, because only ~1% of authors complied with instructions. We were not really interested in running an infrastructure that is not used, so we had to do something else to change scholarly communication – we had to get adoption.
So, if RRIDs would be our answer to changing how authors report reagents, then we needed someone to follow up with non-compliant authors to add them to manuscripts or check that the right ones are included. Editorial staff doing this manually for the Journal of Comparative Neurology was tedious, doing this for PLOS ONE was impossible … without an automated agent. I then applied for and received funding to create an automated agent, now called SciScore, to help journals enforce the RRID and other standards that authors are not always keen to remember.
What do you like most and least about working in research infrastructure?
Infrastructure is a weird thing, there is no great recognition for working on it. When you do a great job you blend into the background and nobody knows you exist. When you do a bad job people get really mad because you now stand between them getting what they need. It is a lot like eastern European mothers in that sense, so I am pretty used to that.
Based on your own experiences, what advice would you give someone starting, or thinking of starting, a career in research infrastructure?
Wow, yeah, this job is not for the person who likes recognition or has much of an ego. I think that we now have about a million RRIDs out in the “wild” – a tiny number when you look at something like DOIs or maybe even ORCIDs, but not insignificant when you look at how many antibody papers there are. With all of that success I don’t think that authors really know or care that there are people behind the infrastructure. If you are the type of person that can be happy making small incremental but meaningful changes rather than big sweeping ones, you might be happy working in infrastructure.
What sort of infrastructure does SciCrunch provide, and who are your users?
SciCrunch primarily serves as a search portal for research reagents and other resources, with a single goal, to find the persistent identifier (RRID) for the right product. Authors use the portal to find RRIDs for the reagents they already have and they may also browse for reagents they wish to buy. We reduce some of the complexity for authors and coordinate between authors’ lab reagent management, journals, and the resource providers – which could be a single laboratory in a university, or a giant commercial company. The recently created non-for-profit rrids.org is a way to make sure that no commercial concern ever owns RRIDs.
SciCrunch also runs the SciScore tool, the grammarly-like robot that helps to identify places in text where RRIDs belong, and determines whether the author has identified those resources. SciScore also checks for ~50 additional things like author’s compliance with the ARRIVE guidelines, review board statements, and also asks questions about whether or not the cell lines that authors used are contaminated. All of this seemed useful to our journal partners, and this tool has now “talked to” over 250,000 authors about their feelings….actually their papers.
How is SciCrunch sustained financially?
The main question for us and many infrastructure providers is, how do you sustain the infrastructure that has to be fully open? RRIDs are CC-0 licensed and they need to be protected, but also the infrastructure needs to be maintained. For us, membership makes sense and I think this is something that other infrastructures have had some success with. SciCrunch was a university-based project supported by grants, which became an infrastructure that is both a commercial company as well as a not for-profit. The company became self-sustaining through a “membership” model, comprising resource providers who pay to include their resources, and this model is slated to sustain the RRID infrastructure. Journals seem less keen on sustaining infrastructure, but they are sufficiently happy to pay for the SciScore service, which improves the compliance of their papers with accepted guidelines. So the company can be sustained financially by selling services around RRIDs such as compliance checking and analytics.
As the leader of a research infrastructure organization, what do you think are the biggest opportunities we’ve not yet realized as a community — and what’s stopping us?
I think that PIDs, persistent (unique) identifiers, are not yet fully realized, no matter how many cool fuzzball graphs there are out there. PIDs are amazing! They are associated with metadata about whatever thing they identify! So, for example, if an author wants to find out whom to collaborate with on a particular experiment, really what they should be able to do is triangulate on things like experiment types, reagents they are using, locations, expertise availability. I think that we are seeing some bits of this in place now, but we are barely scratching the surface of what is possible.
Looking at your own organization, what are you most proud of, and what keeps you awake at night?
I still answer email requests from authors — if you are asleep you really can’t answer emails so that keeps me up. In all seriousness, I am very proud of the work we have all done to get RRIDs adopted and I do know that we have made a difference in people’s lives. I really can’t measure how many people left science because of bad antibodies, but I bet that, with our efforts and the efforts of great resource providers that are taking the problems seriously, we have a situation that is improving. I know that we directly impacted at least the ability to find (FAIR anyone?) antibodies by about 25-35%. Not bad, for a handful of people toiling in anonymity.
What impact has/does/will AI have on SciCrunch?
We are AI, shhhh!!!! Don’t tell anyone.
AI refers to a group of technologies that we have used for over 15 years, before all of the current hoopla. These technologies largely help find things that are not based on PIDs. With better information you really don’t need AI to divine anything, so we use it just to get the people who have the information (for example, which antibody or mouse did I use in this experiment?) to provide it in the paper they are trying to publish. I actually crack up when I get funky solicitations from some johnny-come-lately companies that promise to “revolutionize my business with AI”. We were winning text analysis competitions against the National Library of Medicine while you were in diapers!
What changes do you think we’ll see in terms of the overall research infrastructure over the next five to ten years, and how will they impact the kinds of roles you’ll be hiring for at SciCrunch?
I do think that the things we care about and use will be at the forefront of publishing over the next few years. Research integrity and artificial intelligence are things that you really can’t get away from. We need to have people who similarly care deeply about research integrity, not just as a box-checking exercise, but as a lifelong passion. We will be hiring those passionate individuals who will train robots to recognize ever more issues that will take the guesswork and deep reading out of the hands of the hard-working managing editors, but which will never fully replace them. We will need AI experts who can build those new tools that help the authors and publishers, but in a way that is informed by the goal at hand – to improve science.