Type the word “bioRxiv” into into Google’s search box, and the first autocomplete suggestion is biorxiv impact factor. Number three is biorxiv journal. To be clear, bioRxiv is not a journal, but a preprint server. It is not indexed by the Web of Science and, more importantly, has never received an Impact Factor. At the same time, Google’s algorithm has discerned that enough users have searched for bioRxiv’s Impact Factor to suggest the result.
Since it’s launch in November 2013, bioRxiv has been wildly successful, attracting submissions, readers, social media attention, and citations.
Each bioRxiv record summary includes the paper’s altmetric score and usage figures. It does not include how many times the preprint has been cited. Nevertheless, with a little work, you can find this out too.
Consider the preprint, “HTSeq – A Python framework to work with high-throughput sequencing data” by Simon Anders and others. It was first deposited in bioRxiv on 20 Feb 2014, with a revised version being posted on 19 August 2014. The paper was subsequently published on 15 January 2015 in the journal Bioinformatics.
If you do a cited reference search in the Web of Science for Anders S* as the author, biorxiv as the work, and limit the publication date to 2014, you’ll get the following results:
For the first three references, biorxiv is listed as the source of the paper; references five and six list “preprint” as the source; and number four omits a source completely. In total, we find 27 references to the preprint, which, in a practical sense, is a pretty small number of citations compared to the 2304 counted toward the journal article. Even so, of these 27 citations to the bioRxiv version, 3 were made in 2015 (the year the article was published in Bioinformatics), 3 were made in 2016, 8 were made in 2017, and 2 have come out so far in the first few months of 2018.
Why do authors continue to cite a preprint years after it has been published in a journal?
It’s hard to understand why an author would still cite the preprint years after it has been formally published in a journal. Readers may have downloaded an early version from bioRxiv and continue to cite it as a preprint; they may be copying or reusing old references; GoogleScholar may be preferentially sending readers to the bioRxiv instead of the journal. While bioRxiv does its best to search for published papers and update its website with accurate metadata, this information is obviously not reaching all readers.
Similar to the problem of authors continuing to improperly cite papers that have been corrected or retracted, citing earlier versions of a paper may promote incorrect or invalid scientific work. Still, even if the bioRxiv version was identical in every respect to the published version (in the above example, the final version in the bioRxiv was submitted the day after the final version was submitted to the publisher), a citation to the bioRxiv is a citation that cannot be counted towards a journal’s Impact Factor and associated metrics.
This redirection of citations is not just a bioRxiv problem. You’ll find references to journal articles as if they were published by the arXiv, PubMed Central, Academia.edu, and ResearchGate (see example above). Understandingly, some publishers are not happy that these repositories are stealing eyeballs and citations.
A citation is much more than a directional link to the source of a document. It is the basis for a system of rewarding those who make significant contributions to public science.
The scope of this problem is not well understood. While there is great enthusiasm for incorporating preprints into the lifecycle of publication, there is a dearth of research on this topic. There are currently well over 8,000 citations to bioRxiv in the Web of Science.
At present, there is no mechanism for a user to export metadata from bioRxiv to match with citation records from the Web of Science, Scopus or Dimensions. Unfortunately, my request for a dataset from bioRxiv to study this problem was rejected as Cold Spring Harbor Laboratory looks to develop a long term solution to providing metadata to interested parties.
A citation is much more than a directional link to the source of a document. It is the basis for a system of rewarding those who make significant contributions to public science. Redirecting citations to preprint servers not only harms journals, which lose public recognition for publishing important work, but to the authors themselves, who may find it difficult to aggregate public acknowledgements to their work.