Editor’s Note: With Peer Review Week on the horizon, today we turn to the question of preprints, and how they can best be integrated into the permanent research record. Today’s post is by Sylvia Izzo Hunter (Marketing Manager), Igor Kleshchevich (Senior Software Engineer), and Bruce Rosenblum (VP of Content and Workflow Solutions, and the 2020 NISO fellow), all with Inera, an Atypon company. For more on preprints, the Society for Scholarly Publishing is hosting a webinar, The Future of Preprints: Coronavirus as a Case Study on September 22.

The COVID-19 pandemic has produced an explosion of postings on preprint servers to meet the critical need for rapid dissemination of new biomedical and clinical research findings. Citations to these preprints, both in other preprints and in peer-reviewed articles, have also exploded, as the research–cite–publish cycle shortens to weeks or even days, and citing the most up-to-date information becomes vital.

The preprint explosion created an urgent need for development work at Inera –the ability to process preprint citations went quickly from a nice-to-have feature to an imperative. This post was born out of our discoveries about (and frustrations with) the current preprint citation landscape as we set out to update our software solutions to include support for citations to preprints. As COVID-19 increased the prevalence of preprint citations during the spring of 2020, and we worked to adapt our software, we uncovered one technical challenge after another, illustrated below with real-world examples.

abstract chart showing increasing trend

As we were working on our software development and this post, a working group assembled by ASAPbio in collaboration with EMBL-EBI and Ithaka S+R was working on a set of recommendations for “building trust in preprints” (posted by Beck et al., as a preprint to OSF Preprints on July 21, 2020). These separate but parallel activities show that there is a growing awareness of the issues; however, this workshop report has, to our knowledge, not previously been mentioned in The Scholarly Kitchen. Many of our recommendations overlap with those of Beck et al.; we advise that their report be read and considered alongside the data and recommendations we present here.

We will not discuss the pros or cons of preprints, because we believe that irrespective of anyone’s opinions of them, preprints are now an integral part of the scholarly publishing landscape. What we will discuss are the challenges of recognizing, linking, and retrieving information that has not yet been peer-reviewed, and that may be cited in ways that make it difficult for readers to recognize when a citation has not been peer reviewed. Preprint citations, unless managed well, may weaken the refinement of scholarship, and we make the case that current management needs to be improved.

Some preliminary notes

Preprint servers, early publication articles, and metadata are constantly in flux. All of the examples below were accessed on September 8, 2020. We cannot promise that future readers will find them in exactly the same state.

We use the term “preprint” to mean an item that has been posted on an open site for purposes of viewing and commenting, either prior to or in parallel with the peer-review process, but is not formally undergoing peer review on the site where it has been posted. “Preprint server” means a site that hosts preprints. “Article” or “journal article” means an item that has been accepted by and published in a peer-reviewed journal.

Our data set was collected by culling sample citations from randomly selected bibliographies in preprints on medRxiv, bioRxiv, OSF-hosted preprint servers, and the WHO COVID-19 preprint server from April to September 2020. It was supplemented by example citations to preprints provided by our customers.

Preprint servers do not always identify their content as not peer-reviewed

Inconsistencies and ambiguities exist across preprint servers with respect to how (and, indeed, whether) they identify their content as posted without peer review. For example, this note appeared at the top of most pages we viewed on bioRxiv and medRxiv:

bioRxiv is receiving many new papers on coronavirus SARS-CoV-2. A reminder: these are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information.

The original preprint server, arXiv, is similarly clear about the status of manuscripts. Other sites are less clear. MindRxiv, for example, has no indication on its homepage that the content it hosts has not been peer reviewed, only a note stating: “MindRxiv is a service provided by the Mind & Life Institute, and is not affiliated with other preprint servers.” Neither do individual preprint pages on this server (see, for example, this preprint).

It is imperative that all preprints be clearly labeled as such: at a time when many people consider any information they find online to be true, being anything less than explicit is misleading.

Recommended citations on preprint servers do not indicate that the citation is a preprint

Many preprint servers provide recommended citations in a variety of editorial styles, many of which don’t make it clear that the work in question is a preprint. For the Beck et al.,  manuscript, OSF Preprints conveniently provides these suggestions, none of which explicitly states “Preprint”:

APA
Beck, J., Ferguson, C. A., Funk, K., Hanson, B., Harrison, M., Ide-Smith, M. B., … Swaminathan, S. (2020, July 21). Building trust in preprints: recommendations for servers and other stakeholders. https://doi.org/10.31219/osf.io/8dn4w

MLA
Beck, Jeffrey, et al. “Building Trust in Preprints: Recommendations for Servers and Other Stakeholders.” OSF Preprints, 21 July 2020. Web.

Chicago
Beck, Jeffrey, Christine A. Ferguson, Kathryn Funk, Brooks Hanson, Melissa Harrison, Michele B. Ide-Smith, Rachael Lammey, et al. 2020. “Building Trust in Preprints: Recommendations for Servers and Other Stakeholders.” OSF Preprints. July 21. doi:10.31219/osf.io/8dn4w.

The APA citation even lacks the name of the preprint server. When we searched for this preprint at https://search.crossref.org/, the Vancouver format citation provided by Crossref was:

Beck J, Ferguson CA, Funk K, Hanson B, Harrison M, Ide-Smith M, et al. Building trust in preprints: recommendations for servers and other stakeholders. Center for Open Science; 2020 Jul 21; Available from: http://dx.doi.org/10.31219/osf.io/8dn4w

This citation gives the Center for Open Science as the preprint server with no indication the item is a preprint. So, when an author copies one of these citations into a new article, future readers of that article may have no idea that the cited content has not been peer reviewed.

Recommended citations on preprint servers may fail to include a DOI

The MLA example citation, above, does not include a DOI, which can make it difficult or impossible for a reader to follow the citation back to the authoritative copy. Here’s a suggested MLA citation from MindRxiv:

MLA
Weng, Helen, et al. “Focus on the Breath: Brain Decoding Reveals Internal States of Attention During Meditation.” MindRxiv, 7 Nov. 2018. Web.

When we Googled the title of this paper to find it from the DOI-less citation, MindRxiv did not even appear in our first page of results. Interestingly, the first two matches link to a preprint of this paper posted on November 4, 2018, to bioRxiv at https://doi.org/10.1101/461590. Only the third link points to the published article.

bioRxiv’s version of this preprint directs us to “View current version of this article” in a big red banner at the top of the page, with a link to the final publication in Frontiers in Human Neuroscience. The MindRxiv page gives no indication that the paper has been published in a peer-reviewed journal.

Preprint citations provided by authors often do not include a DOI

Today, the majority of journal articles are assigned DOIs that are deposited with Crossref, and the majority of preprint archives also assign DOIs to their content (arXiv, which assigns its own persistent identifiers, is the most notable exception). But assigning a DOI does not guarantee that it will always be used in subsequent citations.

In a random (though unscientific) sampling of preprint citations we found in preprints posted to a variety of servers over the past six months (i.e., new preprints citing older preprints), fewer than half included a DOI.

Sometimes the DOI is missing because the recommended citation on the preprint server did not include one. In other cases, reference management software doesn’t distinguish between a journal article and a preprint, and may not include the DOI when formatting a reference to a specific editorial style. Sometimes authors don’t understand why it’s important to include a DOI when citing content that is not part of a traditional issue-based journal.

In fact, it’s even more essential to include a DOI when citing content that is not part of a traditional paginated publication to enable readers to locate the authoritative copy of the preprint.

Preprint servers are not always updated to indicate publication in a peer-reviewed journal

In the preprint servers we reviewed, failure to indicate that a preprint has since been published in a peer-reviewed journal is not uncommon.

Publishers have told us that it’s essential to know when a preprint has been peer-reviewed and published. Peer reviewers and editors need to understand whether or not a preprint citation has subsequently been published when evaluating new research. Often, publishers will ask the author of an accepted article to review preprint citations and update them to cite the final article, if the content is substantially the same in both versions. It’s also important for future readers to know if a preprint was peer reviewed and published in a journal, and, if so, which one.

Whose responsibility is it to notify the preprint server that an article has been published? Authors may not see it as their responsibility and, in any case, are notoriously overburdened with administrative tasks. So notification and updating must be handled via scholarly publishing workflows and infrastructure.

Consider the paper “Serology characteristics of SARS-CoV-2 infection since the exposure and post symptoms onset.” It was posted to medRxiv on March 27, 2020, and published online in the European Respiratory Journal on May 19, 2020. As of September 8, medRxiv had not been updated, nor does Crossref metadata indicate the relationship between these items.

The lines of responsibility for notification, website updates, and Crossref metadata updates are not entirely clear, and this is the cause of many metadata disconnects. As Beck et al., note in the ASAPbio workshop report, the problem can be mitigated if publishers, server hosts, and Crossref work together to identify a clear set of workflows that minimize the burden on authors and, to the greatest extent possible, automate metadata and site updates.

Automated link formation between preprints articles may not always work

Given the importance of including a DOI in a preprint citation, editors should locate and add any that are missing when editing a peer-reviewed article. Ideally this should be relatively automatic by looking up the citation on Crossref.

But the cross-references between items are not always updated. In theory, Crossref could watch for items with similar first author surnames and titles and then automatically create linkages based on a set of matching criteria. If this is beyond Crossref’s purview then they could, at a minimum, automatically send out notifications to preprint servers based on matches, prompting the servers to update their site and redeposit their metadata.

But such matching logic is not infallible. In particular, it may not work if an article’s title changes significantly between the preprint and the final publication. For example, a preprint posted on the World Health Organization COVID-19 preprint server entitled “A simple model to assess Wuhan lock-down effect and region efforts during COVID-19 epidemic in China Mainland” was published in the Bulletin of the World Health Organization as “Modeling the effects of Wuhan’s lockdown during COVID-19, China.”

Crossref returns inconsistent results for the same query

Part of our software development process includes automatic nightly regression testing. We have found that Crossref structured queries return inconsistent results for preprints. Here are two references to preprints that are part of our regression test set:

E. Hassanien, L. N. Mahdy, K. A. Ezzat, H. H. Elmousalami, H. A. Ella, Automatic X-ray COVID-19 Lung Image Classification System based on Multi-Level Thresholding and Support Vector Machine, medRxiv.
[citation copied from bibliography of https://doi.org/10.1101/2020.05.01.20087254]

Lachmann, A., Jagodnik, K. M., Giorgi, F. M., and Ray, F. (2020). Correcting under-reported covid-19 case numbers: estimating the true scale of the pandemic. medRxiv
[citation copied from bibliography of https://doi.org/10.1101/2020.07.01.20144279]

Although our test system makes identical queries every night, our daily data review has found that Crossref doesn’t consistently return a DOI for each of these references, and it’s unclear why.

Authors may post preprints to two or more preprint servers

Sometimes a preprint appears on multiple servers.(Note: our discussion here focuses on the technical challenges associated with such postings; we leave the principle of multiple postings for commenters on this post to discuss.)

We found the following paper on three different preprint servers. The first two citations are the recommended ones from those servers:

Rodriguez Llanes, Jose Manuel and Castro Delgado, Rafael and Pedersen, Morten Gram and Arcos Gonzalez, Pedro and Meneghini, Matteo, Confronting COVID-19: Surging Critical Care Capacity in Italy (3/25/2020). Available at SSRN: https://ssrn.com/abstract=3564386 or http://dx.doi.org/10.2139/ssrn.3564386

Rodriguez-Llanes JM, Castro Delgado R, Pedersen MG, Arcos González P & Meneghini M. Confronting COVID-19: Surging critical care capacity in Italy. [Preprint]. Bull World Health Organ. E-pub: 6 April 2020. doi: http://dx.doi.org/10.2471/BLT.20.257766

Rodriguez-Llanes JM, Castro Delgado R, Pedersen MG, Arcos González P & Meneghini M. Confronting COVID-19: Surging Critical Care Capacity in Italy. medRxiv. Posted 6 April 2020: 2020.04.01.20050237; doi: https://doi.org/10.1101/2020.04.01.20050237

According to the statement that accompanies the SSRN version, “Authors have either opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet [the Lancet-branded preprint service on SSRN], or submitted directly via SSRN.” We can therefore presume that the authors either submitted this paper to The Lancet or posted it directly on SSRN on March 25, although there is nothing in the recommended citation to indicate that it was submitted to The Lancet. Two weeks later, the authors submitted the same manuscript to the Bulletin of the World Health Organization, which posted it on their COVID-19 preprint site. The authors also posted the preprint to medRxiv.

This situation creates several problems. First, if and when this paper is published, multiple preprint servers will need to update their site and their metadata. Second, despite the (laudable) inclusion of DOIs in the recommended citations, if — as is frequently the case — future citations to any of these versions do not include a DOI, looking up the DOI at Crossref presents challenges.

As a test, we took the three citations above, removed the DOIs, and then tried to find their DOIs via Crossref services. With structured queries a DOI was only returned for the first citation, which had been incorrectly deposited at Crossref as a journal article. The others were correctly deposited as “posted-content”. Querying the same three references using the Crossref Simple Text Query service returned the medRxiv DOI for all three citations, incorrectly for two and correctly only for the third.

Preprint sites replace preprints with the published journal article

We have seen multiple cases in which preprint servers replace the preprint with the final journal article after publication. For example, if we look at the article “Habitat risk assessment for regional ocean planning in the U.S. Northeast and Mid-Atlantic” on marXiv, we find that the original preprint has vanished without a trace. In its place is a PDF of the final publication in PLOS One, and the following recommended citation:

Wyatt, K., Griffin, R., Guerry, A., Ruckelshaus, M., Fogarty, M., & Arkema, K. K. (2018, January 26). Habitat risk assessment for regional ocean planning in the U.S. Northeast and Mid-Atlantic. https://doi.org/10.1371/journal.pone.0188776

This means that, if any other paper has cited the preprint, and if the final published journal article differs from the preprint in any significant way, future readers will be unable to view the research as it was expressed in the preprint.

A preprint later published in a peer-reviewed journal is like an article for which a correction has been published — you don’t make the original article go away; instead, you publish a correction and leave the original as it was.

Recommendations

Through the examples above, we’ve illustrated a number of challenges for the current preprint environment. We’ve come to think of preprints as much like an unruly teenager: we see tremendous promise, but in need of more adult supervision to achieve their potential.

In addition to endorsing the recommendations of Beck et al., we recommend the following steps to make preprints more trusted and sustainable.

  • NISO, in conjunction with ASAPbio:
  • Preprint servers:
    • Clearly identify when content has not been peer-reviewed.
    • Update recommended citation formats, when provided, to always include a DOI, the preprint server name, and a “preprint” indicator.
    • Work with vendors of reference management software to improve integration between preprint metadata and software that consumes the metadata, so that preprints are handled as a unique citation type and not shoehorned into data structures built for journal articles.
    • Define and implement workflows to ensure that preprint webpages are promptly updated with final publication information.
    • Refrain from replacing preprints with published articles.
  • Journal publishers:
    • Update editorial style guides/instructions to authors to indicate how preprints should be cited (we recommend, at a minimum, first author, preprint title, preprint server name, date of posting, “preprint” indicator, version, and DOI).
    • Educate authors about the correct use of DOIs in preprint citations.
    • Take steps to ensure that preprint metadata are not deposited to Crossref as journal article metadata.
  • Reference management software vendors:
    • Design and implement structures for new reference types for preprints, so that they will be formatted correctly according to journal styles and always indicate that the item is a preprint.
  • Crossref:
    • Update query logic (structured query, Simple Text Query) when multiple items may have the same authors, title, and year of publication, to better distinguish between a journal article and an earlier preprint, or between preprints of the same manuscript on multiple preprint servers.
    • Consider implementing systems to automatically notify preprint services when journal articles have been published with the same authors and title.
    • Consider creating automatic Crossref metadata links between preprints and journal articles that have identical authors, title, and year of publication.

Many of these recommendations cannot be implemented in isolation. We hope that the interested parties listed above will work effectively together on solutions.

Sir Isaac Newton famously commented that if, in his work, he saw further than others, it was “by standing on the shoulders of giants.” The knowledge we are collectively building is only stable if the citations that underpin new research are sound. When our citations do not clearly indicate whether a source has been peer-reviewed, and when preprint metadata is handled incorrectly, we risk undermining the stability of future research chains. We have an opportunity to bring consistency and stability to the preprint environment; if we choose not to take it, we risk harming the integrity of research.

Sylvia Izzo Hunter

Sylvia Izzo Hunter is Manager, Product Marketing, Community & Content, at Wiley Partner Solutions; previously she was marketing manager at Inera and community manager at Atypon, following a 20-year career in scholarly journal and ebook publishing. She is a member of the SSP Diversity, Equity, Inclusion, and Accessibility Committee. She lives in Toronto.

Bruce Rosenblum

Bruce Rosenblum is the Vice President of Content and Workflow Solutions at Atypon Systems, and the 2020 NISO Fellow. He was formerly CEO at Inera, which was acquired by Atypon in 2019.

Discussion

14 Thoughts on "Guest Post — What’s Wrong with Preprint Citations?"

Thank you for this post. Your assessment of citation recommendations missed the relevant recommendation from the AMA Manual of Style, 11th ed, released in February of this year. In the References chapter, the following is included, in 3.11.4.1 Preprints and Publication of Unedited Manuscripts:

“Preprints are another online method for publication in which a manuscript is uploaded by authors to a public server, without editing or formatting, and typically without peer review.9 A preprint may be a predecessor to publication in a peer-reviewed journal; it is “archived” and citable. Preprint servers include arXiv.org, bioRxiv.org, MedRxiv, and many others. Preprints were initially used more often in the physical sciences than in medicine, but they are becoming more common in the biological sciences.10 Preprints may have DOIs and can follow this citation format:

1. Bloss CS, Wineinger NE, Peters M, et al. A prospective randomized trial examining health care utilization in individuals using multiple smartphone-enabled biosensors. bioRxiv. Preprint posted online October 28, 2015. doi:10.1101/029983

If a preprint is subsequently published in a peer-reviewed journal, the reference citation should include complete data as outlined in this chapter. Note: The version cited should be the version used.

2. Bloss CS, Wineinger NE, Peters M, et al. A prospective randomized trial examining health care utilization in individuals using multiple smartphone-enabled biosensors. PeerJ. 2016;4:e1554. doi:10.7717/peerj.1554”

https://www.amamanualofstyle.com/view/10.1093/jama/9780190246556.001.0001/med-9780190246556-chapter-3-div2-71#

Disclosure: I am committee member and coauthor of the AMA Manual of Style.

Your post made me think to dig out my copy of the 17th Edition of the Chicago Manual of Style to see if they address preprints, too—and they do. In section 14.173, the manual says, “Not having been subject to peer review, preprints are treated as unpublished material,” and they give this example:

Huang, Zhiqi. “Revisiting the Cosmological Bias Due to Local Gravitational Red-shifts.” Preprint, submitted April 24, 2015. http://arxiv.org/abs/1504.06600v1.

So, even though recommended citations on preprint servers may not indicate that the citation is a preprint, style guides (at least for APA and Chicago) do indicate that researchers should include this information in their citations. Whether people use the guides or copy the recommendation citation from the preprint server is a different question.

Thank you for sharing the Chicago Manual example. We note that this sample citation does not include the name of the preprint server. In this citation, it is implicit in the URL, but that’s because arXiv uses their own PID system rather than DOIs. However, if this were a citation to almost any other preprint server where a DOI is used rather than a URL, the citation would not host preprint server name. Especially in cases where a preprint may appear on multiple preprint servers, we believe it is essential to include the preprint server name in the citation

Thank you for highlighting the very timely information in the latest addition of the AMA Manual of Style. We are delighted to see that the AMA style includes all of the citation elements in the recommendations of our post and Beck, et al. We hope that other publishers who have not yet updated their style guides, or who include less than the “necessary”information in their current preprint citation style will follow your lead, and also that of IEEE as indicated in the comment further below.

Great article — and very timely!
IEEE’s preprint server (https://www.techrxiv.org) operates on the figshare platform. The preprint citation style designed by figshare includes a notation that the item is a “preprint” as well as a link to the doi:
Hamid, Sufian; Shah, Syed Attique; Draheim, Dirk (2020): A Scalable Key and Trust Management Solution for IoT Sensors Using SDN and Blockchain Technology. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.12950090.v1

Thank you for this timely and important article. I would add bibliographic database producers and vendors to your list of parties that need to consider identifying preprints if they include them in their databases. The National Library of Medicine’s PubMed, the public version of MEDLINE, is piloting the inclusion of preprints related to COVID-19. They index preprints as an article type and show that the article is a preprint, along with the name of the hosting repository, in the citation:

Eg, Zietz M, Tatonetti NP. Testing the association between blood type and COVID-19 infection, intubation, and death. Preprint. medRxiv. 2020;2020.04.08.20058073. Published 2020 Apr 11. doi:10.1101/2020.04.08.20058073

Thanks for this analysis, very interesting. Just to note that at Crossref we already notify preprint servers when there is a match with title and authors (details at https://www.crossref.org/education/content-registration/content-types-intro/posted-content-includes-preprints/#00082) and we expect the member who posted the preprint to update the preprint metadata if needed – there is no additional cost for doing so. It would be ideal if publishers collected the DOI of any previous version, such as a preprint, and included it in the metadata when the paper is published. I take the points about the search results, it’s an area that can always be improved.

Following on Martyn’s comment — IEEE’s preprint server, TechRxiv, hosted by figshare, takes advantage of the Crossref service (thank you Crossref!). Following the process that Martyn describes — titles and first-author names of in-coming journal article DOI records are compared against the same elements in Crossref preprint DOI records. See an example at: https://dx.doi.org/10.36227/techrxiv.11474400.

We have noted that if a journal, though, is improperly registered at Crossref, that errors can occur. For example, SSRN, a preprint server, apparently is not using the Crossref preprint doi schema and is seen at Crossref as a “peer-reviewed” journal. (The Crossref preprint record contains a status label that clearly indicates that it is a preprint: “… dx.doi.org\/10.36227\/techrxiv.11566725″,”relation”:{},”subtype”:”preprint”}}”).

So using the automatic matching process to determine the preprint-peer-review relationship fails in this case — it is matching a preprint against another preprint.

Hi Martin, as an employee of one of the publishers used as an example above, I’m interested in your “…ideal if publishers…”. Would that be as simple as including @related-article-type in the article metadata, with the value “preprint”?

Thank you for this thoughtful piece!

We at ASAPbio agree that preprint labeling, metadata, and citation practices can be improved, and that these developments would help readers to fairly evaluate work they encounter. With regard to preprint labeling, the burden to improve transparency should not fall to preprint servers alone. There is often an assumption that all scholarly-looking objects are peer reviewed, unless stated otherwise. However, not all journals are peer reviewed and even at peer-reviewed journals, not all content necessarily undergoes peer review. Higher transparency around peer review practices at journals will also be beneficial. More practically, we are working to address preprint server labeling with a new project (https://asapbio.org/preprints-in-the-public-eye); we would be delighted for additional input from readers of The Scholarly Kitchen.

We’re excited to see efforts in this space, and we’re eager to collaborate with other groups on developing standards and solutions to the challenges enumerated here.

Let’s make this happen. Sylvia, Igor, Bruce, Jessica: I’d be happy to host a conversation with you and anyone else who is interested on launching a best practice initiative to deal with issues around preprints as described. I’ll email you all directly, but if anyone else is interested, please email me. My contact details are on the NISO website.

Hi Todd, I’m at ASTM and we recently had a discussion with our Committee on Publications about preprints/best practices. I’d be interested in the group you’re forming, but I can’t find your contact details on the NISO site.

Life comes at you fast, and we already have an addendum to propose!

We found and examined these 5 verifiably withdrawn preprints listed on Retraction Watch (from this list: https://retractionwatch.com/retracted-coronavirus-covid-19-papers/):

Chen S et al. (2020.) Mental health status and coping strategy of medical workers in China during the COVID-19 outbreak [preprint]. medRxiv. Posted 25 Feb 2020. https://doi.org/10.1101/2020.02.23.20026872

Chu P et al. (2020.) Computational analysis suggests putative intermediate animal hosts of the SARS-CoV-2 [preprint]. bioRxiv. Posted 5 Apr 2020. https://doi.org/10.1101/2020.04.04.025080

Pradhan P et al. (2020.) Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag [preprint]. bioRxiv. Posted 31 Jan 2020. https://doi.org/10.1101/2020.01.30.927871

Rimon Parves M et al. (2020.) Analysis of ten microsecond simulation data of SARS-CoV-2 dimeric main protease [preprint]. bioRxiv. Posted 12 Apr 2020. https://doi.org/10.1101/2020.04.10.036020

Yang Y et al. (2020.) Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China [preprint]. medRxiv. Posted 11 Feb 2020. https://doi.org/10.1101/2020.02.10.20021675

The preprint servers themselves indicate withdrawn status clearly and explicitly, in a human-readable way.

However, upon looking at the Crossref records for these items we discovered not only that the metadata had not always been updated (2/5 abstracts on Crossref include a narrative explanation that the authors have withdrawn the preprint), but that in none of these cases did the metadata include a machine-actionable indicator of withdrawn status.

Our findings indicate a need for preprint servers and Crossref to collaborate on protocols to ensure that when a preprint is withdrawn, the associated metadata deposited with Crossref is updated to reflect its new status, in a way that not only humans but also machines can recognize and understand.

Finally, it’s worth noting that all of these examples relate to biomedical research into COVID-19, currently the subject of intense and broadly distributed research—research that is therefore subject to robust critique. That these specific preprints have been withdrawn is the result of a system operating effectively, which unfortunately is not always the case in less publicly prominent research areas.

Comments are closed.