While the use of preprints (public posting of an early draft of a paper before it’s submitted to a journal for formal review) has long been established in fields like physics and the social sciences, recent uptake in the biomedical world has raised some concerns. When clinical treatment and public health are involved, extra care must be taken to ensure that it is clear to the reader that the work being described has not been peer reviewed. Most preprint servers handle this well, watermarking their preprints and clearly labeling them as preliminary. But little thought seems to have been given to how we cite preprints. Should we treat them the same way that we treat reviewed and published material?
Where you stand on this largely depends on the purpose that you think reference lists in papers are supposed to serve. If you see them as providing empirical support for any statements made in the paper, then the inclusion of preprints in citations likely worries you. An author could make a dubious claim in a preprint that sees no editorial oversight or review, and then cite that claim as an accepted belief in the field in a subsequent published paper. If you see reference lists as a set of links providing further information, then inclusion of non-peer reviewed material isn’t a big deal, caveat lector.
A journal I work with was recently publicly criticized because they asked an author to remove a preprint from the list of references on an accepted paper. Their reference policy is a traditional one, espoused by many other journals — anything that goes into the reference list must have been peer reviewed. Anything that has not been peer reviewed is treated as a “personal communication” and can be referred to in the paper, but is noted as such. I’ve often heard preprints compared to the equivalent of giving a talk about unpublished work at a meeting, so there is some logic in treating them both the same way when referring to them in a published work.
I asked Richard Sever from biorxiv about this and while he conceded that this might have been the “traditional” policy, it is one that has largely become outdated:
I’m not sure that anyone really adheres to that view because for decades people have been happy to include theses, books, editorials, and, more recently, websites, data, and code in those Reference lists, none of which are peer reviewed. I think people sometimes think that but only because they forget all the above that they routinely include. Personal communications are different. There is nothing to actually point to there so it doesn’t make sense to include them in a reference list.
One benefit of citing preprints is that many (often most, depending on the field) are eventually formally published in a journal. Linking to the preprint version, assuming it is on a reputable preprint server, will bring the reader to an updated version that prominently displays a link to the peer reviewed, published version.
The tide seems to be flowing toward more inclusion in reference lists. As the way we communicate research results continues to evolve, it makes sense that our policies toward those communication channels should continue to evolve as well. But as with all things in the scholarly communications sphere, we should strive for clarity and transparency. If we are going to include different types of materials in reference list, then we should make clear to the reader what they represent. We need a set of easily recognized standards for how this is done.
The National Institutes of Health (NIH), in their document discussing the use of preprints to report interim research results from grants, proposes the following form:
To cite the product, applicants and awardees must include the Digital Object Identifier and the Object type (e.g. preprint, protocol) in the citation. Also list any information about the document version (e.g. most recent date modified), and if relevant, the date the product was cited.
Example: Bar DZ, Atkatsh K, Tavarez U, Erdos MR, Gruenbaum Y, Collins FS. Biotinylation by antibody recognition- A novel method for proximity labeling. BioRxiv 069187 [Preprint]. August 11, 2016 [cited 2017 Jan 12]. Available from: https://doi.org/10.1101/069187.
These requirements help reviewers understand that the product is public, interim, and identify the specific version is being referenced.
Cold Spring Harbor Press, the publisher behind biorxiv, agrees with this methodology, and includes the label “PREPRINT” after the server name in citations in their journals. There’s a flexibility in this approach, where any new type of object could be clearly labeled in the reference to alert the reader (e.g., WEBSITE; BLOG COMMENT; etc.). One concern though, is that the reader doesn’t get this information unless they dig down to the reference. The casual reader may continue to skim along, and assume that the concept holds as much weight as any other reference in the paper. One way to resolve this would be to include the descriptor in the reference callout in the text (e.g., Smith et al., 2018 PREPRINT).
Other suggestions include putting non-peer reviewed material into a separate reference list. This would create greater editorial overhead though, as one would need to carefully delineate which references go where, and some, such as a book where one doesn’t know the review history, would remain ambiguous. One could also use the different DOI category for a preprint to automatically create some different way of displaying the reference (different colors for different sources?), but again, this may be difficult to standardize (and would entirely go away when someone prints out a copy of the PDF on a black and white printer — yes, people still do this).
No matter the final chosen standard, this means extra work and extra expense for journals. We know we won’t be able to rely on authors to implement these changes, so that means careful review of manuscript reference lists to flag and characterize any suspect citations, or at least the expense of building automated tools to do so, and the time and effort needed to run them. As Tim Vines pointed out recently, advocates for change and new ways of publishing have a habit of unintentionally creating these sorts of editorial headaches.
Regardless, the use of preprints continues to roll forward and other than wreaking havoc in financial markets, so far seems to present benefits that outweigh the concerns raised. More work needs to be done though, and we need a broadly accepted standard for citation of preprints and how this is made clear to the reader. I’m told that COPE is working on a set of “best practices” for preprints which should be out soon and will help push the conversation forward. The role of publishers is not to try to prevent change, but to adapt to change and find ways to preserve quality, transparency, and trustworthiness of the scholarly literature, and preprint citation is an opportunity for us to provide responsible stewardship.
53 Thoughts on "Preprints and Citations: Should Non-Peer Reviewed Material Be Included in Article References?"
If I was in charge of the world I would have two lists — traditional References and then Other Source Material — code, websites, preprints, etc.
Rachel, you only work with publications that use parenthetical citations and then a reference list, not footnotes?
yes that is right; footnotes are very rare and not encouraged. I see how that could be another solution. Do you put website citations, for example, in footnotes (or parentheses) instead of the References list?
I’m not sure I understand your question. In footnote styles, you put all citations in the footnotes.
Lisa, I can’t speak for Rachel (who I don’t know), but in the journals which are specific to meteorology/climatology that I’ve worked with, references are indeed cited parenthetically and then listed in a reference section. Footnotes are occasionally used for parenthetical comments that would be disruptive if left in the body of the text (which I suppose technically makes them not parenthetical, but I can’t think of a better adjective). Footnotes are not particularly encouraged, although they are not forbidden either; you occasionally need to fight for them.
Another important point here is that inclusion of citations in the reference list, rather than the main text, is necessary for them to be identified, linked and counted by Google Scholar. Failure to include these and other non-traditional outputs is thus a disservice to authors who generate them and hides edges in the network from discovery tools like Google Scholar, reducing its completeness and utility.
Good point. Given that much of academia’s career advancement and funding structure is based around citation, excluding things like code and data from being cited puts some productive researchers at a disadvantage at a time when we are actively encouraging them to offer this valuable material to the community.
There has been a similar (more traditional) problem in the medical literature, where authors cite an abstract or conference paper published in a non-peer reviewed journal supplement. Often, the only way to distinguish a supplement from the reference is the letter “s” from the page number. Unfortunately, citing authors may drop the “s” –whether this is purposeful or not– rendering the reference indistinguishable from a peer-reviewed paper. Greenberg documents this and other citation distortions, see:
Greenberg SA. 2009. How citation distortions create unfounded authority: analysis of a citation network. BMJ 339: b2680-. https://doi.org/10.1136/bmj.b2680
I find this topic very interesting because I’m writing a paper on open access and would like to use what David Crotty had to say in his October 4, 2017 column, “Study Suggests Publisher Public Access Outpacing Open Access; Gold OA Decreases Citation Performance.” The issue for me is that his column is based upon a preprint. While he strongly criticizes many aspects of this preprint, I would like to quote some of his comments in my article.
What advice would the author of that column and today’s column give me?
Quote away! Note that the preprint in question has now been published:
I’ve not had the time to read the final version of the article to know how many, or if any at all, changes were made to address the questions I raised, as well as those raised in the comments on my blog post. So hopefully the article was improved, but the process does point out one of the issues of preprints for journals. The article got a lot of attention as a preprint, was blogged about and written about in publishing news articles. Now that it’s formally out, no one is going to repeat that same coverage, so if things like Altmetrics are important to you, the preprint has stolen the journal’s thunder.
Thank you very much for the prompt reply and the link to the published article, which I will now include directly even if it is technically outside the publication date range that I established for my article.
I’ll add that the two main resources for this article are the research studies that I found in Library literature & information science full text and columns in the Scholarly Kitchen, both with a limiting date of January 1, 2015 to February 15, 2018.
(Parenthetical … PeerJ using an open review process makes this much easier to compare at least between versions … just download the “author rebuttal” letters. It is interesting to me these are label rebuttal rather than response when much of the content is agreeing with – not rebutting – the reviewers. But, that’s a different topic!)
Thank you Bob and David for making my point for me. Please see my comment below.
It is not true that preprint publications are a “recent uptake in the biomedical world” (see Cobb, PLOS Biology 15:e2003995). In the 1960s we graduate students were always on the lookout for the latest batch of paper preprints (from the NIH) in the lab director’s office.
The “experiment” lasted a few years, and we recognized it as such, so were not too surprised when NIH declared it a failure (unlike our surprise when the NCBI recently dubbed its wonderful PubMed Commons commenting facility an “experiment” that had failed).
My own thought in the 1960s was how unfair it was for those who were not on the circulation list. By the same token, today’s disenfranchised folk on the wrong side of journal paywalls are helped by preprint servers which generally do not require overcoming this obstacle.
Thus, the 1960s paper preprints were not really “publications” to the extent that circulation was restricted, so they were not really public. Today’s preprints are truly public, so deserve to be called “preprint publications.”
If I were in charge, there would be a mechanism for having authors update pre-print references to post-review references. As a reader I want to know the following.
1. Which journal accepted the cited pre-preprint, if any?
The implications of journal reputation have been discussed in this blog many times.
2. Did the pre-print stand up to peer review or were there major changes to procedure, data, analysis, or conclusions due to reviewer/editor comments?
In other words, the citation must be applicable in context or the scientific record for the citing paper should be changed. Ideally the citation can be updated during the citing paper’s review cycle. However it still should be done post-publication via erratum or other mechanism.
I would think authors would rather have citations for the final version as well.
Any author who cites a preprint runs the risk that, if and when the cited work gets peer reviewed and published, the finding may change, subtly or otherwise. Authors updating multiple references during review or page proofs would be messy enough, potentially slowing the publication of peer-reviewed content, but authors updating them after publication is also problematic. An erratum for every updated reference?
How about, instead, an XML tag that programmatically pulls in and displays the updated peer-reviewed reference at any point after publication, without deleting the preprint reference, for full transparency?
But who or what would associate the updated reference with its preprint forerunner? We may need the preprint servers to do that. David states that reputable servers will bring the reader to the updated version (and bioRxiv does do that). But not every notable preprint server out there does it. If preprints are going to be posted and referenced, I think all preprint servers need to commit to linking programmatically to any updated version of the preprint.
I think you’re on the right track here. Jonathan’s suggestions have merit, but I’m not sure how they would be accomplished without a huge amount of additional expense and effort. But as I understand it, CrossRef has a system where they are able to find formally published versions of the preprints they’ve issued a DOI for, and then that gets used by a repository like biorxiv to automatically add a link to the published version on the preprint. As you note, not all repositories do this, and personally, I think it’s something that is an important criterion for choosing a preprint server.
Just a quick clarification that it is bioRxiv that identifies the match with the journal article and passes this on to Crossref – not the other way round.
Thanks, Richard. It’s fantastic that bioRxiv does that. That, along with bioRxiv’s programmatic searching to match up published subsequent versions with the preprints, really raise the bar for preprint servers. I think establishing best practices such as these will help all researchers going forward.
I wonder why this issue is not thrown to the editors and the editorial board instead of the publishers.
Certainly where I work this is the case — the editors and the editorial board set policy on such matters, and we supply background information, advice and support. But ultimately it is their decision.
That said, if new infrastructure is needed to support those policies, then it falls to publishers to build and maintain that infrastructure. Our editors are busy enough without asking them to code automated systems for recognizing different DOI characteristics in reference lists.
Why do you conclude that the publisher has a say regarding this matter. I suggest you look at the SK issue on what publishers do.
I am responding to the discussion in the Kitchen. And while publishers can claim that their thumbs are not in the Christmas pie, there are purple thumb prints all over the place.
There is such an interesting underlying issue here … is the point to cite what you used in developing your work or to curate a list of the “right” things?
I am not a fan of pre-publications. For one thing, we have no guarantee that the pre-pub will actually end up being published as a peer-reviewed article. If this practice becomes acceptable, then we must seriously re-think the purpose of peer review. If, however, any and all contributions are in some way “another tick on the tenure meter’’ where have our standards gone? I also am perhaps more concerned than I otherwise might be due to consistent errors in the lit reviews I have seen in manuscripts I have reviewed. I appreciate the opportunity to comment on this issue, and if I have “already said this” then perhaps it is worth saying again.
As I recall it, at Molecular Ecology we asked authors to cite preprints as (Leroy et al, unpublished) in the text, and point to the preprint DOI in the references. If the paper was accepted for publication, the typesetters asked the authors to check all the ‘unpublished’ citations to see whether they should be updated to point to a published article.
So, Molecular Ecology assumes that the final version does not differ substantially from the preprint?
This is an important point, particularly when one considers humanities journals, which do an enormous amount of revision and editing post-acceptance. The final, published version is very different from the accepted version, and the claim or quote being cited from the preprint may not even exist in the final, published version.
The claim here is that “the typesetters asked the authors to check all the ‘unpublished’ citations to see whether they SHOULD be updated to point to a published article.” If it has substantially changed, then the offers should say that it SHOULD NOT (or, depending on the situation, possibly the text should change (which in turn may lead to other actions)). It is not update automatically or by the typesetters.
The claim here is that “the typesetters asked the authors to check all the ‘unpublished’ citations to see whether they SHOULD be updated to point to a published article.” If it has substantially changed, then the authors should say that it SHOULD NOT (or, depending on the situation, possibly the text should change (which in turn may lead to other actions)). It is not update automatically or by the typesetters.
Thanks for raising this point, David. It’s hard to get across how important that post-acceptance stage of writing via editing-revising-checking is for humanities journals. The article is simply not the same as the pre-print. We’ve done the forensics for our journal (William & Mary Quarterly) on the number of skilled editorial hours that take place post-acceptance and pre-typesetting–for substantive editing by experts, manuscript editing by also-grad trained more-than-copyeditors, and source verification by our editorial apprentices. But I think we likely need to do a graphic that captures the difference between the two stages of an essay. Another post! But it’s why the pre-print, while some see it as such a valuable and accessible form of scholarship, is so problematic.
Proposals to create a second reference list for non-journal products or to “warn” readers that the work is a preprint within the text itself present a false dichotomy between papers on preprint servers (which may include versions revised in response to both formal and informal peer review) and papers in journals (which, in the case of some “predatory” journals, may not have been peer reviewed at all). As an alternative, additional information in the reference list (analogous to NIH’s preprint tag) could be used to inform readers of *how* the work was peer reviewed (or better yet, link to the reviews themselves, if the reports are open). Some interesting ideas on metadata to describe the peer review process were discussed at a meeting earlier this year: https://www.prtstandards.org/
For the record, I know many journals that work very hard to carefully go through reference lists for accepted manuscripts and require the author to eliminate or replace any citations to journals that they consider unreliable or predatory (this is another area that would greatly benefit from automation). And if some slip through, that’s still not a justification for letting everything through.
But essentially, I think it’s less a dichotomy and more an attempt at clarity and transparency. It’s more informative for a reader to know what is being cited, and we can fairly easily spell out some basic categories. Like you, I don’t think separate reference lists is a good solution. But I would have a big problem with any attempt to make judgment calls to inform readers “how” a citation was reviewed. That seems like an enormous amount of overhead for editorial offices, having to carefully investigate every reference in a paper, then dig into the source to find their policies and practices. Most preprints don’t receive any public comments, so that means you’d miss any private reviews that were sent to the author. Would we need a separate tag for megajournals that only review for soundness and not significance? Seems like a huge can of worms that’s not worth the effort.
I understand this must be done on a case-by-case basis and with great care, but I find the idea of a journal compelling authors to remove citations to “predatory” journals troubling. Could this not force them to plagiarise that information?
I completely agree that editors shouldn’t be burdened with judging how peer review was performed. But perhaps peer review reports could be automatically linked from citations with Crossref data. Or, in the model apparently discussed at the PRT meeting (as I understand it), publishers could deposit metadata describing the peer review process each article underwent (# of reviewers, whether blinded or not…) which could then be automatically added to a citation of that article. I emphatically support your goal of making more information available to readers in an efficient way. I just think that labeling *only* preprints tells just one part of the story.
This is such a great example of disciplinary differences. To leave out a source that was influential in forming one’s thinking, regardless of its quality, would be scholarly malpractice in many disciplines. Of course, the text itself would include an assessment of the influence of the piece and its quality, etc.
As a physicist, I couldn’t agree more. Good scientific practice requires that any statement that is not the authors’ own has to be backed up by references, no matter whether they are to a peer-reviewed journal article, a book, conference proceedings, a preprint or whatever.
David, thanks for drawing attention to this issue. Preprint server managers are in discussion about a number of issues of common concern, and the method of citing preprints is one of them. No consensus yet but I think most would agree that having preprint citations in the reference list is preferable to putting them in the text where they will not be machine read. The style of citation needs more discussion. CSHL Press does not use the full cap style you mention: instead it uses the format bioRxiv recommends to everyone: Author A, Author B. 2013. Article title. bioRxiv doi: 10.1101/123456.
To Jonathon’s point above, yes, a preprint should link out to a published version and it can be done programmatically. bioRxiv does this already, finding the links through a script written by our colleague Ted Roeder that crawls PubMed and CrossRef for matching titles and authors. It is successful unless those identifiers have changed significantly in the preprint’s transition to publication.
Finally, a gentle but important correction: CSHL Press is not “the publisher behind bioRxiv”. The bioRxiv service comes from Cold Spring Harbor Laboratory, an academic institution with (as you know) a long history of assisting science and scientists in a variety of ways. Behind bioRiv there are in fact numerous publishers: all those whose journals are willing to consider manuscripts that began life on bioRxiv and, in particular, the increasingly large number who collaborate with bioRxiv to enable manuscript transfers from their journals to the server and from the server to their journals. Those integrations have been important in sustaining bioRxiv’s current momentum.
Yes, absolutely, put them in the citations, just be transparent about it. The goal is for the reader to be better informed, so making a clear link to the preprint is important, as is the nature of what is being linked to.
And I apologize for any mis-characterizations — I’d like to blame them on how complicated the world of preprints and standards currently is, but they’re more likely a vestige of my jet-lagged brain. One question — Richard Sever told me via email that, “What we at CSHLP and Cell do is make sure the work PREPRINT is there in large letters.” Is this NIH-recommended style not the case for preprint citations in journals that CSHLP publishes? Let me know so I can correct the text above.
And as far as CSHLP’s role in bioRxiv, I’m sorry if it goes against your Scottish nature to accept praise, but your group deserves an enormous amount of credit here for your vision of a biomedical preprint server that would be community-owned, rather than in the control of a private company. This is increasingly important in an era of increased consolidation and lock-in in scholarly communications. Your (and Richard’s) vision is what got this whole thing off the ground, and while it has grown in scale and participants, I think it’s fair to give your organization kudos here.
It seems like ‘private communication’ disappeared form the citation list. Maybe it is only my impression.
Folks, this one’s easy.
As a matter of academic integrity, you must cite everything you meaningfully consulted in the preparation of an article. The traditional way to do this is through citations and/or inclusion in a bibliography. Leaving out sources you consulted–preprints or no–runs the risk of appropriating ideas that you have not given credit for, and thereby passing someone else’s research off as one’s own. At every academic institution I have ever studied, taught, or worked (in publishing) at, we call that plagiarism. Cite and give credit to everything you consulted for the research. Period.
Technical problems with preprint servers and evolving standards of formatting for how to signal preprints in a reference list do not absolve scholarly authors from this basic principle of academic integrity.
I’m not sure it’s that black and white. Citation practices vary enormously from field to field, and as noted in the post above, credit is frequently given through other means such as noting “personal communications” in the text of the paper for information from a colleague or seen at a meeting presentation for example. Your general point stands though, we want to give credit where credit is due, it’s more a question of finding the best form for doing so.
Karin Wulf wrote a great piece a few years ago, asking just what exactly citations mean:
I agree that the principle, at least, is easy–but you might be surprised at how many writers find citation difficult.
Identifying non-peer-reviewed articles is not difficult. For many years the New Zealand Veterinary Journal has identified all non-peer-reviewed material in the reference list with an asterisk linked to a footnote. This works in monochrome as well as colour, although it does require some editorial input.
Thanks for the good summary of issues around citation of preprints, it definitely merits some air time. There seems to be an assumption in most of the discussion above that citation implies that the author endorses and trusts the cited work, but of course this isn’t necessarily the case. There are a number of reasons to cite work, including that they disagree with your own work. This is a flaw inherent in using number of citations as a measure or quality – whether we’re talking about preprints, articles or anything else.
Thanks David, a bit of anecdotal information on this from the alternative metrics world, and from my colleagues at Altmetric. It’s also the case that other sources like Wikipedia and news, are not specifically calling out the citation is from a pre-print, although in this example Wiki does list/name the citation as coming from bioRxiv https://www.biorxiv.org/content/early/2018/02/20/191569.article-metrics (click on Altmetric donut to see all Wiki references) and here Wiki page https://en.wikipedia.org/?curid=4004434 for one link. From a few spot checks it seems like the Altmetric attention to the pre-print article is about the same as the Altmetric attention to the full article … I know there’s discussions about finding better ways to link attention and the conversation captured on the pre-print to the final published article, and I’m not sure of Wikipedia’s policy to say cite the final article v’s the pre-print, once a fully peer reviewed and published article is available. No doubt this whole topic needs more research, study and discussion … but just making the point it’s not only research articles that cite pre-print articles.
considering peer review did not become standard until well after 1960, and that none of Einsteins papers in his miracle year were peer reviewed, and a bulk of scholarly books are not peer reviewed, combined with the imperfections of peer review, there is really no point. this is much adieu about nothing. then, what about journals that have or do vary whether content is or is not peer reviewed, the whole notion is absurd.
I raised concerns about citing preprints in the physics & astronomy literature back in 1998: http://crl.acrl.org/index.php/crl/article/view/15233
No so much about the practice, but the need for access and permanence of the cited work for future generations. Once the preprint is cited in a published peer reviewed source, its legacy is established and the item and mechanisms for access need to be preserved.
In economics, working papers (I take it that preprints are just a term used in the natural sciences for working papers) have been around for a long time. When it comes to literature reviews, I think it would be wrong to not cite them if they are closely related to your paper. A key purpose of a literature review is to show how your paper fits into the literature and makes a contribution. If you don’t mention a closely related paper (published or unpublished) you are wrongly implying that your paper is a bigger contribution than it may be.
Also, if anyone is really interested in the citation, they will check your references and see it is a working paper. Thus, I really don’t see a problem. Moreover, I will note that some of the best papers often take the longest to get published as their authors are holding out for top journals. Thus, working papers are a way of disseminating the information more quickly than would otherwise be the case.
I am intrigued this discussion and I definitely see merit in the ability to cite preprints (and unpublished datasets), provided that these are carefully labeled as such, and are properly archived so that others can access them. However, I note that this entire discussion seems to be focused on the felicitous situation where some authors would LIKE to be able to cite unpublished work found on a preprint server, presumably because that preprint lends some degree of support to their own work, or further illuminates it. Fine and dandy!
But what about the opposite situation, where a reviewer (or editor) issues a request to authors during the peer review process that they cite, and also discuss in their manuscript, someone’s preprint (possibly the reviewer’s)? Should including such a citation ever be a pre-condition of publication in the peer-reviewed journal? And what if that same preprint comes to very different conclusions, or contains an analysis that the authors disagree with, and are inclined to disbelieve? (This is not some idle musing; it’s actually happening to a paper of mine under review right now!) The authors may understandably be reluctant to cite a preprint for several reasons. First, the data or conclusions found in a preprint, which the author believes are suspect, may change prior to publication, rendering any attempt at discussion moot. Second, the publication the authors’ own peer-reviewed paper may be the agent of such a change, prompting the unpublished authors to review and revise their work. And third, the request by a reviewer to cite an unpublished work from a competitor of the authors in their peer-reviewed publication comes across as a priority claim. So, does our current understanding of scientific priority need to change, so that we now accord it to the first preprint uploaded, and not to the first peer-reviewed paper? If so, this has the potential to set off a race to upload half-baked efforts to servers as placeholders, to the detriment of scholarship. I hope others will consider this scenario carefully and propose how best to deal with it. I look forward to your comments.