Pinnochio, when you wish upon a star
Image via Sam Howzit

What do you do as a scientist when you really, really want something to be true and yet the evidence just isn’t quite there? If you stay true to the scientific method, you only make conclusions that the data supports, very carefully note the limits of your study and take great care not to make claims beyond those limits. Unfortunately, far too often, authors overstate the significance of their work.

One of the key responsibilities of journal editors, as well as peer reviewers, is to hold authors to a high standard. Authors should not be allowed to state any conclusion that is not fully borne out by the data. Speculation is fine, as long as it is clearly stated as such. As a former journal editor-in-chief (and frequent peer reviewer), this is probably the most common problem I saw in submitted manuscripts. The authors have a theory as to how their system works, and while the described research supports that theory, it does not prove it inconclusively. This does not, however, prevent the authors from claiming that it does. A good editor/reviewer walks back these claims to an appropriate level. Sometimes this means changing words like “necessary” to “sufficient”, or “proves” to “suggests”.

This is particularly important for observational studies. Unlike experimental studies, observational studies tell us more about correlation than causation. Unless great care is taken to eliminate all confounding factors, the conclusions from an observational study must be very carefully phrased to avoid overstating their significance. X correlates with Y is often a better-supported conclusion than X is responsible for Y.

This seems particularly problematic for bibliometric studies, especially those done on the question of an open access (OA) citation advantage. There have been, and continue to regularly be, observational studies done, looking to see whether open access to a research paper results in that paper receiving more citations than it might have if it were published under a traditional subscription model. SPARC Europe lists 70 such studies, the majority of which claim to show a citation advantage.

The problem with looking at raw numbers of studies is that, of course, popularity isn’t what matters in science; rather what matters is rigorous methodology and accuracy of conclusions. One well-done study with the right controls can overturn the conclusions of thousands of previous studies. That’s how science works.

The studies that claim an OA citation advantage are observational, generally comparing the performance of one set of articles to another, usually OA articles in a journal versus subscription articles in the same journal. At best, this can show you a correlation — the OA articles correlate with higher levels of citations (just as higher ice cream sales correlate with higher levels of murder). But nothing can be gleaned as to causation because there are too many confounding variables that come into play. The two sets of papers being compared must be as identical as possible, with the exception of the variable being tested (access). But for Gold OA studies, common sense tells us that authors with money to pay for OA may come from better-funded labs, and so the citation advantage perhaps then stems from superior funding levels. Similarly, many authors state that they save their OA funds and only use them only on their top papers, not paying for OA on their more average or lower quality outputs. So the citation advantage may just as likely be due to selection done by authors.

To truly test causation, one must perform a randomized, controlled trial, and when one does that, the direct connection between OA and increased citations does not appear to be reproducible. In an era where the reproducibility of published research is increasingly questioned, these studies should receive increased skepticism because of this failure. The problem gets even more pronounced when advocacy and lobbying groups or commercial companies loudly declare an unclear correlation to be an established causal fact.

This has been discussed and discredited ad nauseam (see here, here, here, here, and here on this blog alone), and yet I continue to regularly field questions about it from journal editors. It’s a subject that simply refuses to die, likely because many people really want it to be so, regardless of what the evidence says.

And so, more and more researchers, hoping to find an OA citation advantage, repeat the same mistakes. Either those pursuing these studies lack sufficient training in experimental design and the use of controls (this book is still my go-to for explaining necessary controls), or this is a case of advocacy overwhelming analysis. For example, in the last few weeks, two studies claiming citation advantage have been released.

The first comes from the companies 1Science and Science-Metrix. They perform a typical observational study, see a correlation and declare that, “a citation advantage exists for OA papers.” They note previous objections about selection bias in studies such as theirs and apparently decide to ignore it. The authors acknowledge that it may be a major factor, yet do not temper their conclusion in any way (other than suggesting that a future study will rule out selection bias as a factor). To be fair though, this is not a formally published scientific study. It has not been peer reviewed, and given the obvious conflicts of interest from the authors, whose livelihood depends on selling OA-related products and services, it can be dismissed as commercial marketing, rather than objective analysis.

More problematic is a formally published paper that passed through the peer review and editorial process. In this paper, the author attempts to deal with selection bias, but rather than eliminating it, he ends up replacing the typical selection bias seen for such studies with a different flavor of selection bias. The study claims to use “the equivalent of a random sample” of articles by taking its experimental population from the University of Michigan (UM) repository, Deep Blue. These are compared to articles from the same journal issues that are not found in the repository.

But this is far from a “random” sample. The study is essentially comparing articles by authors from UM to those by non-UM authors. UM is rated one of the top 30 universities in the US, and has been declared the top “Public Research University in the US” by the National Science Foundation. Perhaps there’s a citation advantage to being a UM researcher? Further, there’s no evidence presented that their control group of “closed access/subscription” articles are indeed “closed”. The author apparently assumes that if an article is not freely available in the Deep Blue repository, it is not freely available anywhere. The data set available has been “anonymized” and all of the articles de-identified. There is no way to use it to confirm the results, nor to check whether “closed articles” are indeed “closed”. It is thus impossible to know if the study compares OA articles to subscription articles, or OA articles to other OA articles in different repositories (or freely available in the journal itself, as many journals make articles free after some embargo period). The citation advantage seen may be due to the efficacy of Deep Blue as a repository versus other, less discoverable outlets, rather than a direct effect of OA itself.

While the author does hedge his conclusions (even including a “probably” in the title), his claim that self-selection bias has been removed is negated by the myriad other confounding factors the experimental design has introduced. Without adequate controls, there is no way to tell from the study the causation of any effect seen. To do that, the gold standard is a randomized, controlled trial, but these are much harder to do than observational studies.

And so we have a literature that largely consists of lesser-quality correlative studies that are contradicted by the small number of those with a more rigorous experimental design. There may indeed be an OA citation advantage, but until there is reproducible experimental evidence that it exists, it remains, at best, speculation. Performing more and more correlative studies without proper controls will still not be adequate to prove causation. In science, quantity should not trump quality.

The likely next battleground for this type of wishful thinking is the impact that social media plays on readership and citation of scholarly articles. Initial observational studies and anecdotes suggest a correlation between a paper appearing in various types of social media and increased readership and citation. Yet two randomized, controlled trials done by the journals Circulation and International Journal of Public Health showed no impact of social media activity whatsoever on readership (both studies) or citation (the latter study). Both studies are fairly limited, but the authors do a good job of carefully stating their conclusions — neither makes a blanket statement and both suggest the subject to be complex, rather than social media being without value or providing an automatic boost across the board.

Like the OA citation advantage studies, there are conflicting agendas that will come into play here, with both advocacy and financial gain potential factors in the conclusions one draws.

There remains something distasteful, if not downright unethical in suggestions that authors can buy a better reputation by paying for services, whether OA or promotional. Because of this, we must not let down our standards for scientific rigor, even if we really, really want something to be true. Bibliometricians, if you’re going to work in this area, you need to step up your game, and journal editors, you must hold them to the same high standards to which you hold other authors.

Remember, “I’ll believe it when I see it” is science. “I’ll see it when I believe it,” is religion.

David Crotty

David Crotty

David Crotty is a Senior Consultant at Clarke & Esposito, a boutique management consulting firm focused on strategic issues related to professional and academic publishing and information services. Previously, David was the Editorial Director, Journals Policy for Oxford University Press. He oversaw journal policy across OUP’s journals program, drove technological innovation, and served as an information officer. David acquired and managed a suite of research society-owned journals with OUP, and before that was the Executive Editor for Cold Spring Harbor Laboratory Press, where he created and edited new science books and journals, along with serving as a journal Editor-in-Chief. He has served on the Board of Directors for the STM Association, the Society for Scholarly Publishing and CHOR, Inc., as well as The AAP-PSP Executive Council. David received his PhD in Genetics from Columbia University and did developmental neuroscience research at Caltech before moving from the bench to publishing.

Discussion

47 Thoughts on "When Bad Science Wins, or "I’ll See It When I Believe It""

If the history of science and science and technology studies have shown us just one thing it is that science and scientific work is complex (like most other things we encounter). The depiction of science as the search for facts just waiting to be found and exposed and where “experiments” are the main route to finding these facts is at best juvenile, at worst self-delusional. Science is a building process in which the scientist through all sorts of methods and approaches (including experiments if possible) attempts to construct an image, a depiction of something. That something may be the speed of light, climate change on Earth, crime, political parties, hydrogen, etc. As Heisenberg pointed out some time ago observing changes the observed and the observer. So scientists change, help create what they observe, and are in turn changed by what they create. This process is thus complex. Your simplistic description of it does a disservice to science, scientists, and the events and actions being constructed. It also undercuts the relevance and usefulness of science work. Apparently you do not accept that “clockwork” science doesn’t work.

David really wanted to write about OA, but he began with generalizations: “One well-done study with the right controls can overturn the conclusions of thousands of previous studies. That’s how science works.” No problem with the first sentence. But the second needs some qualification. In the long run it is correct, but quite often in the short term it’s not “science,” but science politics that “works.” If those responsible for the “thousands of previous studies” have the peer-review high ground it may take some time for the focus to shift.

All TA vs. OA citation advantage research seems very superficial — and generally driven by commercial interests on both sides — because it merely looks at the numbers without understanding how those numbers came to be. In a truly experimental setting you’d find a whopping OA citation advantage, though.

Here’s the design so anyone can run the experiment and conclude for themselves:
1. Found two journals sharing the same editorial board and review process; one TA, the other OA.
2. Simultaneously publish each accepted manuscript in both journals.
3. Let the journals simmer like that for 5 years.
4. Tally up the citations to both journals and see which one has more.

Result: the OA journal has significantly more citations.

It pains my heart to even think how much cash and time has been, and apparently continues to be thrown down the drain to fuel this essentially pointless debate.

I suspect there would be great objection and confusion that would occur if the same articles were published twice with different citations. It might look bad for an author if they had to continuously explain that they weren’t deliberately doing something unethical.

I would instead suggest the protocol used in this study. The results may surprise you:
http://www.fasebj.org/content/25/7/2129

Well it’s supposed to be performed as a thought experiment rather than a practical one.

That study… Yeah, I’m no Gordon Ramsay so I’ll refrain from putting a chef’s creation on blast in his own kitchen. It is fairly limited, though, and it would be interesting to see a similar study exploring not only the number of citations, but also the related citing author data compared with journal subscriber data. Is there a study like that anywhere out there?

As it stands, no, it’s not surprising at all that the OA citation advantage doesn’t manifest itself if a given journal’s core readership subscribes to it in the first place.

3,245 articles in 36 journals in the sciences, social sciences and humanities from 7 different publishers is too limited?

As it stands, no, it’s not surprising at all that the OA citation advantage doesn’t manifest itself if a given journal’s core readership subscribes to it in the first place.

This doesn’t logically track. Nearly every observational study looks at hybrid journals and compares subscription articles in those journals to OA articles in those same journals. If subscription to the journal by a “core readership” was a confounding factor in the Davis study, wouldn’t it have had the same effect in the observational studies that show the opposite result?

What you’re suggesting is that OA articles are cited more by non-journal subscribers than non-OA articles, but that’s not the question being asked here. The question is whether overall, OA articles are cited more than non-OA articles (by everyone). One would think that if there’s a huge body of non-subscribers adding in citations only to the OA articles, then those would have an overall citation advantage, which was not seen.

It’s very broad but lacking in depth is what I meant. Plus there’s a hefty ‘Limitations’ section in there, so I assumed it’s really not that controversial to point out.

If subscription to the journal by a “core readership” was a confounding factor in the Davis study, wouldn’t it have had the same effect in the observational studies that show the opposite result?

I didn’t check out the other studies since you recommended this one as the legit one, but I assume it does have the same effect in those, too. If I were to guess, it likely skews the other way due to journal selection bias. On the flip side, the study you link mostly focuses on society journals, commonly subscribed to and read by society members themselves, especially if subscription is included in membership fees. It’s a bad place to look for the elusive OA citation advantage by default.

The question is whether overall, OA articles are cited more than non-OA articles (by everyone).

Do you feel that a five year old study of articles published in 2007 still reliably answers that question to this day?

Can you explain what you mean by “depth”?

And a hefty limitations section is a good thing — it means the author recognizes the limitations of their research and is making an honest effort to fairly evaluate the data without over-reaching.

I didn’t check out the other studies since you recommended this one as the legit one, but I assume it does have the same effect in those, too.

Actually just the opposite. Davis did a randomized, controlled study of articles within the same hybrid journal and saw no citation advantage. Others did observational studies on articles within the same hybrid journal and saw a citation advantage. If it definitively skews the results one way, it shouldn’t show the opposite effect in both types of studies.

On the flip side, the study you link mostly focuses on society journals, commonly subscribed to and read by society members themselves, especially if subscription is included in membership fees. It’s a bad place to look for the elusive OA citation advantage by default.

Why is this a bad place to do such a study? Do you have evidence showing that society journals are more broadly subscribed to/read than journals without a society association? I’d love to see it if you do. Regardless, should one only look at journals with a poor subscription base? Wouldn’t that bias the results? If the effect is as robust as it is claimed to be, it should be broadly consistent.

Do you feel that a five year old study of articles published in 2007 still reliably answers that question to this day?

That’s a really good question. I’d be willing to bet that if anything, the effect is largely reduced these days due to the factors that Marie McVeigh discusses in her comment here:
https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/#comment-161607
There are very few articles that remain solely subscription access any more, between repositories and piracy sites like Sci-Hub (not to mention quasi-legal sites like ResearchGate and Academia.edu). There is no longer as fine a line between “open” and “closed” these days, so even if there is some citation advantage, it would be reduced as more articles are more open than ever before.

Let’s do a thought experiment (one I first proposed in 2005): what if all articles were OA? Scholars could access, read and cite anything they wanted or needed? Articles by rock-star authors, or in highly respected journals would be cited more than articles by less well-known authors, or in less familiar outlets (reputation effect). Articles with lots of citations already would get lots more (Matthew effect – but also a simple fact of awareness). Articles in high-volume fields with short half-lives of research outputs would be cited more quickly and more than articles in smaller or more slowly-moving fields (time and tide effect, let’s call it). Articles in “hot” topics or with revolutionary findings would be cited more quickly (sex-appeal effect).
Good articles – on the whole – would be cited more than poor articles (the quality effect).

This sounds very like the 1980’s subscription-only world.
And quite a bit like the hybrid world now.

If there are “OA” effects or “social media” effects, or “as the clock strikes midnight” effects (http://arxiv.org/abs/0907.4740) – these might influence one or two citations on an individual article but do not hold up across the whole scholarly output.

Can you explain what you mean by “depth”?

I mean it’s not sufficiently taking the very practice of citing into account; it’s not sufficiently granular. Looking into detailed citing author data and journal subscriber data is required for that. The study glosses over the fact that all TA articles in a journal are de facto OA to its regular subscribers.

Ideally, a citation is born of relevance (to a citing author’s work at hand) and access (by a citing author), in that order. Comparing citations to different articles in a same journal assumes that order to be inverse to begin with. What you want in that situation is to at least control for those who have access either way (i.e. de facto OA) and see if, on average, articles placed in OA accrue more citations by non-subscribing citing authors over time compared with strictly TA articles. If they do, that’s your OA citation advantage at work right there; it’s just hiding a few layers deeper in the numbers.

Otherwise it’s just crunching numbers to essentially prove that ‘access’ condition is satisfied for most citing authors.

Do you have evidence showing that society journals are more broadly subscribed to/read than journals without a society association? I’d love to see it if you do.

No, but I’m not claiming that either. I’m saying society journals likely have a substantial subscriber base, and dedicated reader-cum-citer base in their society members in the first place. If there are society journals willing to volunteer subscriber data, we can verify or disprove it together cause I’d love that too.

Regardless, should one only look at journals with a poor subscription base? Wouldn’t that bias the results?

No, not only, but shouldn’t they be included in a representative sample? Aren’t the results biased if they’re not?

You’re asking research questions that are beyond the scope of the study. This is a simple study. Does making an article Gold OA lead to more citations? Yes, there are differences between articles, but in any medical study, there are differences between individual patients. You overcome this by randomization and using enough examples to provide adequate statistical power. That’s why the correlative studies are questionable — their samples are likely not randomized, leading to biases.

And if OA consistently offers a citation advantage, shouldn’t that be observable over the spectrum of journals? Or does it only work under very specific conditions? That’s not the claim that has been made nor the claim that’s being tested.

I agree that the FASEB paper’s underlying protocol is good, but it only looked at what happened over a three year period, and the treatment articles differed from the control only in that they weren’t embargoed for the first of those three years.

As a result it says something about the effect of immediately opening an article versus waiting a year to do so, but that wasn’t the norm in academic publishing then, and it still isn’t. That this paper is cited as a significant study of the effect of OA vs. closed/toll access is what surprises me.

Not every study can cover every aspect of every subject. A good study is deliberately limited and focused on the question it attempts to answer. The FASEB study does exactly what it set out to do and makes no statements that go beyond the data collected. It is also a study that employs appropriate controls and attempts to test a phenomenon experimentally, rather than trying to piece together causation from correlation. That is why it is cited as a significant study in this area.

This experimental design makes no sense because no one would subscribe to the TA journal. The a priori “whopping OA citation advantage” claim is merely argument by assertion.

This is a genuine unresolved empirical issue. As with many social science issues it is very difficult to settle, precisely because controlled experiments are not possible. Note that this is also true of other observational sciences, such as astronomy, geology and ethology.

Forgive my ignorance if I’m being dense here, but wouldn’t the above scenario constitute a perfectly controlled experiment? There’s one publisher with two virtually identical journals starting from scratch, and the only variable to watch for is mode of access to articles (reader-paid or author-paid).

Of course, the experimental results would not translate well to the real-world uncontrolled environment in which reader-paid access is the norm.

First of all, it varies enough from the standard model of publication that it would call into question any findings — is the citation behavior affected at all by double publication? Does the attention that double publication draws add to the citations received by the papers or does it detract from them? If I see someone publishing the same work twice, that harms that author’s reputation in my mind, and I’m much less likely to cite their work.

You’re also looking at start-up journals, which vary significantly from established journals. Often, when one starts a new subscription journal, all articles are made freely available for several years in order to build a readership and then one starts to sell subscriptions. If you start your experiment immediately upon launch, you’re going to have an OA journal available to everyone and a subscription journal available to no one. That’s not a very fair test. Given how tight library budgets are, you’re very slowly going to be adding a few subscriptions here and there over a long period of time, if at all (it is extremely difficult to sell a new subscription journal into libraries these days). It is unlikely that the results from such a skewed experiment would be comparable to those seen in an established journal with tens of thousands of subscribers.

Then there are factors out of your control–do both journals receive an Impact Factor? Is it equal or is one higher than the other? Do both journals get picked up by all the same abstracting and indexing services? Are both journals marketed exactly equally? Which journal’s name falls alphabetically first on lists?

I could go on, but the most important points are made in the first two paragraphs. You’re creating a system that is too far removed from the system you’re trying to test. Better to compare articles within the same established journal, as it negates any of those differences.

If I see someone publishing the same work twice, that harms that author’s reputation in my mind, and I’m much less likely to cite their work.

Assume it’s a paper very relevant to your own work, and since we’re talking hypotheticals leave your ethical and reputational concerns aside for a moment. Given the choice between the two, would you pay to read and cite the TA one, or would you read and cite the OA one?

It is unlikely that the results from such a skewed experiment would be comparable to those seen in an established journal with tens of thousands of subscribers.

I agree, but that’s been my point this whole time: all other variables being equal, there is an OA citation advantage. It may not be readily apparent in the real world at this or that one specific moment and place in time due to various confounding factors, but it’s there alright.

Ah, okay, this makes clear some of the problems in your assumptions.

First, you’re assuming that for each article, the reader has to make a choice between purchasing it or not purchasing it. This is not the way journals are sold (directly, via pay-per-view to individuals). Pay-per-view makes up a tiny, nearly non-existent portion of journal readership. The vast majority of traffic (nearly the entirety of it) comes from readers at subscribing institutions. That choice simply never has to be made for most researchers.

Second, you’re making the assumption that increased readership automatically leads to increased citation. On the surface, this would seem to make sense, but it is not the case. For example, clinical practice articles are widely read, often much more widely read than research articles in medical journals, yet only very rarely cited. This is because they are read by clinicians and practicing physicians, not by researchers who are going to do the next experiment and write the next paper. These are people treating patients, not doing research.

The OA articles in the journals I run (or the ones we make free for promotional purposes) are often among the most-read, but not automatically the most-cited in the journals.

In the FASEB study cited above, making a random set of articles OA massively increased their readership, yet did not increase their citations whatsoever. One can speculate that what OA does is what it promises to do — to increase access to articles to those who do not currently have access. But the majority of those who do research (and would write the next paper and provide citations) already have good access to the papers (surveys of academic researchers confirm this). So it is likely that those most benefiting from OA are from outside of the formal research world — patients, policy makers, educators and those in private industry, all consumers of the literature, rather than producers.

This is surely a good thing on its own, more widespread readership of research.

I agree, but that’s been my point this whole time: all other variables being equal, there is an OA citation advantage. It may not be readily apparent in the real world at this or that one specific moment and place in time due to various confounding factors, but it’s there alright.

And here is where you veer away from science and into religion. It just is there, you know it, even though you have no proof and when it is experimentally tested, it fails to manifest. You are asking others to take this on faith.

Human observers want things to follow the logic of what we already know about how things work; we want the observational world to “make sense.”

The first reported associations of OA with citations landed in the mission-driven early days of OA publishing. It “made sense” that articles which were easily accessible would be cited. I can’t (or at least shouldn’t) cite something I haven’t read. I can’t read something to which I do not have access. Therefore, more access should generate more citations – and darned if the observation didn’t show just that!

The problem of course, surfaces when “therefore” and “should” work together to predetermine the interpretation of an observation. Because the “sense” arrived first, the observation was a perfect fit for what we already believed. A simple, inferred explanation for a simple, observed phenomenon.

That is why the notion is so hard to dislodge.

As early as 2005, a mechanistic study showed that the “Open Access Advantage” was more probably an Early-Access advantage and was short-lived: Kurtz et al 2005: http://dx.doi.org/10.1016/j.ipm.2005.03.010 or http://arxiv.org/pdf/cs/0503029.pdf ). That “made sense” too, indeed, it made more sense, but was less emotionally appealing, and it seemed to make no dent in the notion of OA Advantage.

The landscapes of Access and of Open have changed. OA versus Non-OA is no longer as clearly defined or easy to identify as it was in 2003 and 2004.

If a 2015, NIH-funded article is cited this year – is that citation due to the article being openly accessible on PMC in 2016 (per mandate)? or due to the citing author’s library-mediated access in 2015? or due to the Author Accepted MS in the university archive since late 2014?

Or, could it be that the article was cited because it was relevant to the work at hand?

Doesn’t that make sense?

Indeed, social science issues with strong policy implications are often unresolvable. Just look at economics.

I agree that the motivation for citation is not resolvable; the motivation for making an article OA is similarly not resolvable.

It is possible, however, to begin with a set of articles, make a random selection of them OA and the rest subscription access and determine both timeline and net citation count across a set period of time. Phil Davis did just that. (cited above and here: http://www.fasebj.org/content/25/7/2129

“Articles placed in the open access condition (n=712) received significantly more downloads and
reached a broader audience within the first year, yet were cited no more frequently, nor earlier, than subscription-access control articles (n=2533) within 3 yr.”

My question is: why do people still believe that OA confers advantage? I think that’s where the strong policy implications come in,

Phil’s work is excellent but hardly conclusive, just as a single trial cannot determine the efficacy of a new drug. I think that at a minimum this trial would have to be repeated many times, with different starting points, and a probability distribution built up.

I may be able to think of a simpler approach, using existing publications.

According to Crotty, “A good editor/reviewer walks back these claims to an appropriate level. Sometimes this means changing words like ‘necessary’ to ‘sufficient’, or ‘proves’ to ‘suggests.” Per the post, “popularity isn’t what matters in science; rather what matters is rigorous methodology and accuracy of conclusions.” But none of this matters, especially “rigorous methodology” (whatever that is) if the scientists involved and worse still also the journal editors involved start with a Science 101 understanding of science. Every 101 textbook includes the “rigorous methodology” and “science isn’t about popularity’ quotes.

So you prefer poor methodologies, no controls, and the idea that lots of bad studies are better than one really good study. You feel that authors should be able to make unfounded and unsupported claims in their research. Okay.

Dr. Crotty, I prefer doing actual science with all the messiness, uncertainty, partiality, and observational blind spots. The sanitized view of science you present merely covers all this over. Pretending that scientists can do things and achieve results that are simply not possible. Once you accept this then we can discuss how researchers and reviewers of their work and publications might be able to interact and with what possible results.

I understand this well from the fifteen or so years I spent at the bench. But if one is a good scientist, one acknowledges that partiality and those observational blind spots and designs adequate controls to overcome them as best as one can. Then, when forming conclusions, one recognizes that messiness and uncertainty and makes sure not to overstate what can be gleaned from one’s research.

The complexity of a subject being studied does not grant one carte blanche to be dishonest.

Just one example from science might help here. Was it okay for Pasteur to falsely present the results of his work with anthrax? Pasteur was brought in to “cure” anthrax on farms, a deadly killer of livestock. But Pasteur did not cure this anthrax. He cured the anthrax in his laboratory. While the two versions look nearly the same they do not behave the same. Pasteur believed it counter productive or even harming the efforts to make livestock safer from anthrax to try to explain the difference and what it means to farmers, politicians, and government bureaucrats. What do you think?

If, as an editor or peer reviewer, Pasteur submitted a scientific paper to my journal and made claims that were inaccurate and not supported by the experiments he did and the data derived, I would have rejected that paper or required him to restate his claims so that they represented the truth. Scientific journals should be in the business of accurately reflecting the scientific record, not deliberately distorting it to drive political advocacy.

I’ll throw a counter-hypothetical at you: I’m the editor of a journal, and I receive a submission of an article claiming that the MMR vaccine causes autism. The author fervently believes that this is a significant health crisis. In reading the study, it is clear that it was poorly designed and lacked adequate controls. The conclusion reached, that the vaccine causes autism, is not supported by the data, which merely shows a correlation of the onset of symptoms of autism with the age where the vaccine is received. It also becomes apparent that the author stands to gain financially from connections to a law firm that plans to sue vaccine makers.

Given “all the messiness, uncertainty, partiality, and observational blind spots” in science, should I publish this paper?

First, the illustration about Pasteur is not a hypothetical. But your response to my question is disturbing. When Pasteur did the work on anthrax the “germ theory” was new. So new that many scientists had not yet accepted it and the general public (including farmers and government bureaucrats) either did not know of it or did not understand it. If Pasteur had published all the results of his work it would have put this theory in jeopardy. Which would of course threaten the only cure for anthrax shown to work. Plus in order to make his explanation comprehensible to the “lay person” Pasteur would have had to provide each a background in germ theory, laboratory procedures (including microscope use),

Nor is the illustration about Andrew Wakefield and autism theoretical. And when Wakefield published his paper on autism, his theory that vaccines cause the disease was also new, “so new that scientists had not yet accepted it and the general public either did not know of it or did not understand it.” That’s why we require proof. Just because something is new and has potential merit does not mean we should automatically accept it because we want it to be true, or because it might be proven true at some point down the line.

Further, the role of scientific journals is to serve as a conversation between experts. They are not written for the lay public. Articles are jargon-laden and assume a high level of background knowledge.

If the science in journals is beyond the life experiences of the layperson, it’s not surprising that science and scientists are not trusted by these same laypersons. One final word on the words proof and fact. You bounce these around like they are something solid and fixed. They are more like quicksand. Proof can sometimes help us see events a bit more clearly. But proof can also lock us in a direction that later is difficult to escape. Best be skeptical of proof. And the more certain the scientist says it is the more skeptical one ought to be. And then there’s what makes up proof, facts. Facts are not given in nature. They are made up via the interactions through which all events come to life. Mary Poovey’s book, “A History of the Modern Fact” and Theodore Porter’s “Trust in Numbers” provide insightful histories of two the major elements of what makes modern science modern science. Being schooled in these histories would seem to be a prerequisite for reviewing scientific research and publications. But maybe I’m off base here.

If journals were to publish new results for the layperson, each article would be as long as a textbook and it would destroy their efficacy. Journal articles are a highly evolved form that is meant to serve a very specific purpose, the high level conversation between experts noted above. If you want to communicate to the lay person, there are other forms of communication that are better suited for that purpose.

We can quibble over vocabulary, and perhaps I’ve lost the thread of what you’re arguing here. I do recognize the fluid nature of science, and that this week’s results will likely be expanded upon, if not completely overturned by next week’s. That’s a given, particularly in fast moving fields like biomedicine. All the more reason to be skeptical and to have very high standards when looking at an extraordinary claim that’s made on scant evidence and where there are obvious controls that have not been performed. My argument is less epistomological and more pragmatic. When trying to understand a set of experiments, having a strong grounding and high standards in experimental design is important.

Apology. Accidentally sent last post before it was complete. Aside from the problems facing Pasteur already cited he also faced the usual problem of conflicting and contradictory results from the treatment trials from the vaccines he created for anthrax. Some showed positive results, some negative, and some no results at all. But the general direction was that several versions of the vaccine worked to stop anthrax. So if the no effects and and partial results cases are dismissed, as they often are in such experiments then anthrax is cured. Over time the cure generally was a success, thought not 100% so. But then what is 100% anything?

But you’re also cherry picking an example where the researcher happened to be right. What of the horrors perpetrated by eugenecists? What of the harm to public health due to Wakefield being pretty sure he was right without adequate evidence? Are we to accept everything someone thinks might be true? Would you let your own child be given a treatment for which there was no clear evidence, and in fact (as in the papers mentioned above), there were holes in the conclusions big enough to drive a truck through?

You keep missing the point. The point is truth and facts are always uncertain and changeable. So there is always a risk that even the most diligent researcher and research reviewer will be wrong. We accept that risk or not. That’s a choice in doing research and in reviewing research findings. You seem to want a “bright line” and to believe that you can always be on the correct side of that line. You cannot. There is always some evidence for and against almost anything. At what point that makes it to “clear evidence” is a judgement call. Considering the factors from which the judgement emerges is vital in both research and in reviewing research results. I’m merely asking that you consider these questions.

As to journals and laypersons I agree that journals are not the best place to communicate between scientists and laypersons. But at the same time laypersons ought to be able to recognize some relevant connections between what’s said in journals and their own lives. It’s one of my major concerns with scientific research publication and journals today that there is often no such recognition.

I think we’ve been arguing past one another the entire time. You’re making a point about the nature of knowledge, that it’s fleeting and mutable. That’s a given. I’m making a point that there should be a minimum standard of rigor for accepting someone’s claim about their research (fleeting and mutable as that conclusion may be). That person’s claim may later be shown to be 100% correct, but it would be irresponsible to accept it at face value without convincing evidence, particularly when it comes to public health, or in this case, sustainability of an industry. If you are labeling a correlation as a causation (as is the case in the studies discussed here) and if you haven’t done adequate controls (as is the case here), then you probably haven’t passed that minimum standard.

But at the same time laypersons ought to be able to recognize some relevant connections between what’s said in journals and their own lives. It’s one of my major concerns with scientific research publication and journals today that there is often no such recognition.

Which probably says something about the complexity of research at this point and the level of science education. I’d also argue that some of the most important research doesn’t have any relevant connection to the life of the lay person. A good example here:
https://scholarlykitchen.sspnet.org/2014/07/25/the-importance-of-funding-basic-science-research/
But journals aren’t designed to make those connections anyway, and there are other means of communication that do a better job (one example here: http://blog.oup.com/). If every molecular biology article had to start with a lengthy introduction to the structure and function of DNA and then cover every single development that’s happened since Watson and Crick, it would no longer serve its current purpose.

There is at least one reason why there are more downloads for OA articles – ready access for many of us in developing countries. We know that it may not be the best paper, but then there is little choice. But one should be able to judge which OA journal is credible for further citation. I suspect it actually is a much more complex issue. .

Is it wrong to demand proof when someone declares as factual, a causal relationship between two factors? Theoretical work and speculation are both valuable, but must also clearly be stated as such. I am asking here, for clarity and honesty — don’t state something as proven unless you have actually proven it. Is that too much to ask?

It may be an interesting topic to the author, but it is hard to understand why. Pushing self-interest (“get more citations!”) in order to advocate for OA is very distasteful. It might be of interest to show that the claims are false, and stop people wasting their time on poorly designed studies. But since (a) OA articles surely won’t be less cited than in the subscription world, if the entire system converts to OA and (b) articles will surely be read by more people, including non-professional researchers if they are in OA outlets, all other things being equal, it is clear that an all-OA system has many benefits. Other than the loss of comfortable lifestyle of some in the publishing industry, how would converting all established TA journals to OA (for example, by reallocating subscription funds) not be a clear improvement on what we have now? Why is the publishing industry resisting it?

OA does clearly have great benefits, but one must approach it in a responsible and sustainable manner. Making false claims that it is a magical unicorn that can automatically elevate a researcher’s career may do damage to progress in the long term.

And as the recent study from the University of California library system showed, reallocating subscription funds does not work:
https://scholarlykitchen.sspnet.org/2016/08/09/the-pay-it-forward-project-confirming-what-we-already-knew-about-open-access/
Universities that do a lot of research would be hit with enormous costs, and since in the US, subscription funds come mostly from tuition and student fees, these are not easily transferable between individual private and state-run institutions.

The industry is resisting suddenly converting all established TA journals to OA for at least three very good reasons. First it is difficult and expensive. Second, it is extremely risky, as there is no reason to believe that specific journals or publishers will survive the transition (microeconomics trumps macroeconomics). Third, the authors in general are not asking to pay for what they now get for free. The hybrid journal is a good compromise, which is why it is widespread.

Regarding the PLoS article you refer to (and which I wrote), I’ll try to address the concerns you expressed.

The anonymization of the data was required by both the journal publishers, via the licensing agreements that made their articles OA, and Thomson-Reuters as a condition of using their citation data. This was their right as content owners, of course. As it is, because of that anonymization you seem to mistrust the assertion that the control group of published articles was in fact closed/toll access. I didn’t check all ~90,000 in that group, but did check a sample across the full span of dates and journals to confirm what common sense and Occam’s Razor suggest: the overwhelming majority of articles used as a control group were, and still are, closed.

(And at the risk of implicitly giving too much information, the publishers in question aren’t difficult to determine by looking through Deep Blue. If you did that you’d find they are prominent and prestigious, but not notable for their strong and active support of open access. There were a few exceptions—I found 3 out of ~3000 articles checked that needed to be removed—but I’m confident that there aren’t many, or enough to skew the results. )

Regarding the limitation of looking at only U-M articles, if there was another set of published articles made OA that included a broad range of disciplines, years of publication, and institutions I’d have been glad to use it. This was a concern of one of the paper’s reviewers as well, but such a dataset doesn’t currently exist, and publisher practices (read: the vast majority of contracts authors sign don’t allow no-cost OA for the published version of the article) mean it’s unlikely to exist soon.

Finally, while U-M is indeed a top-tier institution, the concern that a given U-M authored paper is necessarily better/more highly cited than other papers in the same journal issue isn’t borne out by the data. As you read in the article, “the mean for Oc [the number of citations to U-M authored articles during the period when it was closed] was 6% more than Cc [citations to the other articles during the same period], while the median was 14% less, so an acceptance bias might be argued either way.” The same can be said for a quality bias.

I certainly agree that conflicting agendas come into play here on both sides of the OA discussion. I know that as a repository manager I have a bias towards promoting open access, and would have been happier if my study led to an even stronger conclusion. And having (I hope) addressed your criticisms, I still acknowledge that the dataset is imperfect: the ideal would be for the OA papers to have been opened immediately upon publication, without embargo periods of one or more years. Again, we’re limited to the data we have or can collect, and I’ll note that the BMJ study you prefer looks at only eleven journals in a single discipline, did not have a control group of articles that remained closed, and only looked at one year of citation data. The FASEB study you also like is a little better, with more (and more diverse) journals in its dataset, but it also didn’t have a control group of articles that remained closed, and it only looked at citations for three years.

That doesn’t make them bad science; it’s not bad to work with the best data available and then make sure your claims accurately reflect your results. That’s what the PLoS article does, even though it arrives at a different conclusion from the studies you prefer.

Hi Jim,

Thanks for the further information. It’s really disappointing to hear that publishers required you to anonymize what it essentially public information. Anyone who subscribes to WoK can immediately look up any given paper and see its citation record. Journals are increasingly promoting Altmetrics including citation count for articles, and many, if not most platforms, clearly display any citations for an article on that article. It also makes the data generally useless in terms of reproducibility.

Your paper would have been improved had some the qualifications you note above been included, particularly efforts to ensure that papers labeled as “closed” were indeed “closed”. I suspect that a much higher proportion than the 0.1% you suggest would fall into that category. Anything with NIH funding, for example, is required to be freely available within 12 months of publication. Increasingly publishers make all articles in a journal freely available after some period (most of OUP’s science journals do this after 12 months, Science is at six months, Nature 6-12 months depending on the journal). Given the increasing requirements from funding agencies and institutions, this practice will only increase, and as Marie McVeigh suggests in her comment above, there are very few articles that aren’t openly available through some means.

The reason I use the term “bad science” is the over-reach on conclusions discussed in the first few paragraphs above. At best, your study can only show a correlation between the OA papers and citation levels. The word “correlation” does not appear once in the paper. Stating that OA directly grants a citation advantage is unsupported by the data because no causal relationship has been shown.

And to be clear, the FASEB study was a look at the impact of immediate Gold OA, not an attempt to study further effects of later Green OA. It didn’t need a control group for something that it didn’t attempt to study nor make conclusions about.

Thanks David,

Like you, I wish publishers weren’t so restrictive about what is, as you say, essentially public information. The irony is that this is the open access conversation in a nutshell!

Regarding the 0.1%, you may be right that it’s higher, but I’m confident that it’s not a lot higher, given the 1990-2013 timeframe of my study. As you point out, some publishers have begun to open things up after a fixed period, but even now it’s still not the norm, especially for the largest commercial publishers. Either way, it would have been good to put this in the paper itself; I should have done that.

I also agree that the text of the FASEB paper itself is cautious. However, it’s often offered up as what you say it shouldn’t be used for. As you wrote above: “To truly test causation, one must perform a randomized, controlled trial, and when one does that, the direct connection between OA and increased citations does not appear to be reproducible.”

Qualified language to be sure! But it doesn’t acknowledge the many limitations of those studies’ datasets, and the implication that the FASEB and BMJ studies nailed down—with rigor lacking in studies that find an OA citation advantage—the absence of a citation advantage is pretty strong.

I think there’s an unfounded paranoia that comes with being in a competitive business. There’s a fear that if you let the slightest bit of information out about your company, others will seize on it and use it to their advantage against you. Look at how coy and oblique Amazon is about any of their sales figure announcements for example. And here it’s even more absurd, because they’re trying to protect information which is already readily available and in the hands of their competitors, all of whom likely subscribe to WoK.

2013 is pretty far along in these matters. The NIH policy has been codified by law since 2008. ResearchGate was launched in 2008 and had over 1M users by 2011. Academia.edu was similarly launched in 2008. OUP’s policy on making content free after some time period dates back to at least 2010 when I joined the company, and we were hardly a pioneer in this area.

Qualified language to be sure! But it doesn’t acknowledge the many limitations of those studies’ datasets, and the implication that the FASEB and BMJ studies nailed down—with rigor lacking in studies that find an OA citation advantage—the absence of a citation advantage is pretty strong.

As always, it’s impossible to prove a negative, but here, it can be stated that there has been no proof of the positive. When tested experimentally (at least so far, using rigorous methodologies), the proposed citation advantage does not happen. Without evidence of causality, we cannot reasonably conclude that the effect is anything more than the correlation shown in the observational studies.

Comments are closed.