The Paranoia of Publication “Bias” — How a Study Proves Its Point by Making Its Point

: Paranoid Android via Flickr

States that receive more research funding publish more papers, and publishing more papers means publishing more positive (described as “biased”) results. If you follow the argument that productive science is “biased” science, then you’ll appreciate a new article published in PLoSONE last week.

The paper, “Do Pressures to Publish Increase Scientists’ Bias? An Empirical Support from US States Data,” by Daniele Fanelli, is a great example of a paper that may be technically fine, but fails miserably on theoretical grounds, and the methodology reveals it.

The author defines a scientist’s “research environment” as a geographical state — not an institution (where you’d imagine that most variation exists) — and examines the relationship between the percentage of papers reporting positive results with state productivity and expenditures in R&D.

This is like saying that New York State has a research environment, when in fact you’ll find disparate research environments depending on whether you are a scientist at Cornell University or SUNY Cortland, for example. Luckily, the author reveals the weakness of his hypothesis in the paper:

Although the confounding effect of institutions’ prestige could not be excluded (researchers in the more productive universities could be the most clever and successful in their experiments), these results support the hypothesis that competitive academic environments increase not only scientists’ productivity but also their bias.

Not only does the author exclude the differences in research cultures across institutions, but quality differences across journals. Higher-quality journals are much more selective of what they publish and many will only accept papers that advance their field — this means positive results.

In other words, the author ignores the nature of academia: that better scientists go to better institutions with stronger research cultures, receive more federal research funds, do better science, publish papers in better journals, and so on.

This doesn’t sound like “bias” in the negative sense.

Most ground-breaking scientific research in the United States is conducted within a small number of elite institutions of higher education. This is not an indication that the system is biased or flawed or broken — it is an indication that the system is working by concentrating talented individuals with resources in places where they can achieve more than by working alone. This is the social stratification of science at work.

Ultimately, “publication bias” is a paranoid topic. It assumes that there is an intent to distort what is known about a topic, and it can happen at different levels. At the funding level, a pharmaceutical company may condition the release of results based on whether the data support the efficacy of a new drug. Researchers, themselves, may self-censure, holding back negative results (or at least those that do not support a widely-held theory) under pressure to publish their work. Some editors may also favor manuscripts with positive results, deeming manuscripts with negative results just plain uninteresting and therefore not worthy of publishing. I’d like to think that these editors are doing what is good for science, and not practicing “publication bias.”

The fundamental problem with the PLoSONE paper is not that it reports a positive association between publication output and positive results. Its problem is that it states its hypothesis backwards, assuming no geographic variation in results and then being shocked to discover that it has revealed “publication bias” without first ruling out higher-quality research as an explanation.

By twisting ordinary, uninteresting negative results into positive results, this is a good example of the very publication bias the author is attempting to illustrate.

Phil Davis

@ScholarlyChickn

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/

Discussion

23 Thoughts on "The Paranoia of Publication “Bias” — How a Study Proves Its Point by Making Its Point"

Is this paper really “technically fine”? As you point out, the theory, the methodology and the conclusions drawn are flawed. I’d go one step further and state that this is more likely a failure of PLoS One’s peer review process to do even the primitive job to which it aspires. If the selection of papers and judgment of results is based on the vocabulary used (they searched for the phrase, “test the hypothes*”), then isn’t it more a study of the semantics of writing about science, rather than any true examination of the quality of results?

By David Crotty
Apr 27, 2010, 8:16 AM

I really would like detailed clarification of this:

“Some editors may also favor manuscripts with positive results, deeming manuscripts with negative results just plain uninteresting and therefore not worthy of publishing. I’d like to think that these editors are doing what is good for science, and not practicing “publication bias.” ”

So, this means that I conduct 10 randomized control trials (RCTs) on a drug. In 9 of the RCTS, the results are negative (by your definition, uninteresting). In fact, in 2 of those 9 RCTs, a few of the participants die. (You can’t get much more negative than death). But, ultimately, the results are not positive in these 9 RCTs.

In just 1 of the 10 trials, however, the results are positive. So I publish only the results of that 1 trial, (because I’m exercising “self-censure”, and it’s also more pleasing to editors who want to publish only positive results.)

And your comment on this is : “I’d like to think that these editors are doing what is good for science” ?

And this is supported by your earlier comment that “Higher-quality journals are much more selective of what they publish and many will only accept papers that advance their field — this means positive results.” So you equate “positive results” with advancing the field.

I would argue that this attitude of editors (if, indeed, it exists), combined with the other issues you raise (pharmaceutical companies conditioning “the release of results based on whether the data support the efficacy of a new drug” and researchers releasing only “positive” results) is exactly the kind of bad science that leads to deaths every year because of drugs that are dangerous. And frequently, these drugs are prescribed by doctors who believe they are acting in the the best interests of their patients because there is strong evidence supporting the use of the drug (those doctors just never know about the RCTs that were were not interesting enough to be published).

If there are any editors in the world who would be prepared to state publicly that they regard only positive results as advancing the field, and that they favor publishing positive results over negative results, I think they would be targets for major lawsuits.

On the other hand, I could be terribly wrong. Several people in the Scholarly Kitchen (and the Society) are strongly connected to very high-quality journals, so perhaps you could get editors from those journals to categorically support this view, in which I would stand corrected, and apologize.

By Ken
Apr 28, 2010, 12:45 AM

I don’t think any editor would categorically state that only positive results are interesting, but negative results are much less likely to be interesting because the standard for interest/novelty/importance is much different. It can’t just be that the hypothesis was bad, the study underpowered, the lab results misread, etc. It has to be a robust negative finding, and those seem to be hard to come by. But they are published, and they do exist, even in top-tier journals. That said, positive results — i.e., results that show an effect — are much more interesting than a study that ends with, “Well, that didn’t work.”

By Kent Anderson
Apr 28, 2010, 6:52 AM

I think you’re misinterpreting the phrases “positive results” and “negative results”. As a journal editor, if I get in a paper say that describes a gene knockout in a mouse, it’s more interesting to our readers if that knocked out gene had an effect on the mouse. For example, if the mouse is born without a head. That’s a positive result, indicating that the gene is involved in head development. But if the mouse is perfectly fine, with no apparent effect from the knockout, then that’s a negative result. A paper stating that the gene is probably not necessary for head development is less interesting.

The other problem with negative results like this is that they’re often difficult to discern from experiments that weren’t done right.

This has nothing to do with hiding data or faking results. That’s unethical and in some cases illegal.

By David Crotty
Apr 28, 2010, 8:30 AM

Given your comment that “A paper stating that the gene is probably not necessary for head development is less interesting,” it appears that you and I are using the word “intesting” in exaclty the same way. The difference is that you find such a paper less interesting. I say, if the study was valid, and conducted properly, then publish it – the first benefit is that it will ensure that others who have similar ideas, won’t go down that route (or may spot something in your method that needs adjustment). (If the study was not valid, fine, bash it, but don’t bash because you don’t finf the outcome interesting).

As far being “they’re often difficult to discern from experiments that weren’t done right.” I agree – that makes reviewing tough. But it has nothing to do with the interest level of the paper.

By Ken
Apr 30, 2010, 5:40 AM

The problem is that it’s more than just tough to tell the difference between a null result and experimenter error, it’s near impossible. Let’s say you followed all established protocols and reported a null result. But it turns out you were off by a decimal point in mixing up one of your buffers so it was 10X more concentrated than you thought. This had the effect of stopping the reaction you were hoping to see. How would a reviewer be able to detect this? Really, the only way to do so would be to “go down that route” and repeat the experiment.

Science requires some redundancy. I don’t see any way around it. You may see it as wasteful, but I see it as valuable cross-checking and verification. Here’s an example. A colleague working on breast cancer uses a very complicated assay involving tumor cells in a matrigel culture. In comparing the effects of extracellular proteins, he got a null result, but this was contradictory to earlier experiments done in another lab. It turns out that there were great discrepancies in the constitution of the collagen that each lab was using, different lots from different vendors. It took the concerted efforts of 7 labs each repeating those null results to work things out, and now they’ve got a powerful system for advancing our knowledge of cancer. But if the first experimenter had published those null results and everyone had accepted them and decided not to “go down that route” and repeat them, progress in science would have been halted.

I have no problem with a directory or database where people could list their failed experiments (including as much detail as they’d like). That might be helpful in guiding future labs interested in the same questions to troubleshoot or try alter conditions to actually make the assay work. I’m not sure how many labs would participate though, as people don’t like to air their dirty laundry and potentially tarnish their reputations as being someone who fails a lot of the time. I’m also not sure how useful full, detailed journal articles on failed experiments would be. I’m not sure I’d want to invest much time in reading the details nor writing them up when I could instead be concentrating on experiments that might pay off in a bigger way.

By David Crotty
Apr 30, 2010, 10:50 AM

Ken,
Your example about Randomized Controlled Trials and deaths is a good one since most medical journals require RTCs to be registered (see the CONSORT Statement) so that unpublished studies (especially those reporting negative results) can be identified.

Selecting articles that advance science is not about hiding the truth. Your cognitive frame for viewing the selection process needs some review.

By Philip Davis
Apr 28, 2010, 9:33 AM

Yes, that sort of registration is a good idea, but doesn’t solve the actual problem. Registration means that publication must have been preceded by registration; it does _not_ means that all trails registered have to be published. Even looking at http://clinicaltrials.gov/ will show a number of trials that never make it to publication stage. So again, even if all 10 trials were registered, where are the published results of the 9 that failed? Nowhere, because in the ‘good science’ world of this report, these trials showed only negative results, and so are not intesting enough to be published.

By Ken
Apr 30, 2010, 5:48 AM

Dear All, I am the author of the paper in question. A reader pointed me to this discussion, and since you took so much effort in criticising it, the minimum I can do is thank you for your interest and reply to a few points.

1) “The author defines a scientist’s “research environment” as a geographical state — not an institution (where you’d imagine that most variation exists)”

If you had read the paper accurately, you would have noticed that it is not me but the very National Science Foundation that produces that data -unlike you, NSF seems to find this information meaningful! And it calculates productivity very accurately, by fractioning authors by institution. I just took the data and used them.

I agree that looking at individual Universities would be more informative, and I hope somebody will have the time and means to do that in the future (I didn’t). In absence of that, an average per capita by state is better than nothing.

2) “Luckily, the author reveals the weakness of his hypothesis in the paper”

That’s not luck. I am simply discussing this point, which is an obvious one, although you make it sound like we needed an exceptionally brilliant mind to discover it.

3) “Higher-quality journals are much more selective of what they publish and many will only accept papers that advance their field — this means positive results.”

I would have agreed. Interestingly, however, it turns out that the proportion of positive results was not significantly associated to journals’ IF (the 5-year IF standardized by discipline).
For the record, in the first submission I was also controlling for IF, which made little difference to the results, but then I decided to drop it to simplify the results, after harsh criticisms from one of the peer-reviewers, a statistician.

You can check the IF issues in another paper of mine, which partly used the same sample, published just a few days before: “Positive” Results Increase Down the Hierarchy of the Sciences; http://www.plosone.org/article/comments/info:doi/10.1371/journal.pone.0010068

By the way, in this latter paper you’ll find better evidence that positive results are about 20% more frequent in Psychology or Economics than in Astrophysics. Which by your logic (“positive” results = “good” science) would mean that the former two are better sciences than the latter, I suppose.

3) “In other words, the author ignores the nature of academia: that better scientists go to better institutions with stronger research cultures, receive more federal research funds, do better science, publish papers in better journals, and so on.”

This is a central point in the discussion of the paper! And it’s in the Abstract too! Indeed, this is the very reason why I present this data as a “support” and nothing more.
(I nowhere in the paper declare to “being shocked to discover”)

So, how can you HONESTLY say that I ignored this point?! I understand that writing blogs is time-consuming. But reading and quoting accurately other people’s work is considered good scientific practice, if not basic good manners.

4) David Crotty: “If the selection of papers and judgment of results is based on the vocabulary used (they searched for the phrase, “test the hypothes*”), then isn’t it more a study of the semantics of writing about science, rather than any true examination of the quality of results?”

Again, amply discussed in the paper. However, please note that if it were really just a matter of semantics, that would make the results actually MUCH STRONGER and INTERESTING. Because it would rule out the whole issue about some institutions doing better science. Leaving us with the finding that authors in more productive state are compelled to write their papers in more positive terms. A very cool finding indeed!

I’ll add a couple of points about PLoS ONE in the other discussion, which I find much more interesting than this, as your readers probably do, too.

By Daniele Fanelli
Apr 28, 2010, 7:28 PM

Noting within the paper that your data is likely biased and without meaning is certainly honest, but it does not somehow remove that bias nor does it add meaning.

The problem I have with the methodology is that it introduces an unnecessary radical, and no controls were done to measure its effect on the data. If you acknowledge the likely bias you’re introducing, why not test for that bias? Where’s the equivalent control group that specifically does not include the phrase? Where’s the control group that includes a randomly chosen phrase? How do the numbers compare? If you had only used papers that included the words “fails”, “negated” or “disproved”, do you think that might have altered your results? How do things compare with papers that include the phrases “support the model” or “verify the theory”?

You state in the paper, many of those papers examined contained multiple results, but you chose to only look at the first one. If a paper stated that it disproved a long held hypothesis, but that it instead offered an exciting new model for how a process works, is that a positive or a negative result? In my mind it’s a positive result, but by your methodology, it’s a negative result because you only count the first test.

Why limit the set of papers chosen to ones that just contained a particular phrase in the first place? Why not instead use a random sample of papers? Did you choose to only look at papers that included the phrase “tests the hypothesis” because it made it much easier for you, or your “untrained assistant” to make a snap judgment about the positive/negative nature of the paper? That seems like a sloppy shortcut to me, and a deeper, more involved reading and understanding of each individual paper would be necessary to make such a call.

By David Crotty
Apr 28, 2010, 9:50 PM

“However, please note that if it were really just a matter of semantics, that would make the results actually MUCH STRONGER and INTERESTING. Because it would rule out the whole issue about some institutions doing better science. Leaving us with the finding that authors in more productive state are compelled to write their papers in more positive terms. A very cool finding indeed!”

Also wanted to add that correlation does not equal causation. Choosing a small subset of papers that contain a certain phrase and finding that authors who used that particular phrase were more likely to offer positive results says nothing about compulsion. That phrase may just be a more accurate way of describing those types of experiments or those results. Perhaps it’s more linguistically native to the geographic areas where those studies were performed. Perhaps those authors predominantly bought and read the same book about scientific writing. How can you determine the reasoning behind a particular word choice merely by reading those words?

By David Crotty
Apr 28, 2010, 10:03 PM

David, with all due respect, you seem to completely misunderstand the paper.

I don’t know who I am talking to, so I’ll only spend a few more minutes on this, and ecourage you to read both works with a bit more focus.

I looked for the sentence “tested the hyp” and then looked at what the authors had concluded. The point is the frequency of what the authors conclude (support vs no support), and how this varies with field and with US state.

I like the idea of using some controls (although that’s not the point in logistic regression analysis, is it?). However, what you suggest would not be “controls”.

Multiple-hypotheses papers are dealt with in the companion (larger) paper. I have given you the link above.
So please, if you really have nothing better to do than try to demolish my work, at least read both papers properly before.

Sincerely, D

By Daniele Fanelli
Apr 29, 2010, 6:14 AM

Welcome to the future of science, at least according to PLoS One. What you view as “demolishing”, PLoS One promotes as “post-publication review”. You seem surprisingly uninformed as to the process PLoS encourages. Articles are reviewed solely for methodological accuracy–did the author do what the author says he did, no conclusions are made regarding the significance of the research. That’s meant to be determined after publication, in public forums such as this one, by the endless, faceless yobs of the internet, regardless of their level of expertise or understanding. Most proposed systems (karma) actually favor and weight the comments made by those who spend the most time online commenting (those with “nothing better to do” as you put it) rather than looking for actual expertise. Your response is indicative that most researchers will need to grow thicker skins in order to deal with this process, and be prepared to devote large swaths of time to defending published results.

I have little background in sociology or statistical analysis, and that’s why I have avoided commenting on those parts of your paper. I do, however, have a good deal of experience in evaluating scientific methods, and I do have issues with the methods you employed for data collection. As you yourself state in the paper, the data collected is quite likely biased because of the methodology you have used. You wish to dismiss this by stating that it would be “problematic to explain”. I’m sorry, but that’s not good enough. Your inability to come up with a good explanation for something does not mean it can be ignored. You like the idea of using controls, but then dismiss it and didn’t employ them to actually verify that you were analyzing a truly random sample. An experiment without controls is meaningless. Controls provide context and verification and are necessary, not optional. I highly recommend David Glass’ book on Experimental Design for a detailed explanation on the use of controls (a caveat: I was an editor on this book).

Is it possible that researchers who clearly felt that they had a “positive” result that supported a hypothesis were more likely to use the phrase “test* the hypothes*” than those whose data pointed toward a “negative” result? Does that then mean that you could have unintentionally cherry-picked your data from a pool that was more likely to support your hypothesis, that you possibly chose a non-random sample that was biased toward the positive reports you wanted to see? Are you guilty of the same publication bias of which you accuse others? What if you did the same analysis on a set of papers that included the phrase “failed to support” or “disproved the hypothes*”? Would the results likely have been different? You have offered no proof that your sample was indeed random for this part of the experiment. Why did you insist on the inclusion of that phrase? Why not a truly random selection of articles with no potential introduced bias via word choice?

Linking all of this to “pressure” is also speculative at best. Correlation does not equal causation.

And a further question that should be asked regarding the overall point the article presents, that researchers with negative results must either discard them or do something unethical to get them published. Is this really the case? You look down upon “HARKing” (reformulating a hypothesis after an experiment fails to support it). Isn’t this the way science works? You have a hypothesis, you do experiments and collect data that shows the hypothesis is inaccurate. Must you stop at that point and publish the failed hypothesis (or lie)? Or is it more likely that you would reformulate a new hypothesis based on the observations collected, and go on to test that with further experiments, refining the model as more data is collected? Those refining experiments would then be published as a “positive” experiment. Why is that considered “questionable”? Are scientists only allowed one guess as to how something works and if they’re wrong, they’re never allowed to offer up a new theory?

By David Crotty
Apr 29, 2010, 2:56 PM

Interestingly, I don’t think this PLoSONE paper would be included in its own filter, since my brief search couldn’t find any use of the phrase “test(-ed, -ing, -s) the hypoth(-esis, -eses)”. The phrase used in this paper is “verify the hypothesis.”

Semantics might be all that were studied.

By Kent Anderson
Apr 29, 2010, 3:07 PM

Dear David,

I’ll still try to be polite, but I really don’t think you know what you are talking about.

Please read the paper. It will relieve your anxieties concerning, for example, the randomnesss of the sample. The results section starts with an almost PERFECT CORRELATION between the number of papers I retrieved from each state and the actual papers each state publishes yearly. Which, as I the point out in the DISCUSSION, shows the sample to be truly random.

When I said I liked the idea of controls, I was half joking: I thought you would undertand that logistic regression has nothing to do with controls. Now, of course, I know you don’t.

On one thing, and one only, you are right. And I have been thinking about this after reading your comments: If the PLoS system is meant to allow all random “faceless yobs of the internet, regardless of their level of expertise or understanding” to waste everybody’s time with superficial comments, then we have a problem.

Still, I am confident that most people would have the decency (i.e. respect for themselves and others) to actually read the papers and think throughly before commenting them. Interested readers will know who is right anyway. And who makes silly comments will just make a fool of him/her-self.

Comments and criticisms are welcome, but only if sincere. I think you guys are just having a go at my paper not because you are sincerely interested in it (not interested enough to actually read it, at least!), but because you want to attack PLoS ONE.
If anything is against intellectual and scientific progress, is attitudes like these.

Having said that, I’ll really stop. No offence, but I have more important things to do than argue with you. And I am surprised you don’t, too.

Regards, D

By Daniele Fanelli
Apr 29, 2010, 8:37 PM

I have read the paper multiple times. I have no issues whatsoever with the geographical distribution of the articles chosen. I have little interest in the geographical nature of the material, as the connections drawn are speculative in nature, rather than grounded in fact.

I have an issue with the actual content of the papers chosen, and your use of a particular semantic phrase to choose those papers. You yourself state in your paper that, “we cannot exclude the possibility that authors in more productive states simply tend to write the sentence “test the hypothesis” more often when they get positive results.” This is highly problematic and invalidates your conclusions, yet it’s also something that could easily be accounted for by using the simplest of scientific controls. If you truly feel that controls are not necessary in a scientific analysis, I can’t say anything beyond that. I have no desire to attack PLoS One, but I am offended by sloppy science, overinterpretation of results and poor implementation of the scientific method.

If noting that a set of data is likely biased (as you acknowledge is probable) is indecent or silly then I think we have different expectations from the scientific literature.

By David Crotty
Apr 29, 2010, 9:36 PM

Returning to the original post, I disagree with the statement “ultimately, ‘publication bias’ is a paranoid topic.” While I am also not convinced that Daniele Fanelli has identified a new form of publication bias, I very much believe that other forms of publication bias have been identified.

Many empiric studies demonstrate various flavors of publication bias. Publication bias happens in the best of journals and the best of institutions. Regarding high tier journals, research results are overly optimistic. Ioannidis’s study (http://pubmed.gov/16014596) found that in major journals the results of 16% of high-impact studies were attenuated over time and additional 16% of studies were refuted. As most medical journals use an alpha level of 0.05, one would expect only 5% of studies to be refuted. If one accepts the assertion in the original post that “higher-quality research” may be an explanation for positive results, then we should expect talented researchers to concentrate on promising projects and the rate of false positive, or type I errors, to be even less than 5%. Regarding higher tier universities, publication bias has been quantified (http://pubmed.gov/1727960). While it is possible that publication bias is less frequent in higher tiers of journals and institutions, I am not aware of a comparative study.

While we can argue about the many causes of overly positive research, whether some of the causes are nefarious, and what is the contribution of publication bias to this problem; I disagree with calling publication bias paranoid. Before we label a belief as paranoid, we should investigate whether the belief might be true. Case studies of publication bias that are clearly nefarious (http://pubmed.gov/16908919) encourage more healthy skepticism of research results.

By badgett
Apr 29, 2010, 8:40 PM

I’ll weigh in on this one. I think the publication bias talked about in this paper — that people only publish “positive” findings — is a paranoid topic. Other types of bias, not so much. But for most areas of science, a result that doesn’t pan out doesn’t merit taking the time to write it up, submit it, review it, and publish it. So, it’s usually quickly filed away. Thinking this “file drawer” holds a lot of amazing truths that would rectify injustices in the world? That’s paranoid.

By Kent Anderson
Apr 29, 2010, 8:49 PM

I agree that lots of amazing truths are unlikely; however, we should find in the file drawer that some medical interventions are either less effective than previously thought, or are not effective at all. A well documented example of the former is in NEJM 2008, http://pubmed.gov/18199864. Examples of the latter might be glucosamine and chondroitin for the treatment of osteoarthritis (JAMA 2000, http://pubmed.gov/10732937).

In clinical medicine, negative studies can be quite important – for example the lack of benefit from arthroscopic surgery for DJD of the knee (http://pubmed.gov/18784099). In one analysis, negative studies were more enduring than positive studies (http://pubmed.gov/12069563).

By badgett
May 2, 2010, 11:14 AM

Readers following this discussion might be interested in results that were later obtained using the same method and in most cases the same dataset, by testing new blind-collected predictors.

Expanding the sample back to the 1990s showed that the frequency of positive result has increased rather markedly over the years. Independent studies support this finding.
The growth was stronger in softer sciences and biomedical research, and the overall frequency of positives was higher in the United States than UK and other countries (Scientometrics http://link.springer.com/article/10.1007/s11192-011-0494-7).

A higher bias from the United States was also noticed in a few meta-analyses, and was replicated, by me, using the same proxy on a sample from 2008 and 2009. These results are at the moment only published as conference proceedings (Proceedings of. 13th COLLNET, Meeting, Seoul, South Korea http://collnet2012.ndsl.kr/wsp/conference/program.jsp).

Moreover, by simply recording the citations to what had been coded as positive and negative in the 2010 study, positives turned out to be cited much more – as would be expected under a “pressures to publish” model. This finding, incidentally, confirms that the proxy in question measures something real about these papers, which should solve some of the doubts expressed above.
Surprisingly, the social sciences and other “high-positive” disciplines showed little under-citation of negatives, suggesting that we might be getting something wrong about the whole publication bias issue (Scientometrics, http://link.springer.com/article/10.1007/s11192-012-0757-y)

Given the interest that writers on this page showed for publication bias and for my work, I am surprised they did not report on these later results, too.