A typical tape measure with both metric and US...
Image via Wikipedia

Ibn al-Haytham, a 10th century philosopher and an early voice defining what would become the scientific method, is quoted as saying:

Truth is sought for its own sake. And those who are engaged upon the quest for anything for its own sake are not interested in other things.

Yet science today seems increasingly interested in “other things,” from academic advancement to financial rewards. And the scientific publishing process seems more and more geared to abetting these practices as the number and capacity of outlets has exploded over the past decade.

Meanwhile, we continue to encounter scandals and shortcomings. A recent case involving a Dutch researcher named Diederick Stapel will likely lead to dozens of retractions. The first domino fell recently in Science. Stapel has relinquished his PhD in light of his misbehavior. Obviously, in the risk-reward mentality likely at work, Stapel felt the risks of being caught were outweighed by the rewards of publication and citation.

A recent commentary in the Chronicle of Higher Education seeks to put a happy spin on the situation. Entitled, “Despite Occasional Scandals, Science Can Police Itself,” I’m not sure I accept the author’s argument, which boils down to the fact that despite dozens of fraudulent papers being in the literature for years, the perpetrator was finally caught; therefore, science can police itself.

There are a few problems with this cheerful line of thought. First is the obvious “we don’t know what we don’t know.” How many papers in the literature are currently packed with explosives, waiting to detonate under our noses? It’s hard to tell. And aside from the scandalous, there are the more pernicious but individually less harmful uninteresting, unread, and uncited papers, turning the stepping stones of science into something more akin to an intellectual swampland. It’s hard for the police to succeed when they’re up to their waists in sludge.

It seems malfeasance is uncovered by a tip from someone close to a perpetrator. It’s not as if science policed itself — rather, someone fed up with a cheater’s charade, and success perpetrating it, finally blows the whistle, a major journal investigates, and months later, there is a retraction. Humans police humans, the same as if a drug kingpin had been narced out by a crony. There’s usually nothing noble about how “science” polices itself. Baser human motivations feed the fraud, and ultimately they tip off authorities.

Exaggeration is another form of careerism — hardening hints and shadows into declarative certainties, all for the sake of a higher impact publication event. This plays on the desire to believe that scientific publishing yields “truth,” which seems fairly prevalent. At the recent STM Innovations meeting in London, a semantic specialist decried the tendency of semantic vendors and text miners to say that their algorithms can reveal factual statements. Her far more modest and correct interpretation is that these techniques can identify claims. And a claim or assertion is far from a fact.

How far claims can be from the truth was the subject of a recent article in the Wall  Street Journal [subscription required]. Entitled, “Scientists’ Elusive Goal: Reproducing Study Results,” the article details how scientists at various companies are finding it difficult to reproduce results published in the literature, wasting time and money. Apparently, this is a dirty little secret coming to light:

“I was disappointed but not surprised,” says Glenn Begley, vice president of research at Amgen of Thousand Oaks, Calif. “More often than not, we are unable to reproduce findings” published by researchers in journals.

The article notes that the preference for positive findings, the pressures to generate grant funding, and the “publish or perish” mentality leads some researchers to cherry-pick results. The number of available outlets also makes publication a high-likelihood event.

Amgen isn’t the only company dealing with a soft scientific literature:

In September, Bayer published a study describing how it had halted nearly two-thirds of its early drug target projects because in-house experiments failed to match claims made in the literature. The German pharmaceutical company says that none of the claims it attempted to validate were in papers that had been retracted or were suspected of being flawed. Yet, even the data in the most prestigious journals couldn’t be confirmed.

The lesson once again is that science is done by humans, and prone to the failings of its practitioners and their institutionalized practices. A comment by a Wall Street Journal subscriber, David Eyke, is quite illuminating:

Modern management science tells us that if you want more of something, all you have to do is measure it. . . . The measurement alone – even if you do nothing else such as attach consequences to the values produced by the measurement – helps to massively improve the value of the output. . . . We aren’t measuring is the REPLICATION rate of scientific work by scientists. We aren’t measuring it, nor are we publishing it widely. In other words, we tell the scientific community that we ignore their poor efforts and wasted research dollars. . . . This is because we are not measuring it. And management science tells us that if you don’t measure, you are going to waste a whole lot of money.

Perhaps we’re measuring the wrong things — number of publications, number of citations, impact factors of publication outlets — as a way of measuring a scientist’s productivity, which we then reward with money, either directly or indirectly. Perhaps we should measure how many results have been replicated. Without that, we are pursuing a cacophony of claims, not cultivating a world of harmonious truths.

Enhanced by Zemanta
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.


47 Thoughts on "Measuring the Wrong Things — Has the Scientific Method Been Compromised By Careerism?"

On the “policing” issue, the peer-review process is not designed to root-out fraud but to evaluate mss on the basis that they are reporting honest work. So it can take a while for problems to come to light – typically this is when someone else tries to repeat an experiment.
Turning to that point – a scientific paper does not typically contain enough details in its methods (or equivalent) sections for a piece of work to be repeated. Journals/publlishers can take several approaches here – the obvious one is to make it a policy that all data, materials and methods are made available as a condition of publication of the article. Another is for them to provide a place for the author to upload details of protocols, and so on, linked to the paper, so that those who do want to repeat the experiments can find the recipe conveniently. Funders and institutions also play a role as they can make funding or employment contingent upon these types of policy, or can have a strong archiving policy for “grey” work (that does not get formally published but done while the scientist is employed or funded by that institution).

Peer-review is a kind of policing. And I’m more concerned with the macro-environment, which measures things not related to good science but related only to pumping out papers. It creates an environment in which it’s even hard to know who has been published and when, so finding fraud is more difficult. Pickpockets work in crowds for a reason.

If a paper doesn’t contain enough information in the methods to support reproducibility, why publish it? Then my point is complete — careerism has defeated the scientific method, because publication supports advancement of the author but not advancement of science.

We are measuring the things that computers can count, and for good reason. Measuring things that only humans can count takes a lot of humans, which we don’t have to spare. In the extreme, replicating everyone’s work, so we can count the failures, would take something like half of all the scientists out of research. This is not a desirable end.

The point of doing science is not to count human successes or human failures, but to try and understand reality.

Understanding reality is something that only humans can do. Trying to rely on computers to count progress in something only humans can do can only end in failure.

As Penn Jillette writes in his book on atheism:

If every trace of any single religion died out and nothing were passed on, it would never be created exactly that way again. There might be some other nonsense in its place, but not that exact nonsense. If all of science were wiped out, it would still be true and someone would find a way to figure it all out again.

I think citations, made by humans, are a good measure of importance in science. They don’t measure progress, just the relative importance of people’s work.

But they aren’t replication, and that’s their potentially fatal flaw.

More specifically Kent, while I am all for taking reasonable steps to limit fraud, I see no reason to restructure the scientific enterprise to do so. I have no sympathy for the argument that some are making, that science is screwed up. My research work involves figuring out ways to improve science and accelerate progress via improved communication, but while improvement is always desirable, science as a whole is doing just fine as it is.

I agree – in pretty much every field there is pressure to pretty up the conclusions for maximum appeal, citation, and impact. Two of the proposed solutions – rewarding reproduction of findings and improving the methods sections of papers – are sorely needed, but unlikely in the current research and publishing environment, as implied. Unfortunately (and this is certainly not news to any regulars here), in most cases it is very difficult to publish work that is a simple replication of previously published work – these manuscripts fail the “novelty” test of most journals, reviewers, and editors, and are not terribly rewarding (career-wise) for those who are doing the work. This is aside from the fact that funding bodies want to fund new stuff, not reproduction of previous stuff. It’d also be great to include more details on methods, but then page limits relegate a full listing of methods to the supplementary information (at best). It’s a frustrating situation all around, and feeds into your broader point!

Andy, I think you’ve hit the nail on the head. Wouldn’t it be great if major scientific journals devoted a section to the reproduction of findings, both successful and unsuccessful?

There is a good and recent example where a study of extraordinary positive results (precognition) was published, and then replications of that extraordinary study which showed negative results (absence of precognition) was refused publication by the same journal that published the extraordinary positive results. The disingenuous excuse given was that they didn’t have space to print duplicative studies.


This is not the behavior of a stakeholder and participant in the enterprise of understanding that is science; by being a scientific journal that is the publication of record of scientific research. This is the behavior of a crass for-profit publisher of titillating fluff merely for amusement.

If a journal won’t publish follow on results that address extraordinary results they have published (precognition really is an extraordinary result), it is so biased as to be useless to scientists and to everyone else.

From the link you provide, the journal in question is The Journal Of Personality and Social Psychology, which is owned and published by the American Psychological Association, not a “crass for-profit publisher”. It doesn’t excuse the behavior, but please point your finger in the right direction.

Methods have been a casualty of the print and issue mindset. As volume of papers has increased, journals have looked for ways to get more papers in each issue without massively increasing page budget. To do this, many have drastically cut the level of detail of the Methods sections of their papers to a bare minimum. Some journals that do this do allow authors to upload detailed methodologies in the supplemental material, but very few readers generally access supplemental material. The solution here will likely occur as journals move away from print and the page budgets print imposes.

As for replication studies, the problem is less the availability of data and methods than it is the availability of time and funding. Try writing a grant that states that your goal is to repeat something that someone else has already done and see how well that’s received.

Nor I suspect, do people want to pay big bucks for journals that provide nothing new. If results are really important they get tested when people try to build on them. I do not believe there is a problem worth solving here. It sounds like a prescription for over regulation, in the foolish hope of stamping out all human dishonesty.

‘New’, in this respect, is a relative term. The publication of novel findings is fantastic for the development of science, but only in a limited way. What use are novel findings, published in top-tier, high-impact journals, if we cannot know for sure if they are replicable? Surely the point of the publication of findings is to develop and increase our knowledge of our reality, but we can only be sure that those findings are accurate and genuinely reflective of reality if they can actually be replicated. If the results from these high-impact journals consistently fail to be replicated, to what extent can we suggest that these findings are representative of reality? I would not have necessarily suggest that these measures are being proposed to stamp out human dishonesty. Rather, the goal of the scientific method is to promote findings that can be considered to be scientifically sound and rigorous. Replicable findings, and the processes of replicating them, are the very processes that can assure us (to an extent) of the rigour of these studies.

Perhaps deliberate fraud in science should be taken as seriously as deliberate fraud in any other industry — subject to criminal penalties. Seeing a few old colleagues go to the slammer might be a good incentive to stay honest in the journals. Also, university policies that provide strong protection to lab whistleblowers and that encourage anonymous reporting.

I suppose the way to do this would be through the funding agencies. If they’ve paid out large sums of money and been deliberately deceived, they’d be the likely ones to pursue charges (or damages).

There’s an assumption that once one commits fraud, one is permanently banished from science, but I’ve heard many stories about fraudsters ending up at other positions at other institutions. Might be an interesting study, seeking out post-fraud publications from the allegedly “banished”.

The rewards that come along with publishing need to be balanced with the risks and responsibilities of making inaccurate or fraudulent truth claims. When science is a career, there are real repercussions for being caught gaming the system. In the case of Stapel, he will likely never work again as a scientist, which is a pretty punitive consequence compared to the fate of investment bankers, for example. There is no golden parachute in science and this helps to keep most of us honest.

Would returning to the age of aristocratic gentlemen scholars engaging in science as a hobby be a better alternative?

This is an interesting, but surprisingly narrow view of the phenomenon of selective reporting of scientific results. To view this as new, increasing or related to modern methods of peer evaluation might cause us to seek ineffective solutions. My first real encounter with ‘inaccuracies’ in published data was as a young Master’s researcher over 20 years ago. I was undertaking a geological examination of an area on which little had been done for a few decades. Among the the better known and respected researchers, icons in the field, who had published data from the area some 50-60 years ago was, a great many inaccuracies had been recorded and only came to light when the mothballed mines of the regions were revived and new research began. At least one described borehole data to a depth never even reached by the borehole! Perhaps reward, scientific survival and research have always influenced each other.

Interesting post Kent, but realistically, this is not an area where publishers can lead. We are in many ways a service industry, and the things we provide are based on the needs of our readers and authors. Any real change has to come from within the research community and then the publishing industry will react and try to best serve those new needs. Journal publishers aren’t inextricably tied to the impact factor or any other metric, but we do focus on them because that’s what’s important to our customers.

It’s not a simple problem to solve though–you’re talking about a wholesale restructuring of the nature of science funding, science careers, etc. If you want to base career advancement and funding on replication, then who does that replication? What’s their motivation for doing so? How do you pay for that replication without taking money away from original research? Does devoting a huge amount of time and effort to redundant studies slow the pace of discovery? Wouldn’t you want to be the person doing original research that others replicate rather than the copycat?

Not an area where publishers can lead? I beg to differ. Is ISI a publisher of data? Yes. Could we publish data about reproducibility? Yes. “Publishing” has a broad scope, from argumentation through presentation, and publishers can catalyze activity throughout that range. We aren’t just middle-men for authors to use to reach readers. We are supposed to do more than that. If we’re truly just asking authors if they want fries with that, count me out. Publishers have a role in scientific communication. That role includes some level of responsibility, and plenty of opportunities around better information and better measurement approaches.

That said, I agree, it’s not a simple problem, and it’s far too “baked in” today. That said, we know incentives drive behavior. It’s what we incentivize that matters.

Are you suggesting that publishers start funding replicative studies? Without funding, I don’t see how these studies are supposed to get done, how we’re supposed to get the data you want about reproducibility. I think we can demand a certain level of rigor and provide opportunities (I’m working on a new journal right now that will welcome replication studies, at least as a percentage of the total articles published), but we can not drive the actual research unless we’re the ones paying the bills.

What incentives would you offer if you don’t mean funding? If Nature declares that you can’t publish a paper unless you have an independent lab that’s replicated your results, then people will just stop publishing there. As David Wojick notes above, if every study has to be done at least twice, then that’s essentially halving the budget for original research. I don’t think that’s something that publishers are going to have the power to decide.

Well, if you thought the debate over control of science was heated now, just wait until publishers start trying to control the research agenda…

I’m suggesting publishers might publish data bragging about how many of their studies have been replicated, not just cited; publishers might create data services that are about data publication and focus on quality, replication, and the legacy/connection of ideas, not just articles; etc.

We can do that, but without the actual replication studies, we’re not going to have much to show. And without funding, I don’t see how those replication studies are likely to be done.

I agree that there are many problems with the current scientific research and publishing process. We recently did a piece highlighting the types of errors that come up just within the biomedical computing field (http://biomedicalcomputationreview.org/7/2/index.html) .

Measuring replication is an interesting solution, and though limited funding and resources would be logistical challenges, I wanted to point out that there is at least one group over at the NIH that is funding replication studies – Facilities of Research Excellent in Spinal Cord Injury (FORE-SCI, http://www.ninds.nih.gov/funding/areas/repair_and_plasticity/fore_sci.htm). Their justification is that research results aren’t moving into the clinical realm because they haven’t been replicated and they are unable to identify which research avenues are truly promising. So I would argue that at least in some fields, there is a strong motivation for replication studies.

I agree that there is strong motivation, but the logistics are a bit mind-boggling. For computational studies, there’s obvious benefit in publishing the data and the algorithms used so other can both replicate and extend the work. But for other types of science, replication isn’t quite that easy.

Slight variations in cell lines, temperature, humidity and reagents can make major changes in cell biological experiments. I know of several groups growing a developing organ in three dimensional matrigel cultures who could not replicate each others’ experiments despite all following the same protocols. After close to a year of troubleshooting detail by detail the issue came down to the collagen being used. It all came from the same manufacturer, but each lab was using different lots. Through working with the supplier to purify and regulate the components of the collagen, they were finally all able to be on the same page.

This is not atypical–studies showing differentiation of stem cells into different tissues have been plagued with similar inconsistencies. If you extend this out across all of biology, you’re talking about a monumental effort. A worthy one, to be sure, and great evidence why detailed methodologies should be published, but it does present an enormous undertaking.

There is another kind of ‘policing’ that pervades the whole community and is more nuanced than the guilty/not guilty detection of fraud, and this is reputation. Some researchers develop a reputation for overselling their results and cherry picking their numbers, and their papers are given less weight. Scientists who are clearly being very careful in how they conduct their studies develop a reputation as reliable and their papers do count for more.

I’m reminded of a colleague who described a particular scientist’s approach as, “I’ll see it when I believe it…”

Either a fraud is not discovered by investigations, or he is discovered by investigations.

Suppose he had *not* been discovered by investigations: say he turned himself in, with no investigations or any one the wiser. Wouldn’t you then say that was a damning indictment of science, that it had no idea about all his fraudulence and would have left it untouched for who knows how long? Sure, that’s reasonable – if that isn’t evidence against science, nothing is.

Now, you say, him being discovered by investigations and exposed as a fraud is also evidence against science.

So it seems that regardless of whether he is discovered via investigation or not, a fraud is evidence against science. Despite the usual understanding of science as a noisy self-correcting process from which error and fraud cannot be eliminated.

I see.

The point is that is wasn’t the scientific method that exposed him. Most likely, it was someone who knew he was up to no good who told someone else. With the high rate of studies that can’t be replicated (or that aren’t worth replicating), it’s hard to use the scientific method to expose fraud. So science isn’t self-correcting in the way we’re practicing it. Who knows how many studies are poorly done, non-replicatable, fraudulent, or overblown? We can’t use science to find out right now because we’re focused on other things (citations, advancement, etc.). If we were focused on scientific truth, things might be different. As it is, we have to rely on jealousy, academic in-fighting, competitiveness, and so forth, more than on science.

Er… Why are you speculating ‘most likely’? This is all on the public record, is it not?

> In late August, three young researchers under Stapel’s supervision had found irregularities in published data and notified the head of the social-psychology department, Marcel Zeelenberg.


Smells like science to me – peer review, replication, investigation of anomalies, etc.

If you look across the literature, many cases, this likely included, had inquests and re-analysis spurred by envy which spurred action. This “wunderkind” certainly inspired jealousy. If he’d published less, at a slower rate, and not been in the spotlight, his papers may still be out there — which was one point of the post (i.e., science isn’t catching the errors, base human emotions predicated on other factors, are spurring action). The Nature news story didn’t say, “In July, building resentment toward Stapel led some of his colleagues to want to deflate his reputation.” I think that sentence is fair to assume.

There are cases where fraud persisted in the literature for years, if not decades, because the researcher(s) produced little more than a handful of compelling reports over a few years. There are also cases in which entire lines of inquiry have been based on faulty use of references without actual science to validate or replicate. Not all of the problems I’m talking about in the post are about fraud. Many are based on the fact that we count the wrong things. Fraud is one problem of this. Lazy science is another.

Yes, a fraud is evidence against science. Obviously. The _absence_ of a fraud would be evidence in favor of science.

And regarding him being discovered, the question is not what evidence his being discovered at all is, but what evidence his being discovered in such a late and lucky way.

> Yes, a fraud is evidence against science. Obviously. The _absence_ of a fraud would be evidence in favor of science.

Really? You think it’s that simple? Imagine a year with no fraud discovered in any way. Obviously a good thing, since you say fraud is only ever evidence against science, regardless of how the fraud is discovered.

And every year without fraud, your faith increases.

Meanwhile, I’m over here saying ‘no news is bad news – this is impossible, all this means is that the fraud discovery mechanisms have broken down so completely that no fraud ever gets discovered, and the longer this goes on, the more pathological science has become.’

This is an optimal level of fraud being discovered, and just as too much fraud is bad news, too little fraud is bad news too.

Peer review is broken and can be fixed by introducing continuous peer review to replace publication-time one-time review, open peer review rather than review by just 2-3 colleagues, force complete disclosure in scientific communications (something I argued for 5 years ago in http://alexbacker.pbworks.com/w/page/1721033/On%20Complete%20Disclosure%20in%20Scientific%20Publications) and live wiki-based papers rather than dead, static ones. I argued that most scientific papers were wrong 5 years ago (http://alexbacker.pbworks.com/w/page/1721004/Most%20Scientific%20Papers%20Are%20Wrong%3A%20The%20Need%20for%20Live%20Papers%20and%20Ongoing%20Peer%20Review) and argued for this change 7 years ago (http://alexbacker.pbworks.com/w/page/1720833/A%20Future%20for%20Scientific%20Publishing).

If science only means ‘to publicize’, what is the difference between science and journalism?

Nobody is claiming that. Communication is vital because science is a cumulative social diffusion process, but it is only about 5% of the overall effort. As for the difference, it is between me reporting my work and you reporting it.

Comments are closed.