A few years ago, I was sat in my desk at a well known American university as snow fell silently outside my window. A knock on the door stirred me from my contemplation. A grad student that I occasionally worked with apologised for interrupting my thoughts and asked if he could borrow the analysis computer that sat in the corner, it’s over-worked twin CPUs gently heating my office.
“Why did you not tunnel in with SSH like I showed you and use it from your own desk?” I noticed the principal investigator of the lab appear behind the student and instantly regretted the inquiry.
“You mean like remote control? That’s cool, you’ll have to show me that sometime.” I shook my head in sad resignation and gestured for them to go ahead.
Young Tim’s (that wasn’t his name) first ever scholarly article was nearing completion, a publication that he desperately needed in order to complete his Ph.D. and start the postdoctoral fellowship that he had been conditionally offered. The gaining of a successful result was vital to the young man’s endeavours. At a prestigious university such as this, success was expected.
The professor stood over Tim’s shoulder asking pertinent questions about the various false colour images of slices of brain that were being called forth from the machine’s silicon workings. Tim dutifully explained the analysis that he had recently done and how he’d finally managed to tease out the result that he’d been looking for all these months. There were just one or two minor details that needed another set of eyes on before we could declare that we’d finally found that publishable result that would earn Tim that coveted Ph.D.
Within a few minutes, I began to become concerned. It became clear to me that entirely unintentionally, the experiment had been constructed and the analysis had been done in ways that all-but guaranteed the desired result. This wasn’t p-hacking or any simple statistical sin. What had happened here was experimenter effect. Due to the nature of the experiment, it was impossible to blind the samples because you could tell with even a cursory examination with the naked eye, which samples were which.
In the gentlest way that I could manage, I talked them both through what I saw and why I was sure the results weren’t valid. At first they didn’t believe me, so I took some of the data, changed the analysis parameters a little and changed the apparent result, proving that I could pretty much make it come out any way I wanted. There was no statistically valid way to know if the effect was real. At some point, the realisation became unavoidable. The professor walked to the opposite side of the room, leaned on the wall and put his head in his hands. The graduate student slumped in his chair, defeated, tears welled in his eyes. I knew I’d done the right thing, but it didn’t stop me feeling like I’d single-handedly destroyed this gifted young man’s immediate career plans.
What happened next I will remember for the rest of my life because it changed the way I think about how science is practiced. The professor rallied and proposed another analysis strategy; one that very well could be the correct way.
The grad student stopped looking at his shoes and desperately began to re-analyse the data. The result didn’t come out right so another idea was floated, then another. Eventually, a consistent criteria was found that would preserve the needed result. The images looked good, the graph came out right and it was a consistent analysis protocol that was applied to all samples. Even the statistics came out okay, so the problem was solved.
Only it wasn’t really solved. The way that we eventually settled on might have been the correct way to do the analysis, but we chose it because it gave the answer we wanted. You can’t reverse engineer an analysis protocol, even if you use the same protocol across all samples or data points, it still invalidates the statistics. I nearly spoke up but then I made eye contact with the professor.
Deep down, he knew, as I did that that we hadn’t proved that our result was robust, but there were other things to think about. This one experiment was part of bigger research project. We had other data that supported the overarching story of the science that we were trying to tell. The probability was that while this experiment may or may not have technically failed, it probably should have worked, so our analysis protocol was probably right. Right? To let this result enter the literature wouldn’t have a significant effect, it wouldn’t cause anybody to go down an experimental path that they weren’t going to anyway, and it didn’t rule out any life-saving approaches. There was no point being puritanical about it.
That particular experimental technique was notoriously confounded. It would be unfair to punish the grad student because the technique created results that were difficult to interpret; he was only doing the project he was given. If we were to say that we couldn’t publish the data because it wasn’t completely robust, who knows how long it would have taken that grad student to get this incredibly difficult experiment to work? A year? maybe two? Perhaps it never would and he’d have to start again with a new project. He’d certainly have lost that highly prestigious postdoc position and who knows if he’d have found another. To prevent that article from being published and seriously delay his graduation, or even prevent him from going into academic science altogether would have been a loss to the academy and unfair.
So, the question that I ask you, dear reader is: What would you have done?
Note: Almost all of the details in the the above story are made up. It wasn’t even snowing; it was June.