In the midst of a hyperactive media landscape even more super-heated by Presidential politics, I find myself turning to the excellent *New York Times* blog called *FiveThirtyEight,* which I also followed in its independent days back in 2008. Run by statistical whiz Nate Silver, the analyses on *FiveThirtyEight* are excellent, and the number crunching is superlative. In 2008, Silver was the first to detect John McCain’s demise in the polls, and called all but one race correctly, and that one was within the margin of error of even his highly tuned statistical models.

Visiting the *FiveThirtyEight* blog and its data sets puts the hue and cry of polls into perspective. It separates the signal from the noise — which leads us nicely into Silver’s new book carrying a similar title.

“The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t” is an excellent read, and I found many of the issues contained within it germane to our work, as it touches on some publishing trends (Big Data) and statistical issues (uncertainty vs. risk, the importance of hypotheses, the danger of extrapolating from a few data points).

Silver has interesting things to say about Big Data, for instance. While some have been focused on the mechanics of Big Data — building the infrastructure, for instance — Silver has the practitioner’s pragmatism. His concern is that we may be seduced into thinking that more data is the same as better data or the right data:

. . . our predictions may be more prone to failure in the era of Big Data. As there is an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate. . . . there isn’t any more truth in the world than there was before the Internet or the printing press. Most of the data is just noise, as most of the universe is filled with empty space.

He left me thinking, “With more data, we’ll have more correlation, but not more causation.” There is more to this, of course, and Silver covers it well.

Silver has great examples of how nonsense correlations can endure for a long period of time — for instance, the correlation between stock market performance and whether an original AFC vs. NFC team won the Super Bowl lasted for 30 years or more. Yet, these correlations are clearly irrational. Instead, Silver reminds us of the power of the right idea over more data.

But more data can help in some cases, especially weather forecasting, which has improved scientifically a great deal over the past few decades. However, human foibles continue to work against science. In the case of weather forecasting, local weather people tend to deviate from the science, making their forecasts “wetter” than they should. Their rationale? They’d rather predict a shower that might ruin a picnic than miss it. This way, if the shower never materializes, the picnic-goers have a happy day; otherwise, they’d blame the weather forecaster if a stray, unforecasted shower popped up.

The section on how scientific statistics have become quite unreliable is worth the price of admission in and of itself. Silver writes admiringly of the Bayesian approach to prior probability, then contrasts this with the practices of “frequentism,” the statistical approach we’re most familiar with in polls and scientific papers:

. . . [the frequentist] methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of prior probability. Thus, you will see apparently serious papers published . . . which apply frequentist tests to produce “statistically significant” (but manifestly ridiculous) findings.

The section on chess is fascinating, as well.

“The Signal and the Noise” is full of interesting stories and strong analysis. It’s not a book for a real statistics jockey — Silver doesn’t provide formulae or long tables in the main text. It’s a book for a fan of statistics and statistical thinking, a person who tries to make sense of things. The graphs are used judiciously, and tell stories well. But don’t expect a high-level statistics course here, something some readers have complained about.

The book is also quite current. Silver discusses earthquake predictions, and the scientists in Italy who were, at the time, on trial in Italy for missing an earthquake prediction and downplaying a bogus prediction. When the bogus prediction jibed with events, the scientists were charged with malpractice. The book came out before the verdict, which was released this week — the scientists were found guilty and sentenced to prison. After reading this book, you’ll know what a miscarriage of justice and statistics this ruling was, and why other Italian scientists are resigning in protest.

Written in a very readable and modern style, “The Signal and the Noise” is recommended for anyone who wants to wrap their head around the increasingly data-driven but noisy world we inhabit. Reading it might just help you separate the signal from the noise to greater effect.

It is rare refreshbing and ultimately effective when statistics

are given context and not used to bolster the status of assertions

so they appear as facts (there’s a Halloween costume!).

More data does not always mean we are better informed.

Discoverability is a word we here often in regard to the future success

of published material. Will separating the legit stats/info become easier?

I think the beholder will require more training and better insight to sort through the deluge to achieve worthwhile discoverability.

Viable discoveries.

I do not share his love of Baysean prior probabilities which are subjective, merely the expression of feelings in probabilistic language. Frequentist probabilities are objective measures derived from probability theory. Ruling out implausible correlations is important but it is not probabilistic. It is also not easy.

Yes, he addresses the “subjective” complaint quite convincingly. I came away feeling that science is ultimately more Bayesian (slowly gathering evidence, building on the shoulders of giants, incrementing your way to something resembling the truth), and there’s also a lot to be said for getting your assumptions out on paper. Frequentism seems to hold that you should be able to calculate a priori the chances of something happening, without context, and with your assumptions unspoken or any prior evidence unaccounted for. And “statistically significant” continues to mislead us.

What is misleading is that these subjective, metaphorical probabilities are often published as though they were mathematical facts, which they are not. They suport fads and groupthink. If people misunderstand objective statistical significance that is a fixable problem, one I have little seen by the way.

And the “mathematical facts” are often published as if they have real-world significance. There’s a lot to be said for doing it right, and that takes a bit of both, and a lot of integrity.