Data always sound more objective than stories. We should let the data speak for themselves, we’re told. However, a recent controversy involving the Tesla electric car and a review in the New York Times shows that letting the data speak for themselves is fraught with difficulty — that is, who knows what they’re saying?
Earlier this month, a journalist at the Times, John M. Broder, took a Tesla S out for a long road test; at some point, he needed to recharge the car’s batteries; he claims he couldn’t find a charging station, drove around looking for one, and the car ran out of juice; Elon Musk, based on data the car collected, claims Broder drove around in a parking lot intentionally trying to exhaust the batteries.
The battle was shaping up — Musk’s exhaustive telemetry, speed, and battery status data versus a journalist’s story. And there were millions of dollars at stake.
When the dust settled, it turns out that neither was sufficient to tell the story accurately.
Musk’s data fit a couple of possible scenarios, one of which Broder claimed. Broder’s notes were incomplete, making his subsequent story seem more like an excuse than the full story. Broder was rightly criticized for this by the Times’ Public Editor a few days later.
Broder’s story became more detailed as he was pushed to divulge exactly what happened, and he wrote a blog post showing how the data Musk had access to weren’t complete. For instance, temperature data — Connecticut, where Broder drove for much of the time, was in the midst of a severe cold snap — wasn’t recorded. Nor were the calls between Broder and Tesla personnel.
. . . we actually have no idea, based on the interpretation of the review data released by Tesla, which narrative is true. All the data shows is a car driving around a parking lot. And here in lies the principle lesson from the whole Tesla affair: Data is laden with intentionality, and cannot be removed from the context in which it was derived. We do not know, from these data alone, what happened in that parking lot.
Data can help tell a story, but the story can’t necessarily be derived from the data alone. And big data don’t replace the right data, as Nate Silver showed us during last year’s US elections. Signal and noise are present — in data, in stories, and in the combination of the two.
This also points to an important distinction I’ve heard a wise person make before — that scientific papers contain claims, not facts. Facts come later, and may never fully emerge from being merely strong claims. Remembering this is vital to doing text mining correctly. You can’t have triplets identifying facts, but you can have triplets identifying claims.
Ultimately, the best storytellers use the best data tell us the most. When a journalist’s notes fail and an inventor’s data aren’t complete, we’re still left wondering exactly what happened. And even if the notes and data were both complete, we might still be left with competing claims.