One of the success stories of a digital rebirth has been the Atlantic, once a trusted but fusty magazine that has done a great job of reinventing itself as a relevant and vibrant digital property, generating a renewed audience in both print and online media.
But journalism these days is highly variable, with a lot of pressures pulling at it, many of them financial. The Guardian — a newspaper that has embraced free content debates and business models — has not made money since 2003, and is drawing down its charitable trust rapidly. (Its recent coverage of the fantastic success of the paywall-based Financial Times — which has now made a profit every year since 2005 and has its largest readership ever, most of it online — is noteworthy for obvious reasons.) Dozens of papers have folded as technology competitors have eaten away at their margins, forcing them to cut editors and eliminate or consolidate beats over the past decade. Things continue to balance on a knife’s edge for most newsmagazines and newspapers, so it’s nice to see a success story.
But even at the Atlantic, you see some of the problems with modern journalism — a lack of rigor and a paucity of editors sometimes rears its ugly head. Recently, an article at the Atlantic entitled, “The Great Sieve: This Is What Browsing Scientific Research Looks Like,” highlighting PLoS ONE data, proved that even data — maybe especially data — need to be reported on with care and thoroughness. Here’s the graphic of the data in question:
The graphic is derived from an interesting set of data, but the Atlantic article is superficial and not incisive. The title alone gives that away — one mega-journal’s data representing how all scientific publishing works? With a definitive tone (“This Is!”)? And “research” is the same as “publications”? We’ll move on and give the headline writer a break.
I first started picking at this story when I found an error in the article’s link to the underlying blog post by Martin Fenner — it initially linked to a story by the same reporter about Greenland’s ice melt. I emailed the reporter about the error, and she fixed it promptly, but an editor should have caught this (and the reporter shouldn’t have made it, of course).
But that’s not the real problem.
The real problem is that the reporter doesn’t grapple with the data or its graphical representation, choosing to go down a tangent about the efficiency of online information retrieval (quoting the abstract of a Science article), and only mentioning some limitations of the dataset (which shows [“. . . we don’t know, for example, how this sieve compares with offline searching . . .”] that she doesn’t understand that PLoS ONE is an online-only mega-journal). The story ends with a surprising bit of cheerleading:
And also, props to PLoS ONE. Fifty-three million-odd pageviews for some 37,000 papers is no joke. The number of downloads and Mendeley readers are likewise impressive. If there’s any real take-away here, it’s that their open model is attracting tons of readers and discussion . . .
This conclusion reveals a lack of analysis and context, limitations probably stemming from the lack of time and attention the reporter devoted to it — Fenner’s piece was published on July 24, 2012, as was the piece in the Atlantic. Fenner is in Europe, so he might have published his very early in the morning US time. Nonetheless, the Atlantic piece was published during the lunch hour of the same day. For me, writing this post took about three days of work, off and on, probably a total of eight hours, believe it or not. I emailed people for information, tried to find benchmarks, and so forth. So it’s no wonder the Atlantic piece is comparatively superficial.
In the Atlantic article, even basic math isn’t done to see what’s driving the numbers, and no skepticism is exhibited by the reporter, no probing for context and perspective. For instance, if instead of the “Oh wow!” tone, something parsing the data a bit were said, something like, “The 37,267 articles generated on average 1,468 views each, and less than 400 PDF downloads each,” we might have a different impression of these data. If we had benchmark data from another source — say a major publisher of toll-access journals or another OA journal publisher — we’d have an even better sense of whether these data are impressive or mediocre. The truth is, we just can’t tell. And if instead of 145,296 CrossRef citations, we were told that the papers averaged less than four CrossRef citations each, at least we’d be making some journalistic headway.
According to CrossRef data (freely available online), 18 million DOIs (out of 55 million) have references deposited. These 18 million generate 306 million cited-by links, and 23 million articles have at least one cited-by link pointing to it. Potentially, the average across this large dataset is 13 citations per article (306 million cited-by links to 23 million articles). As a potential benchmark for the PLoS ONE CrossRef citation data, this seems worth including in coverage. But even comparing this to the PLoS ONE data is problematic, because the CrossRef data span many more years, while the PLoS ONE data span only from late 2006, and, as we’ll see, most of the PLoS ONE articles were published too recently for much citation to have accumulated.
But there’s even more a reporter could do, because the PLoS ONE data are publicly available. Verification is one of these things. Exporting and isolating the PLoS ONE data, I found the number of articles in the dataset (downloaded July 31, 2012, at 7 p.m. ET) to be 10% lower than the number claimed in the graphic (33,782 vs. 37,267). This is noteworthy, as the timeframe stated in the graphic is “until July 2012″ (which I interpret as “through June 30, 2012″) — which means an additional month of data yielded fewer articles when I exported the data and analyzed the PLoS ONE slice.
Problems continued. The total CrossRef citations in the dataset downloaded at the same time was about 18% lower than the number claimed in the graphic (119,316 vs. 145,296). This puts the average CrossRef citations at 3.58 per article in the set.
Now, I’ll admit, I haven’t had independent verification of my analysis, but here’s what I did — I downloaded the PLoS dataset, saved it as a .csv to get rid of the embedded sorting tools, manually and carefully isolated the PLoS ONE segment by deleting all the other data related to other journal titles, saved this data subset separately, and tallied from there. I did it twice, to make sure I didn’t make a mistake, and the numbers came out the same both times. I’m sure Martin Fenner did a more sophisticated data dive, but I still find it interesting that there are discrepancies between the two. Did the reporter bother to confirm the numbers in the few hours she had, given the public availability of the data?
But even if the numbers matched, there’s plenty of nuance to be had, and the graphic conceals almost all of it. As we all know, citation datasets are highly skewed toward a few big articles, with a long tail of less- or non-cited articles. It also takes a few years for citations to appear, as noted above. The PLoS ONE dataset is no different. In the case of the PLoS ONE data, 194 articles (0.5% of the articles in the set) account for 10% of the citations. This isn’t surprising, because 18,544 PLoS ONE articles (54%) have been published since January 2011, which gives very little time for citations to accumulate. Again, this is a major limitation of the dataset which is not represented in the chart or in the Atlantic reporting. The chart also fails to mention that the dataset’s first element is dated December 20, 2006, a helpful landmark for the user/reader. And the chart also conceals the fact that more than 400 articles published between 2006 and 2009 have zero CrossRef citations. (Expanding this to include 2010 articles, and the list of articles without a CrossRef citation grows to more than 1,300.) Also, the discrepancies between CrossRef citations, PubMed Central citations, and Scopus citations aren’t mentioned, despite sizable variations between the three (CrossRef tends to be higher than PubMed Central by a notable amount, and Scopus tends to be higher still).
Another problem is that there are mixed data elements in the graphic, including different types of social media interaction data. Fenner makes an observation in his blog post about what he believes is a lower-than-desired amount of social Web commentary on scientific articles, one that comports with PLoS advocacy:
. . . there are still too many barriers for scientists to take part in the informal discussion of scholarly research on the web, in particular as comments on journal websites.
This implies that barriers to discussion are somehow imposed on the scientific community — yet, the barriers to robust online Web discussions about science are imposed within the scientific community, by the very incentives of science. Those incentives are not a barrier to discussion — they are a sign of self-discipline. Time taken away from doing research, getting grants, reading research, and keeping things together on big complex projects is time wasted. This is not a problem.
There is also a mix of signals in the social Web data, which carries over into Fenner’s graphic. A Facebook “like” is qualitatively different from a Facebook comment, yet the two are reduced (or elevated) to the same status in the data. I’d argue that a “like” is much less interesting and significant than a comment, and the two should be separated at the data level and in any subsequent graphic.
The lumping in the graphical representation is also a problem — seven proclaimed data elements are reduced to five visible elements, as the small boxes obscure one another. (You could argue it’s really four, as the two smallest ones are basically on top of each other.) Though essentially meaningless, this is the kind of chartjunk that would make a more rigorous chart creator think twice about the visual representation. Why embed them within each other in the first place?
Embedding the graphical elements gets to the false metaphor imposed on the data in what the Atlantic reporter labels the “sieve.” The sieve implies that one thing leads to another, and is blocked by a hurdle before getting to the next step — an HTML view leads to a PDF view, which leads to Mendeley readers, then to Facebook likes and comments, then to comments on PLoS ONE itself, then to citations. That is an unlikely flow, and the sifting analogy isn’t accurately realized. The technically strongest inference is from HTML to PDF, because of how the PLoS ONE site is architected — you have to render the HTML to get to the PDF link. Beyond that, all bets are off. Mendeley readers are not a subset of HTML views, but an independent track in the data with no clear dependency. Citations aren’t a subset of Facebook activity. I don’t need to read the PDF to “like” an article on Facebook or write a blog post about a PLoS ONE article. And so on.
There are dimensions, alluded to above, that are shut out of the nested blocks graphical choice. Time-series effects are an example. The number of PLoS ONE articles published has been growing dramatically year-over-year, yet the graphic is empty of these important data, which likely have an effect on the velocity of views, citations, and social media activity. In fact, changes in any data element aren’t captured, meaning this is a waste of graphical material — a table of seven data statements would have more efficiently represented the information, eliminated most of the false metaphor of the “sieve” or “funnel,” and made the data easier to read and understand.
We’re in a data-driven era, and we need to become better at presenting, analyzing, critiquing, and drawing conclusions from data. Based on the chart Fenner has created, I can’t tell much. There isn’t much context, the forces dissimilar data together for no clear reason, the graphic suggests causation that isn’t logical, and many complex subtleties are shut out because of the choices made.
As Edward Tufte wrote in “The Visual Display of Quantitative Information” (1983):
The conditions under which many data graphics are produced . . . guarantee graphical mediocrity. These conditions engender graphics that (1) lie; (2) employ only the simplest designs, often unstandardized time-series based on a small handful of data points; and (3) miss the real news in the data. . . . It wastes the tremendous communicative power of graphics to use them merely to decorate a few numbers.
It would be great if reporters had the time and resources to properly interrogate graphics and the stories they purport to tell — verifying the accuracy of the underlying data, placing the data into a meaningful context through reportage, and questioning the tacit assertions imposed by the graphical conceits being employed.
Unfortunately, in the age in which funding, staffing, and information skepticism all seem to be getting short-shrift, we end up with situations like this — graphical representations of data that are misleading, incomplete, possibly wrong, and pulled through a journalistic outlet with nary a whimper of critique.