One of the success stories of a digital rebirth has been the Atlantic, once a trusted but fusty magazine that has done a great job of reinventing itself as a relevant and vibrant digital property, generating a renewed audience in both print and online media.
But journalism these days is highly variable, with a lot of pressures pulling at it, many of them financial. The Guardian — a newspaper that has embraced free content debates and business models — has not made money since 2003, and is drawing down its charitable trust rapidly. (Its recent coverage of the fantastic success of the paywall-based Financial Times — which has now made a profit every year since 2005 and has its largest readership ever, most of it online — is noteworthy for obvious reasons.) Dozens of papers have folded as technology competitors have eaten away at their margins, forcing them to cut editors and eliminate or consolidate beats over the past decade. Things continue to balance on a knife’s edge for most newsmagazines and newspapers, so it’s nice to see a success story.
But even at the Atlantic, you see some of the problems with modern journalism — a lack of rigor and a paucity of editors sometimes rears its ugly head. Recently, an article at the Atlantic entitled, “The Great Sieve: This Is What Browsing Scientific Research Looks Like,” highlighting PLoS ONE data, proved that even data — maybe especially data — need to be reported on with care and thoroughness. Here’s the graphic of the data in question:
The graphic is derived from an interesting set of data, but the Atlantic article is superficial and not incisive. The title alone gives that away — one mega-journal’s data representing how all scientific publishing works? With a definitive tone (“This Is!”)? And “research” is the same as “publications”? We’ll move on and give the headline writer a break.
I first started picking at this story when I found an error in the article’s link to the underlying blog post by Martin Fenner — it initially linked to a story by the same reporter about Greenland’s ice melt. I emailed the reporter about the error, and she fixed it promptly, but an editor should have caught this (and the reporter shouldn’t have made it, of course).
But that’s not the real problem.
The real problem is that the reporter doesn’t grapple with the data or its graphical representation, choosing to go down a tangent about the efficiency of online information retrieval (quoting the abstract of a Science article), and only mentioning some limitations of the dataset (which shows [“. . . we don’t know, for example, how this sieve compares with offline searching . . .”] that she doesn’t understand that PLoS ONE is an online-only mega-journal). The story ends with a surprising bit of cheerleading:
And also, props to PLoS ONE. Fifty-three million-odd pageviews for some 37,000 papers is no joke. The number of downloads and Mendeley readers are likewise impressive. If there’s any real take-away here, it’s that their open model is attracting tons of readers and discussion . . .
This conclusion reveals a lack of analysis and context, limitations probably stemming from the lack of time and attention the reporter devoted to it — Fenner’s piece was published on July 24, 2012, as was the piece in the Atlantic. Fenner is in Europe, so he might have published his very early in the morning US time. Nonetheless, the Atlantic piece was published during the lunch hour of the same day. For me, writing this post took about three days of work, off and on, probably a total of eight hours, believe it or not. I emailed people for information, tried to find benchmarks, and so forth. So it’s no wonder the Atlantic piece is comparatively superficial.
In the Atlantic article, even basic math isn’t done to see what’s driving the numbers, and no skepticism is exhibited by the reporter, no probing for context and perspective. For instance, if instead of the “Oh wow!” tone, something parsing the data a bit were said, something like, “The 37,267 articles generated on average 1,468 views each, and less than 400 PDF downloads each,” we might have a different impression of these data. If we had benchmark data from another source — say a major publisher of toll-access journals or another OA journal publisher — we’d have an even better sense of whether these data are impressive or mediocre. The truth is, we just can’t tell. And if instead of 145,296 CrossRef citations, we were told that the papers averaged less than four CrossRef citations each, at least we’d be making some journalistic headway.
According to CrossRef data (freely available online), 18 million DOIs (out of 55 million) have references deposited. These 18 million generate 306 million cited-by links, and 23 million articles have at least one cited-by link pointing to it. Potentially, the average across this large dataset is 13 citations per article (306 million cited-by links to 23 million articles). As a potential benchmark for the PLoS ONE CrossRef citation data, this seems worth including in coverage. But even comparing this to the PLoS ONE data is problematic, because the CrossRef data span many more years, while the PLoS ONE data span only from late 2006, and, as we’ll see, most of the PLoS ONE articles were published too recently for much citation to have accumulated.
But there’s even more a reporter could do, because the PLoS ONE data are publicly available. Verification is one of these things. Exporting and isolating the PLoS ONE data, I found the number of articles in the dataset (downloaded July 31, 2012, at 7 p.m. ET) to be 10% lower than the number claimed in the graphic (33,782 vs. 37,267). This is noteworthy, as the timeframe stated in the graphic is “until July 2012” (which I interpret as “through June 30, 2012”) — which means an additional month of data yielded fewer articles when I exported the data and analyzed the PLoS ONE slice.
Problems continued. The total CrossRef citations in the dataset downloaded at the same time was about 18% lower than the number claimed in the graphic (119,316 vs. 145,296). This puts the average CrossRef citations at 3.58 per article in the set.
Now, I’ll admit, I haven’t had independent verification of my analysis, but here’s what I did — I downloaded the PLoS dataset, saved it as a .csv to get rid of the embedded sorting tools, manually and carefully isolated the PLoS ONE segment by deleting all the other data related to other journal titles, saved this data subset separately, and tallied from there. I did it twice, to make sure I didn’t make a mistake, and the numbers came out the same both times. I’m sure Martin Fenner did a more sophisticated data dive, but I still find it interesting that there are discrepancies between the two. Did the reporter bother to confirm the numbers in the few hours she had, given the public availability of the data?
But even if the numbers matched, there’s plenty of nuance to be had, and the graphic conceals almost all of it. As we all know, citation datasets are highly skewed toward a few big articles, with a long tail of less- or non-cited articles. It also takes a few years for citations to appear, as noted above. The PLoS ONE dataset is no different. In the case of the PLoS ONE data, 194 articles (0.5% of the articles in the set) account for 10% of the citations. This isn’t surprising, because 18,544 PLoS ONE articles (54%) have been published since January 2011, which gives very little time for citations to accumulate. Again, this is a major limitation of the dataset which is not represented in the chart or in the Atlantic reporting. The chart also fails to mention that the dataset’s first element is dated December 20, 2006, a helpful landmark for the user/reader. And the chart also conceals the fact that more than 400 articles published between 2006 and 2009 have zero CrossRef citations. (Expanding this to include 2010 articles, and the list of articles without a CrossRef citation grows to more than 1,300.) Also, the discrepancies between CrossRef citations, PubMed Central citations, and Scopus citations aren’t mentioned, despite sizable variations between the three (CrossRef tends to be higher than PubMed Central by a notable amount, and Scopus tends to be higher still).
Another problem is that there are mixed data elements in the graphic, including different types of social media interaction data. Fenner makes an observation in his blog post about what he believes is a lower-than-desired amount of social Web commentary on scientific articles, one that comports with PLoS advocacy:
. . . there are still too many barriers for scientists to take part in the informal discussion of scholarly research on the web, in particular as comments on journal websites.
This implies that barriers to discussion are somehow imposed on the scientific community — yet, the barriers to robust online Web discussions about science are imposed within the scientific community, by the very incentives of science. Those incentives are not a barrier to discussion — they are a sign of self-discipline. Time taken away from doing research, getting grants, reading research, and keeping things together on big complex projects is time wasted. This is not a problem.
There is also a mix of signals in the social Web data, which carries over into Fenner’s graphic. A Facebook “like” is qualitatively different from a Facebook comment, yet the two are reduced (or elevated) to the same status in the data. I’d argue that a “like” is much less interesting and significant than a comment, and the two should be separated at the data level and in any subsequent graphic.
The lumping in the graphical representation is also a problem — seven proclaimed data elements are reduced to five visible elements, as the small boxes obscure one another. (You could argue it’s really four, as the two smallest ones are basically on top of each other.) Though essentially meaningless, this is the kind of chartjunk that would make a more rigorous chart creator think twice about the visual representation. Why embed them within each other in the first place?
Embedding the graphical elements gets to the false metaphor imposed on the data in what the Atlantic reporter labels the “sieve.” The sieve implies that one thing leads to another, and is blocked by a hurdle before getting to the next step — an HTML view leads to a PDF view, which leads to Mendeley readers, then to Facebook likes and comments, then to comments on PLoS ONE itself, then to citations. That is an unlikely flow, and the sifting analogy isn’t accurately realized. The technically strongest inference is from HTML to PDF, because of how the PLoS ONE site is architected — you have to render the HTML to get to the PDF link. Beyond that, all bets are off. Mendeley readers are not a subset of HTML views, but an independent track in the data with no clear dependency. Citations aren’t a subset of Facebook activity. I don’t need to read the PDF to “like” an article on Facebook or write a blog post about a PLoS ONE article. And so on.
There are dimensions, alluded to above, that are shut out of the nested blocks graphical choice. Time-series effects are an example. The number of PLoS ONE articles published has been growing dramatically year-over-year, yet the graphic is empty of these important data, which likely have an effect on the velocity of views, citations, and social media activity. In fact, changes in any data element aren’t captured, meaning this is a waste of graphical material — a table of seven data statements would have more efficiently represented the information, eliminated most of the false metaphor of the “sieve” or “funnel,” and made the data easier to read and understand.
We’re in a data-driven era, and we need to become better at presenting, analyzing, critiquing, and drawing conclusions from data. Based on the chart Fenner has created, I can’t tell much. There isn’t much context, the forces dissimilar data together for no clear reason, the graphic suggests causation that isn’t logical, and many complex subtleties are shut out because of the choices made.
As Edward Tufte wrote in “The Visual Display of Quantitative Information” (1983):
The conditions under which many data graphics are produced . . . guarantee graphical mediocrity. These conditions engender graphics that (1) lie; (2) employ only the simplest designs, often unstandardized time-series based on a small handful of data points; and (3) miss the real news in the data. . . . It wastes the tremendous communicative power of graphics to use them merely to decorate a few numbers.
It would be great if reporters had the time and resources to properly interrogate graphics and the stories they purport to tell — verifying the accuracy of the underlying data, placing the data into a meaningful context through reportage, and questioning the tacit assertions imposed by the graphical conceits being employed.
Unfortunately, in the age in which funding, staffing, and information skepticism all seem to be getting short-shrift, we end up with situations like this — graphical representations of data that are misleading, incomplete, possibly wrong, and pulled through a journalistic outlet with nary a whimper of critique.
Discussion
22 Thoughts on "Data Integrity and Presentation — Journalism, Verification, Skepticism, and the Age of Haste"
Kent, thank you very much for highlighting one of my recent blog posts, and for taking the time to look at the PLOS ONE data. There are obviously many different ways the PLOS ONE Article-Level Metrics can be visualized, and a graphic always has to focus on particular aspects of the data. I’m happy to discuss the issues that you raised.
What I don’t like though is personal accusations in public. It would have helped if you had simply sent me an email with your questions about the PLOS ONE data. I stand behind the data visualized in the chart, and as is clearly indicated in the chart, and I also explained to you yesterday, the chart is based on the July Article-Level Metrics data dump, which obviously has slightly different numbers than the April data you looked at.
Martin,
I wasn’t making “personal accusations.” I was finding problems with your work product, and the work product of the Atlantic reporter.
There are many things wrong here, which I explain in the post. One problem you’ve helped clarify here. In your original blog post, you state, with a link, “As always, the data behind the graphic are openly available.” However, you’re now telling me that the data the chart is based on are not openly available, and are only available to you. There is a mismatch between your statement and reality. It made it impossible to replicate your numbers. I think that’s a deficiency in your data presentation, and a direct contradiction to what you state in your blog post.
Instead of calling my critique “personal accusations” and attempting to make me look like some sort of troll for even criticizing your published graphic and probing the underlying data, I’d be interested in your response to the substance of the critique.
Kent, it seems that we have opinions on where to draw the line between scholarly discussion and personal accusations. The only conclusion I can draw is to no longer participate in the discussions in this blog.
Hi Martin. I’m in this discussion just to understand the graphic, not to trade insults and accusations. Could you respond to my question below?
I’m not sure I understand the figure. There are 5 boxes but 7 statistics. Which data do the two smallest boxes represent? (Facebook Likes and CrossRef Citations?) Are PloS comments and blogs too small to show up on this graphic?
I love your brief discussion of the dynamics. The diffusion of scientific results is a complex, poorly understood process. What these numbers, and their changes over time, mean is indeed far from clear. But your criticism of Fenner and the Atlantic may not be justified. Each may well have done the best they could with the time and expertise they had. You probably know a great deal more about this stuff than they do. If you are criticizing the age we live in, you might want to separate that from what sound like personal attacks. But a better approach might be to simply provide a better analysis, with less criticism. Your analysis is very interesting.
I agree with David that separating the criticism of The Atlantic‘s website from your underlying problems with PLoS/OA would be helpful here.
I think it is a stretch to call four-paragraph “articles” on websites like this “journalism,” though. But maybe that’s part of the problem. It has the imprimatur of a respected magazine even though it is something that is only marginally better than what one would find on a content farm.
Although I often disagree with you, it is clear that your blog posts here are labors of love. The Atlantic‘s online articles are a labor of “must post three articles every day” (or whatever).
And, ironically, one of this author’s other posts from the same day is about the problem of crappy and lazy infographics:
Several times a day I receive emails bearing polished, colorful, and playful infographics. Wouldn’t I just *love* to spice up The Atlantic with some beautifully presented information? And the best part is, it requires just about no work on my part.
The problem with “separating” them is that there are a few related forces at work here — a not-very-good source infographic, a not-very-rigorous journalistic pass-along of same, and an environment that is making it harder for rigorous reporting and really good data analysis and presentation because there aren’t enough incentives for quality work.
I think what you call my “underlying problems with PLoS/OA” are things you’re reading into the post. I criticized Fenner for attributing a lack of social media engagement to “barriers” to the literature, but that was germane to the topic — it was his interpretation of the data, and I think it’s wrong, for the reasons stated (lack of incentives).
Thanks for pointing to the reporter’s other post, which only proves that irony is always with us.
I don’t know what the incentives are at the Atlantic, but if they are volume-based instead of quality-based, then we see a symptom of underfunded journalism. To me, all these issues culminated in a less-than-stellar infographic and less-than-exemplary reporting. And if we don’t fix the underlying causes of these symptoms, we’re only going to have more of the same in the future.
My fifth career, from 1994 to 2004, was as a science journalist. I would get sent a press release at 10am, on a topic I knew little about, with a 3 pm deadline for a 300 word story on it. I thought I did a very good job at this exciting challenge. Are you saying I did not?
If you do not like the way the world is, fine, say so. But do not criticize the people in it for being there.
My first impression upon reading this post was “Wow, what an interesting and detailed critique!” Then I read the 343-word Atlantic piece by Rosen and the 164-word blog post by Fenner. At that point my opinion radically changed to “Why in the world did you waste so much time on this, Kent?”
Rosen’s brief article barely makes sense, and Fenner’s graphic strikes me as a quick-and-dirty visualization. So what? Why spend so much effort in shredding some harmless items in a world of off-the-cuff comments and lazy articles that distort *important* facts?
My comments may be harsh, but seem a fitting response to Kent’s tone.
Believe me, I had no idea this post would happen at all when I came across a link on Twitter — as I said, I only gave this a second glance when I noticed the erroneous link in the original Atlantic piece. But then I started drilling down, and when data didn’t line up, the metaphorical problems became clear, the claims seemed wrong and outsized, and so forth, I realized I actually had something I felt like exploring at length. This is how sweaters unravel — pull one thread, and you’re off to the races.
I don’t mind the criticism. Like you, I didn’t imagine there’d be that much there when I first glanced at the two items. But it did turn into a detailed (I won’t claim “interesting,” but it was interesting to write) critique, and it was refreshing to tackle something like data presentation.
In regards to the July PLOS Article-Level Metrics dataset, PLOS was in the process of making this dataset publicly available at the time of Martin’s blog post. This dataset is now publicly available at http://www.plosone.org/static/almInfo.action.
The data’s fine. The emphasis of the critique is fine, the visualisation could be done better.
This critique though, is verbose to the point that it contradicts itself. Why give such a wordy answer to a problem, where the problem is that the data isn’t easy to parse at a glance?
Inane, counterintuitive and cheerleading in itself.
I think this blog posting and the subsequent comments serve as a meta-commentary on why article commenting and discussion of research papers has failed to catch on.
Kent looked at a presentation of data and responded with a critique. The author of the piece took offense at Kent’s criticism and declined to respond.
Nearly every time The Scholarly Kitchen has offered detailed critiques of published papers, the authors have responded angrily, often very upset. This has been a near-constant quality of the responses, regardless of the professionalism or inflammatory nature of the critique offered.
It is clear that researchers don’t like to have their work publicly called-out, publicly criticized (even when that criticism is constructive). In a field of endeavor where peer review is the norm, this thin-skinned attitude seems somewhat contradictory.
It does make one question how much support there will be going forward for more open peer review systems. Are researchers going to be happy having their dirty laundry similarly aired publicly? Will they respond to postings of peer review reports with hurt, angry responses?
I am not sure I can agree, David, given that my first comment objected to what I saw as personal attacks. The previous post, which involved the use of the term “parasites” was even worse. There is in science a highly refined language of criticism. It is as unemotional as language can be. The blogosphere does not use this language, to say the least, hence it lies outside of the language system of science.
This denigrates neither system, it merely explains why the overlap is so small.
It’s not just this post though. Even if you think Kent went too far, or was too personal here, this happens regularly when someone’s work is singled out for analysis. Phil Davis has written some very carefully phrased, straightforward commentaries on the statistical analysis used in papers and the authors still felt they were being picked on. A research paper represents both someone’s way of earning a living, but also something they’ve devoted their lives to. In a business where reputation is so important, it’s understandable that people don’t want to be singled out and have their work picked apart (even if it’s done dispassionately and fairly).
What happens to an author in F1000 Research who pays their aritcle fees, gets their article posted, then has it torn to shreds by a vicious reviewer, with that review remaining permanently publicly visible? I’m thinking that may not be good for return business.
I’ve just launched a personal campaign to expunge the term “data-driven” from popular discourse: there is a meaningful alternative, “data-informed”, which lacks the faux objectivity and scientism of the notion that the conclusions we reach from data are always or mostly clear and obvious. Information cannot be given the authority to make decisions about the destinations we should look towards or the interpretations we put on it–human interpeters of data do that, and are merely attempting to obfuscate their choices by ascribing an impossible level of agency to the information they are arguing supports their perspective.
http://t.co/DexNTGFD
As to the questions of tone, emotionalism, and snark in the blogosphere vs. academia, academic discourse is rife with innuendo and personal attacks–one of the lessons of graduate school was, “Always look to the footnotes first–that’s where the really interesting stuff gets buried, with a wry or vicious comment about the work of a colleague or rival.” And it’s a large part of why I never finished my dissertation…