The new Guardian
The new Guardian (Photo credit: OwenBlacker)

The Guardian recently caused a bit of a commotion by changing its article commenting system to a threaded format (replies to a comment are now listed directly below that comment). In response to the controversy, Chris Elliot, the Guardian’s Readers’ Editor, wrote a column that contains an interesting piece of previously undisclosed information:

The Guardian website publishes around 600,000 comments a month, with 2,600 people posting more than 40 comments a month.

Martin Belam then did the math. He extrapolated out from that initial figure to get a sense of how well the article comments represents the reading community.

  • 2,600 people posting at least 40 comments a month means totals at least 104,000 comments, or at least 17% of total comments.
  • That leaves, at most, 496,000 comments per month to be left by everyone else.
  • The Guardian’s total audience for November, 2012 was 70,566,108 readers.
  • The Guardian’s comments then, at best, represent 0.7% of the audience.
  • At least 17% of the Guardian’s comments come from 2,600 people or 0.0037% of their readers.

Those numbers are likely overestimations of the community’s involvement in commenting as they assume no prolific commenter left more than 40 comments (Belam notes that in December, he could find some 1,500 comments left by a group of four prolific commenters alone), and that every other comment was left by a different individual who only wrote one comment.

What does this mean, then, for altmetrics approaches based around the public conversation inspired by a research article? If we assume that these sorts of numbers translate from a newspaper website to a journal website — given the paucity of comments left on articles and the consistency of the 90:9:1 rule, I don’t think this is a huge stretch — then should a tiny and likely non-representative population be allowed to drive the criteria for funding and career advancement in research?

How should post-publication comments play a role, if any, in the metrics used to judge the quality of an article and a researcher’s work? Blogging about academic research, tweeting links to research papers, and commenting on articles remains a fringe activity (as does using Twitter or blogging in general). These sorts of activities cater to the extremes, to people who either have an agenda they’re looking to promote, or just to the minority of people who enjoy communicating in this manner.

As was recently discussed, the posts here in the Scholarly Kitchen in 2012 that drew the most comments were not the same as the most-read posts. There is a qualitative difference between ideas that are controversial versus ideas that are of great interest to the majority of a community. Comments seem to correlate better with the former than the latter.

This post is not meant to disparage the value of comments — they can be tremendously useful ways to exchange information, to correct problems in an article, to add new information, and to turn things into a conversation. This can benefit the reader, the author, and the commenter. But whether that value can be translated into a meaningful measure of the article and researcher performance remains an open question. The fact that comments come from such a tiny and likely non-representative minority of readers makes the challenge even greater.

Enhanced by Zemanta
David Crotty

David Crotty

David Crotty is the Editorial Director, Journals Policy for Oxford University Press. He serves on the Board of Directors for the STM Association, the Society for Scholarly Publishing and CHOR, Inc. David received his PhD in Genetics from Columbia University and did developmental neuroscience research at Caltech before moving from the bench to publishing.

View All Posts by David Crotty


37 Thoughts on "The Guardian Reveals an Important Truth About Article Comments"

Having a quick look at the PLoS Article Level Metrics data, less than 8% of articles published in the first 6 months of 2012 where commented on. As David says comments can add value to articles and that should be their main purpose, in isolation they shouldn’t be used a metric.

Twitter and Facebook do have much higher engagement with around 40% of articles either shared on Twitter or Facebook. But there is still outstanding question of what alt-metrics measure; my anecdotal experience is that interesting titles appear to help. Is the ability of an academic give a 10 word tweetable sound bite title is a good measure of their research?

As a frequent Kitchen commenter I am especially interested in this issue. First a niggling point. That commenters are not representative of the community in the statistical sense has not been established. It is an interesting research question in the demographics of belief. If they are statistically representative then the fact that the sample is small does not make it useless.

But you are correct that comments probably best measure controversy rather than anything else. The question then becomes what is the value of that measure? (As I pointed out in a recent comment the real issue with altmetrics is what they are measuring.)

Surely controversy is important. In fact one of my chief complaints about the journal literature is that the articles are written in a style that makes the scientific controversies almost invisible, while controversy is a leading feature of the frontier. The controversies often only appear in the question period after conference presentations or on private listservs, neither of which typically get made public. Thus seeing the controversies might be quite useful.

I am not saying people should get promoted for being controversial. I leave it to the community to decide what controversy means in such cases. The point is that knowing what is controversial may be important information and measuring controversy may be a valuable metric. For example it might point to new research needs, or to persistent failings, or to the need for new policies, etc.

Making that call on how representative the commenters are, is no easy task. Is this a behavioral issue, and the commenters speak for the silent majority who do not comment, or do they instead represent a fringe opinion with which most would disagree? I don’t think counting the number of comments alone gives much of a measure of this.

And while I agree there’s some value in knowing which subjects are controversial, I wonder whether that would work against a researcher in many ways. Given how conservative funding bodies are, they may choose to give their limited funds to a researcher whose work is more likely to pan out. Or a tenure/hiring committee may decide to pass on a researcher whose work passes point X on the controversy scale.

Given the mis-use of the Impact Factor, I don’t think it’s too hard to imagine…

I do not disagree with anything you have said but you may have missed my underlying point. I am not interested in evaluating researchers but rather in seeing what is going on in the science, which many in the community wish to know. Here comment based metrics may be quite useful. The widespread focus on evaluation may be a mistake. Metrics have many other uses.

Not so much missing your point as stating that comments alone may not tell you much about what’s going on in science. Are there a small number of cranks who find that particular paper controversial or is that attitude widespread? Can you tell the difference from counting the numbers of comments?

I have not suggested that we look at comments alone. I do the science of science and science is a complex beast. I will take all the measures I can get. The problem of cranks is an interesting one analytically. I have what I call Gresham’s law of blogs which says that bad comments will drive out the good but that seems to be topic specific. (It is not much of a problem here unless you mean me!) If one is looking for the issues related to a given topic I think comments are an invaluable new source, possibly the best available. Big numbers suggest something worth looking at, that is all. They are a gauge.

I agree that comments are no substitute for peer review, far from it. But it does not follow that they are not a useful metric, especially of controversy, perhaps also of novelty which often generates controversy. That most articles do not get comments may be quite useful.

There are of course different kinds of comments. The intelligence community has done a lot of work on the semantic analysis of message traffic. Perhaps we need comment analysis technology, or is someone doing that?

The relevant percentages were definitely low for the journals that had rolled out commenting a few years ago…

At the time of the studies above we also took a look at the commenting patterns on ScienceBlogs and Nature Network (both now more or less defunct). In both cases the top 25 commenters were responsible for a ridiculously high proportion of all comments left – can’t remember the exact stat but certainly >= 80%. I bet it’s the same on The Scholarly Kitchen.

FWIW I don’t think metrics that only look at the attention that articles have received online can be used to measure quality at all – that goes for downloads as well as comments and social media mentions.

Because I work on I’m now duty bound 😉 to point out that this isn’t to say that attention based metrics aren’t useful in other situations. It is useful / gratifying / important to know which papers are receiving interest – helps you decide where to direct your attention in a crowded field. Just because a metric relates to a scholarly paper doesn’t mean it has to be a measure of quality or be used by tenure committees.

I don’t think it changes your point any (90:9:1 probably still holds) but is one flaw in Martin’s analysis that it assumes that 70,566,108 visitors are the pool from which commenters *should* come? For example, a large proportion of traffic will be to image slideshows, podcasts or infographics where you can’t leave a comment even if you wanted to. I’d imagine the Guardian would care more about (semi-)regular readers leaving comments than people who’d dropped in to read a lifestyle piece mentioned on Twitter and then dropped out again.

Thanks Euan. I think your point, and David Wojick’s above are important, and speak to the notion that metrics don’t necessarily have to be used the way we traditionally use things like the Impact Factor, for making calls on funding and career advancement. There’s still a lot of filtering out going on–which sorts of data are useful for which sorts of purposes.

But I still think there is likely an issue in judging “attention” based on simple counts of comments, and many factors come into play before one can get a sense of how representative that attention is. Are the comments due to a very small number of extremists who are passionate about their subject area, or do they represent a broad swath of concern among the community?

Euan, it is probably a power law distribution and these are common in social networks. There need be nothing ridiculous about it.

As someone who comments frequently on InsideHigherEd and The Chronicle of Higher Education sites, I daresay you’d find the same pattern there as was found for The Guardian. There, also, controversy draws more comments than anything having to do with the basic importance of the subject matter. My standard counterexamples to the utility of citations as a metric for quality in scholarly work are The Bell Curve book and the controversy over cold fusion, which generated huge amounts of commentary but reflect work of low quality.

There is tremendous value for commenting both at a personal level (it is one of the best ways to build relationships online) and also as a general tool to discover topics and people. I completely agree with 90:9:1 rule but that 1% does contain gems and is one of the major features of a service called Engagio ( (full disclosure: I work for them). Engagio is working to highlight articles based on the level of conversations that they generate. It is one of the few sites that is highlighting the comments as a filter to discover great content. It pulls data from various social media including twitter, facebook, disqus, wordpress and is an interesting way to discover interesting sites and people.

I disagree with the headline interpretation. What they found is that if the issue makes commenters angry then the readers become more concerned. This makes sense since emotion is a measure of seriousness. In fact strong emotions are one of the things the semantic analyses look for as I understand it. Which scientific issues are generating angry exchanges is a good question.

So once again the question is what is comment data good for, not what is wrong with comments? Comments are here to stay.

This is probably a discussion that should wait until the article is published so we can get a sense of the details. But I do think the impact of emotional arguments, how they affect the understanding and interpretation of scientific data is an interesting one. Some further fodder on the subject here, on how the anti-GM food movement is essentially an anti-science movement, one which specifically uses emotional manipulation to drive policy and public interpretation:

Indeed I am fascinated by the way people get angry. But as an issue analyst and when studying an issue scientifically I try not to take sides and to put my own beliefs and feelings aside. Thus I would not use pejorative terms like “anti-science” and “emotional manipulation.”

Some would view the term “anti-science” as a superlative, not a pejorative. And at some point after the analysis, one must draw conclusions, and if a phrase is an accurate description, I don’t think one should be afraid to use it.

Thank you for this insightful analysis on comments. I’m the CEO of an software product that helps users manage their online comments and conversations (Engagio), so this topic is very interesting to us, and we have even done some primary research on the State of Online Commenting:

I have a number of observations on your analysis.

I’m surprised that the most read articles aren’t the ones that are also most commented. I wonder what is the extent of that gap?

To gain further insights into the commenting practice, one might need to dive deeper into the habits and motivations of the 2,600 users that are regular commenters. I’m not sure that promoting an agenda was their main goal. Actually that was disdained by others, according to our survey. These 2,600 are closer to the writer than they are to the rest of the readers, because these 2,600 are regular content contributors.

One of the more important benefits of commenting is to form relationships with others, and to further debate and analyze the topic being discussed to enrich the opinion of the original writer. Often, conversations in the comments have more insights than the article itself.

Another factor is to find out if the publisher sees an Online Community forming around the discussions, or if they consider these 600,000 comments a 2nd class extension of the article itself, with its associated headaches?

Having a small number of regular commenting contributors vs. the total number of readers is a norm actually, but these are your most valuable commenters. At the community where I’m a top commenter, there are about 1,000 regular commenters out of an audience of about 250,000 readers, but that small number carries a lot of weight and value to the 249,000 others.

I don’t think they should look at the small sample of high-frequency contributors as non-significant. These 2,600 are probably responsible for at half of the 600,000 other comments because they are the instigators and pot stirrers of other discussions.

I like your point about commenters being writers not readers. As a Kitchen chef I prefer commenting, hence discussion, to writing articles for broadcast.

Good journals have long allowed Discussions and Replies, whereby a reader writes a note pointing out a shortcoming of an article, and the author responds. Far predating blogs, Discussion/Reply is an important quality assurance step where something may have been missed in review.

An article on a controversial topic is more likely to generate a discussion, typically from the other “camp.” While, as an editor, I like to encourage free expression, I have drawn limits where a discussion has ranged too far afield or fallen into a personal attack.

A key element is moderation of the exchange by an editor. Even the most professional exchanges of opinions sometimes can get heated. I’ve seen what people write even knowing I will edit it — Not always pretty!

I’m also thinking about the issue of anonymity. There is a big difference between a huge reader audience of a news outlet such as the Guardian or the NYT, and a specialty community where people could actually know each other in advance, or build communities over time. Then there are the humongous comments sections on sites like Huffington Post. I will often glance at the number of comments and gasp — literally tens of thousands on some articles. Very easy to get lost in the volume.

There is some debate about the need for authentic identity being used in some online communities, presumably for a higher level of consideration before posting, and allowing for some law enforcement officer to come knocking in some countries. In relation to David’s article, this is an important consideration for ‘trust’ of the comments. Are they ‘plants’ by a biased org with an agenda or fly-by bombers just out to disrupt? Or is this someone quite serious with a particular worldview that may or may not match our own? How do you tell the difference? Cultural differences can figure in a lot. Geography or country of origin or religion or …. pick your demographic …. is unknown as well.

I’ve just discovered your blog and am finding the articles and the exchanges excellent. They are cogent and literate, unlike some places one reads. You are making my procrastination from what I should be doing at least edifying!

You probably observe just a fat tailed distribution – maybe with a power law tail. This is not so surprising regarding the subject. You can observe this in many similar settings like when people report their opinion to something and the like. However, I don’t think the number of comments is so crucial. More important is the quality and that those few comments tackle different aspects.

David W., William M, and Blattner,

You all point to useful information that can be gleaned from article comments, value that can be derived. I think though, that there are differences between the sorts of in-depth analyses you’re suggesting and what we generally think of as “metrics”. Usually, metrics are meant to provide something of a shortcut, an automated way to avoid the painstaking work of going through an enormous number of data points individually. The altmetrics manifesto refers to them as “filters” used to make sense of the enormous and expanding literature. When one has to get into the specific content of each comment, then the efficiency provided by a metric is lost.

David W.’s use is probably closest to what one would call a “metric”, using comment counts as a flag to point you toward papers that may be considered controversial. But it’s still not a very good filter, as it is clearly going to provide a lot of false positives, particularly given the likely non-representative nature of at least a significant percentage of the comments. It’s also likely a filter that misses quite a few positives as well. In science, rather than spending time leaving comments and arguing over a controversy, the better researcher will instead do the next experiment, one that settles the controversy or provides further information. And that gets published as a new paper, rather than as a comment on the original paper.

So while there’s definite value provided, I’m not sure it’s applicable in the form of what we think of as a “metric”.

David, I sympathize with your point. If altmetrics are thought of as a simple alternative to the impact factor or the h factor they are surely not. But I still think automated comment analysis can be quite useful especially with the proper semantic technologies. The metrical community actually does quite a bit of analytical work, far beyond shortcuts.

My interest is in issues not evaluation, both research issues and policy issues. So my question is how better to spot these than by looking at comment clusters?

Reading through this commentary, I am starting to suspect that you guys have no idea why the “cool” articles in the scientific literature don’t necessarily get “tons of comments”. Part of the reason is that scientists– esp. the good ones– are so busy publishing what we in the business call “interesting news”. This has become a global race, and us USA scientists are getting slowly outmatched by the Chinese. We’re just like general public news reporters— we send our students into the lab, they tell us what they found out, and we work together on a nice report so all of our friends can see what we found. it’s not about prestige for older PIs. It’s about rapid and accurate reporting, and “showing off” the good new scientific talent in the next generation. (All the truly good basic scientists love to teach. Some say those who can’t do, teach. I say, those who cannot teach, never learned the concept that well in the first place. My mother was a high school chemistry teacher and I had to do a lot of TAing to get my chemistry PhD, so perhaps I am biased.) .

So, if something is fine the way it is, then it’s “no comment (necessary)”, see ya at the next conference to orchestrate follow-up experiments with promising new students. If something is Wrong, then you get annoyed. Back in the mid-80s early 90s you used to email the Editor— there’s a lot of hilarious (if somewhat abrasive) back-and-forth debate in the math and physics literature. They’d print up both sides of the story and let the readers (who are also authors in their own right, usually) make up their own minds. Nowadays arXiv skips the Editor and just publishes everything (so long as it’s not total gibberish) and lets the community (usually a pretty small and specialized readership) sort it out. .

Science communication has changed a lot since I was born in 1982, as this site grudgingly acknowledges. (I’ve had at least a dial-up connection since I was was a girl, so communicating on the internet is second nature to me.I actually prefer it over “going to lots of meetings that are not with my students”.) One of the big things that changed was that Nature and Science degraded in value due to a rapid increase in popularity; in turn causing a massive authorship shift to JACS/PNAS/PRL, where you could be sure to find a properly laid out experimental section and nice, big figures. Most recently it’s been PLoS ONE for biology, since the PNAS ref system can’t really handle the current huge volume of biomed lit. I suspect that this is part of why there’s a low number of comments on the Science and Nature websites— no one has time unless they’re being really wrong about something. They are too busy reading and writing for journals like (in my case) JChemPhysB, or JACS, since that’s where all the work gets done these days.

However, many general news journalists seem to still think that a publication in Nature means that you are an Important Scientist. (I know many chemical companies that would rather *hire* someone who’s done a nice JACS, since it proves you know how to work.) What Nature was intended for was “consensus reporting”— ie, spin the concept around in a flagship journal like JACS or PRL for five to 10 years (aka one or two USA PhD dissertations); then write a Nature when you’re sure you are on to something and want to alert the most general scientific audience possible; preferably in a manner in which little technical language is used so that even a particle physics paper can be skimmed by an evolutionary biologist (I tell my students that this is “the tricky bit”). In some ways, the “comments” section had already been debated out in a flagship journal, so further commentary is not required.

I’m not sure where you got the impression that we are confused as to why research articles (any research articles really, not just the ones you label “cool”) fail to draw comments. It’s subject we’ve written about repeatedly:
and here:

Article commenting is increasingly seen as a futile pursuit. Nearly every publisher has tried to drive commenting in one way or another to little success. There are fairly obvious reasons for this failure, particularly the question of why someone would spend time commenting on another researcher’s article when they could be doing their own research instead.

Not sure where the anti-Nature and anti-Science screed comes into things either. Journals across the spectrum have tried and failed to implement commenting systems.

I’m not anti-Nature/Science; I think that sort of general audience thing has a place in the scientific record. It’s just been abused a bit of late due to the rapid increase in scientific discoveries (something I’m sure you’ve covered somewhere). Also, if I think an article is “cool”, then I post it to my Facebook account and discuss it with my old high school, college, and grad school lab buddies. I do use the internet to share papers, but I do not use the journal’s own commenting system.

You may be something of an anomaly in terms of Facebook use, as many struggle with the separation of personal and professional lives in social media–do I friend my boss and colleagues and allow them in to the posts I share with my relatives and friends?

Regardless, this also speaks to the more private nature of much scientific communication, talking place in smaller, trusted networks rather than open and fully public comments:

Scientists tend to have fairly small trusted circles, and opinions (at least negative ones) are only expressed within these small groups. Your preliminary data is only exposed to your labmates, perhaps to your department or a group of collaborators. It’s unlikely you’ll see truly open communication beyond these sorts of groups (especially from the younger scientists mentioned above) due to fear of committing career suicide. Both are unfortunate, but are parts of the current culture. Any network that hopes to succeed must adapt to the culture of the community, rather than trying to rewrite it. Networks can work on the level of these smaller groups, and there is certainly some benefit in providing more efficient ways for labmates to keep each other up to date. But by working on this level, one loses the positive effects of scale, the benefits of receiving varied opinions, and advice from beyond one’s circle.

Yeah, I know I’m probably one of the first to use Facebook to do “internet journal clubs”. However, I have seen Facebook as a way of…..warning my buddies about the rapidly changing science job market in both academia and industry since the site started my first year of grad school. I also use it to keep up with my students– and, yes; I am “friends” with my PhD adviser. I’ve actually stated in the “Research and Teaching” section of my USA assistant professor application (I’m doing a German postdoc with a neurosurgery bioimaging lab, and the application processes for PI positions is very different between the two countries) something about “using the internet as a means for rapid and reliable transmission of scientific ideas”. I am actually planning on switching to Askemos ( soon (it’s run by a German software engineer I met out here), since it has a highly secure way of ensuring electronic data is thoroughly backed up for the public good. The guy’s main goals are to (a) not ever let “the library burn again” (the one in Alexandria) and (b) notarizable software for things like banking transactions (or author comments, so we have an unimpeachable record of who-said-what-and-when.).

However, I think the generation ahead of me is not as accustomed to putting potentially career-risking conversation “on the record”. I view myself as a public servant, since my (rather expensive) education was funded by Federal Student Loans. I owe the American people money, and would like to provide some return on the investment in my education. Therefore, it’s my job to at least try and make sure my linguistics major buddies- not to mention my parents and grandparents- do not promote scientifically fraudulent ideas. (They’re great guys, but the more liberal artsy ones can overreact sometimes– and many of them are these people that have several hundred to a thousand “friends”. If even a fraction of that audience is watching their posts about things like GMOs or the latest neurofad, then I do think it is important to “stand up and say something”.)

I think we have already covered your explanation of why cool articles get few comments. Multiple comments often involve controversy and include internal discussions among the commenters, as in the discussion between David C and myself here.

I suppose that’s a different type of analysis though, than what most consider a journal “metric”, and certainly requires that journals supply different sets of data (full text of comments in an analyzable form) than the sorts of standard “article level metrics” that most are offering or working on offering (which usually includes numbers of comments, ratings or mentions in various social media).

Even still, one must proceed with caution, as the question of how well any particular set of article comments represents a community is an important one. A subject that may be widely controversial may inspire follow-up papers, whereas one that only matters to a few may inspire a flood of comments from those few. Given the anonymous nature of comments, that may be difficult to determine.

Actually any text on the Web can be indexed unless it is blocked. I am fascinated by some tools coming out of ORNL that use agent technology to pull stuff out of periodicals based on their HTML structure.

Give me a set of journals and as long as their HTML structure is stable I can harvest the articles and the multiple comments thereon. The journals need supply nothing, just allow my crawling.

I then envision using taxonomies of semantic and logical comment types to sort out what the multiple comments are all about. There is a lot of government work going on in this area. Of course ultimately one has to be looking for something. But it is pretty clear that at this point we are not sure what to look for in comments. That is the frontier question.

I would think that having journals implement a set of metadata standards for faster identification and parsing of article comments would be helpful in this sort of analysis.

Interesting! My work is in full text analysis so I have never thought about metadata for comments. What would it look like? We are trying to keep this automated so we do not want manual cataloging of comments but computer generated metadata could be very useful, perhaps also some commenter supplied data. If there is reply nesting as here in the Kitchen then we certainly want that as a proxy for the issue tree. Or do you simply mean tagging comments as comments?

Note that there are now two very different discussions going on in our comments. How to get the computer to see that is the fun problem. The basic idea is that when people talk about differnt things they must use different words and vice versa.

Maybe one has to define a reader though: the Guardian’s output is huge, so most readers will only skim at best before lighting on just their personal interests. Need to know the readers/commenters for each column of a newspaper before you can compare it with scientific papers. Also, the preponderance of trollers and lobbyists on sections like CIF, must surely put a great many people off bothering to join in. Often better to engage with the author directly by eml or on their own sites/blogs.

Comments are closed.