Twitter logo
Image via Wikimedia Commons

A new paper on measuring the impact of microblogging via Twitter (tweets) on article citations is making a buzz within the alt-metrics community, with splashy headlines appearing in many of our Twitter feeds:

Cite and retweet this widely: Tweets can predict citations! Highly tweeted articles 11x more likely to be highly cited http://t.co/dcJLRj7y

The article (and the above tweet) were published recently by Gunther Eysenbach in the Journal of Medical Internet Research (JMIR). Eysenbach also serves as editor and shareholder of JMIR.

The article, Can tweets predict citations? Metrics of social impact based on twitter and correlation with traditional metrics of scientific impact,” has been tweeted 527 times.

In his paper, Eysenbach analyzes a cohort of 55 articles published in JMIR between 2009 and 2010 and tracks their performance — in terms of tweets and citations — to investigate whether the former could predict the latter. Short answer: they could. However, we need to look at this paper in more detail than 140 characters allows in order to understand what is being reported and what it means.

Relying upon data collected by his journal’s Twitter API, Eysenbach focuses on tweets that contain an embedded link to the article, like the URL listed in the tweet above. Since the API registers the date and time when the tweet was published, Eysenbach was able to look at temporal trends in tweets very shortly after the article appeared. Citation data for each article were collected from two sources: Google Scholar and Scopus. At the time of analysis, articles were between 17 and 29 months old.

Not surprisingly, he reports that most tweets were sent out on the first or second day after an article was published followed by a rapid decay. Eysenbach explores the frequency and distribution of tweets over different periods and attempts to correlate them with citation data. Tweets show moderate correlation strength with Google Scholar, he reports, but not with Scopus, a fact that Eysenbach explains by the fact that Google Scholar indexes many non-article sources.

The main message of the paper is that highly tweeted articles were 11 times more likely to be highly cited, a result that makes a great 140 character headline but needs much more context for interpretation.

“Highly tweeted” and “highly cited” articles fall into the top 75th percentile when ranked by frequency of tweets and citations, respectively. There were 12 highly tweeted articles, 9 of which were also highly cited. Conversely, just 3 of the 43 less-tweeted articles were highly cited. While the math for calculating the odds ratio looks pretty formidable, we should recognize that this is a very blunt instrument for measuring article performance. Given the highly-skewed distribution of tweets and citations, the top 25th percentile comprises a very large performance range.

Eysenbach did attempt to construct a continuous regression model including time and tweets as citation predictors and reported that he could explain 27% of total citation variation, a figure that is in the same predictive ballpark as article downloads.

Unlike an article citation — which requires an author to produce a new piece of research, have it vetted by peers, and published in a journal, which is indexed by a reputable source that tallies citations — there are very low barriers to microblogging. Accounts on Twitter are free, and a post requires little more than a few words and a link. Retweeting takes even less effort — a simple click of a button. And if that clicking is too strenuous, many twitter accounts are set up to automatically retweet posts that come their way. It should not be surprising that the 527 tweets to the JMIR article contained many repeat posts, some verging on the compulsive:

  • Gunther Eysenbach (3)
  • Richard Smith (2)
  • Brian S. McGowan PhD (28)
  • J Med Internet Res (7)
  • HOT Most Tweeted (6)
  • RT @ (>100)

What’s more, many of these counted tweets were not sent out by humans. The Journal of Medical Internet Research sends out an automatic tweet when a paper first appears and then sends out monthly tweets to promote the journal’s most tweeted papers. Tweets promoting the journal’s most viewed, most purchased, and most cited articles (from Scopus and Google Scholar) are also sent out automatically, many of which are then retweeted by other tweet bots (and human bots) to the blogosphere. That the author decided to count these tweets as measures of “buzz” leaves me concerned about what Twitter metrics measure and whether they can be considered a valid indicator of article impact.

I have deeper reservations about this paper.

In the shadows of an ongoing legal drama that pits a former editor of a scientific journal against its publisher and the news organizations that attempt to cover it, I’m leery of editors who view their journal as a publication outlet for their own work. While Eysenbach selected his own journal for this study, he decided against outsourcing the editorial and peer-review process. Had this been done, some of these methodological and interpretive problems may have been addressed and potential ethical conflicts could have been avoided. For instance, consider the following paragraph within the methods section:

For the tweetation-citation correlation analysis, I included only tweets that referred to articles published in issue 3/2009 through issue 2/2010—that is, tweetations of all 55 articles published between July 22, 2009 and June 30, 2010 [31-97].

What is wrong with this paragraph? First, Eysenbach cites 66 papers and not just the 55 papers included in his dataset. His reference list thus includes a total of 69 articles citing JMIR, only three of which cite articles for their content — 55 serve to cite data points, and 11 are unaccountable. The above paragraph contains enough information for the reader without serial self-citation. If listing each paper was important for understanding the paper, the author could have listed them in a data appendix. At least, this is what I imagine an external editor and reviewers may have recommended.

Whether or not this practice of citing data points is considered normal in medical research is beside the point: The practice of serial self-citation by an author simultaneously serving as editor and shareholder of his journal appears as suspicious behavior. The effect of this behavior on JMIR’s Impact Factor will become apparent next June when Thomson Reuters issues its 2011 Journal Citation Report. Serial self-citation can result in being delisted from Thomson Reuter’s Journal Citation Report.

Eysenbach has also purchased several domain names (twimpact.org, twimpactfactor.org and twimpactfactor.com) “with the possible goal to create services to calculate and track twimpact and twindex metrics for publications and publishers.” While I appreciate that he discloses his conflicts of interest, with so many sources of potential bias, it becomes hard to separate the paper’s contribution to science from its contribution to entrepreneurship.

Update: 4 Jan, 2012

Gunther Eysenbach has issued a correction to his JMIR paper removing 67 self-citations to his journal. The paper was edited to reflect the change.

Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/

Discussion

44 Thoughts on "Tweets, and Our Obsession with Alt Metrics"

This study is flawed, both in concept and execution.

1) As you note, the lapse in judgment as far as the citations makes one question the entire premise—is this an honest study or merely an attempt to game the Impact Factor? It should be pointed that the articles chosen for the study were strictly limited to a set that would contribute to the next Impact Factor determination (citations in 2011 to articles written in 2009 and 2010).

2) The inclusion of a large number of tweets from the journal’s own marketing team greatly muddies the water. One could just as easily use this study to draw a conclusion that citations can be predicted by the amount of marketing a journal does for a particular article.  The more it is advertised via social media, the more likely it is to garner citations.  So is the measured effect here an accurate measurement of community interest or of marketing?

3) I’m disturbed by the lack of negative controls. How many articles total were published in the time period covered? How many were highly cited that were not tweeted? We know that there were 9 highly cited articles that were highly tweeted. What percentage of the total number of highly cited articles is that? How good a predictive tool is Twitter?

4) Any value found in this measurement will immediately be lost if it becomes used in a formal manner. The Observer Effect states that act of observation of an object changes that object. While the conclusions may be accurate now, if tweets become a metric used for funding or career advancement, they will rapidly be gamed. If I run a lab, then every member of the lab, every member of my family, is going to tweet links to any paper I publish. As you note, the barrier to entry for using Twitter is too low to prevent this, thus rendering the metric unusable in any serious manner.

5) The author notes that the study is flawed because it exhibits the same sort of thinking that has doomed countless “Facebook for Scientists” efforts. This is a study looking at the behaviors of readers and authors of a journal that studies the internet. People whose life work revolves around communicating on the internet have a different culture than other types of professionals. It’s not surprising that people who write about Twitter use Twitter to discuss articles about Twitter. Other types of researchers may not use social media in the same manner. Repeat this study using a molecular biology journal, a cancer treatment journal, a particle physics journal, an organic chemistry journal, a history journal, etc., and then we can see if it has any meaning outside of a small self-reflexive sub-niche.

6) But the biggest question of all–is there any value in this metric, essentially, who cares? We’ve discussed the value of studying the level of interest and conversation in science before. It clearly has value to those who study the sociology of science and those who study the use of internet tools. But does it have any value to the researchers themselves?

If at best Twitter is a semi-okay predictor of citation, does it tell us anything that citation doesn’t? Attention and popularity are poor proxies for quality and impact. We know what generates conversation online: humorous articles, salacious articles and articles about social media. Do we want these to dominate the research landscape?

Two things matter most to researchers: funding and career advancement. If I’m a funding agency, I don’t really care about buzz, I care about results, movement toward curing the disease I’m fighting. I have a publicity budget, that’s not why I’m funding researchers. If I’m an administrator at an institution, I’m heavily balancing my reward system toward those who bring in funding, not those who make a lot of noise.

It’s great to see more altmetrics coverage from The Kitchen, particularly of this important article. Phil and David, as usual you make some trenchant points. A few of my own thoughts on David’s points:

1) I wholeheartedly agree that Eysenbach’s choice to cite his data points was a very unfortunate one, and taints the rest of the paper. I certainly hope he will be quick to publish an erratum, moving the articles to supplemental data where they belong.

2) I think you’re not quite right here. Articles getting extra Twitter marketing are those ones which are already performing well for the journal…what we have is a special case of the well-studied Matthew Effect. But you are spot on to wonder the direction of causality, if any: do high tweet counts promote citations, or are they both influenced by a third factor? It’s unclear, as Eysenbach himself points out.

3) I think you may have overlooked these; see Fig. 8 and Table 2 in the article.

4) You both observe that it’s much easier to tweet something than to cite it, which opens up a lot of easy possibilities for easy gaming. That’s an important point that bears repeating. But I think dismissal on these grounds is premature, for two reasons. First, communities can do a lot of self-policing to stop this. That certainly happens in the case of the IF, as this very post ably demonstrates. Second, and more importantly, anyone who expects the facile counting of Tweets to underlay robust metrics is indeed quite naive. This must be only a first step, just as the facile counting of links was only a first step for search engine rankings. SEOsters can easily buy thousands of inlinks for websites, but this kind of gaming generally fails because Google looks at the authority of those giving the links. Since Twitter makes the network properties of a given graph component quite clear, a similar approach with Twitter is a relatively straightforward exercise. It seems obvious that any successful Twitter-based metric will eventually have to do just this. In the meantime, frequentness approaches do have some value as proof-of-concepts.

5) “Repeat this study using a molecular biology journal… etc.” — good plan…I’m working on it 🙂 But in the meantime, keep in mind that the available data suggest academic Twitter use is not at all limited to “people who write about Twitter,” but in fact spread quite evenly across disciplines.

6) “Who cares” is always a relevant question. I agree that one group is certainly those who study science itself. Another group is people who care about public impacts of science, which Twitter could make much clearer. And I wouldn’t be so quick to dismiss the value of being “an okay predictor of citations”…the world has been quick to reward slightly better predictions of anything important.

But I think the real value is in perhaps expanding our access to the stuff of scholarly impact itself. It’s long been accepted that citations are only a gross measure of the more subtle–and more meaningful–information flows hidden away within invisible colleges. So far these have moved in the concealment of hallways, mailboxes, and conference hotel bars. But maybe Twitter and similar tools are a way in which this hidden impact landscape can be brought to light, creating metrics that are both faster and more nuanced than what we’ve got now. Of course it’s early, and we’re mostly using “blunt instrument[s]” now; Twitter is in it’s infancy. Heck, the Web is in its infancy. But early studies like these are exciting because they suggest that there are conversations worth listening to here.

Hi Jason, thanks for the response. Let’s see:

2) I still think it’s important to separate out the genuine interest of the community from the marketing efforts of the journal that published the paper. The journal’s additional tweets may indeed be in response to papers that have received notice, but it’s still in some ways an artificial amplification. If you’re truly interested in measuring community interest, should advertisements be included in that count? I say no, as the journal itself has different reasons for tweeting (promotion to drive readership and subscription) as opposed to genuine grassroots interest in the subject matter. Just as Google looks to exclude paid links, any study like this should avoid paid tweets.

3) Sorry, that’s what I get for writing a comment several days after having read the article.

4) The problem with an analysis based in this manner is that it rewards networking, rather than actual impact. If you’re following Google’s lead, you’re weighting things toward popularity, not necessarily quality. If we are interested in metrics to measure quality and impact, then popularity is likely not meaningful. We (at least funding agencies) want to reward researchers who spend their time discovering important things, not researchers who spend their time network building and being friendly.

5) I’m sure that academic use of Twitter is not strictly limited to people who write about Twitter. However, I guarantee you that people who study Twitter for a living use it in a different manner than those who study chemotherapy or Xenopus development. To assume all cultures are equal and behave in a similar manner is to ignore reality. Clinical researchers can’t discuss their work in the same way as historians, as confidentiality laws prohibit open discussion of research subjects. Those who spend their days at a wet bench in a crowded laboratory filled with colleagues discussing the latest research have a different social dynamic than those who work at a computer screen in a room by themselves.

6) I agree that it’s an interesting phenomenon, but it’s important to designate what is being studied, and to whom it matters. Alt-metrics are often proposed as being used to replace the Impact Factor and while that particular metric is very flawed, it aims to measure something very different than metrics like tweets or counts of blog posting. There’s a difference between measuring the scientific impact of a study and the social impact of a study. A study about narwhal sex is likely to do poorly by the first and well by the second. This may tell us a great deal about human nature, about society and about communication, but it does not give us a system for rewarding quality research, distributing funding or career advancement.

As much as I like the example, David, I have to take issue with the way you’ve leveraged narwhal sex. The notion that social media favours fleeting and/or sensational topics to the point of obviating its usefulness has been steadily dispelled from other disciplines; why not scholarly communication? While we may not (yet) have any empirical reason to believe that the notorious PLoS bat fellatio article is the exception that proves the rule of headline-grabbing science links, it’s becoming a relatively tired counterpoint for these sorts of arguments. Does the “Cute Cats and the Arab Spring” principle (http://www.youtube.com/watch?v=tkDFVz_VL_I) not apply here?

Hey, we will not stand for any anti-Narwhal, species-ist comments!

The favoring of the fleeting and sensationalist does not obviate the utility of social networks, but it does make it more difficult to derive meaning from them. A quick look at the current trending topics on Twitter as I write this:
#momquotes
#thatex
#PlacesIveHadSex
Tim Howard
Everton
Man U
Newcastle
Ellen
Bolton
(5 of the bottom 6 are UK football teams or players, and I left out the sponsored trend at the top in keeping with my previous comments). This is not atypical, and the fluff generally buries the serious material (take a look at the top videos on YouTube for another example) and it means that if we’re going to use something like Twitter to evaluate science, we’re going to need ways to separate the frivolous from the serious, which is not a trivial task. What’s a sly piece of humor to you and I may be a serious piece of research to a Narwhal reproductive specialist.

And given the nature of social media, there are topics that always resonate more strongly than others, in particular the topic of social media. Any system that uses social media as the basis for rewarding researchers is going to inherently favor those studying social media. This tends to make researchers in other fields somewhat skeptical.

It seems that we’re agreed on most counts, David, although it seems I get a very different take-away:

2) Neither network effects nor self-citation are unique to Twitter; both have long been known to obscure the citation impact signal, as well. But as citation research has also established, “some noise” isn’t the same as “no signal.” It’s compelling that despite noise of these marketing tweets, the emperical results show substantial predictave power for the tweetstream. One must assume that a more sophisticated algorithm with additional filtering would present even better results.

4) Ah, now we get into really interesting territory. What does citation signify? Is it a token of intellectual debt, an edge in the graph of human knowledge, as the Mertonian normalists would have us believe? Or is it a rhetorical device employed by an actor enmeshed in a net of obligations, friendships, squabbles, and alliances, as the Social Constructivists claim? This latter group would (and does) strongly argue that tradtional citation “rewards networking, rather than actual impact,” as you say of tweets. And they’d be at least partially right. And partially wrong. Citers and tweeters alike traverse both intellectual and social worlds, as we’ve learned from decades of citation research, and are beginning to see after a few years of research on Twitter. Again, the question is not “does citwaition=impact” but “can we wring evidence of intellectual impact from tweets.” The evidence in this study suggests (as with citations), yes.

5) Could not agree with you more that communication and citation practices vary widely across disciplines, and that this must be taken into account in creating and cosnuming impact metrics. Bibliometricians have been yelling this at anyone who will listen for years now (often to little effect). Let’s by all means see replication for other domains. But for now, we must say that the available evidence supports the predictive value of Twitter citation mining.

6) The “often proposed” suggestion that altmetrics replace the IF is a bit of a straw man, methinks. As you observe, they’re quite different tools for quite different things. I’m foursquare behind the value of citation-based metrics, and I think that most of us actually working around altmetrics are. That said, I’d love to see the current uncritical, fetishistic overreliance on citation metrics–particularly the IF–replaced by a broader, more holistic, and more nuanced understanding of impact. There are lots of ways to game Twitter metrics, Mendeley metrics, whatever–just as there are lots of ways to game the Impact Factor. It’s only when we start algorithmically combining these approaches, using hardnosed empirical findings as guidance, that we’ll get truly robust metrics.

2) Where you say, “despite the noise of the marketing tweets”, one could just as easily say, “because of the marketing tweets.” One could make an argument here that editorial selection is what is responsible for the increased citations–the editors select the articles they think most significant and carpet bomb Twitter with links. This then leads to the network effects you describe, but its basis is in the editorial selection, rather than having the marketing tweets reflecting a pure selection by the community. As noted in my original comment, their presence greatly muddies the water.

While the Impact Factor can theoretically be gamed, doing so requires clearing a much higher bar than gaming a social network, hence a much better signal to noise ratio (for example, it is impossible to create a spambot that puts references into published papers and unlike Twitter, journals do not sell “sponsored references”).

I agree completely that relying on a panel of metrics would be vastly superior to shoehorning everything into one metric. Each metric I’ve seen has its strengths and weaknesses. The idea would be to weight each metric accordingly, and in particular to include metrics where the strength of one balances the weakness of the other. However, the simplicity of having one set of numbers, no matter how problematic or mis-used, seems to appeal greatly to the decision makers of the academic world. Overcoming this is perhaps a bigger hurdle than the creation of a more accurate set of metrics.

Great post Phil. These altmetrics (comments on webpages, likes, tweets etc) seem to only concern 10% of the journal’s output, and the remaining articles are undifferentiated on 0 comments, 0 tweets and 0 likes. This doesn’t mean that 90% of papers are worthless, it’s probably that the authors didn’t either a) study charismatic megafauna or b) include ‘sex’ in the title.

Someone out there should pick a journal (with which they’re not affiliated) and tweet a few random papers from every issue for 12 months. We can then all come back in 2-3 years to examine whether the tweets boosted citations for those papers compared to the remainder.

A randomized controlled trial of the effect of tweets on future citations…very interesting. That would certainly control for editorial and self-selection. WHO is doing the tweeting would be pretty important. I imagine that no molecular ecologist cares about my twitter feed.

Thanks, Phil. This is very interesting.

I agree with your conclusion: “While I appreciate that he discloses his conflicts of interest, with so many sources of potential bias, it becomes hard to separate the paper’s contribution to science from its contribution to entrepreneurship.”

As with downloads, it is one thing to simply note the existence of a paper by tweet, but something quite different to acknowledge its importance/influence by actually citing it. It is true that raising the media profile of an article by Tweet or other means can certainly hasten the recognition/citation of good work, but it seems a really thorough, unbiased study of this has yet to be undertaken.

For the record, JMIR has now issued a correction of the original article, removing the offending citations to dataset articles, and replacing them with a list of articles in an appendix.

Hats off to Gunther for making the change, and to Phil for being an Alert Science Citizen. A lovely example of how grassroots post-publication peer review is continuing to grow into a vital part of the scholarly communication system.

@ Phil Davis
Your critique are the following points:
1. automated tweets were included: “That the author decided to count these [automated] tweets as measures of “buzz” leaves me concerned about what Twitter metrics measure and whether they can be considered a valid indicator of article impact”.
Response: Yes, there is a fairly constant baseline chatter of automated tweets each article receives, currently about 8-10 tweets or so, but the rest is generally human chatter, which fluctuates, and which constitutes the buzz. Subtracting 8-10 tweetation counts from each tweetatation count is unlikely to have any effect on the results. And the fact that there IS a correlation with citations even if automated tweets are included (see Figure 10) illustrates this. Whether or not the correlation is even stronger if certain kinds of tweets are filtered is obviously a question for future research.

The rest of your critique seems to me more directed at “process” rather than addressing the substance of the paper.

2. “I’m leery of editors who view their journal as a publication outlet for their own work.”
JMIR has implemented a number of publishing innovations and will publish a series of editorials providing data on our experiences (and yes, there will be more, e.g. on open peer review). In my mind, there is nothing wrong and nothing unusual about editors talking about their work as an editor in their own journal, and most journal editors will publish editorials discussing innovations at their journal, discussing their impact factor or otherwise reflecting on the impact of their journal, in their own journal.

3. “he decided against outsourcing the editorial and peer-review process”
This is incorrect. Even though this was an editorial, it was in fact externally peer-reviewed and handled by another editor (see information at the bottom of the paper), as I wanted to be sure it makes a contribution to the field. Peer-reviewers were blinded about the identity of the author. Peer-reviewers are acknowledged by name at the end of the paper.

4. “First, Eysenbach cites 66 papers and not just the 55 papers included in his dataset”
This was an innocent mistake in the references, which had no impact on the results, which happened when the doi’s were pasted into the reference list and accidentally included references which were not part of the articles discussed. An erratum (http://www.jmir.org/2012/1/e7/) regarding the 12 extra references is already published, and the reference section has been corrected.

5. “If listing each paper was important for understanding the paper, the author could have listed them in a data appendix. At least, this is what I imagine an external editor and reviewers may have recommended.”
Reviewers did not recommend this, and citing references in an external appendix is against JMIR’s policy if it can be avoided, as this requires the reader to open an additional file, readers cannot click through, it is not crossref linked, it does not appear in printed versions etc. For this paper, I thought it would be important for the reader to easily see which articles we are talking about, and where in the 2×2 impact table quadrant each article falls, which is why they were all cited. Nothing wrong with this, and all done in the readers’ interest. Having said that, we want to avoid any impression of improper conduct, and have moved the references into a separate file, so that the future impact factor is not “artificially” increased (though we could have a debate on whether this would really have been an “artificial” or significant increase).

6. “Eysenbach has also purchased several domain names (twimpact.org, twimpactfactor.org and twimpactfactor.com) “with the possible goal to create services to calculate and track twimpact and twindex metrics for publications and publishers.” While I appreciate that he discloses his conflicts of interest, with so many sources of potential bias, it becomes hard to separate the paper’s contribution to science from its contribution to entrepreneurship.”
How many research groups are owning domain names related to the subject of their research? (and how many are not even disclosing this?). In hindsight it was probably not even necessary to disclose this.
Tweets and citations are openly available for anybody who wants to check the data or do a re-analysis. In the interest of transparency, all tweets are even made available as Multimedia Appendix 1. I am not sure how the observer could possibly influence the results, and I have no motive to do so. The data are what they are. I have no revenue streams from altmetrics, and JMIR is well ranked impact-factor wise, so I have no reason to discredit the impact factor as a valid metric. In fact, the discussion is quite critical of altmetrics, saying explicitly that they should be seen as a complementary metric to citations, not an alternative (which is why I prefer to talk about infodemiology metrics, not altmetrics). The paper compares the impact of papers WITHIN my journal, not against papers in other journals.
As to the statement that it is “hard to separate the paper’s contribution to science from its contribution to entrepreneurship”: I am primarily a researcher (this pays my salary), but the institutions I am affiliated with value entrepreneurship as a mechanism for knowledge translation. It is NOT enough if research findings are to be cited, if findings are not translated in real products. For this reason and this reason alone I have registered some domain names and _consider_ to explore the possibility to build services around these findings, with forward-thinking publishers or other partners who are open to the idea. A lot of good companies (including Eugene Garfield’s ISI, Google etc) are built on scientific discoveries – and there is nothing wrong with this. You may hold a certain entrepreneurial spirit against me, but at the end of the day this is what drives innovation and progress.

@ David Crotty

1) While we normally do not send out editorials for peer-review (few journals do), this editorial was in fact peer-reviewed by two external reviewers (Jason Priem being one of them, and he is happy to make his peer-review report available). (JMIR occasionally sends editorials out for blinded review if they contain data). No reviewer said it is unusual to cite the studies whose scientometrics are discussed.

JMIR has _no_ space limitations and – as a general policy – prefers to cite all references in the article rather than in an Appendix. Citing these 55 articles is totally justifiable and necessary. In any case, all this is just a distraction and has nothing to do with the findings of the paper, and we reacted quickly to get this issue out of the way, by moving references into an appendix.

2) JMIR is a micro-journal, with no “marketing team”. As described in the discussion as a limitation of this study, JMIR does send out automated tweets from our “top10 articles” RSS feeds (which are in turn based on community behavior and interest). All tweets are provided as data appendix (Appendix 1), so anybody can test the hypothesis that excluding these would lead to different results.
What Phil Davis writes above, that “JMIR sends out monthly tweets to promote the journal’s most tweeted papers.” is slightly incorrect. As described in the discussion, an automatic tweet is sent whenever a new article enters a top10 list (top viewed, top cited, top tweeted, top purchased), and this is not a manual process, but also a result of the collective intelligence of the users. Moreover, the results show that the volume of tweets after only 3 days is already predictive of citations. So very little opportunity for our non-existing “marketing team” to make a dent.

3)
“I’m disturbed by the lack of negative controls. “
There ARE negative controls (see Table 2).
“How many articles total were published in the time period covered? “
n=55, as explained in multiple sections of the manuscript. ALL 55 articles published in issue 3/2009-2/2010 were included.
“How many were highly cited that were not tweeted? “
See Table 2 and elsewhere in the ms. 3/12 (25%) of the highly cited articles were not highly tweeted.
“We know that there were 9 highly cited articles that were highly tweeted. What percentage of the total number of highly cited articles is that?”
75%
>How good a predictive tool is Twitter?
All reported in the manuscript – in our dataset the positive predictive value was 75%. There are many different ways to express “how good a predictive tool” Twitter is – sensitivity, specificity, regression. This is all in the paper.

4) – 6) All these points are discussed in the discussion, are they not? Please read the paper carefully. That popularity/attention metrics measure a different concept than what citations measure (while nevertheless they are correlated and there are interactions) is clearly said (even in the abstract) and also illustrated in Figure 12.

Still, as I said in the paper (discussion), I think there is value to these metrics, for funders and researchers. I do not agree with the notion that funders or research institutions do not care about publicity/popularity/social impact. As an aside, not all research is intended to “cure disease”.
The bottom line of the paper is that at least in the context of JMIR, tweets are useful metrics, predictive for citations, and they are potentially also useful for other journals. That’s what the paper says, not more and not less. I was very careful not to generalize this to other journals (READ THE DISCUSSION). As Jason mentioned, he is working on replicating this with other journals, and we are working on this too, as mentioned in the paper, and we are welcoming and encouraging further paper on this topic, as the call for papers at the end of the editorial explains. One paper cannot answer all emerging questions, and I think a good, seminal paper raises more questions that it answers. This is how science works – in small incremental steps. We show something works for one case, then try to replicate and ultimately generalize it, or not.

@Tim Vines
“These altmetrics (comments on webpages, likes, tweets etc) seem to only concern 10% of the journal’s output, and the remaining articles are undifferentiated on 0 comments, 0 tweets and 0 likes.” – this is absolutely not the case in our journal, see Figure 2.

Thanks for the further clarification, a few responses:

Can you explain the difference between an “editorial” and an “article” in your journal? In my experience, editorials are usually opinion pieces or journal announcements, whereas data-driven research is instead published as articles.

2) Though your tweets are automated, this may not be apparent to those reading the tweets, or seeing retweets of those tweets. If one stumbles across one, and does not know the journals Twitter activity policy, then one is likely to assume it has been selectively highlighted by the journal’s editorial or marketing staff (as is the case for the Twitter feed for the publishing houses and journals where I’ve worked). This is marketing, pure and simple, and does have an influence on the reception of the articles. One could make an argument that advertising like this is part of the landscape and should be included, but I take the position that it is something different than the actual community reaction to an article. Having an element of editorial selection driving the process (whether real or implied) alters the process.

4-6) The acknowledgment of flaws in the study does not make those flaws go away. You do note several of these flaws, but for the reader of this blog, and the reader of a 140 character tweet stating that “tweets predict citations”, those flaws are not evident. Our discussion here is meant to illuminate and better understand the meaning of the study, and refining what we can clearly learn from it is an important part of that process.

Some editorials contain data, some don’t. The former are usually peer-reviewed, the latter aren’t.

2) Again: As most other journals, JMIR uses automated processes to tweet about recently published papers. I believe your concern would be valid if we would selectively and manually promote certain articles over others. But this is not the case.

4-6) Every study has limitations (and I prefer the term “limitation”, which is different from a “flaw”), which are acknowledged in the discussion. My discussion has two parts: one on the limitations of the dataset and potential confounders, and one on the limitations of the twimpact metric as a metric itself. In this blog everything appears to be a bit muddled, and concerns about the validity or meaning of twimpact metrics (which I share and which are extensively discussed, mirroring in part the discussion here) seem to be used to question the internal validity of the study. In your comments and discussions please distinguish between “flaws” or limitations that affect the internal validity of the study (and I haven’t heard anything that would undermine the internal validity) vs the external validity (generalizability). And this is further muddled in the discussion here by “ethical concerns”.

Again, I fail to understand the reason why an article is labeled as an “editorial” while another isn’t. Editorials are not described in the journal’s author instructions. Is the only distinction that the article has been written by the journal’s editor? I’m not aware of other journals that publish research articles by their editors as “editorials” rather than as “articles”, though this may be a product of my own ignorance.

2) Your perception of the tweets may differ from that of the recipient of those tweets (and the recipient of a retweet of those tweets). My experience has been very different as far as “most other journals”. In my experience, most other journals do not send out 8-10 automated tweets for every article published in the journal. For the journals I’ve been involved with, the editors use Twitter to point out particular highlights, rather than pouring out a constant stream of repetitive links. Most would be afraid to do this as it might be seen as spam. Given that the recipients of the JMIR’s automated tweets (and retweets from others) may not be aware of the nature of these tweets, is it possible that some might think of them as having been editorially curated?

I am not suggesting that the journal is doing anything wrong by using Twitter in this manner, but that it brings an additional variable to the study. Are you completely, 100% sure that in the absence of any tweets whatsoever from the journal, that the community behavior would have been identical? If not, then you are measuring a combination of the impact of both marketing and community social interaction. This is also a worthy subject for study, but clearly a different beast than a study looking solely at the social impact of the community itself.

4-6) I don’t have any particular issues about your analysis of the numbers themselves (the internal validity). I do have questions about the conclusions drawn, many of which echo your own doubts and deserve further discussion (isn’t that the point of post-publication peer review after all?). As noted in point 2 above, I do worry that a sample containing a mix of advertising and community reactions muddles the issue of exactly what is being studied. I also think it’s important (as you seem to agree) to clarify what these sorts of metrics really measure, and to whom those things are important.

There are often knee-jerk reactions to new metrics like this, either an immediate approval of anything crowd-sourced and anything not the impact factor, or an immediate rejection of anything seen as unvetted and dominated by the tyranny of the masses. But the questions are really more nuanced than that and working out the details has value beyond a simplified “tweets = citations”. In your paper you do mention the “limitations” and voice an opinion about the value presented despite these limitations. Surely that opinion is open for debate and not meant as an unassailable final word.

In JMIR, an editorial is a piece written by the editor related to the journal or articles in the journal.

2) to be clear, when I say 8-10 automated tweets, it does not mean they all come from JMIR. JMIR sends out 1 tweet per newly published article, and a tweet when a new article enters a top 10 list. But there are automated lists/bots from third parties which pick up journal feeds and rebroadcast them to their followers. These are what we mean with “automated tweets” and “baseline chatter”. This is part of the community response (but a constant background noise).

“Are you completely, 100% sure that in the absence of any tweets whatsoever from the journal, that the community behavior would have been identical? ”
No. PLEASE PLEASE READ THE STUDY, esp the discussion. This is all discussed there.
I write: “There are further, JMIR-specific caveats. First, as shown in Figure 1, JMIR ranks the top-tweeted articles on its website, and also sends out automatic tweets whenever a new article enters the top 10 in any of the monthly categories; both may have reinforced and amplified the response from Twitter users. Also, tweetations are a metric of the social media response; hence, the social media strategy of a journal likely has an impact on the results. Journals with an active social media presence and tweet alerts such as JMIR will have a higher uptake. JMIR followers have to click on only one button to retweet or modify these alerts (seed tweets). Journals that do not send out alerts for each article may have very different tweetation characteristics (eg, more late-stage tweetations). Further, the tweetation characteristics and rates are almost certainly influenced by the number of followers a journal has (JMIR currently has over 1000 followers) and, even more so, by lists and Twitter bots redistributing content to specific communities.”
I never contend that the community behavior would be identical. It would certainly be different. But this is besides the point. I describe the situation for JMIR. The idea of the twimpact factor is an article-level, within-journal metric. I clearly say that what we observed for JMIR may not work for other journals.

“I do worry that a sample containing a mix of advertising and community reactions muddles the issue of exactly what is being studied”.
There is no “advertising”, beyond what is described above. I completely get your point and any future studies that uses a journal which has a more active approach to promoting selected articles e.g. by sending out press-releases for selected articles etc, are in danger of falling into this trap. I just don’t see this for JMIR, as we have a rather passive approach, and more importantly, all articles are treated the same.

If someone disagrees with your conclusion, it may not be due to their not having read the article, but in fact, may be due to them drawing a different conclusion. I don’t think we’re really disagreeing here over the substance, but instead on the interpretation of what it means.

You did state, in a previous comment that the promotional tweets had no effect on community behavior:

“The baseline chatter of 8-10 automated tweets is the current situation, back then it was probably less. Again, I fail to see how this would affect the findings.”

I think this is a point worth examining–I’m taking the viewpoint that the experimental design of a study meant to measure the social impact within a community should exclude the efforts of a commercial entity outside of the community trying to exert its influence. Your argument, if I understand it correctly, is that it is impossible to separate out the two, that by the very nature of Twitter, the promotional activity is part of the social impact. The social impact being measured is going to be highly dependent on the level of promotional activity driven by the journal.

If that is the case, then isn’t that problematic for this as an independent metric? If we are using it to measure social impact, and if funding agencies find social impact something worth pursuing, then won’t this just start a public relations arms race? If a funder measures success by the number of tweets a paper receives, then won’t authors choose to publish their papers in the journals with the strongest marketing efforts, with those willing to automatically tweet thousands of links to each paper?

This gets back to the big question of what is meant by “social impact”? Do marketing activities dilute the measure of the true community response? Can we separate out advertising from actual conversation?

“If someone disagrees with your conclusion, it may not be due to their not having read the article, but in fact, may be due to them drawing a different conclusion. I don’t think we’re really disagreeing here over the substance, but instead on the interpretation of what it means.”

Ok, this sounds more like a discussion on the external validity / generalizability of the findings or use of the suggested metrics twimpact factor, twindex, and tweeted half life, rather than a discussion over internal validity.
Just to repeat my stance on internal validity: My principal findings are that in our dataset, in our case, in our journal, tweets are predictive for citations. You said earlier that the study is “flawed in execution and concept”. If you mean by this that the study is not internally valid and these findings (that tweets are somewhat predictive for citations) are questionable because there are confounders in our dataset, then please clearly explain what these confounders are. I understand you were worried that a human marketing team could have influenced these results, and I completely agree that if on the journal side there would have been efforts to aggressively and selectively promote certain articles over others, this would make the findings (correlation tweets/citations) much less surprising as we would have measured the predictive ability of the marketing team to spot articles that will be highly cited, or the impact of the marketing efforts on citations (though even this would be an interesting finding). But JMIR has no marketing team, the seed tweets are automatic, and equal for all articles (unless there are technical glitches). So this is not a factor.

In terms of external validity, the question where do we go from here, what does this mean for the applicability of the twimpact factor and twindex for measuring social impact, perhaps also on other journals apart from JMIR. You raise some interesting questions (which I also raised in the discussion), but to be clear, this discussion is not about “flaws” in the study, but more a discussion on what to make of these metrics going forward.
First you ask “The social impact being measured is going to be highly dependent on the level of promotional activity driven by the journal.” and you are concerned that “if a funder measures success by the number of tweets a paper receives, then won’t authors choose to publish their papers in the journals with the strongest marketing efforts, with those willing to automatically tweet thousands of links to each paper?“.
I agree, but to be clear, – the suggestion was to use these metrics to compare only similar articles from WITHIN the same journal, in the same topic field etc with each other. The twindex7 was suggested and is being used on the JMIR website at http://www.jmir.org/stats/mostTweeted/1 (where we display various metrics derived from twitter including twimpact factor and twindex), The Twindex7 is the rank percentile of an articles’ twimpact factor (tw7) compared against similar/previous articles in the SAME journal.

There is an assumption here that the journal does not promote selectively certain articles over others, which is the case for JMIR. As an aside, if journals or other commercial entities do promote certain articles over others, then the interesting question remains to what degree in the social media context these marketing efforts will skew the results. One may argue that in the social media context, it is not that easy to “make” something “stick” or going viral unless it hits a nerve, solves and important question, etc. A marketing team may send out dozens of tweets, but if it doesn’t stick or resonate with the community, it is unlikely to propagate through the network and go viral and create some more sustained interest.

To your question “if we are using it to measure social impact, and if funding agencies find social impact something worth pursuing, then won’t this just start a public relations arms race?”, several answers come to mind:
1) it remains to be seen to what degree PR interventions can “make” something go viral in the social media context and have an impact on the suggested metrics.
2) one may argue that funding agencies aren’t completely opposed to the notion that in order to create change, promote innovation, translate knowledge into practice change, a certain degree of PR is part of that game. Funders are interested in impact, and impact goes far beyond citations. Funders also need to make an impact on public perception of science, in order to garner support for taxes, donations etc. They do PR themselves in order to increase the impact of the research they are funding. So as long as we can exclude over-the-top marketing/spamming/gaming the system, the fact that social impact metrics are somewhat influenced by “PR” isn’t necessarily a bad thing if we want to measure social impact
3) you can call the pursuit of citations also a PR arms race, even though it is played with different weapons (going to conferences, presenting work, networking with colleagues)
4) even if I follow your earlier argument that funders/promotion committees are only interested in rewarding those who cure disease, citations aren’t necessarily a better or more timely way to measure if somebody has cured a disease than tweetations. If somebody would publish a breakthrough study showing how to cure cancer tomorrow, where do you think this would first be measurable, in citations or tweetations?

In JMIR we don’t publish papers where researcher “cure disease” (few journals do), but it seems that the twimpact factor/twindex is often an early indicator for a paper that will also be highly cited. That in itself has value for me as editor, for readers, and presumably also for funding agencies, promotion committees etc. The overreliance on the journal impact factor as a proxy for article-level impact is partly due to the problem that we have to wait 2 years to get article-level citation data.
As said in the paper, apart from social impact metrics being a pure predictor for citations, they measure also other things (public attention, PR etc). It remains to be seen whether these different concepts can be parsed out, but in the meantime, looking at the aggregate metrics (twimpact factor and twindex) we are providing on an article level at http://www.jmir.org/stats/mostTweeted/1 is certainly of interest to me, both in my capacity as journal editor and in my capacity as researcher, to identify the “hot” areas that resonate in the social media space.
I leave it to other to decide what they make out of this, but my prediction would be that you will see a lot more journals publishing these or similar metrics.

I’m not questioning your actual analysis of the numbers–the flaws (or limitations if you prefer that term) are in the experimental design, and, at least in my opinion, some of the concepts driving the study and the publicly stated conclusions. In particular, and probably due to the abbreviated nature of Twitter, much of the nuance of the study is lost in the public discussion of the results (including the nuance you point out yourself in the paper). I think in a discussion/review of the paper, that nuance should be brought forward, and made clear.

Here’s the tweet quoted from above:
“Cite and retweet this widely: Tweets can predict citations! Highly tweeted articles 11x more likely to be highly cited http://t.co/dcJLRj7y

That’s a very different story from a statement that these results are only applicable to one journal, in one subject area with a variety of caveats.

You and Jason have both been in strong agreement that behaviors vary widely depending on the research culture, and you have repeatedly stressed that this may not apply at all outside of your own field (as I said, it’s not surprising that people who study Twitter use Twitter to discuss articles about Twitter). If the premise behind the experiment was indeed to test whether “tweets can predict citations” as stated in your own tweet, then as an editor, I would likely have rejected the paper for cherry-picking a data set likely to provide a particular conclusion.

Extraordinary claims require extraordinary evidence, and I’m not convinced this paper provides that. I’m reminded of the Darwinius paper from 2009, where the paper itself was even-handed and fair, but the press release made all sorts of exaggerated claims. If I’ve been overly harsh in reacting to your paper, then this perhaps explains why.

I have a problem with the experimental design as I don’t think self-citing promotional activities should be included as valid contributions to the social reaction to one’s work. You suggest this is irrelevant because your activities are automated. I don’t think that makes a difference. Either the promotional tweets exist or they don’t. Either they add to the total number of tweets per article or they don’t. In this case, since it is trivially easy to exclude them, why not present a cleaner set of numbers?

I’m also a bit confused by your explanation of the journal’s automated tweets. You’ve stated that the activity is equal for all articles, yet you’ve also stated a number ranging from 8-10, and a variety of situations where an article enters a top listing for a category and gets retweeted. Are all articles equally entering the top 10 of all categories and seeing the exact number of automated tweets and retweets?

As you point out, without those seed tweets, much of the activity seen may never happen, and that’s a really interesting phenomenon unto itself, perhaps one worth study. How manipulable is the metric here? Take a random sampling of articles, don’t tweet at all for some of them, tweet and retweet to a varying degree for the others. If the metric can be heavily manipulated by promotional activities, then it becomes problematic and less valuable. Again, this is the nature of social metrics, particularly those with low barriers to entry.

I worry that if this metric is taken seriously in terms of career reward or funding, even if it is used exactly as you suggest (only comparing articles within one journal, only using it on journals that tweet automatically and evenly), then it will still be widely open to manipulation by researchers and institutions looking to increase their standing. If it is used to spot rising stars and hot topics, then who wouldn’t want to be included there? And if you can get yourself on that hot list by sending out a few tweets, then isn’t everyone going to start doing so? If I’m a journal editor who is also a researcher, maybe I have the journal tweet papers in my field a bit more than those in other fields to raise the perceived importance of my own research.

This gets back to that idea of the “observer effect” mentioned way back in a past comment. In some ways, the less people are aware of this metric, the less formally it is used, the more accurate it is likely to be. As soon as it becomes a target that provides rewards, it may become entirely useless.

And I do agree with you that we’ll see more and more journals throwing up more and more sorts of metrics like this against the wall to see if they stick. I think a lot of this is likely to be due to “me-too-ism” and wanting to seem progressive without a great deal of thought given to the actual value of the metrics themselves. To be clear, I’m not accusing you of this, as you have taken the time to perform some analysis to dig further into the actual meaning of what’s going on.

Well, if your primary concern is now that my tweet within a 140 character limit did not repeat the discussion or nuances of the article then I can live with this. Tweets are a medium to draw attention to research, not to provide an in-depth discussion. Conversely, they can be used to MEASURE attention, but not necessarily sentiment. This is even alluded to in the paper itself:

a cursory scan through all the tweets suggests that the vast majority of tweets simply contained variants of the article title or the key conclusion, and rarely contained explicit positive sentiments (such as “Great article!”) or—even less common—negative sentiments (such as “questionable methods”—I have not seen any examples of the latter). This may be because the mere act of (re)tweeting an article is often an implicit endorsement or recommendation with which readers express their interest in and enthusiasm about a specific topic, support the research question and/or conclusion, or simply want to bring the article to the attention of their followers. Additional comments are not necessarily required to express this implicit endorsement.

You may decry this or not, this is the nature of the medium.

As an aside, if I would have sent out a series of tweets with a more nuanced discussion, then Phil Davis would probably have accused me of excessive “self-tweetation” to boost my personal twimpact factor!
So it is slightly ironic that you decry that the tweets are not nuanced enough, while Phil lashes out against the one person who tried to do exactly that:

It should not be surprising that the 527 tweets to the JMIR article contained many repeat posts, some verging on the compulsive:
(…)
Brian S. McGowan PhD (28)

So Brian McGowan – who according to this tweeted 28 different tweets quoting different sections from the manuscript, is being labeled by Phil Davis as “verging on the compulsive”. Boy, am I glad I didn’t tweet my discussion!

As to “extraordinary claims require extraordinary evidence”, I don’t think this is how science works. The first hints that something works or not often comes from observational studies or case studies, like this JMIR case study. If we find something extraordinary or surprising, others can try to build on this and test whether this works under other circumstances, with larger sample size, with other populations, and with different premises, and refine the methodology. If it is replicated in other scenarios we can build theories and products on top of it, and so on. Science is iterative. This is a JMIR case study, and clearly described as such, with the JMIR dataset. Your reaction is “all very interesting, but show me that this works for other journals too” is a bit of an unfair critique. Yes, we are working on this, but in the meantime, here is a hopefully useful report (which some have called seminal) which shows in a detailed and thorough manner how the tweetation behavior for a specific journal looks like and how it relates to citations. This has never been done before, so many think this is an exciting finding, and the attention this article gets on Twitter is testimony for this.

As another example on how science works, in 2006 I published another infodemiology study suggesting that Google searches for influenza-related terms may be predictive for influenza outbreaks. Google jumped on this, did a thorough investigation, and in 2008 published a larger investigation in Nature, and developed the Google Flutrends system.

Again, there is great confusion about the “8-10” automated tweets. To repeat: with automated tweets I referred to tweets which come from other bots and lists. JMIR sends out exactly one tweet per newly published article, and one tweet when a new article enters the top 10 list. If you are still confused about the level of tweet activity from the @JMedInternetRes account, just have a look at the JMIR Twitteraccount: https://twitter.com/#!/JMedInternetRes. To see automated tweets from others, look at http://www.jmir.org/stats/viewTweets/all/2012 – tweet #2-#5 seem to come from other lists and are automated (so I should revise my estimate from 8-10 to 5).

As to “Are all articles equally entering the top 10 of all categories”: no, only the top 10 articles are highlighted. So if an article is already highly tweeted, it enters the top10, and the journal sends out another tweet (“HOT: …”). This is done only once (unless the article falls out of the top10 and re-enters). That’s it. As acknowledged in the discussion, “this may have reinforced and amplified the response from Twitter users.”.

I agree with your RCT proposal and we are already considering it. However, not sure the results would be surprising (articles without seed tweets will be tweeted less, I think this is a foregone conclusion.).

I think I was trying to better understand my reaction to the paper–as you’ve pointed out, the paper itself is filled with caveats and qualifications. So why was I under the impression that it was claiming something that it wasn’t, that this was a generalized phenomenon and that tweets can predict citations? Then I realized that the author of the paper, and the editor of the journal in which it is published made a public statement claiming as much. Tweets predict citations. Highly tweeted articles 11X more likely to be highly cited. That’s a bold, broad statement, and one that requires a great deal of proof, proof that is not provided by this paper.

I think it speaks to the nature of social media, which is a media of self-promotion with very little oversight. It’s probably why the vast majority of scientific researchers view it with some trepidation in terms of using it in a professional capacity. There’s been a strong negative reaction in recent years to “science by press conference” (http://boingboing.net/2011/01/06/science-by-press-con.html) and if anything, the low barrier to entry for social media is only going to exacerbate this problem.

So, going into this paper, which I found via Twitter, I was expecting an elaborate set of experiments detailing the validity of Twitter in predicting citations. What I found was a paper that offered much less. A paper that described a fairly derived situation, and that only claimed Twitter was predictive for one journal, for one brief time period, in one particular field of research prone to use Twitter, etc. etc.

The paper itself may do a fair job of not overstepping the bounds of what the research shows, but the paper does not exist in a vacuum. If we are moving into an era of post-publication peer review, then the author’s public comments on the work are also fair game for critical analysis. I previously cited the Darwinius paper as an example (http://classic.the-scientist.com/blog/display/56110/):

[in the published article, the authors] used cautious language to describe the evolutionary connection. Darwinius “could represent a stem group from which later anthropoid primates evolved, but we are not advocating this here,” they wrote in the study. But the researchers made bolder claims to the press. The Times of London quoted Franzen as saying the fossil, nicknamed Ida, was “the eighth wonder of the world.” The same article quoted Jorn Hurum of the University of Oslo, one of the study’s coauthors, as saying that the “fossil rewrites our understanding of the evolution of primates.” Headlines shouted that scientists had found the “missing link” between lemur-like and monkey-like primates, a discovery that offered new clues to the evolution of humans. The History Channel ran a documentary and two popular science writers published a book, The Link, soon after the announcement.

The evolutionary community was, rightly so, up in arms over this, despite the publication itself being fairly cautious and not overstepping its bounds.

If the study is meant to be a minor iterative work, showing that under a very particular set of conditions, Twitter can predict citation, fair enough, but it certainly doesn’t support the broader public claims. A reaction of “all very interesting, but show me that this works for other journals too” is not in any way, shape or form unfair. It is an accurate description of the work, and one that approaches the question with scientific skepticism and rigor. I’ll believe it when I see it. You haven’t done the experiments to make the bigger point. They may be in the works, they may support your thesis. Until they’re done, you don’t get any credit for them. You can’t assume this is an important breakthrough unless you have the data to back it up, and you can’t expect anyone to just take it on faith that maybe someday it’ll turn out to be so.

As for the inclusion of automated self-tweets, I was trying to think of the right analogy. I’m from the world of biomedical wet bench research. Let’s say I’m doing a set of experiments on chemotaxis, basically measuring the response of worms to different chemicals in their environment. Do the worms move toward chemical X or away from it? If, in performing the experiments, I find that the worms do not move at all unless I “prime the pump” by setting up an automated robot that twice picks up and moves each worm in order to get it going, I would likely not include those two automated self-created movements in my data set. They may be automated, they may be necessary, and they may be equivalent across all trials in my data set, but they are not the external phenomenon I’m trying to measure.

Gunther, thank you for replying.
With regard to your treatment of the data, the inclusion of automatic tweets may be a problem. If each paper receives a “baseline chatter” of 8-10 tweets and you report that the median number of tweets per paper within the first 7 days is just 8 (mean=13.9, range 0–96), then about half of the articles in your study include just automatic tweets. Given that your twitter feed includes the source of the tweet, it shouldn’t have been a problem to remove these marketing tweets. I would also argue that self-tweeting (like self-citation) be removed from the analysis as well. I don’t know that removing these would change your results, but I’m questioning why they were left in the dataset.

As for the citing of datapoints as references, I cannot accept the premise of error or inexperience from you or the peer reviewers. You have a long history of medical publishing that extends back to the late 1990s. You have written articles on citation analysis without citing each article in your study. Including them in a paper in which you simultaneously served as author and editor seems very odd to a reader. Blaming your “scientometric experts” for failing to call you on this behavior cannot excuse your responsibility as author and editor. Even more troubling is the fact that you edited a published paper with the rationale that the paper was not really a paper, but merely a “manuscript” because it was not yet indexed by PubMed Central and Web of Science or added to a content aggregator like Swets. While you contend in your comment (above) that this paper was handled by another editor, I do not see this anywhere in your paper –unless this part was post-pub edited as well. I am not a medical journal editor, but this behavior and justification seems quite odd. I’d be interested in what COPE and the ICMJE says about this approach.

I do commend you for issuing a prompt correction, however I draw a different conclusion than you. Your correction does not illustrate “the limitations and tyranny of the impact factor, and why we should consider additional metrics” but exactly the opposite.

“Can we separate out advertising from actual conversation?”

Unless you’re referring specifically to paid advertising in social media, on the grounds that it’s measurably more objectionable than regular lip service (shades of Citizens United here, in that case), I would say no — just as we can’t (and don’t) separate “advertising” in the form of citation from arbitrarily-more-valid academic attribution. Recall Bruno Latour — people will cite others for whatever fit or unfit reasons that they choose. Deriving intent is, for the purposes of all such currently functional systems, beyond us. I don’t think this particular portion of our debate is as tractable as you seem to believe.

While it is difficult to derive intent, most reasonable systems do filter out obvious examples of self-promotion. For example, Thomson-Reuters penalizes journals that exhibit too much self-citation. Google reduces the rankings for sites that use link farming and other shady SEO tricks. So why shouldn’t that be the case here, where doing so is in many ways technologically trivial?

If one argues that it is impossible to filter out shilling and gaming in any way, then the metric loses much of its value. A lack of filtering combined with a low barrier to entry means an easily gamed, less meaningful metric.

The baseline chatter of 8-10 automated tweets is the current situation, back then it was probably less. Again, I fail to see how this would affect the findings.

I guess we just disagree on whether I merely discussed “datapoints”, or whether it is important for the reader to see and judge for themselves which articles ended up in which hi/low cited/tweeted quadrant etc,. It is also a difference if somebody conducts a scientometric study with thousends or hundreds of articles (which is a purely quantitative exercise), as opposed to a study with 55 articles, which are citable, and where there is a qualitative aspect to all this (WHY are they tweeted but not cited etc.).

Ah, and now a new accusation.
I did not edit a “published paper with the rationale that the paper was not really a paper” – where the hell is that coming from? We followed the standard protocol for making a correction in a published article, which is to publish an erratum, which describes the edits, we submit the correction notice to Pubmed, and we resubmit the corrected article to PMC which has XML tags crosslinking to the correction notice (only in this case it wasn’t submitted to PMC yet) – as we have done in other cases before http://www.jmir.org/2009/1/e2/ and as it is standard practice. The corrected paper is marked as corrected paper with a link to the correction notice (and if you really want to know more details, convince yourself that we used the right XML tags for this – http://www.jmir.org/2011/4/e123/XML), and we only made the changes described in the correction notice/erratum. As an open access journal we submit papers to a wide variety of aggregators, and we don’t always have control over which version goes up where, and whether they replace articles with a corrected version, which is why we always describe when in relationship to various database submissions the corrections were made. Feel free to educate yourself by asking COPE or ICMJE, but please stop these nonsense accusations.

No, there were no other edits made. The editor who handled the manuscript and the peer-reviewer names appear at the bottom of our manuscripts (Edited by A Federer; submitted 22.11.11; peer-reviewed by M Thelwall, J Priem …). Has always been there, and is present on every single article.

Gunther. I’m sorry that you want to withdraw from the conversation. I feel that many would consider dialog about the validity and integrity of a paper to be a substantive form of post-publication review, something that counting citations or tweets simply cannot do.

I think we have a misunderstanding of the term “manuscript.” I consider that your paper was published. It has a full citation (volume, issue and page number) along with a registered DOI. I consider this to be a “version of record” and something that should not be tampered with lightly. You may be referring to the document more generally.

Having to remove references from a manuscript to preserve the validity of a journal-level impact metric is somewhat troubling, but if anything then this perhaps illustrates the limitations and tyranny of the impact factor, and why we should consider additional metrics.

Yes, it is a published manuscript. A published manuscript is a scholarly record, which we NEVER ever change without publishing an erratum, which exactly what we have done, I do not understand why you accuse us of “tampering with [this] lightly”. I find this an outrageous statement. We do not “tamper” with published manuscripts. The correction notice provides an exact record of what has been changed. This is the standard process in scholarly publishing and the only situation in which it is “allowed” to edit a published manuscript.

As an aside, I told you already BEFORE you published your blog in a personal email that we will publish an erratum fixing the references. Despite that, you forged ahead with this blog post (not mentioning that we were working on a correction statement). Frankly, I find it rather dishonest that you 1) proceeded with publishing your blog pointing out an error in the references (which by that time you knew we were working on an erratum, but didn’t mention this), and then 2) hold it against us and question our integrity that we “tamper” with a published manuscript. We do not tamper, we published an erratum, and we were already working on it before you published this blog, and you know that.

I am all for post-publication peer-review, but I am getting impatient if I feel that a) the discussants haven’t even read the paper (or just skimmed through it), b) there are under-the-belt attacks which have absolutely nothing to do with the substance of the paper but are solely intended to undermine the credibility of the journal, or my personal credibility. Please read the paper, gather your thoughts and reanalyze the data if you wish with different parameters. We made the data available for exactly this purpose.

Perhaps you should start disclosing your own biases and conflicts of interest here.
Thanks.

And sorry, but I have to withdraw myself from this discussion on this site. I am all for a scholarly debate, but not at this level.
If anybody has something to say about the study, please submit a letter-to-the-editor, which we are happy to publish, with or without rebuttal. Thanks for your interest in the study.

Hmmm. This response reminds me of something, but I’m not sure exactly what.

Post-publication peer review is a new frontier, and it is fascinating to watch it unfold, to watch reactions and to get a handle on how welcoming the scientific community really is to open criticism and debate.

Comments are closed.