facebook like buttonThere is an online market for so many intangible goods these days that it should come as no surprise that there is a market for Facebook “Likes” — the little thumbs-up rating that accompanies so many products and services we see on a daily basis.

For $75, a marketing company will sell you 1,000 Facebook “Likes,” according to NPR’s Planet Money. Only the marketing company does not supply the “likes” but works as a broker between real individuals who are willing to sell their online preferences to your product for very small sums of money — ten cents a “like” — and those who wish to artificially inflate their prestige.

Ten cents may not seem like a lot of money, but there is a huge workforce of individuals willing to be employed to undertake low-skilled, repetitive online work for pennies a task, as evidenced by mature markets like Amazon’s Mechanical Turk. Global outsourcing has never been easier.

Competing with this world market of prestige workers — and driving down prices even lower — are legions of software robots willing to do the work of humans. According to the NPR story, Facebook is retaliating with its own army of “artificial bot hunters” that attempt to route out these robot accounts and kick them off the network.

The artificial trust market is not new and is found in environments where online trust is important, such as purchasing an antique love seat from a complete stranger on eBay, finding a reputable bed and breakfast in rural Ireland, selecting a new e-book from Amazon, or choosing an app from the Apple Store. When in doubt, our tendency is to turn to the wisdom of the crowds because we believe that these ratings are accurate evaluations generated by honest individuals and based on real experiences.

Trust — or at least consensus — works the same way in scientific publication through the accumulation of citations, only the barriers to participate in this market are much, much higher. To cast your votes, you need to publish a paper that is indexed by Thomson Reuters’ Web of Science (or alternatively, Elsevier’s Scopus). Like Facebook, Thomson Reuters does not take kindly with citation manipulation and will delist a journal when it exhibits forms of citation manipulation such as systemic self-citation or, more recently, through the formation of citation cartels.

Some new forms of alt-metrics are more difficult to detect when gaming is taking place. Measuring impact based on article downloads, for instance, requires one to trust that the publisher has properly processed proprietary log files and reported the statistics truthfully. Furthermore, as David Crotty expressed recently, downloads may be measuring something very different — popularity, rather than trust or prestige. It is not difficult to ask one’s closest colleagues, friends and family to provide support for a newly published paper in the form of laudatory or perfunctory comments left on a journal site, or to tweet (or retweet) links to one’s own paper. Those wishing to count scientific impact based on tweets should be aware that you can purchase thousands of them for very little money. According to one advertisement, $5 will purchase a tweet to 150,000+ followers. The ad does not mention whether these followers are humans or robots.

As a reputation market has spawned companies dealing in the buying and selling of Facebook “likes,” we have not yet witnessed a similar market for citations. Arjen Wals, professor at Wageningen University in the Netherlands, imagines an eBay-like market (hBay or PleaseCiteMe.com) where authors would offer perfunctory citation services in their manuscript for a fee, or at least for a reciprocal exchange. Luckily, this market doesn’t exist, yet, although I cannot preclude the existence of an informal shadow market an engineering colleague of mine is convinced exists in China.

While we often hear that transparency (or “sunlight”) is the best solution for routing out corruption in a trust market, it must be accompanied by accountability. Catching someone for attempting to game a trust market is only half of the solution. Finding an appropriate punishment that fits the deed is the more difficult part.

Trust requires both transparency and accountability. I’ll give that a thumbs-up.

Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/


22 Thoughts on "The Black Market for Facebook "Likes," and What It Means for Citations and Alt-Metrics"

The same goes for “positive reviews”. I get “target spammed” … saying “look at what we have done for your [specifically named] competitor by [providing fake positive reviews]. We can do the same for you!”

Consequently my competitor has a glorious on-line “reputation” where fake reviews overwhelm the real, often highly negative, reviews. I get a small steady flow of their ex-customers who are deeply disappointed by them, but I imagine that a much larger number go away disappointed with the industry, and some just don’t know how bad the service is they are getting.

Thus humanity swirls down the sink hole of short-term greed.

Thanks Phil. Your post only amplifies some of the concerns expressed in my initial post at transformativelearning.nl What the scholarly kitchen needs to cook-up now is a mechanism that brings back things like passion, curiosity, societal relevance and meaning in a review systems that is not pre-occupied with ranking people or universities but with improving science for society (and the planet that support us all). Arjen Wals (you spelled my last name in such a way that it won’t help me getting cited 😉 )

My apologies for the spelling error. It is now fixed. You may now receive proper credit for your thoughtful contribution to the scholarly dialog. -phil

Should there be anybody who is working on or with ‘an extended peer review system’ for scholarly work – please make yourself known as this, I think, might be a promising alternative for assessing ‘quality’ in academia. Extended in the sense of not just involving scientific peers but also peers from the community and/or those who are affected by and/or contribute to the research in one way or another. This could help bridge sicence and society and in restoring public ‘trust’ in science.

I’m glad you bring up that gaming is a problem for citation metrics, too…this is often overlooked. And we can assume that for every gamer greedy enough to attract attention, there are many more content to exploit the system under the radar.

But it’s good to spotlight the very real problem of gaming altmetrics; it’s something these new approaches will have to address. It’s particularly important for downloads, since (unlike Tweets or Mendeley bookmarks, for example), these don’t leave much of an audit trail. Usage metrics can be done successfuly (think Nielsen’s SoundScan, for instance), but it takes careful oversight. Admirably, the folks involved with PIRUS and COUNTER have been thinking about these issues.

Although altmetrics don’t mine the peer-reviewed literature, the networks they do sample tend to do their own quality control, often leveraging robust communities; I’ve written about these previously, including Wikipedia’s impressive crowdsourced immune system. Reddit general manager Erik Martin has some great stories about Redditors doing sting operations, offering to sell upvotes, only turn and expose shady marketers.

More importantly, though, the growing diversity of metric sources means it’ll be easier to triangulate and catch gamers. Given enough data, computers get very, very good at spotting inconsistencies. This sounds a bit hand-wavy, especially since our brains aren’t really good at groking the properties of big datasets. But it works, in production, now.

Algorithmic forensics can be as simple as using Benford’s Law to spot made-up numbers, or as complicated as the algorithms using purchase patterns to nab credit card thieves. Perhaps the most compelling example is Google, which daily dares an entire industry of black-hat SEOers to beat its anti-gaming algorithms–and wins, consistently. Anyone can make a webpage–or a ten thousand of ‘em–but we seldom worry about Google results being gamed.

The algorithmic forensics approach can work in academic systems, too. SSRN, a preprint repository heavily used in finance and business, catches gamers left and right using tuned algorithms. I hear PLoS is working on a similar system.

Of course, it’s going to take some time to perfect these approaches. But the upshot is that the more data we’ve got, the better we can catch cheaters. This is especially true once we move from frequentist to network-based metrics–instead of simply counting tweets, we’ll start looking at who tweeted something. Once we start uncovering and attending to tastemakers across different networks and media, it’ll become pretty hard to make gaming pay off.

Jason, thanks for your detailed reply. If I could summarize your main message, it would be that alt-metrics people understand that gaming is going on and that data-driven algorithms are finding ways to catch those who attempt to manipulate the results. This is a fair approach, but it doesn’t adequately address two important issues:

1. Evaluation requires transparency. If I’m going to use an indicator to measure the performance of academics, I’m going to have to reveal exactly how I’m going to evaluate them. And, if I’m going to accuse one of attempting to game the system, I’m going to have to reveal the method for catching them and provide sufficient evidence to make a solid accusation. Google has elaborate–but opaque–algorithms for ranking and detecting SEO manipulation and keeps these algorithms trade secrets because they won’t work if they are public knowledge. Similarly, they tweak these algorithms constantly when SEO companies are able to reverse-engineer the algorithms and learn another way to game the system. This approach, while highly-successful for a free search engine, will not work in academia where you are determining the lifetime careers of individuals. Can you imagine the response of a junior professor to this statement:

Mary, we’re sorry but we have denied your application for tenure. Our systems, which we are forbidden to tell you about, have detected a 64.89% probability that you have manipulated your article downloads. In addition, it appears that a network of pharmaceutical tweetbots have retweeted your last Cell paper more than 4,000 times. You have a year to dismantle your lab, fire your technicians and find a new job. We’ll try to work finding homes for your graduate students and postdocs.

If you are ultimately interested in the evaluation of scholars and their work, then you have to provide clear, simple and transparent guidelines on how they are to be evaluated, and detailed, overwhelming evidence if you are going to change their careers.

2. Evaluation requires accountability. You have pointed to several papers in which researchers attempt to detect rogue behavior, which is a nice first step. However, I see no mechanism to tie the detection into accountability. In the example above, if I’m going to accuse Mary of metrics manipulation (and hold her responsible for her actions), then I need to find evidence that Mary was responsible for the inflated metrics for her work, and not, say, a pharmaceutical company that had a financial interest in promoting her work.

Good points both of you. One other thing to consider here is time and effort. If we’re to rely on crowdsourcing compliance, like the Reddit and Wikipedia examples Jason mentions, who does that and where does that time and effort come from? Most researchers I know are already working with an overloaded schedule. Should more time be taken away from doing research to be spent policing one’s fellow researchers?

There’s also likely the question of paying for and applying the sorts of computational methods that are in development. I understand why Google spends a huge amount of money policing their rankings, as they depend on them to drive the profit of their business. For the research world, where is that same funding and motivation? Does this mean we’ll remain reliant on private for-profit companies like Thomson-Reuters for metrics, as they’re the types of organizations with a reason to spend in such areas?

Should more time be taken away from doing research to be spent policing one’s fellow researchers?

Nope. Researchers do this already, quite casually. What would you do if you read a clearly-plagiarized paper? I expect it’s quite similar to what you’d do if you read a blog purporting to be your colleague, that clearly wasn’t. No one’s expecting researchers to moonlight as detectives, but rather to continue the sort of community-minding they’ve always done.

Does this mean we’ll remain reliant on private for-profit companies like Thomson-Reuters for metrics, as they’re the types of organizations with a reason to spend in such areas?

An important question. I don’t have any ceteris paribus objection to scholarship depending on commercial entities; such organizations have been with us from the beginning (Oldenburg made good money with Phil Trans, though less good than he’d hoped [paywall, poetically]).

But in some cases, and with apologies to Latin, ceteris ain’t paribus. The for-profit ISI, despite important historical contributions, is increasingly unresponsive, holding scholarship back instead of bearing it forward. As long as the profits keep rolling in–and they most certainly are–there’s no reason for them to do otherwise. This may be an inevitable consequence of entrusting not just bits and pieces but the basic infrastructure of science to a for-profit.

That’s why I think the ISI of altmetrics (and the ISI for citatins, for that matter) needs to be a not-for-profit, like CrossRef or PubMed: infrastructure-providing organizations, built with the goal of improving science. For-profits can build on top of that infrastructure, but they can’t have it. We’re trying to build that with total-impact, which will be spinning off into a non-profit this year. But whether it’s us or someone else, custody over the vital signs of science is just too important to give away to for-profits.

There’s a difference though, between spotting a clearly plagiarized (or fraudulent) paper and in monitoring social media statistics for suspicious patterns of behavior. One requires no more effort than the researcher is already doing in reading the literature, the other requires an entirely new set of time-consuming efforts. There’s also the question of enforcement. If you spot a plagiarized paper, you notify the editor of the journal, and they take it from there. If you spot suspicious behavior on Twitter, who would you notify and how would the researcher’s institution go about investigating things further? How does a for-profit company like Twitter or Facebook balance demands for privacy with this level of transparency? Is the scholarly market large enough for them to bother with all the grief they’d receive for releasing user data?

I do agree that this is something you’d want to keep in the not-for-profit realm (though to be clear, not-for-profit companies must make a profit if they hope to survive). I think it was Timo Hannay who said (about data repositories) that private companies have much too short a half-life to trust with something so important. The issue won’t be getting initial funding, which agencies seem willing to offer, but more in sustaining funding. I really like the idea of CrossRef as a model, as it would create a sustainable service that doesn’t rely on grants.

I think we’re confusing two different issues a bit here.

First is the imperative of commercial social media environments to maintain an immune system against spam. Generally, the ones who are successful are (by definition) pretty good at this. They have a bunch of different techniques, but they boil down to algorithmic forensics (Google’s algorithm being the prime example), moderation (Reddit mods), and community participation (reverting Wikipedia edits, clicking the “spam” button on a tweet). Many use all three (Reddit users can downvote for users, mods can swing the banhammer, and there’s an automated filter).

This is good, because it means that most altmetrics that come into the system have already been through one round of spam and gaming detection. Sources with consistently spammy data (Connotea, I’m sorry to say I’m looking at you here) just don’t get included.

Second is ensuring the reliability of the altmetrics stream itself–a second filter. Once we start combining metric sources, then the real fun can start, as we can use multidimensional data to triangulate inconsistencies. We can also crowdsource this problem out to humans; as a reader or evaluator, if I see a rubbish articles with 10k Twitter citations, all from low-credibility users, I’m perfectly happy to mash a “ick, spam!” button. Finally, we can use these two alerting methods to flag problems for human admins to investigate and document.

I’m not saying this procedure will be perfect and catch all gamers. No metric is gaming-proof, including current ones. But I think the success of similar anti-gaming efforts in other fields demonstrates that altmetrics could be a reliable signal. If nothing else, by vastly expanding the work required for successful gaming, altmetrics could significantly alleviate the very real problem of citation gaming.

There’s no question that this is going to take money to do well–although, given the decreasing friction of sharing and processing structured information obtained through open APIs, it’ll be vastly more efficient than the ISI’s 1960s-style manual processing. CrossRef offers one proven sustainability model; another is a fee-for-service approach (based on number of articles tracked, for instance). We’re certainly looking into both for total-impact.

Thanks Jason. I’m still a bit confused and this may be too many questions to get into here, but:

If we’re talking about using common social channels like Twitter or Facebook, rather than specialized, community-owned channels created for this purpose, then there are no moderators, no community reverted edits. Does the lack of oversight from some services mean they should be disqualified from counting?

Where would this “ick, spam” button be located? It’s not a part of Twitter or Facebook, so I assume you’d need buy-in from the publishers to include that on the papers themselves where the overall metrics are displayed. How you would get from a reader clicking on that “ick spam” button to a formal investigation by an author’s institution? Is there a threshold level of flags that have to be raised? Where is that line drawn? What’s to prevent the anonymous user from falsely making those accusations as a means of slowing down a competitor?

Once the threshold has been passed, how does an institution investigate activity that has happened on a privately owned social media network? Why should Twitter or Facebook cooperate with a university?

I’d also argue that it’s not imperative for a social media platform to have a foolproof immune system against spam. Many are building their business models on traffic, so it’s in their best interests to do anything that increases traffic numbers and exposure of the ads they’re selling. This has to be balanced with delivering actual customers to the advertisers. But I suspect that for many, they toe as close to this line as possible, favoring traffic (and hence profit) over strict vigilance.

David, this thread is getting a bit long, so I’ll give you the last word after this comment if you want it. I’ll attempt to answer your questions first, though:

Does the lack of oversight from some services mean they should be disqualified from counting?

Nope, we can just think of external services spam-prevention measures as a prefilter to the more robust central filters that a central altmetrics service would eventually provide, using multiple input streams to triangulate suspicious activity.

Where would this “ick, spam” button be located? It’s not a part of Twitter or Facebook, so I assume you’d need buy-in from the publishers to include that on the papers themselves where the overall metrics are displayed.

Yep, if one went this direction, a spam button would be embedded along with the metrics. Although I suppose it wouldn’t have to be on every article–just enough articles to train the algorithms better (as in the case of the Reddit system I linked to earlier, or with gMail’s spam detection).

How you would get from a reader clicking on that “ick spam” button to a formal investigation by an author’s institution?

I’m not sure, but I suspect the course from cheating to “formal investigation” would be long, expensive, acrimonious, and ugly, if the way such things work today is any indication. On the other hand, the course from “identified inflated counts” to “adjusted, accurate counts” could be pretty quick, as it is with Google and other reputation systems.

When you get down to cases, “consistency and trustworthiness” go a long way. If a central altmetrics service can, over time, demonstrate that its numbers reflect qualities decision-makers and readers value, few will worry about implementation details (although it remains imperative they be available for those that do).

Is there a threshold level of flags that have to be raised? Where is that line drawn?

Yes, and I don’t know. It’d need to be determined empirically, once we’ve got more data. Also, “threshold level” is perhaps misleading; it wouldn’t be a simple count of people who’d flagged things, but rather use stuff like vector sums, boolean classifiers and other machine-learning approaches, combining spam flags with other features of the data–just like email spam filters do. Keep in mind that the chief goal her is not to catch spammers (that’s a side-effect), but rather to ensure quality data.

What’s to prevent the anonymous user from falsely making those accusations as a means of slowing down a competitor?

Interesting question. On the technical level, of course we can do things like having two levels of spam flagging, one of logged-in users, and the other anonymous; the logged-in one counts for more, but also allows identification of cheaters. We’d of course subject the spam button to the same filtering as other metrics; multiple clicks from the same IP, for instance, would raise a meta-spam flag.

But the deeper question is, what keeps scientists from engaging in this behavior now? It’s not hard to send funders or administrators an anonymous letter “revealing” a competitor has falsified data, or sexually harassed postdocs, or some other ghastly accusation. Proven or not, it’d certainly be a distraction.

I suspect the reason is that while such fraudulent attempts to deep-six the competition do in fact happen–and would continue, albeit at lower stakes, using “ick, spam” buttons–the majority of scientists would prefer to spend their time, well, doing and communicating science. In any case, it’s an interesting question, and certainly one that’d have to be dealt with in a successful system.

Once the threshold has been passed, how does an institution investigate activity that has happened on a privately owned social media network? Why should Twitter or Facebook cooperate with a university?

I’m not sure why they’d have to? For Twitter, and most other altmetrics data sources, the requisite evidence is all right there in the open. Give me twenty minutes and a Python terminal, and you’ve got everything you need.

But I suspect that for many, they toe as close to this line as possible, favoring traffic (and hence profit) over strict vigilance.

This seems like an odd conjecture to me, given that what evidence we have indicates that social networks are in fact pretty aggressive in combating spam. I can’t understand what Mendeley, for instance, could possibly stand to gain from additional spambot traffic. Bots don’t buy anything, they eat up bandwidth, and–much more importantly–the degrade the experience of actual users.

But assuming this is true, I don’t think it’s too much of a problem; a central Altmetrics Machine still needs to do its own filtering after combining all the data sources.

1. Evaluation requires transparency: couldn’t agree with you more here. Arguably the single biggest affliction of the current evaluation system is lack of this transparency: we rely on secret peer-reviews written by anonymous authors for journals whose ranking are decided in secret, back-room negotiations. If someone suggested this system today, folks would quite rightly assume it’d be heavily gamed, too.

But this is a bit of a cop-out. We all know the current system’s broke…that doesn’t make altmetrics any better. And if it entirely relies on secretive algorithms like Google does, it won’t be. That’s why I see the algorithmic forensics approach as just a first line of defense, a high-recall, low-precision burgler alarm. Once you know where to look, fake Twitter or Mendeley accounts, for example, are pretty easy to spot (hm, all 200 of you signed up last week, did you?). As with other high-stakes automation, you don’t rest on the (possibly secret) algorithm; there’s always a human in the loop.

2. Evaluation requires accountability: Agreed, but let’s not get ahead of ourselves. No one is giving (or rejecting) Mary’s tenure based on solely her Mendeley bookmarks or GitHub watchers for a while. The data we have just isn’t robust enough for that yet. For now, it’s just another signal that evaluators can use to get a fuller picture of Mary’s impact–the impact of her published software and datasets, along with her papers; her impact on clinicians or general public, not just fellow scholars; and her impact now, not in two years when the citations finally surface.

But could the data be robust enough to support high-stakes decision making in the future? Certainly, once we move beyond facile counting (which, as you point out, is trivially manipulatable). Reputation–and science is fundamentally a reputation system–requires identity. A tweet from my mom doesn’t mean the same thing as a tweet from a Nobel laureate; accounting for this is the genius of PageRank.

The great thing about these altmetrics sources is that they come with identity baked right in. Often, it’s pseudonymous, but that’s plenty. I don’t care who’s behind a given Twitter profile–if it consistently surfaces little-known articles that go on to become highly used, I’m paying attention to it. Other times it’s full names, which is even better. When a field’s top name tells me a paper is good, I don’t care whether the medium is peer review, including it in her Mendeley library, or blogging it: I listen.

Once altmetrics moves to a more network-aware model, pharmaceuticals or whoever can astroturf all they want…we’re only listening to sources with proven value.

Thanks Phil for your thoughtful post.

Measures of performance are neither valid or invalid. It is the inferences or actions that are base upon them that have some level of validity. Complex issues like the quality of a journal or the scholarly achievement of a researcher cannot adequately be assessed by a few simple metrics.

We often give measures like an impact factor far more weight then they deserve simply out of convenience and that is the problem. It is not that an impact factor is a bad measure, it is that it is often over-interpreted and given far too much weight. Furthermore, the higher the stakes placed on a measure, the more pressure there is to “game” it. It doesn’t make it right to do so, but we shouldn’t be surprised when it happens and just about any metric can be gamed.

Comments are closed.