CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a commonly deployed technology used to increase the security around access controls, especially when there’s something at stake. You’ve certainly seen CAPTCHA systems before — distorted letters, letters behind grids, or letters half reversed out of circles. The goal is to make it difficult or impossible for hackers or spammers to automate machines to break into CAPTCHA sites, but to make it easy for humans to use.
In an impressive in-depth analysis of how CAPTCHA works, the automated attempts to defeat it, and the increasingly effective technique of paying human workers to overcome it, a group of researchers at the University of California, San Diego, show that economic trade-offs currently define how effective CAPTCHA systems are.
The technology works, but cheap labor markets have transformed its utility into its vulnerability.
Spammers like to set up validated email accounts, and the more of these they have, the better their returns. Hackers like to have dozens or hundreds of ways to access payment systems and other access control systems, making them porous.
As the researchers show, automated attacks on CAPTCHA systems leave the balance of power firmly in the defenders’ realm — it’s expensive to create sophisticated image manipulation and OCR programs but cheap for a CAPTCHA site to change its approach if it notices something amiss; the accuracy rate of automated recognition is about 30% at best, further reducing the ROI on what is an expensive and repeated investment in programming; and automated attacks are easier to detect. Overall, CAPTCHA has resisted automated attacks, as it was designed to do.
However, for CAPTCHA to work, humans have to be able to solve it at a rate of about 90%. Otherwise, it poses too much of a barrier. And this is what spammers and hackers are exploiting.
The researchers to a terrific job detailing how the labor market for human CAPTCHA entry has evolved over the past few years. Initially, for each 1,000 CAPTCHAs solved, a worker might make $10. Now, the going rate is closer to $1 or $2 per 1,000 CAPTCHAs solved, and in some cases, it’s as low as $0.75/1,000.
Combined with the high accuracy of these solves (75-90%), the ROI on using “human solver systems” to generate CAPTCHA solves is quite good. As a result, business has been growing — and moving to cheaper labor markets.
This downward price pressure reflects the commodity nature of CAPTCHA solving. Since solving is an unskilled activity, it can easily be sourced, via the Internet, from the most advantageous labor market—namely the one with the lowest labor cost. We see anecdotal evidence of precisely this pattern as advertisers switched from pur- suing laborers in Eastern Europe to those in Bangladesh, China, India and Vietnam.
Like any service provider market, quality providers are more expensive while commodity brokers are cheaper. Analyzing eight service providers, the researchers find that while the general trends are the same across them, the accuracy rate of some justifies their expense. Moreover, as sites institute more sophisticated CAPTCHA systems, the high-end providers (such as one called ImagedToText) become all the more necessary:
. . . the results for ImageToText are impressive. Relative to the other services, ImageToText has appre- ciable accuracy across a remarkable range of languages, including languages where none of the other services had few if any correct solutions (Dutch, Korean, Vietnamese, Greek, Arabic) and even two correct solutions of CAPTCHAs in Klingon.
The researchers conclude that while CAPTCHA is viewed by many as a technology, the way in which it depends on humans solving puzzles has made it vulnerable to a labor market solution:
. . . we have argued that CAPTCHAs, while traditionally viewed as a technological impediment to an attacker, should more properly be regarded as an economic one, as witnessed by a robust and mature CAPTCHA-solving industry which bypasses the underlying technological issue completely.
Overall, this is a fascinating paper that touches on many issues — security, access controls, economics, and technology — that are front and center in today’s publishing and services world.