Innovation Sighting: Task Unification and CAPTCHA

The Top 10 Most Underappreciated Inventions
December 30, 2013
Systematic Innovation at the Consumer Electronics Show
January 13, 2014

Innovation Sighting: Task Unification and CAPTCHA

You’ve experienced this dozens, if not hundreds, of times. Before being allowed to enter a website, you must type words written in a bizarre, distorted script inside a box. Dr. Luis von Ahn, a professor in the Computer Science Department at Carnegie Mellon University, estimates that people decipher script like this more than 200 million times a day. He should know. He invented the system. Captcha, as it is called, protects websites by demanding that visitors take a simple test that humans can pass but computers cannot. Captcha, in fact, is an acronym for Completely Automated Public Turing Test to Tell Computers and Humans Apart. It requires website visitors to interpret the text correctly and type the right letters before they can enter the site.

You’ve experienced this dozens, if not hundreds, of times. Before being allowed to enter a website, you must type words written in a bizarre, distorted script inside a box.

Dr. Luis von Ahn, a professor in the Computer Science Department at Carnegie Mellon University, estimates that people decipher script like this more than 200 million times a day. He should know. He invented the system. Captcha, as it is called, protects websites by demanding that visitors take a simple test that humans can pass but computers cannot. Captcha, in fact, is an acronym for Completely Automated Public Turing Test to Tell Computers and Humans Apart. It requires website visitors to interpret the text correctly and type the right letters before they can enter the site.

Captcha is not without its flaws. Its words are generated randomly,and occasionally one pops up that can be easily misinterpreted. One woman trying to sign up for the Yahoo! email service was given the word WAIT. She took it literally. Only after staring at the unchanging screen for twenty minutes did she send a message to the Yahoo! help desk asking for assistance. It could have been worse: captcha sent another web user the word RESTART.

Despite these minor inconveniences, captcha has proven infinitely useful to website owners and managers who want to prevent computer generated spam or computer viruses from invading their domains.

Take Ticketmaster. It sells millions of tickets to sporting, music, and arts events. Ticket scalpers would love to get their hands on the best seats in the house for headline shows and resell them at much higher prices for hefty profits. If they could, they’d storm the Ticketmaster website and buy thousands of tickets for popular events the instant they were available. Although Ticketmaster tried to prevent abuse by limiting the number of tickets that any one customer could purchase at a time, scalpers found a way around the rules by writing computer programs capable of posing as real people, logging on to the website, and purchasing tickets. With an automated method for transacting thousands of sales a minute, scalpers were scoring big at the expense of both Ticketmaster and ordinary consumers, who ended up with less desirable seats or had to pay more for good ones.

Captcha changed all that. Only humans can interpret the distorted letters—and gain entrance to the Ticketmaster website. Yes, it takes some effort and time—about ten seconds—for you to decipher the captcha letters and type them. But Ticketmaster, as well as webmasters for hundreds of thousands of other websites, is infinitely grateful to von Ahn for his invention. Few web users begrudge the ten seconds when they learn about the benefits they reap in the form of enhanced security and fair prices on high-demand items such as concert tickets.

Few people other than industry insiders know that von Ahn has good reason to be grateful to them as well. It is an open secret in the online world that von Ahn harnesses the hundreds of millions of daily captcha test responses to achieve a goal—one arguably more useful to society than thwarting ticket scalpers: scanning and digitizing every book in the world.

Most people don’t realize it, but their captcha answers serve two purposes. In addition to proving to websites that they are not machines, users are deciphering difficult to-read words from old printed texts. When they type the words into the onscreen box, they are transforming printed content into digital form. It’s a perfect example of Task Unification, assigning a new task to an existing resource.

Digitizing old books is hard work even with today’s advanced scanning machines and powerful computers. Scanning accuracy remains poor, especially given the wide variety of fonts and poor print quality of many older publications. Von Ahn wrote a program, called reCaptcha, that feeds the words computer scanners can’t read into the captcha program, which, in turn, presents them to website visitors to crack. Major websites such as Yahoo! and Facebook use reCaptcha, and von Ahn gives the program away free to anyone who wants it.

Does it work? The results are, quite simply, astounding. Ordinary web surfers are helping to transcribe the equivalent of nearly 150,000 books a year—a job that would otherwise require 37,500 full-time workers. Among other accomplishments, reCaptcha helped digitize the complete printed archives of the New York Times dating back to 1851.

This is Task Unification at its best. Von Ahn came up with the idea after calculating how much human labor went into completing captcha tests. “I did a quick ‘backof- the-envelope’ estimate that people solve captchas about two hundred million times per day,” he explains. “So if it takes ten seconds to solve one captcha, that’s fifty thousand hours of work per day! I kept wondering what that work effort could be used for.”

Dr. von Ahn didn’t stop with reCaptcha. If he could, he says, he’d harvest more social, economic, and intellectual benefits from every moment in every life on the planet. “I want to make all of humanity more efficient by exploiting human cycles that get wasted,” says von Ahn. And as more of humanity goes online, society has the potential to take advantage of what he calls “an extremely advanced, large-scale processing unit.”

The possibilities are tremendous, says von Ahn. For example, his latest venture, Duolingo, is an effort to translate the entire web into the world’s major languages. Today words on the web are written in hundreds of languages, but more than half of it is in English. That makes the web inaccessible to most people in the world, especially in fast developing regions such as China and Russia.

Once again, von Ahn’s solution involves Task Unification. A billion people worldwide are learning a foreign language. Millions of them use a computer. If they use Duolingo, people learn a foreign language while simultaneously translating text much as captcha and reCaptcha do: by assigning the additional job of translation to people while they are performing another task. Dr. von Ahn estimates that if one million people used Duolingo to learn Spanish, the entire Wikipedia could be translated into Spanish in just eighty hours.

Von Ahn is constantly thinking about how to “task-unify” the human race. “We’re still not thinking big enough,” he says. “But if we have that many people all doing some little part, we could do something insanely huge for humanity.”