Researchers battling Web bots
PITTSBURGH -- They swarm the Internet harvesting e-mail addresses and free accounts to spawn hoards of junk messages. They lurk in chat rooms waiting to sting unsuspecting surfers with gambling sites, get-rich-quick schemes and pornography.
But these automated computer programs -- known as Web robots or Web bots -- have what may be a fatal flaw: For all their ability to seem otherwise, they're not human.
So researchers at Carnegie Mellon University are designing software that can serve as an online gatekeeper. If you can't prove you're human, you won't get in.
The method involves a new breed of the Turing test used to distinguish real people from intelligent computer programs. Researchers term these tests "captchas" -- shorthand for "completely automated public Turing tests to tell computers and humans apart."
Web bots basically stumble trying to perform simple tasks that humans, even those not terribly bright, can do in their sleep.
Of course, not all Web bots are bad. Many have become crucial to the Internet. Search engines like Google build databases by sending bots to scour the World Wide Web -- and captchas aren't designed to stymie them. Their targets are, instead, the bots programmed for unsavory mischief.
These captchas won't stop bots from finding your e-mail address on the Internet or randomly trying e-mail addresses -- such as john1, john2, john3 and so on -- until it finds a working one. They also can't stop bots from trying to fool you in chat rooms.
But the tests can block bad bots from getting the free accounts used to send spam and pull chatroom shenanigans.
Here's how they work:
One from Carnegie Mellon called Gimpy selects a word from an 850-word dictionary and converts it to a mangled image of itself, warping letters and adding distracting ink spots, colors and backgrounds.
To pass the test, a user has to type in the correct word. It's simple enough for a 5-year-old, but computer programs -- even ones that can read -- have difficulty with the distortions and distractions.
A more complex version of Gimpy superimposes eight words over one another and adds similar distortions. Researchers are also working on captchas based on images and sound.
Internet portal Yahoo started using Gimpy last year to weed out bots trying to obtain free e-mail accounts. This month, Microsoft's Hotmail started using another captcha that uses random letters and numbers.
Search engine AltaVista uses one to block bots from automatically submitting sites for listings. And Internet auctioneer eBay uses one for people who wish to sign up for the PayPal payment service.
Ways around tests
The tests are not foolproof, though.
The captchas have to hold the answers, and hackers could steal them. If you know Gimpy's dictionary, the odds of guessing correctly are 1 in 1000. That's steep for humans, but little problem for a software program, which could guess over and over again in no time.
Some people have written bots to break the tests that use random letters and numbers. Ari Juels, the principal research scientist at RSA Security, a computer security company, said he designed a program that could break AltaVista's test one in every 500 tries.
"If you can try 10,000 times a second and you are successful one in 500 times, you will manage to break 20 of these a second," Juels said.
And at the University of California at Berkeley, researchers using shape-recognition programs have been able to beat Gimpys often.
Luis von Ahn, a Carnegie Mellon doctoral student who designed Gimpy, said tests could be made more difficult by using longer words or making users take multiple tests. But making them too confusing or frustrating would weed out humans as well.
Dedicated spammers could also hire humans to sign up for free accounts. In fact, von Ahn said he's aware of bots that start the process and send any captchas they encounter to humans to complete.
Although the tests have faults, they are another layer of defense, said Peter Norvig, director of search quality at Google and author of a book on Web robots.
"All they have to do is make it so expensive their opponents don't want to do it," Norvig said, adding that time is money. "If you are doing this battle you want to have a lot of tools and frustrate your opponent step-by-step."
On the Net:
The Captcha Project: www.captcha.net/