Cenzic 232 Patent
Paid Advertising
web application security lab

Good Articles on CAPTCHAs

Mark Burnett has a few good articles on my single favorite love-to-hate security measure, the CAPTCHA. Check the articles out here and here. They do a good job at explaining some of the high level problems with CAPTCHAs but don’t be fooled, this is only the tip of the iceburg as I’m sure Matt would agree. If you look on sla.ckers there is post after agonizing post where people are building and then breaking CAPTCHAs.

Jeremiah had a good post on this a year ago describing what makes an effective CAPTCHA. I’d like to go one further. I have actually never seen said mythical beast. I’m not even sure it can be done with the technology we have at our disposal. What I’m getting at is this. People have deficiencies and those deficiencies must be dealt with for them to be able to solve a puzzle. Some deficiencies are pretty dibilitating and include blindness. Okay, so we have audio CAPTCHAs to augment that issue. Then we have colorblind people. They too can use the audio CAPTCHAs.

Then we have things like pwntcha, pron proxies and a whole host of other ways to “break” CAPTCHAs in a way that they were not intended. Bummer. It’s getting to the point, where I cannot even fathom what a good CAPTCHA would look like. Everything is either far too hard for people to solve, or far too easy for computers to solve. The stuff that’s in the middle is usually bad for both. I’m up for an experiment. Can anyone point to a good example of a CAPTCHA anywhere on the Internet - one that meets all the rules outlined by Jeremiah’s post?

21 Responses to “Good Articles on CAPTCHAs”

  1. MikeA Says:

    Sorry, no I can’t point to a good captcha either - I too believe that they are mythical beasts :)

    As Dan Kaminsky showed at DefCon, audio captchas are in some ways self-similar, so that provides a *weaker* path than the visual captchas. Because most captchas are ones where the user has to repeat “parrot fashion” something (text/audio), computers will eventually catch up.

    What we really should be working towards is one of two approaches.

    The first one is to move away from the “copy this” approach to more of a “solve this” approach - humans are much better at solving simple problems (e.g. “My name is Mike - how many letters in my name”, “the color of the sky is…”). There is a cultural problem (people in China will have differrent names, places, etc, so we should be able to tune for cultural/geographic tendencies (and if someone who is trying to sign up for say a US webmail but insists on solving Chineese captchas, that may be a hint).

    The other approach is to start working towards captchas that a computer can solve (and humans find difficult) but will take some time. I dont mind waiting for some delay (how much is up for debate) for my webmail signup to be “verified”, but the attack on that technique could be a massivly parrallel system.

  2. kuza55 Says:

    @RSnake:
    I’m surprised reddit doesn’t implement things better, but then again they’re developers not hackers, so I probably shouldn’t be surprised. Now I don’t know how much spam reddit gets, but it clearly goes to show that the effort of looking at the reddit code and figuring this problem out is higher than the reward gained - or they don’t need to automate their spam, and the human cost of simply filling in the captcha is fairly low.

    Also, the articles failed to touch on the most common error I find - most websites do not invalidate the captcha when it has been used, and so if no requests are made to either the image or the page displaying the image (depending on what sets the captcha value in the session), the same value can be reused indefinitely.

    @MikeA:
    I disagree that the “solve this” type captchas are the way to go. The reason I say this is because while users are better at thinking lateraly and problem solving than computers (though this is of course arguable), its not extensible. There are only so many different problems we can present, and if the CAPTCHA became popular, then it would take very little effort to simply solve all of them manually and then put the answers into the bot, and have it simply regurgitate, or if the values in the questions changed, simply program the method of answering the question into the bot.

    It just doesn’t stand up to targeted attack.

    Honestly though, unless the CAPTCHA is fairly weak the pr0n attack is much more of an issue, since it takes rather few resources (go hack some accounts to porn sites ,or find them on some forums; they’re easy to come by, and put up some sites, and do some SEO), and is easily repeatable.

    We need some kind of solution which stops this. But needing isn’t getting.

  3. TarraDog52 Says:

    Can’t point you to a good CAPTCHA, but one of the worst one’s would have to be in the comments section of Marks articles

  4. Ned Batchelder Says:

    I think CAPTCHAs are a losing battle. As they become more difficult, they become more of an annoyance. And they fall into the same bucket as DRM: “Prove to me I should trust you!” It sets up a bad precedent with new users right off the bat.

    I think a better approach is to add features to forms that make them difficult for computers to fill out properly. These can be completely invisible to people, leaving them with a friction-free user experience. I detail one method at http://www.nedbatchelder.com/text/stopbots.html . I use it on my blog, and I have had zero spam comments in the year it has been in use.

  5. MikeA Says:

    @kuza55:

    I think there are *many* more ways of posing such questions, at least more than the “type these 5 (uppercase) characters” captchas which have a finite set. It certainly wont stop the pr0n attack, but would make automated parsing of CAPTCHA’s much more difficult as it would have to understand where the question was in the challenge as well as answer it.

    For an (admintedly crappy) example…

    My name is Mike, how many letters in my name
    The number of characters in Mike is
    I have to press how many keys to enter Mike
    Mike needs how many letters to enter his name
    etc, etc.

    Sure, you can pre-compute the values and put them in a bot, but isnt this the same flaw as existing captcha’s? Difference being is that I think you have a much bigger challenge/response set. Even bigger if you throw in other bits to the challenge (e.g. Mike is friends with Nick. How many letters in each of their names) and dead ends (Nick is Mike’s friend. What color is is the grass they are playing on). One of the downsides I see is the increase storage, but disk space is cheep.

    I’m not saying that this is perfect - It certainly needs more research to see if it would work, and some non-computer experts (psychologists for example) to know which questions a base-line set of people can answer and ways of “mixing it up”, but I think it’s just one way of moving the bar on the problem (lets say that we can implement it correctly - I’ve not lots all faith in that just yet!), although I stand to be corrected.

    In saying this though, CAPTCHA’s are just a prevention technique, and are always going to be broken. Prevention followed by detection (just assume that so many attempts are simply going to get through which you then have to find and remove) is a more realistic, real-world, way to go.

  6. HYPERFUKBOT Says:

    how about javascript-based client-side generated captchas using the same dynamic obfuscation methods that malware authors use to hide their exploits?

  7. kuza55 Says:

    @MikeA:

    I still contend that parsing (understanding and answering) text is much easier to do than OCR.

    And that the number of possibilities is still much higher in simple CAPTCHAs. Can you come up with hundreds of thousands of different questions?

    @Ned Batchelder:
    CAPTCHAs are not only meant to stop spam. They’re also used to make sure transactions are being performed by people, such as signing up for accounts, and other things. And the ability to defeat the CAPTCHA on, e.g. GMail or Hotmail, or eBay, or PayPal or whatever is much more valuable than being able to spam than your blog. Hell, I’m sure that even the simplest CAPTCHA which does no obfuscation is going to stop most spam.

    And I’ve also seen SPAM being done by hand…..its weird, but I remember we had problems with a _person_ spamming the boards with ads.

    @HYPERFUKBOT:
    Have a look at the various CAPTCHAs proposed in the CAPTCHAs forum section, e.g. http://sla.ckers.org/forum/read.php?7,10330 Essentially, we can easily just utilise the browser’s rendering engine and JS capabilities to write bots, and then interact the DOM.

  8. Udi Says:

    I think this one is actually quite effective :)

    http://www.hotcaptcha.com/

  9. Gareth Heyes Says:

    Cheers for the links RSnake! I’ve still not given up you know lol

  10. Paul Prescod Says:

    @Ned:

    Your techniques work well because you are one of very few people using them and your blog is not worth individually hacking. Now try to use the same technique to protect Facebook or Myspace from specifically-written bots. I don’t see how it can work.

  11. Ned Batchelder Says:

    @kuza55:
    You are right that CAPTCHAs are used for more than protecting blog comments, but all of their applications are designed to do one thing: separate people from robots. My proposal also separates people from robots, but with a different technique that doesn’t burden the people.

  12. istari Says:

    OK, so here’s a question I’ve been meaning to ask for some time now. How fast can a pr0n proxy be? Does anybody have hands-on experience with one, and knows how much time it takes to complete the whole process of solving one CAPTCHA?

    For instance, does it take less than 30 seconds to download a CAPTCHA, show it to one of the proxy’s users, have him enter the text, and then upload the result to the bot that will eventually use the CAPTCHA?

    Because if it takes more than that (and I’m guessing it does, at least for low-traffic proxies), one could make a form with a CAPTCHA that expires really quickly, say 15 to 30 seconds, and then give the proxies a hard time solving the CAPTCHA in that timeframe…

    Of course, nobody fills out a form in less than 15 seconds, so one would have to modify the submit button in order to request and show the CAPTCHA (probably using javascript) only when the user is ready to send the form…

    Just a thought though, and I’m sure this won’t help with high-traffic proxies…

  13. kuza55 Says:

    @Ned Batchelder:
    Its also fallible. Bots these days are based on browser rendering engines that have been ripped out of browsers. Your approach stops only the most naive bots. sophisticate bots can parse CSS and run JS, and can find out what you did to all the elements inside a form. And from there they can determine which form fields they should be filling in - they could even correlate what value to put in each field, by comparing the location of the text for Name” with all the fields, the text for “Email” with all the fields, etc.

    You yourself stated that it will not stand up to sophisticated attack, or more or less implied that. The fact that bots are dumb doesn’t mean a targeted attack will be, bots are dumb because their point is to hit as many people for the least effort. And as I see it, if you want to protect more than just blog comments, you’re going to have to have a more resistant solution.

  14. Jason Macpherson Says:

    A proof-of-work system may be the answer in some situations. The idea is to increase the cost of spamming so that spam bots become less effective. See hashcash.org.

    Proof-of-work system have drawbacks (pron-proxies, skipping mp3, etc), but I would argue they are still a better solution than CAPTCHAs. The average user may require a minute or so to manually fill out a web form. So why not put the CPU to use during that time?

    I don’t think we’ll see wide spread usage a proof-of-work system in email any time soon. However, HashCash may be part of the solution of web forms. There is already a HashCash Wordpress plug-in as well.

  15. Dean Brettle Says:

    Of Jeremiah’s requirements, the one that simply can’t be met in the anonymous world where CAPTCHAs are used is:

    4) Test should only be solvable by the human to which it was presented.

    The attacker can always pay someone else to solve the CAPTCHA. So, when a website receives the solution to a CAPTCHA, it isn’t proof that the sender is human. It is (at most) proof that the sender could afford to have the CAPTCHA solved.

    Given that, all we can really do in the long run is increase the cost. But that doesn’t require using a CAPTCHA at all. It currently costs about 2 cents to have a human solve a CAPTCHA. Why not just require that users spend 2 cents instead of dealing with CAPTCHAs? This could be 2 cents worth of CPU time via something like HashCash, or a real micropayment of 2 cents.

    Users that don’t have 2 cents and can’t wait for 2 cents worth of CPU time could earn the money over at the Mechanical Turk:

    http://www.mturk.com/mturk/welcome

    And spend it through Amazon’s Flexible Payment Service:

    http://www.amazon.com/b?ie=UTF8&node=342430011

  16. efex Says:

    Have any of you looked at http://research.microsoft.com/sn/asirra/ Came across it looking at more captcha stuff. It’s certainly less annoying the the usual letters game.

  17. drear Says:

    @Mike: I agree with you here. I was playing the other day with py and after few drafts I came to the conclusion that problem solving would be best solution. Consider a database of 500 questions and each time a random one would be picked. Each of these would be easy questions like “If ‘abcd’ would be ‘dcba’, what would
    ‘%s%s%s%s’ be?”. At least half of these would be simple math questions in which every number would be picked randomly. What is %d %s %d %s %d? How about %d %s %d?
    Replace %d’s with random integers and %s’s with random strings like ‘plus’, ‘minus’
    and ‘times’.

    I didn’t care to put this kind of stuff online, but I am pretty sure this would prevent most of the blog spam and related annoyances. Nevertheless, starting to code this quickly revealed that it was nothing more than waste of time IMHO.

    @kuza55: you couldn’t have said it better. When it comes to things like PayPal and
    eBay, the game already over if you rely solely on CAPTCHAs. I agree with you here
    in that this broader picture essentially renders the whole concept obsolte. But on
    the other hand, if you request people to put their credit card numbers in some form,
    you might as well request some small problem solving from them.

    All in all, coding something that breaks these annoyances would be more fun. During
    a rainy day.

  18. RSnake Says:

    @efex - I wish I could tell you I love this CAPTCHA. Honestly, I think what they are doing is very noble, and I hope people do use that CAPTCHA just so more pets find good homes. That’s the good. The bad is pretty bad. It requires JavaScript, throws JS errors, has a relatively small and readily downloadable database of solutions, and actually links to the pages that describe the animals in question. Also, given the few examples I’ve seen about half of them are right. So even though it’s 3MM photos, it’s a relatively small keyspace if you were to simply guess. That also leaves image composition analysis, which might take you quite a ways towards being able to tell animals apart. This isn’t the weakest CAPTCHA I’ve seen, but it’s certainly not very strong.

  19. Ronald Says:

    Well, in the end they are trying to invent a perpetual motion device, a “water screw” it seems a cute idea everytime, but it just doesn’t work. Look up history and save yourself the time. I kinda believe CAPTCHA’s are dead they just don’t work. Who is with me? ;)

  20. MustLive Says:

    Nice articles about captchas.

    But the most interesting that captcha at Mark’s site is vulnerable ;-) - it can be bypassed with my MustLive CAPTCHA bypass method. I’ll contact him about that soon.

    This captcha plugin (which Mark use at his site) will be in my Month of Bugs in Captchas. The official announcement of my new project will be very soon.

  21. Niyaz PK Says:

    Online Captchas are never secure. They were secure only 10 years ago.