Cenzic 232 Patent
Paid Advertising
web application security lab

CAPTCHA issues

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart“. It’s the little box of numbers that people ask you type to perform site functions (usually post or register). There are a few pretty big problems with this technology.

Understandably this is more of a business issue than anything, but accessability is becoming a huge issue. Specifically how can you protect your site from computers that cannot “see” and still meet the criteria for ADA (American Disability Act) compliance. The problem is the blind use a number of tools to hear the words on the page, but unless you were to somehow pass that information along in plain text their text based readers cannot read the image (they generally use Lynx). And besides that would sorta defeat the purpose anyway. And by the way, this is not a theoretical problem, the NFB (National Federation of the Blind) is notoriously litigeous and has recently been entering the web space: NFB vs. AOL (America Online).

So the alternative is to give a version that is useful for the blind, which is an audio version (assuming their text based reader can handle sound files). The audio version reads a series of numbers that they are to transcribe into the box on the page. There are a few problems with this. The first being, you have now made a secondary transmission source for the same access key (we’ll get back to that in a second). The second problem is some businesses would like to store that information that the user went to the audio version for security purposes, or for customization/personalization in the future. Well, hate to throw a wrench into that idea, but that now forces you to be HIPAA (Health Insurance Portability and Accountability Act) compliant (at least in the United States) because you are now storing potentially sensitive medical information about people. Now you are liable under that act if you aren’t taking huge measures to insure compliance. Lovely, huh?

Now, let’s assume somehow you deal with all of that stuff, and you have your very own CAPTCHA on your page. How secure is it really? Well, I’d suggest you take a look at PWNTCHA. The basic premise of this paper is that if a human can read it, a computer can too, with enough tweaking. Beyond that the implementations themselves are very weak. The randomness is never high enough to keep a computer at bay with enough tweeking, unless it is also strong enough to keep a person from being able to read it. Worse yet, remember our audio version? Now you have two versions to worry about. Speech recognition is getting better and better too. Now you don’t just have to worry about one CAPTCHA, you have to worry about which one happens to be the weakest, because that is the one that the attacker will attempt to break first. But who is really going to invest all that time into breaking a CAPTCHA? Well what if it were easier than that?

The next huge problem with CAPTCHA is that you have to assume the entity who is recieving it is the entity who will attempt to fill it out. Wellllll… that’s not always the case. There is a concept of MITM (man in the middle) attacks for CAPTCHA. If the attacker sets up a porn site or any site that has a high traffic volume they can use that against your site. Here’s how. Their site requests a CAPTCHA image from your site and instead of immediately trying to solve it, they replay it to one of their users, saying something like, “If you want free access to our pr0n, type in the numbers above”. The user looks at the image, types the solution into the attacker site, which then replays the solution back to your site. Poof, instant access for the robot by way of a human proxy with a malicious website acting as a MITM.

There are a lot of gimmiks out there, like kittenauth CAPTCHA etc… but they almost always suffer from flaws (kittenauth happens to suffer from a small order of possible solutions), but the human proxy malicious website MITM issue is probably the number one problem for all CAPTCHAs, and why would any company risk it with the potential ADA lawsuits involved? Back to the drawing board.

8 Responses to “CAPTCHA issues”

  1. An Awesome Guy Says:

    I think a workable solution is to provide at least a dozen characters and then prompt for something like “the first four charaters, but not the one before the last, that are blue and red, and even where appropriate”. Naturally mixing up which characters are requested. This requires *comprehension*

  2. RSnake Says:

    Comprehension, yes, and also a mastery of the english language, odd and even numbers and hopefully they aren’t color-blind (which a huge percentage of men happen to be). I’ve heard statistics of 8%-10% of males have some measurable amount of color-blindness.

    With every additional level of obfuscation you make it measurably harder for computers to detect, and measurably harder for humans as well. I agree with the premise, but the downside is pretty significant. Jeremiah Grossman put together an interesting list of what makes a CAPTCHA useful that is worth a read: http://www.webappsec.org/lists/websecurity/archive/2005-08/msg00059.html

  3. Jeroen Haan Says:

    see also my contribution on http://ha.ckers.org/blog/20060605/kittenauth-captcha/

    I would like to make some technical notes:

    MITM doesn’t seem a problem to me since my captcha or any good captcha is available for a limited period of time. On first visit information (IP, session id) is stored server side together with a date_time stamp.
    On submission the system time is checked against the stored time.

    About readability and security of captchas:
    Why don’t mix numbers with symbols in a way everybody clearly sees, including the color blind and is also usable in Asian regions.
    Don’t focus on real time generated fonts but look from an artistic point of view; Google images or do some site seeing with your camera.

    Last but not least you could combine numerous ways of checking if someone tries to abuse the form:
    - check if only one @ is present
    - check for new line feeds
    - check on length
    - convert certain characters
    - etc, etc

    Or what about storing the senders address in the body or subject and somewhere else in the server or client side filter and process these information; use a static FROM address!

    Cheers,
    Jeroen Haan
    website developer
    Netherlands, Brasil

  4. RSnake Says:

    MITM would work by IP address because the same IP address is connecting to it. Maybe I wasn’t super clear. Here’s what I meant:

    * Attacker/spammer connects to your server and requests the page and gathers the image.
    * Attacker/spammer holds on to this information for a few seconds until the next user drops by their site (high volume porn site).
    * User enters CAPTCHA to see their free porn
    * Attacker/spammer logs this information and replays the answer from their own server (same IP as when they requested the image originally).
    * Server allows entry based on legitimate IP answering legitimately.

    I think you’re missing the point on your second paragraph though, because not all people can see at all, let alone can see colors. So we have to consider alternatives as well. I’m really not sure what you are referring to by checking for @ symbols. Perhaps you can elaborate.

  5. Jeroen Haan Says:

    Did you invent this MITM yourself or does this actually happen?
    Are there any statistics on this?
    Do you have other links about MITM?

    What about this protection:
    http://www.htmlcenter.com/tutorials/tutorials.cfm/159/PHP/

    Or do you have alternatives or any clue in which direction we could search for a solution?

    Like I stated in my other post http://ha.ckers.org/blog/20060605/kittenauth-captcha/ ;
    I offer forms as an extra means for making contact, it is additional not the only means.
    People who “listen” to my page or “feel” my page can give me a call, skype, pm, sent an email, write a letter, visit me or even call someone to type the numbers.
    Trying to satisfy everybody under all circumstances will drive many developers mad.
    I always learned in my time as developer for a big USA company that e.g. 80% (or 99%, whatever…) is good since 100% doesn’t exist and keeps you from earning your income and having a nice time. And we still complied to e.g. several ISO standards.
    When I entered this company I tried 100% and didn’t reach the goals (economic output).
    However I respect people who try 100%. Those are many times the ones who do actually the inventions.

    The good old RFC specs of email prescribe you can redefine all your parameters like “TO, Subject and Message” in the final Headers.
    By checking if only one @ (supposed the supplied FROM) is present in the Headers you can prevent spam to multiple addresses.
    Of course I built in some extra checks just in case…
    So even if MITM spams me, they can only do maximum one email per time to me personally.
    When this happens too many times, I simply log their IP address and block this IP a certain period.

    Would it be possible to block IP addresses based on black lists like Spam Assassin?
    Or treat them in a different way?
    If this is a good idea to secure forms I would be happy to contribute.

    Cheers,
    Jeroen Haan

  6. RSnake Says:

    Jeroen, I didn’t invent the idea, but I am probably one of the few people it’s been used against. I’ve only seen it actually in use one time, but you have to remember, there’s really no way for me to know who has used it against me and who hasn’t. IP blacklisting only works if they pull from the same IP address many times. If they use TOR, or Anonymizer, or AOL you’re pretty much out of luck. So to answer your question, my statistics are irrelevant, because there’s no way to know for sure, other than I have seen it happen first hand (only because they were incredibly sloppy in how they did it).

    The first link talks about referrers which can be spoofed incredibly easily since you are talking about a server call (in fact they have to be created by hand anyway for this to work so I don’t see that as a practical solution). Jeremiah Grossman and I have tossed around the idea of using flash movies, or otherwise dynamic content that cannot simply be replayed as it requires information from the server that changes regularly or requires human interaction beyond a simple input. We haven’t come up with anything good is the short of it.

    To your second point, I think the precautions you have taken would certainly hold up in any court as you are making above and beyond the “reasonable” effort to make your site accessable. I’m certainly not targeting this post towards you in particular since you already know the issues, but rather to everyone else who probably have never heard of this. But you’re exactly right, the more avenues of communication they have the less likely you are to be at risk.

    About the @ thing, I haven’t seen a MITM email for spamming purposes (I hadn’t even thought about it, actually), but that too is possible. Instead of just relaying the email with headers intact, you’d just strip out the contents and replay them in your own email to the person on the other end. If they responded to you, you could strip out the contents of their email and replay it back to the user. That’s super sloppy and bound to get you caught, but it is definitely possible without acting as an email relay MITM.

  7. Jeroen Haan Says:

    If you are not talking about SPAM as the goal of MITM, what is it for?

    About Flash content…
    What is the advantage above images?
    They can show the Flash on the porn site or make a screen dump?

    If Flash could be of any use;
    Could we use some form of encryption while sending the CAPTCHA code to the Flash object?
    Or do you want to generate Flash on the fly?

    Cheers,
    Jeroen Haan

  8. RSnake Says:

    There are dozens of reasons you may want a CAPTCHA dealt with. In some cases it’s comment spam (not spam emails but actually adding content to the website linking you to drug websites). But there are lots of other reasons… some are as simple as querying for too much information that could be used in competitive research, some are simply registering too many times for websites. Some are for insuring that the user is in control of their account before a password change (CSRF). There are dozens of reasons.

    Flash offers only two advantages over images. They can change dynamically after they have already been served up, so you can interact with the user and pull in more information remotely, and they also have the ability to know things about the page they are on via JavaScript because they have access to the DOM making replay slightly more involved - though not impossible. Ultimately we gave up on this idea in the short term, but there may still be something there.