Paid Advertising
web application security lab

Archive for the 'CAPTCHA' Category

Target Sued By The Blind

Thursday, October 26th, 2006

Once again, the blind are at it - wanting equality and accessibility. Those pesky blind people! No but seriously, this is really pretty important and although I am pretty anti-litigious I think the National Federation of the Blind is making a statement by suing Target. Yes I know I’ve mentioned this before, but I started thinking about this some more in the wake of this recent MSNBC article. Blind people cannot use the Internet in the same way people with vision can. They cannot “see” the page layout. One thing I haven’t talked much about is semantic relationships in HTML. It’s a very simple concept that eludes most people who claim to know HTML (at least they put it on their resume).

One of the major problems I see with the way HTML is constructed is tables. Tables are one of the most useful constructs in HTML. You put things in columns and rows, and it makes sense. The problem is that it’s not accessible. The way tables are constructed you read down the column instead of across the row. It’s easier to dump the contents of a select statement in SQL than put it into a multi dimensional array and output one row element at a time in order. Thus it is no longer semantically correct.

Let’s say I have a simple table that has this sort of data in it:

Name Age Sex
Alice 32 Female
Bob 53 Male
Cathy 38 Female

A person who is blind heard that as follows: “Name Alice Bob Cathy Age 32 53 38 Sex Female Male Female.” That’s not terrible with such a small list but when the table grows to many columns with many rows in it, it’s nearly impossible for the person to understand which person you are now talking about. If the table were re-constructed to be in semantic order it would make more sense, “Name Age Sex Alice 32 Female Bob 53 Male Cathy 38 Female.” I understand CSS has come to the rescue but with completely different look and feels and bugs depending on what browser you are using. My question is, why haven’t we invented a new table structure in HTML that is semantically correct? It’s not radical thinking, it’s a simple solution to giving accessibility and still allowing an easy standard way to display data in HTML.

Anyway, sorry, that was probably a tangent. The real reason I’m writing this post is to drive home the fact that the CAPTCHAs people have been using on their enterprise websites are going to get them sued unless they have an alternative. We’ve talked about this before, and I’ve been given the impression that people just aren’t sensitive to this issue by the very same people who built those CAPTCHAs. I wonder what it will take for people to realize it’s just not a good idea from a security perspective (porn proxies completely circumvent the value since you can trick people in any context to type in those CAPTCHAs for you) and from a legal perspective. Hell, it doesn’t even have to be a porn site that relays the CAPTCHAs to unsuspecting users, it could be a blog… a web application security blog. Hmmm…

CAPTCHA Curiosity

Wednesday, September 6th, 2006

Tim Tucker posted an interesting solution to some of the CAPTCHA solving stuff going around. He posted that to comment on his blog you must enter any data, as long as it’s incorrect. So as long as you don’t type in whatever you see and it is six characters long, it will be solved.

As the posted noted, this isn’t particularly good security, as a) it can be broken by anyone who views the site and knows that rule (therefore it’s not good against targeted attacks), and b) if it ever gains popularity it will become standard in splogging software. Still, it’s an interesting take on the same old problem of blog spam.

Target sued over inaccessability

Monday, July 17th, 2006

I know I’ve been batting around a lot of the accessability issues with CAPTCHAs and turing tests in general when used to discriminate against robotic activity and how that relates to the blind.  Sometimes what I talk about is theoretical and sometimes it’s not.  Here’s a case in point.  Target had a class action suit filed against it by the National Federation of the Blind (NFB).

My general feeling on this matter is it is always better to allow alternate applications to visual CAPTCHAs.  Audio versions are one example.  Email is another example.  Out of bound methodologies can provide reliable alternatives to visual CAPTCHAs and can still allow users to proceed through the flows in question.

Hot CAPTCHA

Monday, July 17th, 2006

Jeremiah Grossman sent me this link today, that I thought was simply hilarious more than anything.  It is based off the Cutest KittenAuth CAPTCHA or the HumanAuth CAPTCHA but instead it uses a series of pretty women - it’s called Hot CAPTCHA.  Hot CAPTCHA uses the same basic turing test that the others do, except that it uses a hyper subjective series of photos.

Do I recommend using hot captcha?  Well if you are trying to protect a porn site and blind users are basically not going to have fun with your site either way, sure.  Other than that, I’m not sure the corporate world is going to be embracing this one any time in the near future.  It would be hard to tell my boss “Yah, sorry, I’m just trying to post to MSNBC - no really!”

I find myself feeling bad for some of the guys I know who clearly cannot pick out a cute girl if she were sitting on his face. They will be deemed a robot and never allowed to see the porn for which they seeked - which may have cured them of their lack-of-hot-girl-insight.  Anyway, worth a laugh.

KittenAuth CAPTCHA

Monday, June 5th, 2006

I was thinking about the KittenAuth CAPTCHA since I messed with it over the weekend a little. As I said, the number one issue with that particular system is the low order of possible solutions. It’s not about finding the right kittens, necessarily, but it’s also about the probability of getting the right answer. If you just guess an answer, the probability is 3 over 9 times 2 over 8 times 1 over 7 = 6:504 odds (given 3 correct values in a set of 9), compared to a normal CAPTCHA of say 6 numbers would be 1:999999 (just a tad worse odds there). The other problem with it is that it has such a small set of photos.

I did some cursory research on Yahoo images and Google images, and I found that Yahoo had a far superior data set of actual kitten images than Google, although Google reported half the data set of images it was also less accurate in what it was finding for the first 20-30 pages. If you were to use Yahoo’s data set there would be very little pruning needed, where if you used Google’s image search you’d be removing things like a band named “Atomic Kitten” and some Melborne based transvestite named “Kitten”. The point being, even if you could gather such a set, and prune it, then you’d still be at the mercy of a robot who could accurately gather all of the images off the internet with the name “Kitten” and get such a large data set to compare against that it would be broken again. But that’s leaving out Bayesean heuristics.

There’s a company called MessageLabs that uses something beyond pixel by pixel comparison and even beyond pixel color densities to determine if something is porn (those are the most common method of content filtering and also very flawed). MessageLabs also verify what is in the photo. For instance they can tell what a hand is, or what a car is or what a sky is, so they are less likely to see something like a flesh colored door or a baby picture or something more grey area like a swimsuit photo at the beach as porn. Using something like this against KittenAuth could prove to completely break their system - as if it weren’t already broken enough the way it is today.

CAPTCHA issues

Sunday, June 4th, 2006

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart“. It’s the little box of numbers that people ask you type to perform site functions (usually post or register). There are a few pretty big problems with this technology.

Understandably this is more of a business issue than anything, but accessability is becoming a huge issue. Specifically how can you protect your site from computers that cannot “see” and still meet the criteria for ADA (American Disability Act) compliance. The problem is the blind use a number of tools to hear the words on the page, but unless you were to somehow pass that information along in plain text their text based readers cannot read the image (they generally use Lynx). And besides that would sorta defeat the purpose anyway. And by the way, this is not a theoretical problem, the NFB (National Federation of the Blind) is notoriously litigeous and has recently been entering the web space: NFB vs. AOL (America Online).

So the alternative is to give a version that is useful for the blind, which is an audio version (assuming their text based reader can handle sound files). The audio version reads a series of numbers that they are to transcribe into the box on the page. There are a few problems with this. The first being, you have now made a secondary transmission source for the same access key (we’ll get back to that in a second). The second problem is some businesses would like to store that information that the user went to the audio version for security purposes, or for customization/personalization in the future. Well, hate to throw a wrench into that idea, but that now forces you to be HIPAA (Health Insurance Portability and Accountability Act) compliant (at least in the United States) because you are now storing potentially sensitive medical information about people. Now you are liable under that act if you aren’t taking huge measures to insure compliance. Lovely, huh?

Now, let’s assume somehow you deal with all of that stuff, and you have your very own CAPTCHA on your page. How secure is it really? Well, I’d suggest you take a look at PWNTCHA. The basic premise of this paper is that if a human can read it, a computer can too, with enough tweaking. Beyond that the implementations themselves are very weak. The randomness is never high enough to keep a computer at bay with enough tweeking, unless it is also strong enough to keep a person from being able to read it. Worse yet, remember our audio version? Now you have two versions to worry about. Speech recognition is getting better and better too. Now you don’t just have to worry about one CAPTCHA, you have to worry about which one happens to be the weakest, because that is the one that the attacker will attempt to break first. But who is really going to invest all that time into breaking a CAPTCHA? Well what if it were easier than that?

The next huge problem with CAPTCHA is that you have to assume the entity who is recieving it is the entity who will attempt to fill it out. Wellllll… that’s not always the case. There is a concept of MITM (man in the middle) attacks for CAPTCHA. If the attacker sets up a porn site or any site that has a high traffic volume they can use that against your site. Here’s how. Their site requests a CAPTCHA image from your site and instead of immediately trying to solve it, they replay it to one of their users, saying something like, “If you want free access to our pr0n, type in the numbers above”. The user looks at the image, types the solution into the attacker site, which then replays the solution back to your site. Poof, instant access for the robot by way of a human proxy with a malicious website acting as a MITM.

There are a lot of gimmiks out there, like kittenauth CAPTCHA etc… but they almost always suffer from flaws (kittenauth happens to suffer from a small order of possible solutions), but the human proxy malicious website MITM issue is probably the number one problem for all CAPTCHAs, and why would any company risk it with the potential ADA lawsuits involved? Back to the drawing board.