Cenzic 232 Patent
Paid Advertising
web application security lab

Archive for the 'CAPTCHA' Category

Solving CAPTCHAs for Cash

Friday, April 27th, 2007

I had a really interesting conversation with a guy out of Romania this morning regarding a team of CAPTCHA solvers that he has set up. The basic premise is that he has 5 guys set up to solve CAPTCHAs like Yahoo, MySpace, and Hotmail. He does this for clients all over the world. The economics are probably the most interesting part, since his team is non-technical and only types in what they are given by their clients.

The economics are as follows: 300-500 CAPTCHAs per person per hour. The clients pay between $9-15 per 1000 CAPTCHAs solved. The team works around 12 hours a day per person. That means they can solve somewhere around 4800 CAPTCHAs per day per person, and depending on how hard the CAPTCHAs are that can run you around $50 per day per person (his estimate). The reason it’s not higher is because they take breaks, and fail sometimes.

He also admits it takes some time to ramp them up on new CAPTCHAs. Eventually they get faster at solving them. So for $50 a day, you can get your own human CAPTCHA solver. The ages of the solvers range from 18-23 years old. Pretty interesting stuff - what a crappy job!

Vidoop

Wednesday, April 18th, 2007

In an interesting email that was sent to me I was asked to take a peek at a new software tool, not yet released to the public called Vidoop (there is an interesting article on it here). While I was unable to actually take a look at the software, I’ve got a pretty good idea of how it works from the Wired article. After downloading a software certificate that allows you to use their software basically you say, “I like animals” and it shows you pictures of horses and cats and dogs all mixed in with a bunch of non-animal photos. You choose the the correct photos (a la kittenauth CAPTCHA) and you are granted access.

So here are the major problems with this that I see. Firstly, it’s probably not accessible (meaning there aren’t alt tags on the images) because if there were it would take only a few guesses to get in since the computer could build databases of “like” things. So basically, like in kittenauth, the blind are screwed (which we have talked about a dozen times and I really don’t want to start another conversation on it, I’m just sayin’). Secondly, it’s non-portable because you have to have the software installed on the computer you want to use. That means you can only use it from one computer (forget going over to a friend’s house and logging in) and if that one computer gets hosed you need to find an alternate path for getting the software installed (which is often the least secure part of these systems). This type of design is a lot less portable than tokens and for a consumer tokens are nearly unusable too.

Also something that makes me uncomfortable from a security perspective is the concept of single sign-in. I’ve always thought single sign-on was a great usability improvement but often terrible from a security perspective. Like the old motivational adage - you’re only as strong as your weakest link - the same is often true with single sign-on. You are often at the mercy of the weakest security model. If any one site is insecure you can (in many of the cases of single sign-on that I have seen) end up compromising all the other trusted sites. Perhaps Vidoop has a great way to solve that issue that revolutionizes the way authentication works and never opens itself up for attack under any scenario. Without looking at it, there’s no way for me to know.

Lastly, because Vidoop uses a relatively small set of photos to choose from, there are only a few general choices from which to brute force (otherwise you’d run into overlap and false positives). If I know the target is a male, chances are they aren’t going to pick the fuzzy animals. If I know the target is a 13 year old girl, chances are they aren’t going to pick photos of computers or sports cars and so on. Anyway, you see the problems with this, Unlike passwords, which are user specific (and still guessable), this is highly un-arbitrary. Does it stop phishing, keystroke logging, cure cancer or any other magical things? I can’t say without looking at it. Will I be using it for large scale mission critical secure production installs? Doubtful.

Internet Explorer Accepts Style Attributes on Closing HTML Tags

Monday, March 19th, 2007

There’s a really interesting thread on sla.ckers.org talking about bypassing some fairly rigid anti-XSS vectors that allow nothing that looks like HTML. Specifically it doesn’t allow <[A-Za-z] which does limit the vectors pretty substantially. In the process of working through the attack vector Hong mentioned that an attack could surface inside of an end HTML tag. Here’s the example code:

</a style="xx:expression(alert('xss'))">

It gets around the filter because there is no letter immediately following the open angle bracket, it is a forward slash. I’m not exactly sure why any end attribute should be allowed to have style information associated with it, since that doesn’t really make sense contextually. This doesn’t appear to work in Firefox or Opera, but it does work in Internet Explorer, which makes up a vast majority of the browsing community. I wanted to wait until the exploit actually worked before posting it, as it was a very interesting way to bypass filters that probably wouldn’t have worked in any other way (with the possible exception of injecting nulls). Nice find, Hong!

Target Sued By The Blind

Thursday, October 26th, 2006

Once again, the blind are at it - wanting equality and accessibility. Those pesky blind people! No but seriously, this is really pretty important and although I am pretty anti-litigious I think the National Federation of the Blind is making a statement by suing Target. Yes I know I’ve mentioned this before, but I started thinking about this some more in the wake of this recent MSNBC article. Blind people cannot use the Internet in the same way people with vision can. They cannot “see” the page layout. One thing I haven’t talked much about is semantic relationships in HTML. It’s a very simple concept that eludes most people who claim to know HTML (at least they put it on their resume).

One of the major problems I see with the way HTML is constructed is tables. Tables are one of the most useful constructs in HTML. You put things in columns and rows, and it makes sense. The problem is that it’s not accessible. The way tables are constructed you read down the column instead of across the row. It’s easier to dump the contents of a select statement in SQL than put it into a multi dimensional array and output one row element at a time in order. Thus it is no longer semantically correct.

Let’s say I have a simple table that has this sort of data in it:

Name Age Sex
Alice 32 Female
Bob 53 Male
Cathy 38 Female

A person who is blind heard that as follows: “Name Alice Bob Cathy Age 32 53 38 Sex Female Male Female.” That’s not terrible with such a small list but when the table grows to many columns with many rows in it, it’s nearly impossible for the person to understand which person you are now talking about. If the table were re-constructed to be in semantic order it would make more sense, “Name Age Sex Alice 32 Female Bob 53 Male Cathy 38 Female.” I understand CSS has come to the rescue but with completely different look and feels and bugs depending on what browser you are using. My question is, why haven’t we invented a new table structure in HTML that is semantically correct? It’s not radical thinking, it’s a simple solution to giving accessibility and still allowing an easy standard way to display data in HTML.

Anyway, sorry, that was probably a tangent. The real reason I’m writing this post is to drive home the fact that the CAPTCHAs people have been using on their enterprise websites are going to get them sued unless they have an alternative. We’ve talked about this before, and I’ve been given the impression that people just aren’t sensitive to this issue by the very same people who built those CAPTCHAs. I wonder what it will take for people to realize it’s just not a good idea from a security perspective (porn proxies completely circumvent the value since you can trick people in any context to type in those CAPTCHAs for you) and from a legal perspective. Hell, it doesn’t even have to be a porn site that relays the CAPTCHAs to unsuspecting users, it could be a blog… a web application security blog. Hmmm…

CAPTCHA Curiosity

Wednesday, September 6th, 2006

Tim Tucker posted an interesting solution to some of the CAPTCHA solving stuff going around. He posted that to comment on his blog you must enter any data, as long as it’s incorrect. So as long as you don’t type in whatever you see and it is six characters long, it will be solved.

As the posted noted, this isn’t particularly good security, as a) it can be broken by anyone who views the site and knows that rule (therefore it’s not good against targeted attacks), and b) if it ever gains popularity it will become standard in splogging software. Still, it’s an interesting take on the same old problem of blog spam.

Target sued over inaccessability

Monday, July 17th, 2006

I know I’ve been batting around a lot of the accessability issues with CAPTCHAs and turing tests in general when used to discriminate against robotic activity and how that relates to the blind.  Sometimes what I talk about is theoretical and sometimes it’s not.  Here’s a case in point.  Target had a class action suit filed against it by the National Federation of the Blind (NFB).

My general feeling on this matter is it is always better to allow alternate applications to visual CAPTCHAs.  Audio versions are one example.  Email is another example.  Out of bound methodologies can provide reliable alternatives to visual CAPTCHAs and can still allow users to proceed through the flows in question.

Hot CAPTCHA

Monday, July 17th, 2006

Jeremiah Grossman sent me this link today, that I thought was simply hilarious more than anything.  It is based off the Cutest KittenAuth CAPTCHA or the HumanAuth CAPTCHA but instead it uses a series of pretty women - it’s called Hot CAPTCHA.  Hot CAPTCHA uses the same basic turing test that the others do, except that it uses a hyper subjective series of photos.

Do I recommend using hot captcha?  Well if you are trying to protect a porn site and blind users are basically not going to have fun with your site either way, sure.  Other than that, I’m not sure the corporate world is going to be embracing this one any time in the near future.  It would be hard to tell my boss “Yah, sorry, I’m just trying to post to MSNBC - no really!”

I find myself feeling bad for some of the guys I know who clearly cannot pick out a cute girl if she were sitting on his face. They will be deemed a robot and never allowed to see the porn for which they seeked - which may have cured them of their lack-of-hot-girl-insight.  Anyway, worth a laugh.

KittenAuth CAPTCHA

Monday, June 5th, 2006

I was thinking about the KittenAuth CAPTCHA since I messed with it over the weekend a little. As I said, the number one issue with that particular system is the low order of possible solutions. It’s not about finding the right kittens, necessarily, but it’s also about the probability of getting the right answer. If you just guess an answer, the probability is 3 over 9 times 2 over 8 times 1 over 7 = 6:504 odds (given 3 correct values in a set of 9), compared to a normal CAPTCHA of say 6 numbers would be 1:999999 (just a tad worse odds there). The other problem with it is that it has such a small set of photos.

I did some cursory research on Yahoo images and Google images, and I found that Yahoo had a far superior data set of actual kitten images than Google, although Google reported half the data set of images it was also less accurate in what it was finding for the first 20-30 pages. If you were to use Yahoo’s data set there would be very little pruning needed, where if you used Google’s image search you’d be removing things like a band named “Atomic Kitten” and some Melborne based transvestite named “Kitten”. The point being, even if you could gather such a set, and prune it, then you’d still be at the mercy of a robot who could accurately gather all of the images off the internet with the name “Kitten” and get such a large data set to compare against that it would be broken again. But that’s leaving out Bayesean heuristics.

There’s a company called MessageLabs that uses something beyond pixel by pixel comparison and even beyond pixel color densities to determine if something is porn (those are the most common method of content filtering and also very flawed). MessageLabs also verify what is in the photo. For instance they can tell what a hand is, or what a car is or what a sky is, so they are less likely to see something like a flesh colored door or a baby picture or something more grey area like a swimsuit photo at the beach as porn. Using something like this against KittenAuth could prove to completely break their system - as if it weren’t already broken enough the way it is today.

CAPTCHA issues

Sunday, June 4th, 2006

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart“. It’s the little box of numbers that people ask you type to perform site functions (usually post or register). There are a few pretty big problems with this technology.

Understandably this is more of a business issue than anything, but accessability is becoming a huge issue. Specifically how can you protect your site from computers that cannot “see” and still meet the criteria for ADA (American Disability Act) compliance. The problem is the blind use a number of tools to hear the words on the page, but unless you were to somehow pass that information along in plain text their text based readers cannot read the image (they generally use Lynx). And besides that would sorta defeat the purpose anyway. And by the way, this is not a theoretical problem, the NFB (National Federation of the Blind) is notoriously litigeous and has recently been entering the web space: NFB vs. AOL (America Online).

So the alternative is to give a version that is useful for the blind, which is an audio version (assuming their text based reader can handle sound files). The audio version reads a series of numbers that they are to transcribe into the box on the page. There are a few problems with this. The first being, you have now made a secondary transmission source for the same access key (we’ll get back to that in a second). The second problem is some businesses would like to store that information that the user went to the audio version for security purposes, or for customization/personalization in the future. Well, hate to throw a wrench into that idea, but that now forces you to be HIPAA (Health Insurance Portability and Accountability Act) compliant (at least in the United States) because you are now storing potentially sensitive medical information about people. Now you are liable under that act if you aren’t taking huge measures to insure compliance. Lovely, huh?

Now, let’s assume somehow you deal with all of that stuff, and you have your very own CAPTCHA on your page. How secure is it really? Well, I’d suggest you take a look at PWNTCHA. The basic premise of this paper is that if a human can read it, a computer can too, with enough tweaking. Beyond that the implementations themselves are very weak. The randomness is never high enough to keep a computer at bay with enough tweeking, unless it is also strong enough to keep a person from being able to read it. Worse yet, remember our audio version? Now you have two versions to worry about. Speech recognition is getting better and better too. Now you don’t just have to worry about one CAPTCHA, you have to worry about which one happens to be the weakest, because that is the one that the attacker will attempt to break first. But who is really going to invest all that time into breaking a CAPTCHA? Well what if it were easier than that?

The next huge problem with CAPTCHA is that you have to assume the entity who is recieving it is the entity who will attempt to fill it out. Wellllll… that’s not always the case. There is a concept of MITM (man in the middle) attacks for CAPTCHA. If the attacker sets up a porn site or any site that has a high traffic volume they can use that against your site. Here’s how. Their site requests a CAPTCHA image from your site and instead of immediately trying to solve it, they replay it to one of their users, saying something like, “If you want free access to our pr0n, type in the numbers above”. The user looks at the image, types the solution into the attacker site, which then replays the solution back to your site. Poof, instant access for the robot by way of a human proxy with a malicious website acting as a MITM.

There are a lot of gimmiks out there, like kittenauth CAPTCHA etc… but they almost always suffer from flaws (kittenauth happens to suffer from a small order of possible solutions), but the human proxy malicious website MITM issue is probably the number one problem for all CAPTCHAs, and why would any company risk it with the potential ADA lawsuits involved? Back to the drawing board.