Paid Advertising
web application security lab

US-ASCII and EUC-JP Character Injection

I spent a little time this weekend playing with my XSS fuzzer, which I am trying to get to a point where I can release it, for other researchers to play with. In doing some preliminary testing I’ve found a number of issues worth mentioning to anyone doing this form of research. Cheng Pang Su and I have been working on some of the more advanced variable width encoding, and I’ll release more on that later, as I’ve found a number of additional issues. In doing that, I have expanded the fuzzer to look at additional character encoding methods, which is how I began finding these.

In the mean time I thought I’d release some additional research I’ve been doing around injecting characters into HTML tags to see their outcome in various encoding methods. Here is the string I started with:

<[CHAR]IMG SRC="" onerror='XSS_ME(CHAR)'>

Where [CHAR] is the actual character, and CHAR is the numerical representation of that character and XSS_ME is a function to log the character. Anyway, after removing false positives (60 = <) I came up with this list in the various encoding methods I am testing in Internet Explorer:

US-ASCII: 0, 128, 188, 1788, 1852, 1916, 1980, 2044, 2108, 2172, 2236, 2300, 2364, 2428, 2492, 2556, 2620, 2684, 2748, 2812, 2876, 2940, 3004, 3068, 3132, 3196, 3260, 3324, 3388, 3452, 3516, 3580, 3644, 3708, 3772, 3836, 3840, 8124, 12220, 16316, 20412, 24508, 28604, 32700, 36796, 40892, 44988, 49084, 53180, 57276, 61372, 65468

EUC-JP: 0, 142

The 0 char (null) is not particuarly interesting, as it’s been known for a while (as documented on the XSS Cheat Sheet). The others however, I’ve never seen documented properly. These wide chars are not straight forward to inject as are any below 255, but it’s still possible (especially if it’s stored verses reflected).

Lots more to come, but I wanted to give you all something so you could see what I was up to. In the mean-time it might be interesting to start writing a scanner to get the various encoding methods of different websites to see how statistically interesting these methods are. From one scan I’ve seen it was less than 1% use US-ASCII, but as far as EUC-JP it would require scanning Japanese domains to get a better understanding of the possibilities of these attack vectors.

2 Responses to “US-ASCII and EUC-JP Character Injection”

  1. escman Says:

    I don’t understand it, but are these codes vulnerable:

    [i] text [/i]
    [size=1-6] text [/size]
    [color=#colorcode] tekst [/color]


  2. RSnake Says:

    I think what you are asking is if BBCode is vulnerable, but the problem is that BBcode must be translated by the system in question, so it’s not just a matter of getting the injection to work, it also requires that the code translating the BBCode does so as you would expect. The short answer is, it probably is vulnerable if it’s in one of the vulnerable encoding methods (not ISO-8895-1).