I spent a little time this weekend playing with my XSS fuzzer, which I am trying to get to a point where I can release it, for other researchers to play with. In doing some preliminary testing I’ve found a number of issues worth mentioning to anyone doing this form of research. Cheng Pang Su and I have been working on some of the more advanced variable width encoding, and I’ll release more on that later, as I’ve found a number of additional issues. In doing that, I have expanded the fuzzer to look at additional character encoding methods, which is how I began finding these.
In the mean time I thought I’d release some additional research I’ve been doing around injecting characters into HTML tags to see their outcome in various encoding methods. Here is the string I started with:
<[CHAR]IMG SRC="" onerror='XSS_ME(CHAR)'>
Where [CHAR] is the actual character, and CHAR is the numerical representation of that character and XSS_ME is a function to log the character. Anyway, after removing false positives (60 = <) I came up with this list in the various encoding methods I am testing in Internet Explorer:
US-ASCII: 0, 128, 188, 1788, 1852, 1916, 1980, 2044, 2108, 2172, 2236, 2300, 2364, 2428, 2492, 2556, 2620, 2684, 2748, 2812, 2876, 2940, 3004, 3068, 3132, 3196, 3260, 3324, 3388, 3452, 3516, 3580, 3644, 3708, 3772, 3836, 3840, 8124, 12220, 16316, 20412, 24508, 28604, 32700, 36796, 40892, 44988, 49084, 53180, 57276, 61372, 65468
EUC-JP: 0, 142
The 0 char (null) is not particuarly interesting, as it’s been known for a while (as documented on the XSS Cheat Sheet). The others however, I’ve never seen documented properly. These wide chars are not straight forward to inject as are any below 255, but it’s still possible (especially if it’s stored verses reflected).
Lots more to come, but I wanted to give you all something so you could see what I was up to. In the mean-time it might be interesting to start writing a scanner to get the various encoding methods of different websites to see how statistically interesting these methods are. From one scan I’ve seen it was less than 1% use US-ASCII, but as far as EUC-JP it would require scanning Japanese domains to get a better understanding of the possibilities of these attack vectors.