US-ASCII Issues Redux
As I’m nearing completion of my XSS fuzzer for people, I’m finding more and more interesting issues. Just so you know I’m not keeping everything from you all, here’s another interesting problem I uncovered. Sure you remember the original problem with US-ASCII encoding, where a character could be modified and in US-ASCII it would render as a open angle bracket, or any other character, if you encoded it correctly. Wellllll, it just happens that that is only one very small problem it turns out. Sure you can look for everything higher than 7F (or 127 in decimal) and less than FF (255 in decimal) and kill it but that won’t solve your problem. One of the tests I ran was:
[CHAR]IMG SRC="" onerror="XSS_ME([DECIMAL-CHAR])">
Where [CHAR] was an enumerating list of characters and [DECIMAL-CHAR] was the decimal representation of that character. I expected to only find 60 (the decimal representation of the open angle bracket, and the additional character 188 (the US-ASCII issue that Kurt Huwig found). Alas, there was far far more vulnerable characters. Here’s the list:
188, 316, 380, 444, 508, 572, 636, 700, 764, 828, 892, 956, 1020, 1084, 1148, 1212, 1276, 1340, 1404, 1468, 1532, 1596, 1660, 1724, 1788, 1852, 1916, 1980, 2044, 2108, 2172, 2236, 2300, 2364, 2428, 2492, 2556, 2620, 2684, 2748, 2812, 2876, 2940, 3004, 3068, 3132, 3196, 3260, 3324, 3388, 3452, 3516, 3580, 3644, 3708, 3772, 3836, 3840, 6588, 6652, 6716, 6780, 6844, 6908, 6972, 7036, 7100, 7164, 7228, 7292, 7356, 7420, 7484, 7548, 7612, 7676, 7740, 7804, 7868, 7932, 7936, 10684, 10748, 10812, 10876, 10940, 11004, 11068, 11132, 11196, 11260, 11324, 11388, 11452, 11516, 11580, 11644, 11708, 11772, 11836, 11900, 11964, 12028, 12032, 14780, 14844, 14908, 14972, 15036, 15100, 15164, 15228, 15292, 15356, 15420, 15484, 15548, 15612, 15676, 15740, 15804, 15868, 15932, 15996, 16060, 16124, 16128, 18876, 18940, 19004, 19068, 19132, 19196, 19260, 19324, 19388, 19452, 19516, 19580, 19644, 19708, 19772, 19836, 19900, 19964, 20028, 20092, 20156, 20220, 20224, 22972, 23036, 23100, 23164, 23228, 23292, 23356, 23420, 23484, 23548, 23612, 23676, 23740, 23804, 23868, 23932, 23996, 24060, 24124, 24188, 24252, 24316, 24320, 27068, 27132, 27196, 27260, 27324, 27388, 27452, 27516, 27580, 27644, 27708, 27772, 27836, 27900, 27964, 28028, 28092, 28156, 28220, 28284, 28348, 28412, 28416, 31164, 31228, 31292, 31356, 31420, 31484, 31548, 31612, 31676, 31740, 31804, 31868, 31932, 31996, 32060, 32124, 32188, 32252, 32316, 32380, 32444, 32508, 32512, 35260, 35324, 35388, 35452, 35516, 35580, 35644, 35708, 35772, 35836, 35900, 35964, 36028, 36092, 36156, 36220, 36284, 36348, 36412, 36476, 36540, 36604, 36608, 39356, 39420, 39484, 39548, 39612, 39676, 39740, 39804, 39868, 39932, 39996, 40060, 40124, 40188, 40252, 40316, 40380, 40444, 40508, 40572, 40636, 40700, 40704, 43452, 43516, 43580, 43644, 43708, 43772, 43836, 43900, 43964, 44028, 44092, 44156, 44220, 44284, 44348, 44412, 44476, 44540, 44604, 44668, 44732, 44796, 44800, 47548, 47612, 47676, 47740, 47804, 47868, 47932, 47996, 48060, 48124, 48188, 48252, 48316, 48380, 48444, 48508, 48572, 48636, 48700, 48764, 48828, 48892, 48896, 51644, 51708, 51772, 51836, 51900, 51964, 52028, 52092, 52156, 52220, 52284, 52348, 52412, 52476, 52540, 52604, 52668, 52732, 52796, 52860, 52924, 52988, 52992, 55740, 55804, 55868, 55932, 55996, 56060, 56124, 56188, 56252, 56316, 56380, 56444, 56508, 56572, 56636, 56700, 56764, 56828, 56892, 56956, 57020, 57084, 57088, 59836, 59900, 59964, 60028, 60092, 60156, 60220, 60284, 60348, 60412, 60476, 60540, 60604, 60668, 60732, 60796, 60860, 60924, 60988, 61052, 61116, 61180, 61184, 63932, 63996, 64060, 64124, 64188, 64252, 64316, 64380, 64444, 64508, 64572, 64636, 64700, 64764, 64828, 64892, 64956, 65020, 65084, 65148, 65212, 65276, 65280, 65340, 65404, 65468, 65532
Forgive the mess, but yes, all those characters can substitute for an open angle bracket, and run HTML and your cross site scripting vectors. Looks like theres tons of other problems to look for. Thankfully US-ASCII encoding is not that prevelant (about 1% of Fortune 500 by our estimates), however I’ve only just begun my testing. Almost everything I’m trying works, which is pretty scary. Lots more to come…



August 29th, 2006 at 4:28 pm
Correct me if I’m wrong, but there are only 256 US-ASCII characters, aren’t there? As such, 316 isn’t its own character - it’s two. chr(1) and chr(60). 380 is chr(1) and chr(124), 444 is chr(1).chr(188), etc.
Of those, 316 and 444 aren’t at all surprising. 380 kinda is, though.
August 29th, 2006 at 5:13 pm
That’s probably what the intention was, but when you output a long width character in that encoding method it works. Like so:
IMG src="" onerror=alert(65532)>
August 29th, 2006 at 7:59 pm
But a filter that prevents these attacks according to Kurt Huwig’s suggested fix should be clearing the high bit of each *byte* before looking for vectors. It wouldn’t make any sense to look at each long character, right?
That said, I agree with yawnmoth that chr(1) chr(124) getting interpreted as an angle bracket is definitely interesting since the suggested fix would not have blocked it. Are there any other sequences of 7-bit chars that have that property?
August 30th, 2006 at 8:41 am
Dean, I hadn’t heard that fix… but what do you mean look at the high bit, exactly? Do you mean if the first bit of the char is greater than 7 (as in 7f) then ignore it? Well what if I have 32060 (7D3C in hex) the first bit is not greater than 7. Or am I misunderstanding?
August 30th, 2006 at 9:27 am
The idea behind the fix, I think, is that if you do an ‘and’ with 0×7f on every character that characters like chr(ord(’
August 30th, 2006 at 9:32 am
Hmmm… WordPress’s filtering out what it thinks are tags broke my post…
I’ll just use ‘i’, instead.
If you do an ‘and’ with 0×7f on every character like chr(ord(’i') | 0×80), you won’t have two characters that represent an angle bracket - just one. ie. chr(ord(’i') | 0×80) & chr(0×7f) == ‘i’, etc.
August 30th, 2006 at 3:36 pm
RSnake, the suggested fix is to clear the high bit of each *byte* (not each character) before filtering. For 7D3C, no bits would be cleared but a filter looking for “
August 30th, 2006 at 3:38 pm
[Trying again…]
RSnake, the suggested fix is to clear the high bit of each *byte* (not each character) before filtering. For 7D3C, no bits would be cleared but a filter looking for a left angle bracket would catch it because 3C is a left angle bracket. As a result, I’m saying that 7D3C isn’t really any more interesting than say 203C (space followed by left angle bracket). Both would be caught by a filter operating on bytes, and any filter operating on text in the US-ACII encoding should clearly be operating on bytes.
017C (decimal 380) is interesting because it doesn’t contain a left angle bracket. As a result, a filter would probably miss it.
August 31st, 2006 at 4:53 am
Well, all these characters, when represented in UTF-8 contain either 0xbc or 0×3c which are equivalent to ‘
August 31st, 2006 at 4:55 am
Well, the previous posting is broken by the use of ‘<’ but the conclusion is that there are no hidden surprises.
August 31st, 2006 at 8:40 am
Amit and I discussed this as well, and I think I was falsely finding issues that weren’t there. Here is his diagnosis:
What’s confusing is that there actually are variable widths that modify the next character (I see the effect quite a lot now that I’m working with Cheng Peng Su and modifying the fuzzer to use that information). Using bvi, I was able to modify the tests in memory, and sure enough I can see the effect with anything I type:
55 02 BC -> works
09 BC -> works
Etc… As long as that last bit is BC it functions. So the serialization of the various characters strung together is having no effect on the last bit. Anyway, as a result this has uncovered a pretty major flaw in my fuzzer design. Don’t expect it anytime soon… I have to re-think some key issues.