Today, I spent about 30 seconds looking at Dean Brettle’s NeatHtml page that is designed to sanitize HTML to remove XSS. Well at the moment I think it’s broken, so it’s probably not much of a valid test, but the live demo is designed to show what is and is not possible. Of course, the second or third thing I tried was to end the textarea that the text was being displayed in and pop open an alert box. Voila!
But it got me thinking, there are a number of HTML constructs that don’t allow HTML in them to be rendered as HTML, but rather only as plain text. Common ones that I’ve seen in the wild are textarea, title, comment, and noscript. To the casual XSS penetration tester, these can be easily glossed over, unless you view source and see in which context the HTML has encapsulated the information.
It’s actually very easy to break out of these, assuming HTML is allowed. Just because the title tag is created dynamically and it’s encapsulated by title tags, does not mean that it should be considered safe. Honestly, I don’t think webmasters are even thinking about this issue, or if they are, they are unaware of how it actually works.
This, to me, points to a bigger issue with quality assurance testing in web application security. In a previous job I’d always be super frustrated to find that the application worked great unless you entered a quote or a tab or some other random char, and then all of a sudden you’d end up with a application with serious security issues. I think fuzzing is part of the answer. I think every application should at one point or another send every single charachter in the ASCII char set through the application to see if it has intended results or not.
The major problem with that is a lot of these problems end up being how the browsers themselves render the content, not how the application serves it up. Don’t believe me? Look at the XSS Cheat Sheet and see how many vectors affect both Firefox and Internet Explorer. Very few overlap actually. In fact, when I am testing vulnerabilities I have to test each vector no less than 5 times: once for IE, once for Netscape in IE mode and once in Gecko (because the way it handles URL translation is different than both IE and Firefox), once in Firefox and once in Opera.
What is lacking is a browser that understand all five DOMs and also how they interact with the user. Some of these vectors don’t fire unless there is some interaction with the user. Some require clicking through alert boxes. Some send the browser into infinite loops. All of this makes testing extremely difficult and time consuming. Automation is a real problem in XSS attack detection. A few web security consultancy firms that I’ve talked to make a blanket statement that if any HTML is allowed to be injected, it is considered vulnerable. I’ve chewed on that one for the last year or so, and I think it’s half right.
Of course there is the scenario where you have a whitelist of usable tags with no allowed parameters (like the <BR> tag for instance) that couldn’t really cause harm. If you add a style tag and it’s all over of course, but the simple stand alone <BR> tag is pretty harmless. Does that mean that allowing an unknown HTML tag like <XSS> makes it vulnerable? Well of course not, because that’s not a valid tag. So is it a valid test? I’m still uncertain, but maybe what it does point out is that that particular application needs more testing.
So rather than having a binary auditing position on whether HTML injection of any sort makes a vulnerable application, maybe it’s a hueristics flag that makes it suspicious. Ideally any state change of the application in the case of SQL injection for instance (or how the browser reacts to the returned data in the case of XSS) is suspect. Anyway, food for thought.