Cenzic 232 Patent
Paid Advertising
web application security lab

Preventing XSS Using Data Binding

Stefano Di Paola sent me an interesting email the other day. Honestly, it took me a good hour of playing with it before I finally wrapped my brain around what was going on. Using data binding he can make JavaScript attach user content to the page while validating that it does not contain active content. That is, styles are okay, but JavaScript is not. Very interesting. Here’s the demo (warning, not for the technically feint of heart).

Stefano asked me to give my report on the good and the bad. The good is, this is pretty damned good at stopping XSS. It probably won’t stop abuse of styles that position themselves over other people’s content, but it would stop a good deal if not all XSS if implemented properly. That’s the good news (and that’s very good news for most people). Here’s the bad news.

The bad news is that it requires JavaScript to work. If you don’t have JS installed, forget it. That’s bad news for security people, bad news for accessibility, and even worse news for robots who are trying to get contextual understanding of the page. It also forces the bottom of the page to be where the user generated content is. That’s also bad for SEO because it means the most relevant content is at the very last part of the page. Depending on how the page is built and the spider, this may fall off the size limits of the robot. Not good. Lastly, it would reap havoc on lots of those poor web application scanners. They would light up like Christmas trees because there is NO output encoding done. None. Zip. It’s funny to make web application scanners have false positives, but it’s also a pain in the butt if you’re the operator of said scanner. Herein lies one of the advantages of scanners that use built in rendering engines (forgiving any other issues they may have).

So where would this be useful? Think about all those web2.0 applications out there that have to put dynamic content on the page, don’t have to worry about spiders, robots, and need to make sure that what they output is okay no matter what encoding, or any other craziness that users may put in. I’m not advocating being sloppy, and there may be other issues here that I haven’t found, but thus far, it’s looking like a promising technology. Very nice work by Stefano!

26 Responses to “Preventing XSS Using Data Binding”

  1. kuza55 Says:

    I talked to Stefano about this before, and I really like the idea; since it enforces the separation of code and data, and the whole reason we’re in this mess is because we have no effective way to separate them.

    The only issue I was able to find with that (but not quite able to exploit) demo is that since you can specify the character set, you can specify a character set where the plaintext tag is not interpreted as an actual plaintext tag, e.g. a fixed width 16 bit encoding like UCS-2, or similar, and then have your payload UCS-2 encoded. But this could easily be fixed, either by not allowing non-unicode character sets (and lets face it, there are no sites where you are allowed to control the charset which is returned anyway, so this is a non-issue), or using a library to convert the plaintext tag to the appropriate charset.

  2. Dean Brettle Says:

    This approach is definitely interesting, but the only case where I think it makes sense is for user-provided style (and maybe URL) attributes. Here’s why:

    The developer still needs to identify and pass all the user-provided content through something (one of the addBind* functions) on the server-side to be used with this technique. Those functions do some simple text processing on the content before injecting it at the end of the document. If you are going to do that, why not actually clean the content on the server-side and not deal with the disadvantages that RSnake mentioned? That is relatively easy to do for plain text (note: the approach doesn’t allow for user-provided HTML), comments, and most attributes. Just escape a few characters.

    The exceptions are style and URL attributes. Whether the value of those attributes are dangerous can depend on exactly how the browser parses them. Stefano’s approach elegantly bypasses this issue by letting the browser do the parsing and then filtering the resulting objects on the client-side.

    For URLs, a simple whitelist of protocols is sufficient for most apps, and that can be implemented on the server-side pretty easily as well (i.e. make sure the URL starts with a valid prefix). Styles are harder to whitelist on the server-side because you don’t normally want to run a CSS parser there. That’s why Stefano’s approach is a *big* win for style attributes.

    Moreover, if you only use Stefano’s approach for style attributes, you don’t suffer any of the disadvantages in terms of accessibility and SEO.

  3. Pablo Says:

    If this is an anti-XSS defense how exactly is it a problem if it only works if JS is enabled?

  4. Stefano Di Paola Says:

    @RSnake :
    The javascript approach is just a proof of concept on what we could do if a separation of trusted/untrusted data is applied when dealing with user data. My aim was to publish a paper on september with a deep analisys on what could be prevented and what not with a logical data separation approach.

    For all the drawbacks you mentioned, I agree at all! But, if we could find a way to build it as a standard i think it could be understood by every automatic scanner and search engine bot.

    @Kuza55: Thank you Kuza55, the charset parameter was just a way to demonstrate that problems like google had with the utf-7 XSS error page, wouldn’t have take place, if data binding prepared statements were applied.

    @Dean: About mixed Html user data, you could apply some preprocessing stage too. Have a look at the poc implemented in test.html page (only works with Firefox), you can give a white list array of html or also you can simply stop potentially dangerous attributes and markups.
    About the Style stuff, it’s the same. If the developer wants the user to modify only the colour of a specific markup it could be easily implemented in the preprocessing stage.

    Anyway, thank you all. I’m planning to release a paper about this approach on september (I’m on vacation at the moment:)..
    Everyone feel free to test it and modify the scripts, because if this technique would be applied as a new standard, browser vendors could implement it and no javascript would be required!

  5. RSnake Says:

    @Kuza55 - you’re exactly right and I feel stupid for not having seen that myself. It’s an easy fix, but yes, allowing multibyte character sets could easily allow other issues. Herein lies the land of variable width encoding issues too. I still think you’re going to have to limit things to a select few charsets, or program around all charsets. Doable, but painful.

    @Pablo - because of the other non XSS related disadvantages I mentioned (SEO, accessibility, et al). Life doesn’t revolve around XSS. ;)

    @Dean - that’s a good point. If limited this could actually work better in terms of SEO since style information is generally ignored anyway. If you limit it to styles, it could actually help SEO rather than hurt it. Scanners are still screwed, but the rest is not a problem.

    @Stefano - agreed, I think there are a number of good ideas here. I think part of the difficulty here is how do you standardize the content so that it can live anywhere on the page, and still be “safe” from rendering? Plaintext doesn’t work, as we’ve seen, textareas can be jumped out of, etc… This is feeling like another place for content restrictions, where you can have embeddable safe content.

  6. ascii Says:

    @Stefano: great idea and great demo! i love the separation from visualization and contents and really like the javascript POC implementation (clean and easy to understand). basically it’s data binding for html done on the client side so inputs/contents are congruent with the browser environment (if not also the malicious xss data won’t work).

    i’m waiting to see your initial browser implementation proposal ;)

    @Kuza55: compliments for the bypass, you were the only one who spotted it in the pre-public stage :)

    @RSnake: i think this could be useful (if standardized) for SEO/spiders/accessibility since most web server support byte ranges (could be a problem for really dynamic data anyway)

    a solution for the final implementation could be an alternate data stream that could be requested alone (wouldn’t it be nice? like for xml/xslt)

    i really hope people understand this and works together to make this idea realty

  7. kuza55 Says:

    @Stefano:
    I realise its just a demo, but people *do* still need to realise that it won’t just work everywhere, and that in some circumstances tweaking would need to be applied.

    @RSnake:
    I really don’t see why you would want to let the user control the character set at all; (almost?) everything supports UTF-8, and (almost?) everything can be represented using it.

    And wouldn’t the SEO/SEM issues only apply to things like forums where most of the content is user generated?

    And like Stefano said, if we can get some kind of standard accepted by the browsers, then the bots will follow. And it could potentially make things easier for bots, since they won’t have to deal with presentation when looking for data.

    @asii:
    Thanks; /me feels special, ;)

  8. Dean Brettle Says:

    I looked at Stefano’s example for handling untrusted HTML and I had an idea. We can put the untrusted HTML in a comment that we generate with document.writeln() from javascript. The result is that it is only in the comment if javascript is enabled. That avoids the SEO/accessibility issue. You can see the POC at:

    http://www.brettle.com/UntrustedHtmlWithJavascript.html

    Thoughts?

  9. Dean Brettle Says:

    For those that are interested, I just updated that code to fix a couple minor bugs and handle javascript in styles and urls.

  10. Stefano Di Paola Says:

    @kuza55: You are completely right, a simple (flawed ;) poc could confuse people and this is not what we want. I managed to exploit the plaintext tag with ISO-10646 charset, and prepend a %00 for every char….So..darn :)..i liked your smart idea when you told me and i like it now…it seems you are possessed by the hacking ghost!

    @Dean: Nice idea, the comment using document.write and the implementation style :)..it has still some drawbacks..

    I’m trying to think about some other simple solution as the plaintext tag, but i still cannot see a workaround letting user data in clear. The only thing would be using a different channel in the same response (chuncked data, multipart mime encoding, headers, or base64 encoding)

  11. Ronald Says:

    Interesting stuff, I had a similar idea but it only focused on the innerHTML tag, since it’s not allowed to initiate a new script tag inside innerHTML. Along with some replacing I came up with this script: http://0×000000.com/index.php?i=96 which allows html/css but strips all dangerous Javascript. Well, almost all. it was just a project of leisure. But, it’s portable.

    I really like this idea of data-binding Stefano, it is almost the holy grail of stopping XSS, good work my friends :)

  12. Dean Brettle Says:

    I’ve updated the test page again to close the hole where the untrusted content can display out of the box when javascript is disabled. See the test page for details:

    http://www.brettle.com/UntrustedHtmlWithJavascript.html

  13. John Ther Says:

    I know and all of you know, and you Rsnake as you stated it’s very bad in lots of points. Most important thing is possibly you have to enable Javascript.

    But other funny thing, If you can write code with htmlentities and other stuff just go and code it! Why would you want to introduce such a problematic solution instead of easily fix the problem.

    Use htmlentities and write 3 more simple functions and there you go you are totally secure without spending ages and killing your website in SEO, usability, accessibility point and making it even unsecure by forcing visitors to use Javascript.

    And code has even got blacklisting itself while trying to deal with protocols!

    Don’t try to fix XSS with tricks just fix it, because it’s bloody easy if you are not trying to allow HTML at the same time.

  14. Dean Brettle Says:

    Yet another idea:

    Let’s say you want to allow some untrusted script to run but you want to limit the API it has access to. You could wrap the untrusted script like this:

    function() {
    var document = undefined; // or = new ProxyDocument(window);
    var window = undefined; // or = new ProxyWindow(window);
    // Repeat for all globals that you want to hide/proxy.

    UNTRUSTED SCRIPT GOES HERE

    } (); // calls the anonymous function we just defined.

    You’d also need to preprocess the untrusted script as follows:

    1. Make sure all the unquoted, noncommented, curly braces match up so the untrusted script can’t run outside of the restricted function. Simplest way might be to use regexs (on a copy of the untrusted script) to remove all comments, remove all quoted text, and then check that curly braces match. If they don’t it’s malicious, so delete it.
    2. Transform all uses of the delete operator to a call to an UntrustedDelete() function. UntrustedDelete would prohibit deleting of the vars we defined at the top of the function.

  15. Dean Brettle Says:

    Just realized that the syntax from my anonymous function call isn’t quite right. It should be:

    (function() {
    var document = undefined; // or = new ProxyDocument(window);
    var window = undefined; // or = new ProxyWindow(window);
    // Repeat for all globals that you want to hide/proxy.

    UNTRUSTED SCRIPT GOES HERE

    }) (); // calls the anonymous function we just defined.

  16. DoctorDan Says:

    This is some wonderful stuff. I think I found an issue with the style parsing that may be a pain to fix. If you use an invalid property it wont show up in element.style[i] (not even element.style[’cssText’]), yet it will still execute anything in its expression(). For example, this will make its way through the style parser and execute in IE:
    gobbledegook: expression((window.r==1)?”:eval(’r=1;alert(”XSS”);’))

    As Dean said, the idea of having the browser parse everything before checking is a beautiful concept! Excellent idea, Stefano!!

    The ?: operation is simply to stop IE from crashing/looping the code.

    @Dean: Good thinking! It looks pretty nice. I’ll play around with it a bit.

    -Dan

  17. kuza55 Says:

    @John Ther

    The idea here is to solve the problem of XSS, not just filter things. The whole reason XSS is even an issue is because there is no separation between presentation/code and data, its all chucked in the HTTP body.

    This is an attempt to create some kind of separation between the presentation/code and the data, or at least the data supplied by users.

    This is about doing it better, not just doing something which seems to work, case in point: variable width encoding is still possible if htmlentities is called without the appropriate character set (and i think there was a post somewhere saying that variable width attacks were still possible on some specific charsets even then).

  18. Dean Brettle Says:

    My test page has been updated to work with (at least) IE7 and Firefox. It also plugs the style hole that DoctorDan pointed out. Thanks Dan!

    Test page is still at:

    http://www.brettle.com/UntrustedHtmlWithJavascript.html

  19. Nick Says:

    Definitely a good bookmark for webmasters. Thanks

  20. sirdarckcat Says:

    Click on test me xD

    http://www.wisec.it/ph/test.php?c=&search=&ref=javascript://www.google.com/%25250d%25250aalert(/I%20was%20here/);&style=&comm=

    Greetz!!

  21. Stefano Di Paola Says:

    @sirdarkcat : incredible IE! :D I can’t believe that there’s a difference between HtmlAnchorElement and internal parsing.
    Quite easy to fix, thanks sirdarkat. Anyway, that’s why my real aim is to propose the DataBinding approach as a standard. Client side javascript is definitely _not_ the solution for this kind of technique.
    Firefox behaves well and identifies it as a javascript snippet.

    @Kuza: As a simple workaround on your Variable width encoding exploit i just added utf-8 encoding and then UTF-16, UTF-16LE and UTF-16BE plaintext markup. This (should:) stop your exploit.

    @DoctorDan: I cannot test it on IE7 at the moment, but my tests on IE6 behave as expected, and it just add the unknown properties to the list of style attributes, chatching it and then removing it. Does it on IE7 works?

    @Jonh Ter: You have to think the problem as for SQL Injection. What really stops it? Prepared Statements. They are just a technique to fully separate Logical data from Scalar data (or trusted from untrusted). And remember all the issues generated by charset change as Kuza55 already told you.

    To everyone, i just posted a blog entry: http://www.wisec.it/sectou.php?id=46c5843ea4900
    In order to clarify some technical details which, anyway, will be fully described on my paper released on early september (ASAP).

  22. Stefano Di Paola Says:

    @DoctorDan: errata corrige:)… the style stuff works like a charm…
    there was some problem with your cut and paste:
    gobbledegook: expression((window.r==1)? ‘ ‘:eval(” r=1; alert(’XSS’); “))

    and now i understand where the problem lies.. IE just execute your poc _before_ it is added to the real markup! Really, js in IE is quite incongruent …

    Anyway, as previously said this stuff should be definitely for internal browser implementation and not client side js.

  23. Dean Brettle Says:

    @sirdarkcat - what browser did your exploit work in? It’s not working for me from IE7 or Firefox. I’m not working because of the browser or because Stefano changed something in the code.

  24. Dean Brettle Says:

    Sorry, I mean to say:

    Iím not sure if it’s not working because of the browser or because Stefano changed something in the code.

  25. Fraggeleh Says:

    This is an old post, but I’m wondering what happened with this? From what I’ve observed with very BASIC testing, this is quite a cool tool. What is the problem with implementation (i haven’t read comments)?

  26. MustLive Says:

    Guys. As I wrote at my site http://websecurity.com.ua/2244/ about vulnerabilities at wisec.it, there are Cross-Site Scripting holes and they are in this PoC.

    I found them 06.11.2007, just when first read this RSnake post. The holes works in old versions of Mozilla and Firefox (versions before Firefox 2.0). So this method of protecting from XSS is quite non universal.