Fixing XSS Can Cause Command Injection
Kishor sent me a link to a recent post he wrote as a follow up to my previous post about how forgetting global replace can cause XSS. What he talks about is how doing something as simple as turning HTML into it’s equivalent entities can cause command injection. This is yet another reason why modifying content is a dangerous proposition.
Kishor notes that changing < into < and injected within a string xyx<ls -l will turn into xyx<ls -l which still renders. Obviously I’m not a fan of taking any user input and piping it through a system call but if you have to do it make sure to dump the script through a while loop to ensure that it’s not doing anything you don’t want it to. Something that’s okay for web content isn’t necessarily okay for SQL or commands or any other use. Just make sure you know what you’re doing with the text and don’t just blindly use it.



March 18th, 2007 at 6:05 pm
This seems to be more a case of escaping for the wrong context than anything else. Concluding that the idea of escaping is therefore flawed is a poor conclusion, in my opinion.
Escaping isn’t about modifying content; it’s about preserving content in a different context.
March 18th, 2007 at 6:10 pm
I think we agree actually. I’m not saying it’s bad to escape, I’m saying it’s bad to escape only in one context while using it in another.
March 18th, 2007 at 6:14 pm
“I’m not saying it’s bad to escape, I’m saying it’s bad to escape only in one context while using it in another.”
You’re right; we do agree.
March 19th, 2007 at 9:29 am
Though this begs the question of architecturally whose job it is to encode/decode.
You’d like to do minimal input sanity checking, but if you actually want a person to edit html via a web interface you’re going to do a lot of crazy converting for no apparent good reason.
At the same time relying on all downstream consumers of the data to make sure it is safe for their use can expand on the number and type of filters you have to do, but you get filtering/escaping that is much more semantically aware of why/how to escape.
I haven’t seen any “standard” architectures published for this sort of thing…. have you?
March 19th, 2007 at 10:17 am
@Andy:
Generally, in my experience, it’s best to do whitelisting close to the input, and encoding/blacklisting close to the output. This makes a fair degree of sense architecturally, since you’re more likely to know what the valid characters are on the input side (e.g. an email address is “\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b”), and you’re more likely to know what the bad characters are on the output side (e.g. single quotes or non-digits for SQL). This is also advantageous in that you’ll often have two different people independently validating the input, which gives you greater depth of defense. Even if it’s not two people, it’s two contexts, which is still better than nothing.
March 19th, 2007 at 10:34 am
P.S. I know that that email regexp is wrong, it doesn’t work on foo+bar@cmu.edu or other such cyrus/IMAP/whatever addresses. And the blacklist for SQL doesn’t apply to all databases or even all queries. They’re just examples.
March 19th, 2007 at 1:51 pm
(not to mention that “xyz
March 19th, 2007 at 1:52 pm
….. err
(not to mention that “xzy*left angle bracket*ls -l” still runs “ls -l” when inserted into a shell call)
March 20th, 2007 at 6:30 pm
I’ve been waiting for a while for someone to release a vulnerability that involves XSS and metacharacters, but so far I don’t know of any public reports. See my Bugtraq post “Re: ISA Server 2004 Log Manipulation” from May 2006 for context. I use this as an example of why blindly encoding everything is the wrong approach. Agree with RSnake - you definitely have to keep close track of your context, which argues for processing input/output as close to the boundaries as possible.