I’ve put a lot of thought into why anti-virus is like an IDS (intrusion detection system) which is like mod_security and web application firewalls. Like all IPS (intrusion protection systems) you have to know what you are looking for. There are two types of detection that are widely accepted. The first is signature based. That means something as simple as looking for a SCRIPT tag in a URL parameter. The second is anomaly based, which is where you might see something like the webserver returning the string “document.cookie” when you had never seen it return something like that from that application before.
Both of these types of detection are pretty flawed. Both rely on a type of detection that can be easily circumvented. But why? There is a common theory amongst the cryptographic community called the Turing Halting Problem. The basic theory was based off the turing machine (a very simple computer) which given certain inputs would grind to a halt. Once the combination of inputs was detected they’d write that combination down. They’d know not to use that set of inputs in the future to avoid the computer “crash”. Then they’d run it again and when they found the next set of inputs that halted the machine they’d write it down, and so on. The end result is that they could tell you what not to do based on experience, but they couldn’t tell you the next set of inputs that would case the same problem.
This is the same problem anti-virus vendors and web application firewalls face. They know what has caused problems in the past, and they probably know a few variants of the same exploit code or vector, but they cannot know what they have not seen. In this way the obfuscated vectors that the community comes up with will almost always circumvent the existing detection methodologies because of the Turing Halting Problem. Web application firewalls, anti-virus detection and IPSs in general all suffer from this deficiency.