Paid Advertising
web application security lab

Spam-me-not Obfuscation

I saw an interesting link today that reminded me a lot of the XSS Calculator. I wonder why. No, really I actually don’t wonder why - it’s practically the same thing. Spam-me-not is designed to allow people to use the mailto: functionality as they would today but obfuscating the URL using a mix of HTML and HEX characters. Cool in concept, but pretty trivial to beat. The cost to modern day robots is next to none to do the sort of backwards conversions required to get the real text.

I think the value in this is fairly limited. Spam is sort of a way of life these days. Email is really taking a back seat to other forms of internet communcation like instant messaging and oversees you’re beginning to see a lot of voice over IP traffic. It just makes more sense. Email has been around for 15 or 20 years now, and the spammers are always ahead of the anti-spammers. That’s not entirely true though. I’ve got a few account that get thousands of spam emails a day, and I see none of it. It’s pretty remarkable actually.

This form of obfuscation that spam-me-not provides is probably effective at the lowest common denominator of spam bots. And as more people use this form of obfuscation, developers will spend the 10 seconds necessary to write the code to decode it (they don’t even have to write it from scratch since it’s already out there). All of this reminds me of the DMCA problem. If you release software that has to be decoded at some point there is nothing you can do to stop that decoded information from being logged elsewhere. It’s the nature of software. Oh well, maybe some people will find some use for spam-me-not - while the rest use email forms.

6 Responses to “Spam-me-not Obfuscation”

  1. Roger Blakes Says:

    I don’t know if you’ve googled this or not, but there’s a very interesting article on this same topic on NeoSmart Technologies:

    I have to agree with the article, it’s not the difficulty of reverse-engineering the obfuscation so much as it is the CPU cost of doing so, especially compared to the mass availability of non-obfuscated mailto addressses littered across the internet (think NNTP and Mailing List archives).

  2. RSnake Says:

    Interesting concept and in principle it’s probably true but in practice it’s more efficient than you are thinking… really all you need to know is if the first seven chars are “mailto:”. You can use a Boyer-Moor algorithm to greatly increase the speed required to do a check for that string by decoding the first char, seeing if it matches an “m” (obviously removing extraneous or non-relevant chars like ). If the first char is not “M” or “m” then leave it, if so, then go to the next char, and so on. It’s extremely efficient (far more so than regex, actually). There are other algorithms that do things similar that could be used as well. Very little processor time needed when compared to regex.

  3. Roger Blakes Says:

    That plus the fact that
    a) CPUs are really powerful now
    b) Most email harvesters use distributed computing
    c) Little to no actual harvesting is done on their own machines and as such doesn’t cost them anything.

    From the neosmart article:

    There really is no good way to prevent an email address from being listed in spam directories and sold in bulk along with thousands of others to spammers around the web

    That’s kinda our position, right? I mean, we don’t have any technology they don’t have, and whatever we get, they’ll have sooner or later… scary, huh?

  4. RSnake Says:

    That’s pretty true… There are some interesting things like one time use email addresses or double opt-in email systems (where you have to send an email it has to return to the sender and they have to confirm through a web based interface that they received it). That does cut down on spam, but it doesn’t stop it from being listed.

  5. MERLiiN Says:

    >I have to agree with the article, it’s not the difficulty of >reverse-engineering the obfuscation so much as it is the CPU cost of >doing so, especially compared to the mass availability of >non-obfuscated mailto addressses littered across the internet (think >NNTP and Mailing List archives).

    I disagree, the encrypted emails are more valuable. As you know they have spent the time it takes to encrypt these, they will almost certainly be read by humans. I know spammers aren’t too concerned about bounces since they let spoofing handle that, but I am sure they care about the percentage of emails that are viewed/read.


  6. Alistair Macneil Says:

    I am looking at the Hivelogic Enkoder above, have just done a test to implement it and am now having a google to see how crackable it is.

    One thing that strikes me about the above comment:
    “ou can use a Boyer-Moor algorithm to greatly increase the speed required to do a check for that string by decoding the first char” would be to combine that method with a string reverse technique…