Cenzic 232 Patent
Paid Advertising
web application security lab

Tracking Back The Trackback Spam

I got 290ish trackback spams last night, and that’s after quite a bit of anti-spam filters. For some reason spammers think I’ll approve their spam through excessive volume. Well, they couldn’t be more wrong. In fact, I’ve been thinking of interesting ways to detect them. For those of you who don’t run blogs, trackback spam is when robots pretend to be other blogs linking to my site. My site picks up the post requests from the robot, who tells it a few things, like the link to the site and a title and some sample text. Trackback spam is difficult to stop because it is doesn’t act like normal traffic (even when it’s working normally). So today I came up with a few semi-clever tactics to end the madness.

The first is the IP address. This is one thing the robot cannot fake. The robot normally must run from the webserver that the trackback is coming from. If it isn’t, that’s a huge signal that it’s a robot. So what if I connect to the same IP address on port 80 and look for a webserver? If I don’t see one, I can be 99% sure it’s fake traffic. The only way that wouldn’t be true is if the site just temporarily went down or the server is on another port. Either way, do I really care?

Next is the IP address of the link. The link itself should match the IP address. Why would a site be doing a trackback link for some other website? That makes no sense, and therefore again is 99% spam. The only way the spammers could get around this is to temporarily spoof the DNS entry to my server, but even still they’d have to be running a webserver on that IP address. In this way, you can quickly exhaust the number of sites they can spam from because they must run a webserver on it to get it to work (which they do in less than 1% of the cases I’ve looked at thus far). And even still they must also link to that same server. That greatly increases the work of a spammer to even get a link to show up in my moderation queue, and I can simply ban that IP address going forward, since I know it is truly the same IP as the spam site that I don’t care to see anyway.

It’ll be fun writing the software. They spammed the wrong guy 290 times!

19 Responses to “Tracking Back The Trackback Spam”

  1. thrill Says:

    And you could start storing/blocking the source address for x number of days, since you can be sure they would probably try to spam again from that IP.. of course, keeping it indefinitely may not be the best thing, but keeping it for say 60 days might do the trick..

    –thrill

  2. RSnake Says:

    What we really need to do is start classifying the people we block and putting it online. Maybe we don’t need to block their IP address entirely but we could block them from submitting any comments/trackbacks indefinitely. They still have email as a remediation.

  3. Gaz Says:

    You could also delay them once you’ve detected them :)

    That’s what I’ve done with my comment spam plugin “sleep(30)” :)
    www.thespanner.co.uk/2007/02/12/spambam/

  4. drew Says:

    Sounds like a great Wordpress plugin. :)

    You could take it a step further and verify that the trackbacked page actually contains a link to your page.

  5. Wladimir Palant Says:

    I know at least one reason why somebody would post trackback for a link that isn’t his own: http://software.hixie.ch/utilities/cgi/pingback-proxy/, look at the pingback-to-trackback proxy.

    Also, when you compare IP addresses of the trackback sender and the link target - I hope you mean comparing subnet parts of the address? There is such a thing as load balancing, IP addresses don’t have to match exactly…

  6. RSnake Says:

    @Gaz - that would slow down the server, leaving ports open like that, although I like the concept.

    @drew - I’ve stopped downloading new versions of Wordpress, so although I use the base framework my code is getting more and more divergent.

    @Wladimir - I really couldn’t care less about proxies. They can turn it off if they really absolutely must have their link on my page. Trackback links are a feature, not a right. And actually no, I wasn’t talking about subnets, I was actually talking about looping through the list of all possible IPs used by the DNS (including failover). It’s better than subnets since some companies load balance across subnets. Look at gethostbynamel() in PHP to see what I mean.

  7. zeroknock Says:

    New Advisory have been released at metaeye regarding wordpress.

    you can check at:

    http://www.metaeye.org

  8. beNi Says:

    @zeroknock

    Sorry, But you are kinda Late:
    http://mybeni.rootzilla.de/mybeNi/2007/how_to_play_with_an_wordpress_admin/
    February 17th ;-)
    Also posted on fulldisclosure and webappsec

  9. Jungsonn Says:

    I don’t understand that trackback feauture, I mean what’s up with that? I know what it does but; Do you need it? I never understanded that feature in todays blogs.

  10. RSnake Says:

    Trackbacks are just a way for websites to tell other websites that they are talking about them. Specifically so you can know who is linking to your blog, and give them some reciprocal traffic if it’s interesting enough to your users to follow the link and the snippet of text associated with the trackback.

  11. Zeroknock Says:

    @beNi

    Its all with with logout and login events.But better if you placed
    language ticks in that like “Redirection Vulnerability” or “Flaw”.

    No doubt events are same but one difference tyours is logout and ours is login.

    Hhaha

    Well rest is almost same. On same boat.

  12. Zeroknock Says:

    @beNi

    Well one more thing , the wordpress accepts it as a vulnerability and they haven;t undertaken this issue previously.Thats why advisory is there.

    No Temper.

  13. Security Tools News & Tips » Blog Archive » Quick Links for 23 March 2007 Says:

    […] Tracking Back The Trackback Spam - Interesting ways to detect the Trackback Spam. […]

  14. RSnake Says:

    $ grep trackback access_log |grep “23/Mar” |wc
    761 13117 184656

    Yessir, 761 trackback spams blocked (well minus two since two were actually real). I’m loving the massive drop in spam!

  15. ~silkenwitch~ Says:

    Help…

    I have a trackback where the source site returns some ebooking business, but the trackback title indicates a valid business which relates to my blog. Is this an indication that the trackback is spam?

  16. RSnake Says:

    It could be! Hard to say for sure without looking at it. When in doubt nuke it.

  17. ~silkenwitch~ Says:

    Thank-you, RSnake…Will do…

  18. william Says:

    Itíll be fun writing the software. They spammed the wrong guy 290 times!============have you finished the software? it will make them disappear forever?

  19. RSnake Says:

    William - yes, the software has long been completed, but eventually I just turned off trackbacks completely because of all the xmlrpc vulns that Wordpress had.