I got 290ish trackback spams last night, and that’s after quite a bit of anti-spam filters. For some reason spammers think I’ll approve their spam through excessive volume. Well, they couldn’t be more wrong. In fact, I’ve been thinking of interesting ways to detect them. For those of you who don’t run blogs, trackback spam is when robots pretend to be other blogs linking to my site. My site picks up the post requests from the robot, who tells it a few things, like the link to the site and a title and some sample text. Trackback spam is difficult to stop because it is doesn’t act like normal traffic (even when it’s working normally). So today I came up with a few semi-clever tactics to end the madness.
The first is the IP address. This is one thing the robot cannot fake. The robot normally must run from the webserver that the trackback is coming from. If it isn’t, that’s a huge signal that it’s a robot. So what if I connect to the same IP address on port 80 and look for a webserver? If I don’t see one, I can be 99% sure it’s fake traffic. The only way that wouldn’t be true is if the site just temporarily went down or the server is on another port. Either way, do I really care?
Next is the IP address of the link. The link itself should match the IP address. Why would a site be doing a trackback link for some other website? That makes no sense, and therefore again is 99% spam. The only way the spammers could get around this is to temporarily spoof the DNS entry to my server, but even still they’d have to be running a webserver on that IP address. In this way, you can quickly exhaust the number of sites they can spam from because they must run a webserver on it to get it to work (which they do in less than 1% of the cases I’ve looked at thus far). And even still they must also link to that same server. That greatly increases the work of a spammer to even get a link to show up in my moderation queue, and I can simply ban that IP address going forward, since I know it is truly the same IP as the spam site that I don’t care to see anyway.
It’ll be fun writing the software. They spammed the wrong guy 290 times!