Cenzic 232 Patent
Paid Advertising
web application security lab

Finding Cnames Via Google

By way of SEOEgghead I ran accross Matt Cutt’s google videos.  I’m really surprised I didn’t see this before so thanks, Jaimie!  At first I thought it would be a lot of beating around the bush about best ways to make your site rank using better HTML or some other nonesense, but instead he beat around the bush on a number of other issues.  I’m actually really glad I watched the video talking about a guy who set up thousands of domains all linking to the same JavaScript.  Talk about an blackhat SEO newbie mistake!  But Matt Cutts also mentioned a lot of domains on a single IP address.

Wouldn’t it be great to have a mapping of virtually the entire internet, where you could see every hostname -> IP address pairing?  Granted, it would have false positives like virtual hosting services, as he says, but come on!  Talk about predictive!  Sure, a few dozen domains may be possible.  Especially for hosting providers, but if I have hundreds of domains that look even vaguely shady, that’s a huge indicator.  Even if they aren’t the same IP, but within a class C network, that could still be highly predictive.  IP addresses have come back to haunt us!  Everything has to be routable, and if Google has to know where you are to index you, and they have any interest in detecting spamming, of course they’ll do a mapping like this.

I had always wanted to build something like this myself, but to build a spider like that would take more horsepower than I’ve got in my rack at home by far, and a database with some serious space.  We’re talking about millions of hostnames to IP addresses.  It gets harder because that has to stay up to date.  Six month old data is practically worthless when you are talking about spamming domains which may only stay up for a week or less in some cases.

Then I suddenly remembered a conversation I had a few weeks back with one of my readers, who shall remain nameless for the time being.  He asked me a simple question, “How do you find all the cnames on a host?”  Cname (or subdomain) spam has it’s ups and downs in the SEO world depending on the day of the week it seems like and depending on which search engine you’re talking about, but it’s a pain to correlate it all together, no matter how you slice it.  It’s also useful for auditing websites for vulnerabilities since cnames almost always reside on the same host, or at minimum use the same backend.  I thought for a few seconds and I came up with a solution.  Use the search engine itself!  Let’s say I want to find all the cnames on Google.  Let’s start with a simple query:

site:google.com -www

That gives us a list of links back, none of which contain “www”.  So now I see things like sketchup.google.com and finance.google.com and eval.google.com.  So let’s make a note of those and query again:

site:google.com -www -eval -sketchup -finance

And then you take what is left from that (which may include things like sub directories which you can remove as well) and remove them:

site:google.com -www -eval -sketchup -finance -google.com/answers -google.com/trends -browsersync -desktop -toolbar -earth -picasa -toolbarqueries

And so on…  Until there is nothing left to search.  In this way, you can get all of the cnames of a server, with relatively few queries.  Of course, Google is a huge site, with lots of cnames, so this technique is pretty tedious with them, but with smaller sites you can go through this pretty quickly.  This still won’t help you do an IP address to domain name lookup, like what Google has access to, but it does help you do your own investigation of cname based spam.  This technique came in handy finding some of the other domains on one spammer site, that you may have remembered from one of my previous posts.

Finding cnames can help isolate spammers, but wouldn’t it be nice if we could somehow get access to all the IP address to hostname maps?  There’s got to be a way somehow.  Hmmm…  I’ll have to think about that one.

5 Responses to “Finding Cnames Via Google”

  1. countzero Says:

    Using the - operator will remove pages that contain the word. Maybe “-inurl:word” would work better. This would be far from finding all the subs on a domain, unless google gave us queries of unlimited length.

    I have used similar techniques to get over 1000 results from google or yahoo.

  2. quadszilla Says:

    Definatley don’t host all your sites on the same class C. Even if you only have like 5 sites, if 2 of them are big - have at least 3 class C IP addresses. If they ain’t looking yet (and i think they are), they will be soon . . .

  3. Jaimie Sirovich Says:

    Matt Cutts already says they _are_ looking. And if Matt says it, it must be true :)

  4. SEO Egghead » Blog Archive » Finding Spammers’ Hideouts Says:

    […] RSnake of ha.ckers.org documents in this post how to conveniently obtain all the subdomains of a domain using a search engine.  This can help you isolate some spammers.  He later says " … wouldn’t it be nice if we could somehow get access to all the IP address to hostname maps?"  I agree, and here is a partial solution to that request leveraging a search engine — this time MSN Search.  Use the "IP" command to find all hosts located on a certain IP address like follows: […]

  5. RSnake Says:

    Countzero, that’s a great addition to the technique… probably way more effective too, but please read about Jaimie’s recent post, I think it goes a long way to finding even more hosts by using MSN’s “IP” flag. Pretty slick.