Cenzic 232 Patent
Paid Advertising
web application security lab

Archive for the 'SEO/SEM' Category

Preventing XSS Using Data Binding

Tuesday, August 14th, 2007

Stefano Di Paola sent me an interesting email the other day. Honestly, it took me a good hour of playing with it before I finally wrapped my brain around what was going on. Using data binding he can make JavaScript attach user content to the page while validating that it does not contain active content. That is, styles are okay, but JavaScript is not. Very interesting. Here’s the demo (warning, not for the technically feint of heart).

Stefano asked me to give my report on the good and the bad. The good is, this is pretty damned good at stopping XSS. It probably won’t stop abuse of styles that position themselves over other people’s content, but it would stop a good deal if not all XSS if implemented properly. That’s the good news (and that’s very good news for most people). Here’s the bad news.

The bad news is that it requires JavaScript to work. If you don’t have JS installed, forget it. That’s bad news for security people, bad news for accessibility, and even worse news for robots who are trying to get contextual understanding of the page. It also forces the bottom of the page to be where the user generated content is. That’s also bad for SEO because it means the most relevant content is at the very last part of the page. Depending on how the page is built and the spider, this may fall off the size limits of the robot. Not good. Lastly, it would reap havoc on lots of those poor web application scanners. They would light up like Christmas trees because there is NO output encoding done. None. Zip. It’s funny to make web application scanners have false positives, but it’s also a pain in the butt if you’re the operator of said scanner. Herein lies one of the advantages of scanners that use built in rendering engines (forgiving any other issues they may have).

So where would this be useful? Think about all those web2.0 applications out there that have to put dynamic content on the page, don’t have to worry about spiders, robots, and need to make sure that what they output is okay no matter what encoding, or any other craziness that users may put in. I’m not advocating being sloppy, and there may be other issues here that I haven’t found, but thus far, it’s looking like a promising technology. Very nice work by Stefano!

Is XSS Good For SEO?

Wednesday, May 30th, 2007

There’s an interesting post over at Venture Skills blog talking about if XSS is actually good for SEO purposes. While I don’t have any conclusive evidence that he is wrong or right (at least nothing that makes me satisfied by saying that is a correct or incorrect assessment), I will say I have seen evidence that blackhats definitely are using this and search engines definitely are indexing them.

I have also heard blackhat say that it works best when used as a “spice” within a mix of a lot of other normal links, rather then relying on them entirely. Again, I have no evidence that that is true or not, but I wouldn’t refute other people’s experience without evidence. One thing I think is important to mention is that XSS as it stands is NOT good for SEO, nor could it be. What blackhats use is HTML injection, not JavaScript injection. Also, it should be noted that XSS takes on three forms, only one of which is almost hopeless for a search engine to prevent and that is stored XSS. What I will say is it should be pretty easy for search engines to set up rules looking for commonly used reflected HTML injection techniques and devalue them.

Anti-Splog Evasion

Monday, May 21st, 2007

I know I’m really going to kick myself for this one, as it will no doubt come back to haunt me, but I’ve been thinking about this one for a long time. One of the things that Blackhat SEO types do is they attempt to scrape other people’s sites that have original content (such as mine). Then they post that content on their site as their own, attempting to raise their own page-rank. Because the search engines aren’t smart enough to know who is the original author, the sploggers get higher in the page ranks.

One of the tactics to evade them is to deliver unique content to them (a one time token or something of the like) that allows them to see the content, but if they attempt to replay it, the webmaster can tell who it is by going to their lookup table and seeing who scraped them. Often times you can shut them off at the source or do something more evil like I did. But there’s a way around it.

If you click on the image you can get an idea of the concept. The concept revolves around using more than one scraper (which is not a new concept - see splog hubs for more details - but that’s only been used to hide the real IP address in the past). The difference between that method and this method is that you use more than one scraper and then validate that the responses are the same. If they are, you’re good, if they aren’t the same (because there is a unique token in the content) the content can be either thrown away or the splogger can attempt to clean it up.

This would make it much harder for sites to protect themselves from sploggers attempting to steal copyrighted materials. So why am I writing this? Because I still have a few tricks up my sleeve to stop sploggers, but I thought it should at least be known that there are ways around some of the more obvious protection mechanisms.

Google Ads Spread Malware

Friday, April 27th, 2007

This is actually a really serious issue that was sent to me. The funny part is that I’ve known this was possible for years now and even already put it into a presentation I’m doing in a few weeks, but anyway Google’s ads have been spreading malware. A few people with Google accounts have been buying sponsored ads (no doubt with stolen credit cards/identities). It’s sure easier than getting to the top of the search results page!

Although I don’t think this signals the end of text ads, I think it’s a wise choice to consider any paid links to be just as untrustworthy as anything on the SERPs. Google, nor any search engine have been particularly good about vetting how good or bad a domain is before linking to it. Hey, money is money right? Although, I believe they will probably do a cursory scan of the domain to make sure it isn’t spreading malware in the future given the bad PR, it’s pretty easy to fool spiders into not seeing malware. So I’m not sure what actual protection this will provide.

My next thought was CSRF - if you buy a search term and include a few images to remote domains you can pretty easily get them to do things on your behalf, and it’s extremely targeted at the same time. Yah, that’s bad. Don’t trust those paid ads - it doesn’t matter if they are “sponsored” or not. As a side note, I was a little annoyed to read that Matt Cutts wants people to snitch out paid links. I think Google should look at it’s own problems before trying to hurt people’s revenue streams. At least with my paid links, I wouldn’t be risking people’s identity to click on them!

Clickbot.a Writeup

Friday, April 20th, 2007

I was sent this link today on Clickbot.a written by the Google adwords guys. It’s a pretty interesting high level read for the most part, if you don’t know much about click fraud, but does get into some of the technical stuff near the end on how the bot actually worked. While the conclusions of the paper are fine, I was struck that the authors failed to address the most important point.

The most important point being the only reason this bot existed, and the only reason the hackers used it to compromise 100,000+ machines - because it was economically lucrative to do so. That means Google’s detection was too slow to respond to and prevent the attackers from making enough money to make it worth their while. Also, it was at the expense of the advertisers as well as the poor web sites who were compromised for this purpose no less. Which means that Google’s detection methods need to improve to not just pick up this particular variant but also polymorphic versions that are far less easy to detect. So while it is commendable for Google to fix this one issue, it shows they are lacking the technology to pro-actively defend against future and less immature variants.

While Google’s executive management feels that economics will solve this issue I feel that Google is failing to see how detrimental this is to the advertisers who depend on quality click traffic. In lieu of this quality, alternative solutions must be in place to allow advertisers to recoup their costs while Google struggles to build new technology to defeat the issue. However, without access to the actual landing pages that the advertisers use, Google cannot have deep insight into the full picture. Ultimately, this will cause a bigger rift with time that the attackers can exploit on the vast majority of sites that don’t use alternative click quality tools. Until the time when Google can come up with a creative solution, companies like Click Forensics fill that void.

Spider Trap For Stopping Bots

Friday, April 20th, 2007

David Naylor (a semi-reformed SEO Blackhat) has an interesting writeup on how to stop badly behaving robots from spidering your site. I would hardly call this technique new (I’ve seen this scripts in one form or another for nearly a decade). However, it’s a good primer for anyone who runs a big website and who is otherwise powerless to stop people from doing it.

This technique doesn’t just work on robots though. Often people during manual assessments will look at the source code of a page and if they find hidden links or commented out pieces of code they will follow them, hoping to find something interesting from a security perspective. One alternative is to trap them and either put them into the matrix or ban them or otherwise log the activity. David’s article is worth a read if you are unfamiliar with how this stuff works.

Hacking Matt Cutts - Death By 1000 Cutts Case Study

Tuesday, April 3rd, 2007

About once a month I get someone asking me why knowing what users are running is useful. People don’t seem to think reconnaissance is worth doing these days. I’ve heard people say things like, “Just try the attack and see if it works.” While sometimes it is totally worth just trying the attack in un-targeted attacks there are circumstances where that’s just not true. The first circumstance is where the attack takes a prohibitively large amount of resources. The second is where the attack leaves a big signature when it runs and you want to minimize that signature. The last, however, is the most interesting. The last is where I want to hack a single user, and I want to make it work the first time without fail. This is where recon is useful.

So I decided to pick a user out of the tens of thousands of people who have visited my site. As you all probably know by now, I’ve never been on super great terms with Google - it’s a long story that I’ll rant about over beers to almost anyone who asks. The point being I represent what we like to call a determined attacker. Not so much that I want to hack Google directly - that’s easy enough, but calling out their unofficial technology spokes person while making a point about how important recon is to web application security is the best of both worlds. So I picked Matt Cutts who runs the web-spam group at Google and who happens to be the person that SEO Blackhats most love to hate.

This case study has taken me a few months to put together, and I was thinking about releasing it at a conference at some point but why wait? I think it’s worthwhile to release it now before the noise of Bluehat, Blackhat and DefCon is upon us. In this case study that I’ve entitled Death by 1000 Cutts (as a jab at my own original case study entitled Death by 1000 Cuts) I take a series of extremely minor information disclosures in various ways to mount a really nasty attack where I steal files directly from his machine using anti-anti-anti DNS pinning against Google Desktop. Rather than type the whole thing out again, I encourage you to read it for yourself. I hope this at least partially puts to rest people’s resistance against recon and proves why recon is a powerful tool in a determined attacker’s arsenal.

Windows Live Italy Being Used Maliciously

Tuesday, March 20th, 2007

Zach sent me a link to a hackin the box article about how Windows Live is being used by blackhat SEO (search engine optimization) to bring malware links to the top of the search results. This marriage between blackhat SEO and hacking is starting to take off. It’s unclear what tactic they used to get to the top of the search results, but clearly, it worked, as they ended up taking over quite a bit of Live’s Italian site.

Once the users were on the Live.com site apparently they were served up links to malware sites. The search engine itself was used as a conduit for sending people to the malicious search pages. This is yet another reason why search engines shouldn’t index XSS. Even if the site is benign, they would be indexing links to malicious pages on benign sites. Anyway, interesting read, and it’s scary that the SEO community is now dabbling in hacking as well. It was only a matter of time.

Google Announces Invalid Domain Through Blacklisting

Thursday, March 1st, 2007

Click fraud is a big deal (Google claims it’s as low as a few percent but other leading industry experts disagree and put it much higher). I was actually fairly impressed that Google not only acknowledged the problem but is actually taking steps to prevent it that are visible to the consumer. Google announced a blacklist for domains that advertisers feel are highly likely to commit fraud. I kinda like this concept but like everything the devil is in the details.

Firstly, is an advertiser going to be able to block specific URLs, specific domains, specific URLs with keywords, or is going to be ultra high level like a “by category” type system. And how will the bad guys subvert that? I think we all know how poor blacklisting works even with something as fine grained as HTML, let alone entire classes of sites of the Internet. Also, what will happen with an advertiser who blacklists a domain while other people are visiting that exact domain? Will the banner magically disappear? I think not, so what happens if someone clicks on that link? Has the website taken a risk that the advertiser can turn off the link at will and refuse to pay (allowing for advertising fraud), or will Google force them to pay regardless?

Ultimately, I don’t think the solution to Google’s click fraud numbers has anything to do with blacklisting. It’s a neat consumer feature, and may give them some small clout with advertisers who ask for this sort of thing all the time, but really, it’ll make next to no dent in the overall fraud numbers that Google sees (at least that’s my prediction).

Hacked .EDU Sites Used For SEO

Saturday, February 24th, 2007

I’m sure this is old news to some people but it’s the first time I’ve seen it show up in my logs before. In the last twenty four hours three different hacked .edu domains have shown up in my logs. Stanford.edu, UCNE.edu and ISI.edu have all been at least somewhat compromised where the domains now host spam sites. Not so good.

Clearly the administrators of their domains have got some work to do to secure their sites. But it does cast some doubt on the “good” and “bad” domain concept. When a good domain goes bad, is it breakout (intentionally getting a good reputation and then converting to be bad) or is it spam? Either way, it’s clearly bad, but what to do about it? Do you blacklist the pages or the whole domain? That’s gotta make life a little harder for the search engines that try to stay away from spammy domains. Perhaps reputation and link popularity is a bad model afterall.