Cenzic 232 Patent
Paid Advertising
web application security lab

DoSing Search Engines

When Dinis Cruz and I were talking about how viral propogation would work for an XSS Warhol worm, one of the things we discussed is a centralized command and control element. One of the main problems with an XSS worm is that it needs to propogate itself blindly or it needs a central point to control all of the infected machines. The single point of control is the tough part. Eventually I think we settled on stealth over virulance as a concept, but let’s talk about that some other time.

Let’s assume for a moment that we did want a centralized command and control point. Let’s say we wanted a single point on the Internet that would allow the attack to surface. There are only a few networks in the world that could handle that load and few of them would be super happy being used a command and control point for a super massive worm. Fair enough. I don’t blame them. But what if it wasn’t so much a matter of if the machines had your content on it, but indeed were there at all?

Due to the nature of the Warhol worm you end up flooding almost any network in the world, and if the network doesn’t go down, the machines that host the site in question will go down trying to serve up your command and control point. But what if that weren’t just a side effect, but rather the goal? Let’s say I run a competitive search engine and I want to take down victim.com. What does that mean for me?

One thing I researched early on in my security career was resource exhaustion. Back then it was pretty easy to do. Just fork some processes and the machine would pretty much come to a stand still. That was back when some moron flooded the whitehouse.gov domain with emails and brought the server to it’s knees before the secret service came and escorted him off to the interrogation room. Today things are more complex and you can’t as readily run things on other people’s servers. But what if it’s a search engine? By default you are running an application on their servers. In fact that application tends to take up resources and time. It has to hit a database of some sort (perhaps in memory, perhaps not). Maybe it can cache common requests, but we can avoid that too.

So it comes down to what is the most expensive query you can come up with. For all modern search engines that I’m aware of they have a concept of boolean operators (and/or/not). The most complex of the three is “and” which means do two queries in one and then do some complex operation on the resulting dataset to represent it (yes, some work differently but that’s the premise). So what would happen if a successful Warhol worm started querying a search engine for a phrase like “tall and car” (in some search engines the boolean operator “and” is implicit but let’s just leave it there for clarity). Well first we have to define what a successful Warhol worm would look like. The first MySpace worm infected 1 million machines in a day. When Dinis Cruz and I talked about it, I was saying up to 100 million or more machines are vulnerable to this depending on what exploits you use. He was skeptical that all 100 million would be availible at any given time given the quantity of computers and a follow the sun issue, so I think we settled on between 15-20 million active machines at any second over the period of a day as a realistic target number.

So what if I launched 15-20 million additional POST requests to a search engine a second? Well it totally depends on the size of the infrastructure and if they have caching. The weird part is that it’s possible it could actually survive that because of caching. So what if we made it so that it wasn’t cachable? What if we strung abitrary words together and used that as the string? If we had 15-20 million requests a second doing between 5-10 million random search strings with between two and ten unique search terms, you are now talking about a worm that could bring down any infrastructure that I’m aware of. You’re talking a complete denial of service of the infrastructure itself (not bandwidth exhaustion). In fact, if you are extra clever, you might be able to use the Flash worm to create a header so it only does a HEAD request or otherwise chunk the data to reduce the return information so that the client does not pull the data so you can launch even more attacks per second.

Now that the search engine is offline that triggers the XSS warhol worm to begin it’s attack on the real target - at least that’s the original idea. But if the target was the search engine itself you have already succeeded. I’m not sure how you could defend yourself against that attack without doing some sort of heuristics on the user itself to insure that they came in the correct way (CSRF detection) which again, takes CPU time and time to develop. I don’t think this is a super practical application unless you are a competitive search company in another country that doesn’t care about or abide by normal rules of business and you just want to turn your competitor off for a day and hurt global e-commerce a little.

Anyway, just something I was thinking about.

2 Responses to “DoSing Search Engines”

  1. Albert Says:

    Im pretty sure the cache of the search engines is pretty huge but with about 10^7 machines sending arbitrary requests the cache is likely to fill up assuming the words you generate are mostly unique, but then again the number of possible words you can distribute is limited so the attack will be inevitably limited to the ‘number of words’ Choose ‘number unique combination of words used’

  2. RSnake Says:

    Well, there are just shy of 1 million words in the english language. And you need at least two words to make this work, so 1MM * 1MM = 1 trillion combinations. If you use three words it’s 1,000,000,000,000,000,000 combinations. And then if you use other languages….