Paid Advertising
web application security lab

Malware Stats or Ghost in the Browser

I found an interesting link after visiting Zeno’s post on a Malware paper produced by Google to document malware on the internet. Firstly, let me start by saying, this is a really good paper, as it discusses the ways in which malware propagates. Not that it’ll be news to anyone who reads this site religiously, but it’s still interesting to see all our theories validated.

Secondly, be wary of the statistic 1 out of 10 websites have malware. Google hand selected 17 million and only did a deep dive into 4.5 million sites out of their own repository. It’s well known that Google does not spider the entire internet (it’s a very small portion in reality) and also, they picked those URLs because they were likely conduits. They weren’t arbitrary. So let’s just take that statistic off the table. Yes, the Internet is a scary place, but not 1 out of 10 sites actively trying to screw you scary.

But back to the interesting stuff for a minute. They point to a large number of the exploits found having to do with website vulnerabilities, including those found within ASP and PHP and additionally a big chunk was delivered through holes in the site that allowed XSS. That XSS may have been intentional in the case of widgets or advertising or not, but in the end, it’s bad.

I should also point out that this doesn’t say anything about sites that attempt to do things like CSRF, or servers that have been compromised in other ways that allow the attacker to quietly steal user data. For instance, SQL injection or server vulnerabilities that just allow a back door into the system to pull confidential info out of the database.

One point that I’d like to make on top of this, is that the two things that were able to cause most of these problems were remote JavaScript and iframes. I just don’t see many applications for those technologies that, as a user, I care about (ads and widgets are pretty low on the list of what I care about seeing on my browser as a consumer). I am an edge case as a user, I’ll admit. But as nice as Web2.0 is, not getting malware is even nicer.

7 Responses to “Malware Stats or Ghost in the Browser”

  1. Andrew Says:

    It’s well known that Google does not spider the entire internet (it’s a very small portion in reality)

    I’d be curious to find out what Google isn’t spidering…

  2. Jeremiah Grossman Says:

    Anything after a login screen. :)

  3. Awesome AnDrEw Says:

    I like how Google specifically advises you that certain sites, generally falling under the WS TLD, may contain potentially harmful files such as spyware. I believe malware will continue to grow inversely with the growth of the “Web 2.0″ concept, because CSRF, and XSS vulnerabilities are enablers for the automation of mostly-quiet attacks (to the browser being compromised).

  4. RSnake Says:

    Andrew, Jeremiah is correct, but of even stuff that isn’t behind a login screen (also known as “surface web”) it is not a big percent. Here’s some facts I pulled from one website:

    1. Google doesn’t search the whole Internet, it searches only the “surface web.”
    2. It is estimated that only 7% of the information on the surface web is appropriate for educational or scholarly purposes.
    3. Google’s database includes approximately 16% of the sites on the surface web.

  5. kaes Says:

    not only login screens, anything that can only be reached via a POST form, like search forms.
    this excludes a tremendous amount of public databases, where the index/search page is (possibly) indexed by google but the search results themselves often aren’t.
    then there’s badly programmed AJAX/JS websites with dynamic content, robots.txt exclusions, pages that are linked to deep (googlebot doesn’t crawl nearly as deep as yahoo-crawler) or in the wrong way, stuff that is intentionally hidden in a myriad of different ways, pages that are not linked or only from unspidered sites, etc etc.

  6. Legionnaire Says:

    Hello all,

    this paper is titled “The Ghost in the Browser” and focuses exactly on that, malicious code that may be running invisibly in a user’s web browser. That is more serious than an infected server which will be eventually patched or fixed by an administrator. Also, according to the paper, a user’s web browser contains more useful information (passwords, financial and personal info) than a single hacked database. So, indeed Web 2.0 pushes us to place a large portion of ours lives on the Internet and the fact that our information may be quite easily compromised isn’t a happy thought.

    Now, about the sample pages of this paper, the deep web (!”surface web”) is less likely to be infected by malware since the same rules that apply to search engine spiders also apply to malicious bots. What I mean is that the main target really is the surface web, some page you might get from a search engine or a link from another site, etc.

  7. darkwall Says:

    1:10 does seem steep but am seeing a high ratio of malicious sites. One of the things I’ve been seeing a lot lately is random, old pages that have something malicious stuck on them. Stuff like a site dedicated to family photos that was put up in 1997 and last updated in 2002 that suddenly includes malicious javascript or an iframe trying to install something or other. I’m thinking that its automated, like script that crawls for old, highly vulnerable web servers, or poorly set up ftp servers, and then injects every html file with something nasty. I can’t imagine that someone would go after such small, random sites by hand.