Cenzic 232 Patent
Paid Advertising
web application security lab

Hash Information Disclosure Via Collisions - The Hard Way

Every once in a while I have those discussions with id about “what I would do if I were the NSA and had no mission to accomplish.” It could also be called the overgrown “boys with toys” conversation. It typically goes off on tangents where we abuse system resources for entirely impractical applications, and this is no different. Today we started talking about the PS3 collisions stuff. Cool indeed. But what if we wanted to use something entirely unrelated to find something that’s barely worth knowing? Ahh, that’s where gigantic rainbow tables comes into play.

Every hashing algorithm has possible collisions once you allow a certain number of chars to be hashed. Let’s say you found out that “bob” and “sam” collided in whatever hashing algorithm. If you created an account on a web server with the password of “bob” and then later typed in the password of “sam” assuming no salts you would be able to get into the system. That’s not all that interesting because you could get into your own account anyway. The vaguely more interesting fact is that you now know what hashing algorithm is being used. Rinse and repeat for every salt (random set of chars preceding, after, or XOR’d typically), every password rule variant (must have upper case, or must have special chars, etc…) and every hashing algorithm (MD5, SHA1, SHA256, double hashed because people think they’re being super clever, etc…) and you have an extremely overkill way to get a very small amount of information disclosure. Yes, what a waste of taxpayer money!

The slightly less impractical implication of this is if you already had some collisions that you could use for this purpose you could attempt certain types of brute force against passwords that matched on the backend but were in fact different passwords when applied to a blacklist of typed passwords. Also, you could use these kinds of tricks for other sorts of database collisions where a primary key is a hash of some known data. What a complete waste of resources that are best used for far more interesting tasks, if you ask me. But hey - it’s possible.

10 Responses to “Hash Information Disclosure Via Collisions - The Hard Way”

  1. Andy Steingruebl Says:

    And this is why smart sites will use an HMAC for storing passwords thus defeating this and all other offline dictionary attacks against a stolen password database if the HMAC key hasn’t been compromised too…. :)

  2. RSnake Says:

    @Andy - correct, but in my experience HMAC implementations are done wrong more than 1/2 the time. They usually use some data stored in the database - or based on some user variable like Oracle ID, time of user creation or something else just as simple. If the attacker has access to the passwords in the database, it stands to reason that they at minimum have access to the rest of the database. The only way this really gets a lift is when the HMAC secret is stored in the source code. Because while SQL injection is terrible, many times it doesn’t lead to compromise of the source of the application. The disparity of the two pieces of information does provide quite a lift in security. But yes, HMAC is still harder because of that secret key - which incidentally isn’t much different than a salt if you think about it (which also dramatically increases cryptanalysis time) as well - if it isn’t known.

  3. Andy Steingruebl Says:

    I see your point, but typically salts are stored with the resulting hash, whereas with an HMAC the “salt” is stored separately and potentially protected more. And, people usually don’t use nearly as long a salt as they would a crypto key for something like AES.

  4. RSnake Says:

    @Andy - sure. Although that’s an implementation detail - where the salt or HMAC secret is stored. Either way, I think they both suffer from the same basic problem, which is that they are typically stored in the database along with the hash itself (maybe in a separate table, but still accessible to the same database user out of necessity) - thereby dramatically decreasing their usefulness.

  5. Picci Says:

    Random comment about collisions:
    1. There are programs out there that will bind a binary to another binary.
    2. The most widespread way of checking if you have the “right” binary is to hash it (md5?)
    3. There are programs out there that will generate a colliding md5 by adding junk to your binary, thus creating a second binary that has the same md5 as the original program.

    Now my question is…. how hard would it be to get a double collision on two hashing algo’s simultaneously ? (i.e sha1 and md5)
    (too bad it’s probably close to a few million years…)

  6. RSnake Says:

    @Picci - You don’t have to do all the processing of both algorithms at the same time if you don’t want. You can build a list of potentials in one algorithm and then test against that list in the other. Of course the larger the list the better chance of a collision…

    Yeah, that’s a computationally expensive problem, but it would be huge for defeating a lot of checksum checking programs out there.

  7. Picci Says:

    –> “You can build a list of potentials in one algorithm and then test against that list in the other. Of course the larger the list the better chance of a collision…”

    That’s what I was thinking… but that would mean generating billions of colliding md5’s and then checking if casually a sha1 collides as well… pretty rare situation. It would be nice to calculate how much that could potentially take in terms of time/computational power.
    If you’re lucky enough, someone will just check one hashing algo and stick with that: even if you don’t get an exact collision on a second hashing algo but it just “looks” similar, it’ll be enough. (Remember the fippling ltteres inisde a word doesn’t bthoer the human brain too much… will someone notice a few letters difference in a sha1 when the md5 coincides perfectly?)… This isn’t good enough against fully automated checksum programs, but is still effective against “human-powered” comparisons.

  8. RSnake Says:

    @Picci - I think that will depend on the size of whatever it is you’re hashing. An 8 char string will take a lot less computational time than a 200 Meg application, for instance. But yeah, I can completely see where you’re going with this. This would break a lot of content integrity management systems that only relied on a few hashes. I’m not sure what things like Tripwire and Ftimes use, but those would both be good candidates. My guess is what you want to do would probably way more processing power than it’s worth, unless it were a standard application you’d find on any system, like cmd.exe or explorer.exe.

  9. Picci Says:

    Yeah, i know i’m just messing with something potentially useless or that would anyways take a close to infinite time to generate.. but still, i’m enjoying it:

    In conclusion, it doesn’t seem to be doable.. it takes hours!! ( just to get 1 matching md5 )
    http://www.mscs.dal.ca/~selinger/md5collision/

    %CPU COMMAND
    100 evilize

    …and I know i’m gonna get a call soon from the guy who’s server i’m running that on :P

  10. Picci Says:

    (sidenote)
    I had once developped a smal web app to check the integrity of a website by comparing the md5’s of all files in /htdocs, for example, with ones stored in an external db… now I figure that more than 1 hashing algo should’ve been used to make it secure.