I’ve been interested in password research for over a decade. One of my very first “hello world” type programs was a trojan horse that emulated a UNIX TTY session (figuring out how to suppress keyboard output “stty -echo” was the only tricky part). I’ve spent hundreds of hours looking at password uniqueness, entropy, etc… think of it as part of my passion. However, one aspect of passwords has eluded me for years, and simply because I don’t have the necessary data to test my theory.
Once upon a time I was talking with a female friend of mine and I told her I could probably guess her password. Of course I was completely kidding, but she thought I was serious. Playing along I said, “It is a word followed by a number, maybe two numbers.” Her eyes got big - maybe I was onto something, “It’s an effeminate word, no more than six characters” You should have seen her eyes at this point, “It’s not a word like pony, that’s too little-girl for you…” Of course I never guessed her password, and I had to admit at this point that I was full of crap, but herein lies my dilemma - was I full of crap?
If you look at user statistics you can often derive certain things about people. Younger people tend to like certain things that are different than older people. So too, do security people tend to pick harder passwords than non security people. By knowing a little bit of demographic information about users, you can quickly narrow down the possibilities (I think). Of course I have no proof of this. I would need a huge database of user interests, language type, age, sex, profession, and of course passwords… The more data the better. Including things like password policies of the sites the passwords came from.
Of course there will be significant anomalies, like the fact that people often use the password that mixes the name of the site they are on into it “myspace01″ and the common passwords like “password1″ and the obscenity passwords. And you’ll never guess if people use random numbers or pet names for their cat, but you can get close (I think). Has anyone done this sort of research before? I’d love to get my hands on some user data like this for testing (no usernames, where the passwords came from or email addresses required). Anyway, food for thought.