GeodSoft logo   GeodSoft

Good and Bad Passwords How-To

Creating and Optimizing High Quality Password Cracking Dictionaries and Transformation Rules
Creating Cracking Dictionaries

A cracker should have more than one password dictionary. One password dictionary should be relatively small and focus on high yield words. Another should attempt to be comprehensive. Word is used in the broadest possible sense and includes any character sequence that appears in any electronic list which is referred to as a dictionary. 111111 is a word because it appears in Dan Klein's numbers dictionary. Names are of course words. Names includes names of actors, famous people and sports figures and every other sub grouping that regularly gets listed separately as an example of a bad password. Names also includes the names of places and things, both real and fictional.

The high yield password cracking dictionary should start with a standard small dictionary. /usr/dict/words is probably as good as any. It then must include names. The source should be census information from the country where the target computer is located. In the U.S. the only real question is how much of the census lists to include. Given the very high efficiency of Dan Klien's two thousand plus common name list, I would use all the common male and female first names from the census lists.

Dan Klein's surname list is very small (160 entries) and every entry looks like a very common last name; this is the second highest yield dictionary he used. The Census surname list is much larger than Dan Klein's complete dictionary. If I had a large database of accounts and passwords, I'd test the first 100, 1000, 5000, and 10,000 and the bottom half to get a sense of where results start diminishing rapidly. Absent a test, I'd use the first 1000. It's clear that a systematic list of very common last names will be very productive.

I'd also put Dan Klein's very high yield "Phrases and patterns" dictionary into my high yield one, though I might review and update it first. I'd consider the character sequences and numbers dictionaries even though their yield was not very good.

I'd also collect several common password lists for inclusion in the high yield dictionary. I would want to combine them and strip them of ordinary words and names before combining into the high yield dictionary. I'd want to be able to track the results on things like: ( !@#$% 123123 abc123 apollo13 hal9000 jordan23 qwerty star69 test123 thx1138 win95 ) and anything else that might really be a common password that's not a word in its normal meaning.

I'd want to consider some of the other things Dan Klien used such as myths and legends, sports, science fiction, movies and actors, cartoons, famous people. I'd want to find up-to-date and more comprehensive sources for these and probably want to do some preliminary testing before including any in the high yield dictionary. If I was considering movies and actors I'd also want to think about TV and music. Recommendations regarding passwords always seem to be both redundant and incomplete; Klein's dictionary choices also seem like this.

For the comprehensive list, I'd want to use just about any list I could find that wasn't gibberish or random character sequences. I'd start with any unabridged dictionaries I could find then add in all the specialty lists already mentioned as well as any others that were available. About the only question would be whether or not to add all the foreign language dictionaries that are available on-line. The decision would be determined by performance issues.

Optimizing Cracking Dictionaries

It does not matter if the comprehensive dictionary grows to 10 million or a hundred million words or how many transformations we can program. Any brute force sequence will eventually include all possible words and transformations of words but these will be a tiny fraction of all the character sequences generated. Any imaginable character sequence derived from a word, ordered sequence or other meaningful pattern is more likely to match human chosen passwords than the random sequences that have no apparent meaning, derivation or structure.

The practical limit of a dictionary based approach is reached when the dictionary size combined with the rule set and the computational overhead of the rule set will take more resources than the cracker has or is willing to expend on the search. This will be implementation dependent and could be dramatically different for different cracking tools.

The cracker's job is to determine how to make most effective use of the newly created dictionaries. Starting with a high yield dictionary as described and the default rules of any cracking tool, any normal password file of reasonable size will contain passwords that are cracked. By normal, we mean containing passwords created the way that most people including system administrators form passwords as opposed to passwords created by someone trained to form passwords a cracking tool can't crack.

If a root or administrator password is cracked, the cracker has accomplished his or her purpose and nothing further needs to be done. If the cracker has a significant pre-existing database of accounts and passwords, the following approach should already have been applied to this.

After running and timing the high yield dictionary with the standard rules, the comprehensive dictionary should be run with the standard rules against the same set of accounts and passwords and the results timed. While this is in progress, new rules should be developed for use with the high yield dictionary. New rules should be plausible transformations. The cracker should use his or her knowledge of the cracking tool and add enough rules that the high yield dictionary and extended rule set run in about the same amount of time as the comprehensive dictionary with the standard rules. Both processes need to be carefully timed.

When both are completed, the number of successfully cracked passwords should be compared. The approach that gets the most passwords for the same amount of CPU time is the more efficient and where further efforts should be concentrated. The larger the password database, the more reliable the results will be. If the high yield dictionary got half the passwords but in a quarter of the time as the comprehensive dictionary, more rules should be added for use with the high yield dictionary. As long as additional rules can be formulated that yield new passwords and the efficiency advantage over the comprehensive dictionary is maintained, this approach should be pursued. Eventually a point of diminishing returns is bound to be reached at which point efforts should be shifted to the comprehensive dictionary.

Once efforts focus on the comprehensive dictionary, either because it was more efficient initially or because rules have been pushed as far as practical on the high yield dictionary, there are two options. One is to continue to grow the comprehensive dictionary. The other it to start adding the more productive of the new rules tested on the high yield dictionary. If word sources have not been exhausted, the obvious approach is to grow the dictionary. If the dictionary is single language, adding foreign language dictionaries might be productive. If significant amounts are added, careful attention should be paid to the impact on efficiency. If a new addition does not produce results or significantly lowers the efficiency, it should probably be removed.

At some point there will be no more words or adding them will be less productive than adding new rules. This of course assumes that the initial runs were completed in an acceptable amount of time and that the cracker is willing to expend more resources to get more passwords, or to get the root or administrator password if it has not yet been obtained.

By carefully following a procedure like this, it should be possible to find a combination of rule set and dictionary size that is reasonably optimal, i.e. produces the most passwords in the shortest time. As the tools are learned, it should be possible to adjust the dictionary size and / or rule set to get the most passwords in whatever amount of time a particular target is worth.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in (or These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of (or cgi-bin/ from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

Home >
How-To >
Good Passwords >

What's New
Email address

Copyright © 2000-2014, George Shaffer. Terms and Conditions of Use.