GeodSoft logo   GeodSoft

Good and Bad Passwords How-To

Cracking "Good" Passwords With Custom Programmed Dictionaries

Passwords made from two unrelated short words and a non letter are often recommended as good. We show that all such passwords can be cracked in an afternoon on a low end desktop PC. The use of two non letters makes much stronger passwords but they are still within the reach of desktop PC technology. Some alternative ways of creating good passwords are examined, but our final conclusion is that humans just aren't very good at creating strong passwords and certainly not strong passwords that have a reasonable chance of being remembered.

Manual Passwords, Too Weak

We've now looked at a number of ways not to form passwords and in the process have eliminated nearly every method that most people like to use to make passwords. A common suggestion for creating good passwords is to take two short, unrelated words and combine them with one or more digits or punctuation or symbol characters. Sometimes the suggestion is explicit that there should be both a digit and a symbol (including punctuation). Sometimes both go between the words and sometimes one or the other goes at the beginning or end of the word. I don't recall seeing suggestions that sometimes both should go at the beginning or end of the words or that one should go at the front and the other the end so that the two words run together.

Using a '#' to stand for any non letter character, the suggested patterns look like:

worda#wordb
worda##wordb
#worda#wordb
worda#wordb#

I've used the linux.words dictionary and extracted all the 2, 3, 4 and 5 character words from it. Reviewing these lists, I'd say that most of the words are common but that a third to a fifth are not what I think of as common words. There are enough uncommon words that I think most people would have some difficulty thinking of short words that aren't in these lists so that for a significant majority of users, both words would come from these lists. I don't know whether it's 60% or 98%, but enough people following the previous advice will be using words from these lists that it's worth examining the feasibility of cracking these by creating custom dictionaries, or having the cracking tool generate such passwords.

The linux.words list contains 49 two character words, 536 three character words, 2236 four character words and 4174 five character words. Making every possible combination of two and three, two and four, two and five, three and three, three and four, and four and four character words results in a little over 8.3 million combinations. If you separate every combination with all 42 non letter characters (not including the space) there are just over 350 million combinations.

In about a half hour, I wrote a Perl script to generate all of these. It took the script between 30 and 45 minutes to write these to disk or 16 minutes in memory not outputting the created passwords. The saved file was 3.7 gigabytes. L0phtCrack takes about 15 minutes to read the file initially, and it's counters overflow with this number of words in the dictionary file. Despite the counter overflow, after the initial read, it processed about 5 million words a minute or somewhat over 70 minutes for the full list and cracked two test accounts I'd created using passwords formed like this. A single non letter separating two words simply isn't good enough, not when a first attempt covered the most likely combinations in about three hours on a very modest machine (PIII 500). These can be cracked almost as easily as single dictionary word based passwords.

Manual Passwords, Better

The number of passwords jumps dramatically with two words and two non letter characters. 8.3 million is multiplied by 42 * 42 * 3 giving 44 billion. The three is for the three different arrangements of non letter characters. At this number, saving the list starts to become a serious issue for just about anyone or any organization as it will require half a terabyte of disk space. It's been my experience that with loop and integer arithmetic intensive processes, that compiled C programs run about 40 times faster than almost identically coded Perl scripts. Thus a C program could generate the simpler passwords at about 850 million per minute (on a PII 450). Even if the extra character imposed a significant overhead, 500 million a minute seems quite conservative or less than an hour and a half for 44 billion.

The cryptography side is more CPU intensive. Returning to the nice round 100,000 per second, generating the hashes for our 44 billion passwords will take about 5 days. But we also know that rate is just a crude estimate that could easily be off by more than an order of magnitude.

As long as I can remember, combining two unrelated words with two non letters has been one of the primary recommendations for creating strong passwords. I know if I were seriously cracking passwords belonging to others, that I would replace the brute force password generator with one that made passwords from short words. It's hard to see how it could be less productive than brute force and might pick up some root or administrator passwords that their owners thought were really good. Of course the yield or efficiency, due to the very large password universe, would be very poor and all the standard dictionary methods should be tried first.

At least one other factor should be covered. The 42 * 42 figures are based on any non letter character in either non letter character position. This also means allowing both characters to be the same and both to be numbers. 00 and 11 would be allowed in the middle of words and worda7wordb7 would also be OK. If we make the stipulation that each password must contain a digit and a symbol or punctuation, the correct multiplier is 8.3 million * 32 * 10 * 3 or about 5.7 billion passwords. This is significantly less than 44 billion we were talking about. If we decide to allow two symbols but never two digits, the numbers are 8.3 million * 42 * 32 * 3 or just under 38 billion.

This shows how important it is that the cracker's assumptions match the methods by which the passwords were created in the file to be cracked. If the good guys use the broadest rules and thus create the largest password sets there will necessarily be some relatively weak passwords that are acceptable. In this case a cracker working with "small" password set of 2.5 billion that come from just the words and numbers will get some passwords but completely miss any with symbols or punctuation. If you use the compromise rule set allowing two symbols but not two digits, unless a cracker has inside information, they will have to search the larger universe wasting processing time on the two number passwords that don't exist.

There are some additional ways to complicate the crackers problem. There are some two word patterns that are never mentioned:

wordawordb##
#wordawordb#
##wordawordb

It's true that the words are run together but so what? Is this going to make them harder to remember? Can we not tell where one ends and the other begins? Does it matter? The fact is these new patterns double the size of the password universe a cracker has to match. There is one group that we do need to pay special attention to. In the word combinations of two three letter words and two four letter words, there will be duplicate words.

Though there is no evidence that they are currently doing so, cracking tools could get these with by repeating the word and prepending or appending arbitrary character sequences. L0phtCrack can append arbitrary character sequences of arbitrary length but can't double words. Crack 5 and John the Ripper can double words and prepend or append short digit sequences. It's not clear their rule syntax provides a convenient method for making mixed symbol and digit sequences.

If you don't allow duplicate words and digit sequences, these passwords are much stronger than any that can be derived from single word dictionaries. Current tools cannot transform any single word to create these sequences. Separate words must be combined programmatically to obtain these sequences.

It's very important not to confuse some of the DONT's that apply to variations on single dictionary words with superficially similar character sequences that appear as part of programmatically generated patterns. abduct66 is fundamentally different than bitaid66. The former is a trivial transformation to a common dictionary word which cracking programs may get in a few minutes with current dictionaries and rules. The only ways to get the latter is with brute force which is unlikely even using a character set lacking symbols and punctuation or using a custom programmed dictionary where multiple words and digits have been combined for the express purpose of cracking passwords. Without insider information, it's unlikely that bitaid66 will ever be found because it does not fit any standard recommendations on forming good passwords.

Fortunately for the good guys, we haven't mentioned the case of letters as a factor in these passwords made from short words plus random non letter characters. If we start mixing the case of the letters in the words, we can add about two orders of magnitude of complexity to the cracker's problem.

While truly randomly mixing case on 8 letters will expand the possibilities by 128 times, there is nothing in passwords that I find harder to remember or type than truly arbitrary case words. There is an approach that significantly complicates the crackers' job while not being so demanding of our memory and fingers. This is to limit the capital letters to the first, last, inner or outer positions. Some examples will help: 1Bad&Tuba = first, *losT)baG = last, bolD^2Rug = inner, [3HateraT = outer.

In the examples, I always uppercased both positions to make what I meant clear. In practice, the choices should be between 1) either or both or 2) either, both or none. Keeping no upper case as an option once again creates more possibilities but means some individual passwords that are more likely to be found by a cracker looking at only lower case options. The either or both approach increases the choices by three and is the safer option.

But it's better than that. We don't tell the cracker which approach we use and different people use different approaches; the choices are now increased by 12. To get these with any efficiency a cracker needs three custom programmed dictionaries. The first has all lower case, the second has the first, last, inner and outer upper case combinations and the third all the other mixed case combinations. That's likely to be a pain to program; if we're lucky the cracker jumps from all lower case to full mixed case.

Lets review. There are 8.3 million word pairs. These are combined with two non letters; both cannot be digits. There are six patterns created by where the non letters are placed relative to the words. We're using 12 capitalization variations. This is 8.3 million * 42 * 32 * 6 * 12 which gives 809 billion possible passwords. This a lot better than dictionary derived passwords but if our encryption estimates are off by very much, a cracker has lots of computing power or is willing to wait a while (it's 94 days at 100,000 passwords per second) it's still clearly within the realm of today's technology. If we really want our passwords to be safe we have to do better, a lot better.

Also, if decent frequency tables could be found for short words (like the Census name lists), it would be possible to build several smaller dictionaries and process the most likely word combinations first. This could dramatically alter the time it takes to get at least some of the passwords created as we have described.

Alternative Manual Passwords

Among the discussions of how to create good passwords, there are sometimes suggestions for creating passwords from the first letters or initial few characters of words in sentences or phrases that the user can remember. No specific suggestions were included in the list of Common Password DO's because no single suggestion is nearly as common as combining multiple words with non letters and there is no easy way to phrase such suggestions that won't result in a method that can be programmed to create dictionaries to crack the resulting passwords. Unless the recommendation also includes explicit discussion of minimum password length and the use of mixed case and non letters, there is a good chance resulting passwords may fall to brute force attacks.

For the next few years, any method that helps a user remember passwords that are of a reasonable length (minimum 10, 12+ preferred) and character diversity (4 character groups = full keyboard) and do not contain dictionary words or simple transformations of dictionary words, will likely create passwords of moderate to good strength. The actual strength of the password will depend on the actual length and character diversity, as well as the avoidance of dictionary words. It's surprising just how obscure the result of any two of the following transformations of a dictionary word is likely to be: reverse, rotate, keyboard shift, collating sequence shift, drop or add a character. In other words, passwords created by these methods look almost random to a human, but are easily found by the cracking tools. To be fair, program generated passwords may contain such sequences by chance, as are sequences derived from phrases or sentences.

Passwords created via a personal algorithm that is more complex than multiple transformations of a single dictionary word are likely to be better than two short words and one non alpha character and depending on the algorithm, as good or better than two short words and two non alpha characters.

I'm going to suggest such a personal algorithm that deliberately violates some of the more common negative advice regarding passwords. A typical family has four or more members. A starting point for passwords might be the first two to four characters of the first and middle names of your family, avoiding any sequences that are by themselves dictionary words. Another component might be a variety of alphanumeric sequences that represent birth dates of family members but avoid using the "19" or "20" part of the year and the more common date formats and sometimes mix in one or two character month abbreviations instead of numbers. If you know the day of the week that family members were born on, the day abbreviation might sometimes substitute. If family members were born in different locations, city and state abbreviations might form an additional part. Putting these together in a variety of combinations would likely yield a number of not easily crackable passwords.

For a hypothetical family similar to mine, these might yield the following bits and pieces: ge geod ged geda gdav gda gd ph phi phl phly plly lly phlly pl sy sylv syli sli sl wa wac wacr wcr wcri cri wc 1222 de22 241222 dec22 d22 1224 2412 24d22 24de22 24dec22 0909 99 909 9928 90928 28909 280909 s9 se9 sep9 sep09 s28 se28 sep28 28s9 28se9 28sep9 28s09 28se09 28se09 718 0718 j18 ju18 jul18 j55 ju55 jul55 55718 550718 55ju 55jul 55j18 55ju18 18ju55 18j55 47 047 0407 4762 040762 40762 a7 ap7 apr7 apr07 a762 ap762 6247 620407 62a7 62ap7 62ap07 62apr7 fbhin bhin fthin infbh inftbh infth fnj fmnj mnj ftmnj monnj njft njftm njfm njftmo hpa pah paha pahar harpa penn hpen hapen wwv wvw whwv whewv wvwh weva wheewv. Pet names, car makes and models, and other easily remembered items might be added, but remember, we are trying to use recognizible bits and abbreviations, that are not in themselves dictionary words.

Obviously, I wasn't completely consistent in terms of taking a set number of letters or using correct abbreviations but each piece is easily derived from names, birth dates and places of somewhat hypothetical family members. One of the birth places had a three piece name, parts of which were or were not used. If you start combining these, especially if you start changing case or using punctuation for separation there end up being quite a few possibilities, which should not fall to a dictionary attack. With capital letters, avoid the first character of the pasword unless there is more than one, in which case they should not be the first and last characters. Most of these should be much easier to remember than arbitrary or random sequences of similar length.

I deliberately created a sample approach to creating passwords that violates some of the most common advice so that it won't be repeated elsewhere. If someone picks up on this and starts making passwords similar to these, based on their own family and are careful about resulting dictionary words, they are likely to have secure passwords especially if they add some personal variation to the method. If they tell someone else how they make their passwords, the method becomes somewhat less secure. If the method becomes publicized as an example of how to make good passwords it becomes still less secure, but is there any way to estimate how much less secure?.

If the cracker community believed that this had become a common way for users to create passwords, then much of any value it once had, would be greatly diminished, if the crackers could program dictionaries of such passwords at a reasonable cost. There are two seemingly plausible approaches, Paul Bobby's focused dictionary and a generic approach based on common names, birth dates, and perhaps large and nearby cities. The problem with the focused approach is that it requires a significant effort for each targeted account, with no knowledge that any of it has any value. No matter how much information is gathered, it has zero value if the target has not based their password on personal information, or has, but one bit that is not in the collected evidence, has been used if forming their password. Even if the attacker has collected all relevant bits on the target, the collected information is still useless unless the exact formats are used in the correct positions. Finaly in situations where accounts and passwords have been obtained from compromised systems, the attaker very rarely has any opportunity to link this data to real individuals so such focused attacks are meaningless. Where account and password data can be matched with personally identifiable customer or employee data, that data should be exploited in relatively standard and predictable ways and otherwise forgotten.

The only statistical evidence that supports personal information being used to form passwords is that data from the password file itself has been has been useful in cracking passwords. So the information already included in password file or Windows' SAM should surely be used. How it should be used should be guided by what is present. If largely the same kind of information is in nearly every record in a similar arrangement, that suggests it is there as the result of a policy. It should be used but because it is there, not at the target's request, it should probably only be used in common or standard formats. In other words, make a serious attempt to exploit what is given, but don't attempt to cover all possible variations at the cost of failing to pursue other avenues which are more general and may have a better payoff. If personal information is not present and or is present in varying amounts and types, it is likely there at the target's initiative and should used to the maximum, with special attention paid to the formats in which it is stored.

We also know that common first names are one of the most productive sources of passwords, but these should be gotten as part of standard dictionary attacks, not target focused attacks. Common last names are also produtive, but they also should be part of standard dictionary attacks. For a generic focused dictionary first name and a birthday seem like the best single bet that falls outside standard dictionary attacks. We do not generally think of our loved ones (or ourselves) by first and middle or last name. So we'll try all common first names with all reasonable birthdate formats for the last 80 years. Most active computer users are less than 80 and we are most likely to think of the birthdays of our wives or significant others and our children but far less likely to think of our parent's birthdates. In addition to the full date we often think month and day and to a lesser extent the month and year of the birth. I'd suggest the following common formats: m/d/yy, mm/dd/yy, m-d-yy, mm-dd-yy, m/d/yyyy, mm/dd/yyyy, m-d-yyyy, mm-dd-yyyy, mdyy,mmddyy, mdyyyy, mmddyyyy, dmyy, ddmmyy, dmyyyy, ddmmyyyy, d/m/yy, dd/mm/yy, d/m/yyyy, dd/mm/yyyy, d-m-yy, dd-mm-yy, d-m-yyyy, dd-mm-yyyy, md, mmdd, m/d, mm/dd, m-d, mm-dd, myy, mmyy, myyyy, mmyyyy, m/yy, mm/yy, m/yyyy, mm/yyyy, m-yy, mm-yy, m-yyyy, mm-yyyy. If 3 letter month abbreviations were used with full dates, month and day and month and year, run together, and separated by spaces and dashes, followed by periods and not, and used as all lower case, first upper, and all upper, the number of dates would somewhat more than double.

The census names are based on census data and are thus nearly always the legal given names and rarely nicknames or familiar names. When thinking of loved ones, we are more likely to use nicknames or familiar names than the name on a birth certificate. Thus it makes sense to supplement census names with first names obtained from other sources. When the date formats are adjusted to remove the duplicates when the zero padded months and days are identical to those which are not zero padded (11/11/99) there are 193,374 unique dates, based on the above formats, between 1934 and 2014, and slightly over one billion test passwords when combined with the census, 5500 common first names. This does not include 3 letter month abbreviations. If we were to add a non alphanumeric separator between the name and birth date, the number would increase to 33 billion. If we were to try to add populous cities and other possibly personally related information, I think it's clear the numbers would quickly get out of hand, while becoming less and less likely to match any real passwords.

If I were cracking a large password hash file, I would be very curious to see how productive first names and birthdays were, including with non alphanumeric seperators. The results achieved would determine if I would take it any further. My gut feeling is that both short names with the longer date formats and longish names with the shorter date fromats might be moderately productive but low compared with what Daniel Kline considerd productive. If the one billion test passwords got 10 passwords in a 10,000 account password list, I'd consider that successfull and if the 33 billion list with the character separator got another 3 or 4 I'd consider that successful. Since short name plus short dates would mostly be 6 to 8 characters, those might also produce results comparable to the first two groupings I mentioned, but I would not expect much from longish names and long date formats; very few people are willing to enter passwords over 12 characters.

Dan Klien's dictionaries were not systematic and he got only 24% of the passwords he had to work with. In the 2012 compromise of the Linkedin website, 93% of the passwords were in Mark Burnett's list of the Top 10,000 Passwords. I would use that list, plus uppercasing the first letter (the top 10,000 are all lower case) of each password in the list as my first dictionary on any new password list, and I'd expect to get the large majority nearly every time. No passwords in the top 10,000 combine a name and birthdate, unless Bob, Sam, Joe, Mike, Max, Charlie, Bubba, Buddy, John, Chris, and Nancy were all born on Jan. 23, or Dec. 3. An other John may have been born on March 16 and it's possible Isacs was born in Jan. 1955.

Regardless of the programmed dictionary details, the results would almost certainly be better than a brute force approach of passwords of similar lengths. Since the method depends on personally related information of the most obvious type, it is unlikely to ever be widely repeated as a suggested method for creating passwords. Still, used with knowledge of how password cracking tools work, it is in reality probably as good as any other manual method for creating passwords.

My conclusion is, that any method that creates passwords that the individual can remember, where the resulting passwords are reasonably long and have sufficient character diversity, and cannot be derived from the transformation of one or two dictionary words, is likely to result in a reasonably strong password, that is unlikely to be cracked by today's or near future technology. The more widely known any method for creating "good" passwords is, and the more specific it's instructions on creating passwords, the poorer it becomes. Any specific password or passwords used to illustrate the use of any method for creating good passwords are immediately and for ever after poor passwords.

Humans Are Not Good Password Generators

People think much as they did in the early 70's and create their passwords from familiar parts while computing power increases exponentially. It's only a matter of time until human created passwords don't stand a chance against automated cracking tools.

By the very nature of the way our minds work, humans are not good password generators. Even well trained and intelligent people who understand the security issues of computer passwords, can't help but fall into predictable patterns and repetition when they create passwords manually. Perhaps before the time that computers can reliably crack any human created passwords, all computer authentication will have become biometric or some other non password based approach. I expect that the disappearance of passwords will be like the arrival of simple and reliable voice recognition, one of those things that keeps taking longer than everyone thinks it should. But then reliable voice recognition did finally arrive early in the twenty-first century.

We want passwords that are easy to remember and naturally select words and character strings the come to mind quickly. Even if it were our goal, humans can't mentally create random character sequences. Where people have shown themselves to be inventive regarding passwords, is in finding ways to make easily cracked passwords that effectively circumvent system imposed limits that attempt to enforce strong passwords. For example, "Attack1", has mixed case and a digit, thus includes three of the four character types, and yet will very quickly fall to all three cracking tools discussed above in Manual Passwords, Better.

As an experiment, I timed how long it took to think of and write 20 "good" passwords, similar to two short words and one or two non letter characters, but trying to avoid using actual words. It was just under 10 minutes. There were nine words and would have been more if I hadn't stuck an extra letter on some when I realized I'd written a word. I understand that random does not mean an even distribution, and especially with small sample sizes, you shouldn't expect an even distribution of characters. I suspect however that 18 t's when there were 3 letters that were used only once and 5 only twice was not a product of random variation but the start of a pattern. I also suspected that 12 a's and u's versus 4 e's might be part of a pattern. At the rate I was working it would have taken a little under a year, without stopping for food or sleep to create a million passwords.

If we make them pronounceable and thus relatively easy to remember there is a strong tendency for the result to be real words. Any password that we have to memorize and think through character by character without a pattern or pronounceability to help us is just too hard to remember for practical purposes.

Computers on the other hand are ideally suited for generating lots of good passwords. I had an early version of password.pl password generator, create a number of one million password lists. Each took a few minutes on 500 MHz computers. With default settings there were less than 30 duplicates within a million password list and slightly less between lists. By making minor changes to the settings, thus making the passwords somewhat harder and out of a larger universe of possibilities, one million password lists were generated with no duplicates.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 
Home >
How-To >
Good Passwords >
human_passwords.htm


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2014, George Shaffer. Terms and Conditions of Use.