Good and Bad Passwords How-To
Cracking "Good" Passwords
With Custom Programmed Dictionaries
Passwords made from two unrelated short words and a non letter
are often recommended as good. We show that all such passwords
can be cracked in an afternoon on a low end desktop PC. The use
of two non letters makes much stronger passwords but they are still
within the reach of desktop PC technology. Some alternative ways
of creating good passwords are examined, but our final conclusion
is that humans just aren't very good at creating strong passwords
and certainly not strong passwords that have a reasonable chance
of being remembered.
Manual Passwords, Too Weak
We've now looked at a number of ways not to form passwords and in
the process have eliminated nearly every method that most people
like to use to make passwords. A common suggestion for
creating good passwords is to take two short, unrelated words and
combine them with one or more digits or punctuation or symbol
characters. Sometimes the suggestion is explicit that there should
be both a digit and a symbol (including punctuation). Sometimes
both go between the words and sometimes one or the other goes at
the beginning or end of the word. I don't recall seeing
suggestions that sometimes both should go at the beginning or end
of the words or that one should go at the front and the other the
end so that the two words run together.
Using a '#' to stand for any non letter character, the suggested
patterns look like:
worda#wordb
worda##wordb
#worda#wordb
worda#wordb#
I've used the linux.words dictionary and extracted all the 2, 3,
4 and 5 character words from it. Reviewing these lists, I'd say
that most of the words are common but that a third to a fifth are
not what I think of as common words. There are enough uncommon
words that I think most people would have some difficulty
thinking of short words that aren't in these lists so that for a
significant majority of users, both words would come from these
lists. I don't know whether it's 60% or 98%, but enough people
following the previous advice will be using words from these
lists that it's worth examining the feasibility of cracking these
by creating custom dictionaries, or having the cracking tool
generate such passwords.
The linux.words list contains 49 two character words, 536 three
character words, 2236 four character words and 4174 five
character words. Making every possible combination of two and
three, two and four, two and five, three and three, three and
four, and four and four character words results in a little over
8.3 million combinations. If you separate every combination with
all 42 non letter characters (not including the space) there are
just over 350 million combinations.
In about a half hour, I wrote a Perl script to generate all of
these. It took the script between 30 and 45 minutes to write
these to disk or 16 minutes in memory not outputting the created
passwords. The saved file was 3.7 gigabytes. L0phtCrack takes
about 15 minutes to read the file initially, and it's counters
overflow with this number of words in the dictionary file.
Despite the counter overflow, after the initial read, it
processed about 5 million words a minute or somewhat over 70
minutes for the full list and cracked two test accounts I'd
created using passwords formed like this. A single non letter
separating two words simply isn't good enough, not when a first
attempt covered the most likely combinations in about three hours
on a very modest machine (PIII 500). These can be cracked almost
as easily as single dictionary word based passwords.
Manual Passwords, Better
The number of passwords jumps dramatically with two words and two
non letter characters. 8.3 million is multiplied by 42 * 42 * 3
giving 44 billion. The three is for the three different
arrangements of non letter characters. At this number, saving
the list starts to become a serious issue for just about anyone
or any organization as it will require half a terabyte of disk
space. It's been my experience that with loop and integer
arithmetic intensive processes, that compiled C programs run
about 40 times faster than almost identically coded Perl scripts.
Thus a C program could generate the simpler passwords at about
850 million per minute (on a PII 450). Even if the extra
character imposed a significant overhead, 500 million a minute
seems quite conservative or less than an hour and a half for 44
billion.
The cryptography side is more CPU intensive. Returning to the
nice round 100,000 per second, generating the hashes for our
44 billion passwords will take about 5 days. But we also
know that rate is just a crude estimate that could easily
be off by more than an order of magnitude.
As long as I can remember, combining two unrelated words with two
non letters has been one of the primary recommendations for
creating strong passwords. I know if I were seriously cracking
passwords belonging to others, that I would replace the brute
force password generator with one that made passwords from short
words. It's hard to see how it could be less productive than
brute force and might pick up some root or administrator
passwords that their owners thought were really good. Of course
the yield or efficiency, due to the very large password universe,
would be very poor and all the standard dictionary methods should
be tried first.
At least one other factor should be covered. The 42 * 42 figures
are based on any non letter character in either non letter
character position. This also means allowing both characters to
be the same and both to be numbers. 00 and 11 would be allowed
in the middle of words and worda7wordb7 would also be OK. If we
make the stipulation that each password must contain a digit and
a symbol or punctuation, the correct multiplier is 8.3 million *
32 * 10 * 3 or about 5.7 billion passwords. This is
significantly less than 44 billion we were talking about. If we
decide to allow two symbols but never two digits, the numbers are
8.3 million * 42 * 32 * 3 or just under 38 billion.
This shows how important it is that the cracker's assumptions
match the methods by which the passwords were created in the file
to be cracked. If the good guys use the broadest rules and thus
create the largest password sets there will necessarily be some
relatively weak passwords that are acceptable. In this case a
cracker working with "small" password set of 2.5 billion that
come from just the words and numbers will get some passwords but
completely miss any with symbols or punctuation. If you use the
compromise rule set allowing two symbols but not two digits,
unless a cracker has inside information, they will have to search
the larger universe wasting processing time on the two number
passwords that don't exist.
There are some additional ways to complicate the crackers
problem. There are some two word patterns that are never
mentioned:
wordawordb##
#wordawordb#
##wordawordb
It's true that the words are run together but so what? Is this
going to make them harder to remember? Can we not tell where one
ends and the other begins? Does it matter? The fact is these new
patterns double the size of the password universe a cracker has
to match. There is one group that we do need to pay special
attention to. In the word combinations of two three letter words
and two four letter words, there will be duplicate words.
Though there is no evidence that they are currently doing so,
cracking tools could get these with by repeating the word and
prepending or appending arbitrary character sequences. L0phtCrack
can append arbitrary character sequences of arbitrary length but
can't double words. Crack 5 and John the Ripper can double words
and prepend or append short digit sequences. It's not clear
their rule syntax provides a convenient method for making mixed
symbol and digit sequences.
If you don't allow duplicate words and digit sequences, these
passwords are much stronger than any that can be derived
from single word dictionaries. Current tools cannot transform
any single word to create these sequences. Separate words
must be combined programmatically to obtain these sequences.
It's very important not to confuse some of the DONT's that apply
to variations on single dictionary words with superficially
similar character sequences that appear as part of
programmatically generated patterns. abduct66 is fundamentally
different than bitaid66. The former is a trivial transformation
to a common dictionary word which cracking programs may get in a
few minutes with current dictionaries and rules. The only ways
to get the latter is with brute force which is unlikely even
using a character set lacking symbols and punctuation or using a
custom programmed dictionary where multiple words and digits have
been combined for the express purpose of cracking passwords.
Without insider information, it's unlikely that bitaid66 will
ever be found because it does not fit any standard
recommendations on forming good passwords.
Fortunately for the good guys, we haven't mentioned the case of
letters as a factor in these passwords made from short words plus
random non letter characters. If we start mixing the case of the
letters in the words, we can add about two orders of magnitude of
complexity to the cracker's problem.
While truly randomly mixing case on 8 letters will expand the
possibilities by 128 times, there is nothing in passwords that
I find harder to remember or type than truly arbitrary case words.
There is an approach that significantly complicates the
crackers' job while not being so demanding of our memory and
fingers. This is to limit the capital letters to the
first, last, inner or outer positions. Some examples
will help: 1Bad&Tuba = first, *losT)baG = last, bolD^2Rug =
inner, [3HateraT = outer.
In the examples, I always uppercased
both positions to make what I meant clear. In practice,
the choices should be between 1) either or both or 2) either,
both or none. Keeping no upper case as an option once again
creates more possibilities but means some individual
passwords that are more likely to be found by a cracker
looking at only lower case options. The either or both
approach increases the choices by three and is the safer
option.
But it's better than that. We don't tell the cracker which
approach we use and different people use different approaches;
the choices are now increased by 12. To get these with any
efficiency a cracker needs three custom programmed dictionaries.
The first has all lower case, the second has the first, last, inner
and outer upper case combinations and the third all the other
mixed case combinations. That's likely to be a pain to program; if
we're lucky the cracker jumps from all lower case to full mixed
case.
Lets review. There are 8.3 million word pairs. These are
combined with two non letters; both cannot be digits. There
are six patterns created by where the non letters are placed
relative to the words. We're using 12 capitalization variations.
This is 8.3 million * 42 * 32 * 6 * 12 which gives 809 billion
possible passwords. This a lot better than dictionary derived
passwords but if our encryption estimates are off by very much, a
cracker has lots of computing power or is willing to wait a while
(it's 94 days at 100,000 passwords per second) it's still clearly
within the realm of today's technology. If we really want our
passwords to be safe we have to do better, a lot better.
Also, if decent frequency tables could be found for short
words (like the Census name lists), it would be possible to
build several smaller dictionaries and process the most
likely word combinations first. This could dramatically
alter the time it takes to get at least some of the passwords
created as we have described.
Alternative Manual Passwords
Among the discussions of how to create good passwords, there are
sometimes suggestions for creating passwords from the first
letters or initial few characters of words in sentences or phrases
that the user can remember. No specific suggestions were included
in the list of
Common Password DO's
because no single suggestion is nearly as common as combining
multiple words with non letters and there is no easy way to
phrase such suggestions that won't result in a method that
can be programmed to create dictionaries to crack the resulting
passwords. Unless the recommendation also includes explicit
discussion of minimum password length and the use of mixed case
and non letters, there is a good chance resulting passwords may
fall to brute force attacks.
For the next few years, any method that helps a user remember
passwords that are of a reasonable length (minimum 10, 12+ preferred)
and character diversity (4 character groups = full keyboard) and
do not contain dictionary words or simple transformations of
dictionary words, will likely create passwords of moderate to good
strength. The actual strength of the password will depend on the
actual length and character diversity, as well as the avoidance of
dictionary words. It's surprising just how obscure the result of
any two of the following transformations of a dictionary word is
likely to be: reverse, rotate, keyboard shift, collating sequence
shift, drop or add a character. In other words, passwords created by
these methods look almost random to a human, but are easily found by
the cracking tools. To be fair, program generated passwords may
contain such sequences by chance, as are sequences
derived from phrases or sentences.
Passwords created via a personal algorithm that is more complex than
multiple transformations of a single dictionary word are likely
to be better than two short words and one non alpha character and
depending on the algorithm, as good or better than two short words
and two non alpha characters.
I'm going to suggest such a personal algorithm that deliberately
violates some of the more common negative advice regarding
passwords. A typical family has four or more members. A
starting point for passwords might be the first two to four
characters of the first and middle names of your family,
avoiding any sequences that are by themselves dictionary words.
Another component might be a variety of alphanumeric sequences
that represent birth dates of family members but avoid using the
"19" or "20" part of the year and the more common date formats and
sometimes mix in one or two character month abbreviations instead
of numbers. If you know the day of the week that family members
were born on, the day abbreviation might sometimes substitute. If
family members were born in different locations, city and state
abbreviations might form an additional part. Putting these
together in a variety of combinations would likely yield a number
of not easily crackable passwords.
For a hypothetical family similar to mine, these might yield the
following bits and pieces: ge geod ged geda gdav gda gd ph phi
phl phly plly lly phlly pl sy sylv syli sli sl wa wac wacr
wcr wcri cri wc 1222 de22 241222 dec22 d22 1224 2412 24d22 24de22
24dec22 0909 99 909 9928 90928 28909 280909 s9 se9 sep9 sep09 s28
se28 sep28 28s9 28se9 28sep9 28s09 28se09 28se09 718 0718 j18
ju18 jul18 j55 ju55 jul55 55718 550718 55ju 55jul 55j18 55ju18
18ju55 18j55 47 047 0407 4762 040762 40762 a7 ap7 apr7 apr07 a762
ap762 6247 620407 62a7 62ap7 62ap07 62apr7 fbhin bhin fthin infbh
inftbh infth fnj fmnj mnj ftmnj monnj njft njftm njfm njftmo hpa
pah paha pahar harpa penn hpen hapen wwv wvw whwv whewv wvwh weva
wheewv. Pet names, car makes and models, and other easily remembered
items might be added, but remember, we are trying to use recognizible
bits and abbreviations, that are not in themselves dictionary words.
Obviously, I wasn't completely consistent in terms of taking a
set number of letters or using correct abbreviations but each
piece is easily derived from names, birth dates and places of
somewhat hypothetical family members. One of the birth places
had a three piece name, parts of which were or were not used. If
you start combining these, especially if you start changing case
or using punctuation for separation there end up being quite a
few possibilities, which should not fall to a dictionary
attack. With capital letters, avoid the first character of the
pasword unless there is more than one, in which case they should
not be the first and last characters. Most of these should be
much easier to remember than arbitrary or random sequences of
similar length.
I deliberately created a sample approach to creating passwords that
violates some of the most common advice so that it won't be repeated
elsewhere. If someone picks up on this and starts making passwords
similar to these, based on their own family and are careful about
resulting dictionary words, they are likely to have secure passwords
especially if they add some personal variation to the method. If
they tell someone else how they make their passwords, the method
becomes somewhat less secure. If the method becomes
publicized as an example of how to make good passwords it becomes
still less secure, but is there any way to estimate how much less
secure?.
If the cracker community believed that this had become a common
way for users to create passwords, then much of any value it once had,
would be greatly diminished, if the crackers could program dictionaries
of such passwords at a reasonable cost. There are two seemingly plausible approaches,
Paul Bobby's focused dictionary and a generic approach based on common names,
birth dates, and perhaps large and nearby cities. The problem with the focused
approach is that it requires a significant effort for each targeted
account, with no knowledge that any of it has any value. No matter how
much information is gathered, it has zero value if the target has not
based their password on personal information, or has, but one bit that
is not in the collected evidence, has been used if forming their password.
Even if the attacker has collected all relevant bits on the target, the
collected information is still useless unless the exact formats are
used in the correct positions. Finaly in situations where accounts and
passwords have been obtained from compromised systems, the attaker very
rarely has any opportunity to link this data to real individuals so
such focused attacks are meaningless. Where account and password data
can be matched with personally identifiable customer or employee data,
that data should be exploited in relatively standard and predictable
ways and otherwise forgotten.
The only statistical evidence that supports personal information being
used to form passwords is that data from the password file itself has
been has been useful in cracking passwords. So the information already included in
password file or Windows' SAM should surely be used. How it should be
used should be guided by what is present. If largely the same kind of
information is in nearly every record in a similar arrangement, that
suggests it is there as the result of a policy. It should be used but
because it is there, not at the target's request, it should probably
only be used in common or standard formats. In other words, make a serious
attempt to exploit what is given, but don't attempt to cover all possible
variations at the cost of failing to pursue other avenues which are more
general and may have a better payoff. If personal information is not
present and or is present in varying amounts and types, it is likely
there at the target's initiative and should used to the maximum, with
special attention paid to the formats in which it is stored.
We also know that common first names are one of the most productive
sources of passwords, but these should be gotten as part of standard
dictionary attacks, not target focused attacks. Common last names are
also produtive, but they also should be part of standard dictionary
attacks. For a generic focused dictionary first name and a birthday seem like
the best single bet that falls outside standard dictionary attacks.
We do not generally think of our loved ones (or ourselves) by first
and middle or last name. So we'll try all common first names with all
reasonable birthdate formats for the last 80 years. Most active computer
users are less than 80 and we are most likely to think of the
birthdays of our wives or significant others and our children but far
less likely to think of our parent's birthdates. In addition to the
full date we often think month and day and to a lesser extent the
month and year of the birth. I'd suggest the following common formats:
m/d/yy, mm/dd/yy, m-d-yy, mm-dd-yy, m/d/yyyy, mm/dd/yyyy, m-d-yyyy,
mm-dd-yyyy, mdyy,mmddyy, mdyyyy, mmddyyyy,
dmyy, ddmmyy, dmyyyy, ddmmyyyy, d/m/yy, dd/mm/yy, d/m/yyyy, dd/mm/yyyy,
d-m-yy, dd-mm-yy, d-m-yyyy, dd-mm-yyyy, md, mmdd, m/d, mm/dd,
m-d, mm-dd, myy, mmyy, myyyy, mmyyyy, m/yy, mm/yy, m/yyyy, mm/yyyy,
m-yy, mm-yy, m-yyyy, mm-yyyy. If 3 letter month abbreviations were used
with full dates, month and day and month and year, run together, and
separated by spaces and dashes, followed by periods and not, and
used as all lower case, first upper, and all upper, the number of dates
would somewhat more than double.
The census names are based on census data and are thus nearly always the
legal given names and rarely nicknames or familiar names. When thinking
of loved ones, we are more likely to use nicknames or familiar names than
the name on a birth certificate. Thus it makes sense to supplement census
names with first names obtained from other sources. When the date formats
are adjusted to remove the duplicates when the zero padded months and
days are identical to those which are not zero padded (11/11/99) there
are 193,374 unique dates, based on the above formats, between 1934 and
2014, and slightly over one billion test passwords when combined with
the census, 5500 common first names. This does not include 3 letter month
abbreviations. If we were to add a non alphanumeric separator between the
name and birth date, the number
would increase to 33 billion. If we were to try to add populous
cities and other possibly personally related information, I think it's
clear the numbers would quickly get out of hand, while becoming less
and less likely to match any real passwords.
If I were cracking a large password hash file, I would be very
curious to see how productive first names and birthdays were, including
with non alphanumeric seperators. The results achieved would
determine if I would take it any further. My gut feeling is that
both short names with the longer date formats and longish names
with the shorter date fromats might be moderately productive but
low compared with what
Daniel Kline considerd
productive. If the one billion test passwords got 10 passwords in a 10,000
account password list, I'd consider that successfull and if the
33 billion list with the character separator got another 3 or 4
I'd consider that successful. Since short name plus short dates
would mostly be 6 to 8 characters, those might also produce results
comparable to the first two groupings I mentioned, but I would not
expect much from longish names and long date formats; very few
people are willing to enter passwords over 12 characters.
Dan Klien's dictionaries were not systematic and he got only 24%
of the passwords he had to work with. In the 2012 compromise of the
Linkedin website, 93% of the passwords were in Mark Burnett's list
of the Top
10,000 Passwords. I would use that list, plus uppercasing the
first letter (the top 10,000 are all lower case) of each password
in the list as my first dictionary on any new password list, and
I'd expect to get the large majority nearly every time. No passwords
in the top 10,000 combine a name and birthdate, unless Bob, Sam,
Joe, Mike, Max, Charlie, Bubba, Buddy, John, Chris, and Nancy were
all born on Jan. 23, or Dec. 3. An other John may have been born on
March 16 and it's possible Isacs was born in Jan. 1955.
Regardless of the programmed dictionary details,
the results would almost certainly be better than a brute
force approach of passwords of similar lengths. Since the
method depends on personally related information of the most
obvious type, it is unlikely to ever be widely repeated as a
suggested method for creating passwords. Still, used with
knowledge of how password cracking tools work, it is in reality
probably as good as any other manual method for creating
passwords.
My conclusion is, that any method that creates passwords that the
individual can remember, where the resulting passwords are
reasonably long and have sufficient character diversity, and
cannot be derived from the transformation of one
or two dictionary words, is likely
to result in a reasonably strong password, that is unlikely to be
cracked by today's or near future technology. The more widely
known any method for creating "good" passwords is, and the more
specific it's instructions on creating passwords, the poorer it
becomes. Any specific password or passwords used to illustrate
the use of any method for creating good passwords are immediately
and for ever after poor passwords.
Humans Are Not Good Password Generators
People think much as they did in the early 70's and
create their passwords from familiar parts while computing
power increases exponentially. It's only a matter of time until
human created passwords don't stand a chance against automated
cracking tools.
By the very nature of the way our minds work, humans are not good
password generators. Even well trained and intelligent people
who understand the security issues of computer passwords, can't
help but fall into predictable patterns and repetition when they
create passwords manually. Perhaps before the time that computers can
reliably crack any human created passwords, all computer
authentication will have become biometric or some other non
password based approach. I expect that the disappearance of
passwords will be like the arrival of simple and reliable voice
recognition, one of those things that keeps taking longer than
everyone thinks it should. But then reliable voice recognition
did finally arrive early in the twenty-first century.
We want passwords that are easy to remember and naturally select
words and character strings the come to mind quickly. Even if it
were our goal, humans can't mentally create random character
sequences. Where people have shown themselves to be inventive
regarding passwords, is in finding ways to make easily cracked
passwords that effectively circumvent system imposed limits that
attempt to enforce strong passwords. For example, "Attack1", has
mixed case and a digit, thus includes three of the four
character types, and yet will very quickly fall to all three
cracking tools discussed above in
Manual Passwords, Better.
As an experiment, I timed how long it took to think of and write
20 "good" passwords, similar to two short words and one or two
non letter characters, but trying to avoid using actual words. It
was just under 10 minutes. There were nine words and would have
been more if I hadn't stuck an extra letter on some when I
realized I'd written a word. I understand that random does not
mean an even distribution, and especially with small sample sizes,
you shouldn't expect an even distribution of characters. I
suspect however that 18 t's when there were 3 letters that were
used only once and 5 only twice was not a product of random
variation but the start of a pattern. I also suspected that 12
a's and u's versus 4 e's might be part of a pattern. At the rate
I was working it would have taken a little under a year, without
stopping for food or sleep to create a million passwords.
If we make them pronounceable and thus relatively easy to
remember there is a strong tendency for the result to be real
words. Any password that we have to memorize and think through
character by character without a pattern or pronounceability to
help us is just too hard to remember for practical purposes.
Computers on the other hand are ideally suited for generating
lots of good passwords. I had an early version of password.pl
password generator, create a number of one million password
lists. Each took a few minutes on 500 MHz computers. With
default settings there were less than 30 duplicates within a
million password list and slightly less between lists. By
making minor changes to the settings, thus making the passwords
somewhat harder and out of a larger universe of possibilities, one
million password lists were generated with no duplicates.
Top of Page -
Site Map
Copyright © 2000 - 2014 by George Shaffer. This material may be
distributed only subject to the terms and conditions set forth in
http://GeodSoft.com/terms.htm
(or http://GeodSoft.com/cgi-bin/terms.pl).
These terms are subject to change. Distribution is subject to
the current terms, or at the choice of the distributor, those
in an earlier, digitally signed electronic copy of
http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the
time of the distribution. Distribution of substantively modified
versions of GeodSoft content is prohibited without the explicit written
permission of George Shaffer. Distribution of the work or derivatives
of the work, in whole or in part, for commercial purposes is prohibited
unless prior written permission is obtained from George Shaffer.
Distribution in accordance with these terms, for unrestricted and
uncompensated public access, non profit, or internal company use is
allowed.
|