Good and Bad Passwords How-To
Cracking "Good" Passwords
With Custom Programmed Dictionaries
Passwords made from two unrelated short words and a non letter
are often recommended as good. We show that all such passwords
can be cracked in an afternoon on a low end desktop PC. The use
of two non letters makes much stronger passwords but they are still
within the reach of desktop PC technology. Some alternative ways
of creating good passwords are examined, but our final conclusion
is that humans just aren't very good at creating strong passwords
and certainly not strong passwords that have a reasonable chance
of being remembered.
Manual Passwords, Too Weak
We've now looked at a number of ways not to form passwords and in
the process have eliminated nearly every method that most people
like to use to make passwords. A common suggestion for
creating good passwords is to take two short, unrelated words and
combine them with one or more digits or punctuation or symbol
characters. Sometimes the suggestion is explicit that there should
be both a digit and a symbol (including punctuation). Sometimes
both go between the words and sometimes one or the other goes at
the beginning or end of the word. I don't recall seeing
suggestions that sometimes both should go at the beginning or end
of the words or that one should go at the front and the other the
end so that the two words run together.
Using a '#' to stand for any non letter character, the suggested
patterns look like:
worda#wordb
worda##wordb
#worda#wordb
worda#wordb#
I've used the linux.words dictionary and extracted all the 2, 3,
4 and 5 character words from it. Reviewing these lists, I'd say
that most of the words are common but that a third to a fifth are
not what I think of as common words. There are enough uncommon
words that I think most people would have some difficulty
thinking of short words that aren't in these lists so that for a
significant majority of users, both words would come from these
lists. I don't know whether it's 60% or 98%, but enough people
following the previous advice will be using words from these
lists that it's worth examining the feasibility of cracking these
by creating custom dictionaries, or having the cracking tool
generate such passwords.
The linux.words list contains 49 two character words, 536 three
character words, 2236 four character words and 4174 five
character words. Making every possible combination of two and
three, two and four, two and five, three and three, three and
four, and four and four character words results in a little over
8.3 million combinations. If you separate every combination with
all 42 non letter characters (not including the space) there are
just over 350 million combinations.
In about a half hour, I wrote a Perl script to generate all of
these. It took the script between 30 and 45 minutes to write
these to disk or 16 minutes in memory not outputting the created
passwords. The saved file was 3.7 gigabytes. L0phtCrack takes
about 15 minutes to read the file initially, and it's counters
overflow with this number of words in the dictionary file.
Despite the counter overflow, after the initial read, it
processed about 5 million words a minute or somewhat over 70
minutes for the full list and cracked two test accounts I'd
created using passwords formed like this. A single non letter
separating two words simply isn't good enough, not when a first
attempt covered the most likely combinations in about three hours
on a very modest machine (PIII 500). These can be cracked almost
as easily as single dictionary word based passwords.
Manual Passwords, Better
The number of passwords jumps dramatically with two words and two
non letter characters. 8.3 million is multiplied by 42 * 42 * 3
giving 44 billion. The three is for the three different
arrangements of non letter characters. At this number, saving
the list starts to become a serious issue for just about anyone
or any organization as it will require half a terabyte of disk
space. It's been my experience that with loop and integer
arithmetic intensive processes, that compiled C programs run
about 40 times faster than almost identically coded Perl scripts.
Thus a C program could generate the simpler passwords at about
850 million per minute (on a PII 450). Even if the extra
character imposed a significant overhead, 500 million a minute
seems quite conservative or less than an hour and a half for 44
billion.
The cryptography side is more CPU intensive. Returning to the
nice round 100,000 per second, generating the hashes for our
44 billion passwords will take about 5 days. But we also
know that rate is just a crude estimate that could easily
be off by more than an order of magnitude.
As long as I can remember, combining two unrelated words with two
non letters has been one of the primary recommendations for
creating strong passwords. I know if I were seriously cracking
passwords belonging to others, that I would replace the brute
force password generator with one that made passwords from short
words. It's hard to see how it could be less productive than
brute force and might pick up some root or administrator
passwords that their owners thought were really good. Of course
the yield or efficiency, due to the very large password universe,
would be very poor and all the standard dictionary methods should
be tried first.
At least one other factor should be covered. The 42 * 42 figures
are based on any non letter character in either non letter
character position. This also means allowing both characters to
be the same and both to be numbers. 00 and 11 would be allowed
in the middle of words and worda7wordb7 would also be OK. If we
make the stipulation that each password must contain a digit and
a symbol or punctuation, the correct multiplier is 8.3 million *
32 * 10 * 3 or about 5.7 billion passwords. This is
significantly less than 44 billion we were talking about. If we
decide to allow two symbols but never two digits, the numbers are
8.3 million * 42 * 32 * 3 or just under 38 billion.
This shows how important it is that the cracker's assumptions
match the methods by which the passwords were created in the file
to be cracked. If the good guys use the broadest rules and thus
create the largest password sets there will necessarily be some
relatively weak passwords that are acceptable. In this case a
cracker working with "small" password set of 2.5 billion that
come from just the words and numbers will get some passwords but
completely miss any with symbols or punctuation. If you use the
compromise rule set allowing two symbols but not two digits,
unless a cracker has inside information, they will have to search
the larger universe wasting processing time on the two number
passwords that don't exist.
There are some additional ways to complicate the crackers
problem. There are some two word patterns that are never
mentioned:
wordawordb##
#wordawordb#
##wordawordb
It's true that the words are run together but so what? Is this
going to make them harder to remember? Can we not tell where one
ends and the other begins? Does it matter? The fact is these new
patterns double the size of the password universe a cracker has
to match. There is one group that we do need to pay special
attention to. In the word combinations of two three letter words
and two four letter words, there will be duplicate words.
Though there is no evidence that they are currently doing so,
cracking tools could get these with by repeating the word and
prepending or appending arbitrary character sequences. L0phtCrack
can append arbitrary character sequences of arbitrary length but
can't double words. Crack 5 and John the Ripper can double words
and prepend or append short digit sequences. It's not clear
their rule syntax provides a convenient method for making mixed
symbol and digit sequences.
If you don't allow duplicate words and digit sequences, these
passwords are much stronger than any that can be derived
from single word dictionaries. Current tools cannot transform
any single word to create these sequences. Separate words
must be combined programmatically to obtain these sequences.
It's very important not to confuse some of the DONT's that apply
to variations on single dictionary words with superficially
similar character sequences that appear as part of
programmatically generated patterns. abduct66 is fundamentally
different than bitaid66. The former is a trivial transformation
to a common dictionary word which cracking programs may get in a
few minutes with current dictionaries and rules. The only ways
to get the latter is with brute force which is unlikely even
using a character set lacking symbols and punctuation or using a
custom programmed dictionary where multiple words and digits have
been combined for the express purpose of cracking passwords.
Without insider information, it's unlikely that bitaid66 will
ever be found because it does not fit any standard
recommendations on forming good passwords.
Fortunately for the good guys, we haven't mentioned the case of
letters as a factor in these passwords made from short words plus
random non letter characters. If we start mixing the case of the
letters in the words, we can add about two orders of magnitude of
complexity to the cracker's problem.
While truly randomly mixing case on 8 letters will expand the
possibilities by 128 times, there is nothing in passwords that
I find harder to remember or type than truly arbitrary case words.
There is an approach that significantly complicates the
crackers' job while not being so demanding of our memory and
fingers. This is to limit the capital letters to the
first, last, inner or outer positions. Some examples
will help: 1Bad&Tuba = first, *losT)baG = last, bolD^2Rug =
inner, [3HateraT = outer.
In the examples, I always uppercased
both positions to make what I meant clear. In practice,
the choices should be between 1) either or both or 2) either,
both or none. Keeping no upper case as an option once again
creates more possibilities but means some individual
passwords that are more likely to be found by a cracker
looking at only lower case options. The either or both
approach increases the choices by three and is the safer
option.
But it's better than that. We don't tell the cracker which
approach we use and different people use different approaches;
the choices are now increased by 12. To get these with any
efficiency a cracker needs three custom programmed dictionaries.
The first has all lower case, the second has the first, last, inner
and outer upper case combinations and the third all the other
mixed case combinations. That's likely to be a pain to program; if
we're lucky the cracker jumps from all lower case to full mixed
case.
Lets review. There are 8.3 million word pairs. These are
combined with two non letters; both cannot be digits. There
are six patterns created by where the non letters are placed
relative to the words. We're using 12 capitalization variations.
This is 8.3 million * 42 * 32 * 6 * 12 which gives 809 billion
possible passwords. This a lot better than dictionary derived
passwords but if our encryption estimates are off by very much, a
cracker has lots of computing power or is willing to wait a while
(it's 94 days at 100,000 passwords per second) it's still clearly
within the realm of today's technology. If we really want our
passwords to be safe we have to do better, a lot better.
Also, if decent frequency tables could be found for short
words (like the Census name lists), it would be possible to
build several smaller dictionaries and process the most
likely word combinations first. This could dramatically
alter the time it takes to get at least some of the passwords
created as we have described.
Alternative Manual Passwords
Among the discussions of how to create good passwords, there are
sometimes suggestions for creating passwords from the first
letters or initial few characters of words in sentences or phrases
that the user can remember. No specific suggestions were included
in the list of
Common Password DO's
because no single suggestion is nearly as common as combining
multiple words with non letters and there is no easy way to
phrase such suggestions that won't result in a method that
can be programmed to create dictionaries to crack the resulting
passwords. Unless the recommendation also includes explicit
discussion of minimum password length and the use of mixed case
and non letters, there is a good chance resulting passwords may
fall to brute force attacks.
For the next few years, any method that helps a user remember
passwords that are of a reasonable length and character diversity and
do not contain dictionary words or simple transformations of
dictionary words, will likely create passwords of moderate to good
strength. The actual strength of the password will depend on the
actual length and character diversity, as well as the avoidance of
dictionary words. It's surprising just how obscure the result of
any two of the following transformations of a dictionary word is
likely to be: reverse, rotate, keyboard shift, collating sequence
shift, drop or add a character. In other words, passwords created by
these methods look almost random to a human, but are easily found by
the cracking tools. To be fair, program generated passwords may
contain such sequences by chance, as are sequences
derived from phrases or sentences.
Passwords created via a personal algorithm that is more complex than
multiple transformations of a single dictionary word are likely
to be better than two short words and one non alpha character and
depending on the algorithm, as good or better than two short words
and two non alpha characters.
I'm going to suggest such a personal algorithm that deliberately
violates some of the more common negative advice regarding
passwords. A typical family has four or more members. A
starting point for passwords might be the first two to four
characters of the first and middle names of your family,
avoiding any sequences that are by themselves dictionary words.
Another component might be a variety of alphanumeric sequences
that represent birth dates of family members but avoid using the
"19" part of the year and the more common date formats and
sometimes mix in one or two character month abbreviations instead
of numbers. If you know the day of the week that family members
were born on, the day abbreviation might sometimes substitute. If
family members were born in different locations, city and state
abbreviations might form an additional part. Putting these
together in a variety of combinations would likely yield a number
of not easily crackable passwords.
For a hypothetical family similar to mine, these might yield the
following bits and pieces: ge geod ged geda gdav gda gd ph phi
phl phly plly lly phlly pl sy sylv syli sli lit sl wa wac wacr
wcr wcri cri wc 1222 de22 241222 dec22 d22 1224 2412 24d22 24de22
24dec22 0909 99 909 9928 90928 28909 280909 s9 se9 sep9 sep09 s28
se28 sep28 28s9 28se9 28sep9 28s09 28se09 28se09 718 0718 j18
ju18 jul18 j55 ju55 jul55 55718 550718 55ju 55jul 55j18 55ju18
18ju55 18j55 47 047 0407 4762 040762 40762 a7 ap7 apr7 apr07 a762
ap762 6247 620407 62a7 62ap7 62ap07 62apr7 fbhin bhin fthin infbh
inftbh infth fnj fmnj mnj ftmnj monnj njft njftm njfm njftmo hpa
pah paha pahar harpa penn hpen hapen wwv wvw whwv whewv wvwh weva
wheewv.
Obviously, I wasn't completely consistent in terms of taking a
set number of letters or using correct abbreviations but each
piece is easily derived from names, birth dates and places of
somewhat hypothetical family members. One of the birth places
had a three piece name, parts of which were or were not used. If
you start combining these, especially if you start changing case
or using punctuation for separation there end up being quite a
few possibilities, most of which would not fall to a dictionary
attack. Most should be much easier to remember than arbitrary or
random sequences of similar length.
I deliberately created a sample approach to creating passwords that
violates some of the most common advice so that it won't be repeated
elsewhere. If someone picks up on this and starts making passwords
similar to these, based on their own family and are careful about
resulting dictionary words, they are likely to have secure passwords
especially if they add some personal variation to the method. If
they tell someone else how they make their passwords, the method
immediately becomes much less secure. If the method becomes
publicized as an example of how to make good passwords it becomes
still less secure.
If the cracker community believed that this had become a common
way for users to create passwords, then much of any value it may
have once had would be greatly diminished because the crackers
could start trying to program dictionaries of such passwords. If
enough information about a specific individual were obtained a
focused attack could be launched but building a general purpose
dictionary would seem more productive.
An attempt to cover all possibilities suggested by the method
under discussion would make the programmed dictionaries discussed
so far look small. The number of unique two to four character
sequences derived from all common first and middle name
combinations is several thousand. The number of possible birth
dates in all formats is huge; by focusing on the most common
formats and possibly limiting them to month and day without year
the number is greatly reduced. For cities and states, by focusing
on a the couple hundred largest cities, plus cities and states in
the immediate geographical vicinity of the targeted computer, this
method is brought to a manageable size.
Regardless of the programmed dictionary details, exact pieces
included and total size, the results would almost certainly be
better than a brute force approach, that is assuming the method
became a reasonably common method for creating passwords. If
not, the efforts would be a complete waste, producing no positive
results with typical password hash database sizes. Since the
method depends on personally related information of the most
obvious type, it is unlikely to ever be widely repeated as a
suggested method for creating passwords. Still, used with
knowledge of how password cracking tools work, it is in reality
probably as good as any other manual method for creating
passwords unless the user happend to have a family biography
on-line from which all the pieces might be extracted.
My conclusion is, that any method that creates passwords that the
individual can remember where the resulting passwords are
reasonably long and have sufficient character diversity and
cannot be derived from the transformation of a single dictionary
word or the combination of multiple dictionary words, is likely
to result in a reasonably strong password that is unlikely to be
cracked by today's or near future technology. The more widely
known any such method is and the more specific it's instructions
on creating passwords, the poorer it becomes. For these reasons,
I won't repeat any advice related to creating passwords from
sentences or phrases. Any specific password or passwords used to
illustrate the use of any method for creating good passwords are
immediately and for ever after poor passwords.
Humans Are Not Good Password Generators
People think much as they did in the early 70's and
create their passwords from familiar parts while computing
power increases exponentially. It's only a matter of time until
human created passwords don't stand a chance against automated
cracking tools.
By the very nature of the way our minds work, humans are not good
password generators. Even well trained and intelligent people
who understand the security issues of computer passwords, can't
help but fall into predictable patterns and repetition when they
create passwords manually. Perhaps before the time that computers can
reliably crack any human created passwords, all computer
authentication will have become biometric or some other non
password based approach. I expect that the disappearance of
passwords will be like the arrival of simple and reliable voice
recognition, one of those things that keeps taking longer than
everyone thinks it should.
We want passwords that are easy to remember and naturally select
words and character strings the come to mind quickly. Even if it
were our goal, humans can't mentally create random character
sequences. Where people have shown themselves to be inventive
regarding passwords, is in finding ways to make easily cracked
passwords that effectively circumvent system imposed limits that
attempt to enforce strong passwords. For example, "Attack1", has
mixed case and a digit, thus includes three of the four
character types, and yet will fall to all three cracking tools
discussed here.
As an experiment, I timed how long it took to think of and write
20 "good" passwords, similar to two short words and one or two
non letter characters, but trying to avoid using actual words. It
was just under 10 minutes. There were nine words and would have
been more if I hadn't stuck an extra letter on some when I
realized I'd written a word. I understand that random does not
mean an even distribution and especially with small sample sizes
you shouldn't expect an even distribution of characters. I
suspect however that 18 t's when there were 3 letters that were
used only once and 5 only twice was not a product of random
variation but the start of a pattern. I also suspected that 12
a's and u's versus 4 e's might be part of a pattern. At the rate
I was working it would have taken a little under a year, without
stopping for food or sleep to create a million passwords.
If we make them pronounceable and thus relatively easy to
remember there is a strong tendency for the result to be real
words. Any password that we have to memorize and think through
character by character without a pattern or pronounceability to
help us is just too hard to remember for practical purposes.
Computers on the other hand are ideally suited for generating
lots of good passwords. I had an early version of password.pl
password generator, create a number of one million password
lists. Each took a few minutes on 500 MHz computers. With
default settings there were less than 30 duplicates within a
million password list and slightly less between lists. By
making minor changes to the settings, thus making the passwords
somewhat harderand out of a larger universe of possibilities, one
million password lists could be generated with no duplicates.
Top of Page -
Site Map
Copyright © 2000 - 2006 by George Shaffer.
This material may be distributed only subject to the
terms and conditions set forth on
http://GeodSoft.com/terms.htm.
These terms are subject to change. Distribution is subject to the then
current terms, or at the choice of the distributor, those defined in a
verifiably dated printout or electronic copy of
http://GeodSoft.com/terms.htm at the time of the distribution.
Distribution of substantively modified versions of GeodSoft content is
prohibited without the explicit permission of George Shaffer.
Distribution of the work or derivatives of the work, in whole or in part,
for commercial purposes is prohibited unless prior permission is
obtained from George Shaffer. Distribution in accordance with these
terms, for private, unrestricted and uncompensated public access, non
profit, or internal company use is allowed.
|