Good and Bad Passwords How-To
A Review of Some Automated Password
Generators
Automated Password Generators
Today, small dictionaries or common password lists get a
significant percentage of passwords. Ways to extend and improve
the cracking dictionaries were reviewed. When the limits of
improving the dictionaries and rules are reached, we've shown how
to build front end generators to replace the cracking tools low
yield, brute force methods. These build passwords along the lines
of standard good password recommendations. The yield should be
much better than brute force. As computing power increases,
password cracking tools should be able to generate ever larger
numbers and variations on what have generally been regarded as
strong passwords.
The response to automated cracking methods lies in letting
computers create passwords for us. An automated password
generator can almost instantaneously create as many strong
passwords as we want. If such a tool is properly configurable it
should be able to generate passwords with any degree of structure
or randomness as is desired. The default control pattern in the
current
password.pl
generates passwords from 147 trillion possibilities. In line with
the large set, many of these passwords are more difficult than
most users and administrators are used to dealing with. There are
alternative control patterns and option settings that create
more pseudo word like passwords which will be discussed later.
State Department Passwords
In the early eighties the State Department built an automated
password generator as part of their Controlled User Environment.
This generator did not generate random passwords but highly
structured passwords that were easy to remember. Truly random
sequences of the 95 printable ASCII characters become
difficult to remember and type even at 4 characters, and beyond
6 characters very few normal humans can remember a truly random
sequence, at least not without substantial effort and frequent use.
The State Department analysts realized the problem of truly
random passwords and recognized that easy to remember
passwords could be built from three small, easy to remember
pieces. The State Department passwords conformed to a very
specific pattern, CVC99CVC or consonant vowel consonant, digit
digit, consonant vowel consonant.
Click
here to see dynamically generated examples. The letters were
all upper case. This looks too simple to have a significant
number of passwords but the arithmetic is quite simple: 21 * 5 *
21 * 10 * 10 * 21 * 5 * 21 = 486,202,500. That's nearly 500
million passwords, but that's only slightly larger than the word,
non letter, word pattern that we worked through in about three
hours on a now very obsolete PC. This was fine for the mid 80's
but not adequate for the twenty first century.
Because of the consonant vowel consonant sequence, the alphabetic
sequences are fairly easy to pronounce. Anyone can remember two
digits. Most of these eight character passwords are really quite
easy to remember which is a significant accomplishment given that
the password universe is several hundred times larger than the words in the
English language. Of course a user only needs to remember his or
her password, not the nearly 500 million that are possible.
From the opening comments of my first command line password generator,
to these pages and the CGI password generator that I wrote in 2000, to
the current pattern based generator, I've always acknowledged
the State Department's Controlled User Environment (CUE) password generator
as the inspiration for those I've written. What I did not realize
until mid 2012, as I was reviewing and updating some pages and programs
on the site, was just how central to my thinking about passwords that the CUE
password generator has been. One could easily say it is the figurative
corner stone to my password thinking.
The analysts and programmers who developed the CUE password generator
have never been acknowledged. I never saw a line of CUE code or met anyone
who worked on the the project. I don't know if one inspired individual
thought of and wrote the generator or if it was a small team. Whoever
it was, I and any I may have influenced, owe them a
debt of gratitude. I could never have written any of the generators I
have written if I had not used the CUE password generator. In my opinion,
before my generators, which are direct descendants of the CUE generator,
except for the CUE password generator, I've never seen a decent
let alone a good password generator. Since 2012 Mark Burnett's Pawfert
needs to be included; he wrote it long before that but I did not see it
until then.
With three exceptions, nearly every generator I've seen (not inspired
by mine) seems to be obsessed with randomness and entropy, with no regard
to usability. These produce the worst passwords I've seen. Learning a truly
random 8 character password from the 95 typeable
characters, requires an excessive amount of effort followed by
continued frequent use. Now that passwords really should be at least
12 characters long, it's insanity. This guarantees not only will they
be written down (which as I explain
elsewhere is not necessarily bad)
but in an immediately accessible and usually obvious location. The
favorite spot is a post-it on the monitor. As most security breeches,
at least historically, have been internal and not network or external,
this immediately makes these random passwords worse than the number one
favorite password, "password" and obvious and trivially crackable ones
like Attack1. At least most systems can lock out people who sit
down and try to guess these, and someone may come and investigate.
If you don't follow follow passwords, you probably don't know
what is most popular. On the other hand, an Internet search for "common
passwords" shows up a lot of results. Unfortunately this interest seems
very superficial. It seems a fair number of people are interested
in what the #1 password, or the top 10 or top 25 are. The later seems
to have been spurred on by an Internet security firm's release of a top
25 list late in 2011. A very different list was released by another firm
which listed it as the top 25 used in enterprises. I hate these lists.
So much of the talk is along the lines if your password is on this list,
you need to change it. The implication is often that if yours is not
on the list, then it's probably OK.
What BS. If your password is in any top 10,000 password list, any list
of compromised passwords ever released, any dictionary, book, movie, song,
or been published anywhere in print or on the Internet you need
to change it just as quickly. Any capable cracker is interested in at least
the top 600 quadrillion (that's all 9 character passwords made of all 95
typeable characters on most English keyboards), and really serious crackers
are interested in much larger lists. All crackers are interested in any
password that appears in any list they can get their hands on; they know
if anyone has ever even seen it before it is MUCH MUCH more likely to
be a password than any random 8 character string.
Mark Burnett responded to the
previous paragraph with "About your comments on top password lists, I thought
I would defend them some. Of course the fact that your password isn't on that
list doesn't guarantee that it it strong, but these lists are important nonetheless.
It is important to note that if you took all the unique passwords from all my lists,
there are only about 2.3 million total unique passwords that I have seen. But if you
take that further, there are only 553,956 passwords that have shown up on the list more
than once. Despite how many thousands of passwords I add to that list, the number of
unique passwords grows very slowly."
He continued "What that means is that most people select passwords that are on
that list [his [10.000
Top Passwords list] and very few people select passwords
that don't already appear on there at least once. This is definitely illustrated
by the fact that 93% of those showed up on the linkedin leak. So you definitely
cannot disregard a list that represents such a large percentage of everyone's
passwords."
According to Mark Burnett's statistics, which are clearly supported by the
linkedin leak he referred to, 91% of all known passwords appear in his Top 10,000
list. This isn't just most but a huge, an overwhelming majority. I think he
may be suggesting, like all other people, all crackers are not equally
competent, all don't have the time, knowledge, or resoruces I alluded to when
I said 600 quadrillion. If your password is on any common password list you
are just giving it to every cracker. If it's not on any of these lists,
especially his long one which is far more comprehensive than than any other
common password list released to date, you at least have some chance of not
getting cracked, if a system you have an account on is compromised.
He has a point. It actually takes about 10,000 times the
power of a fast desktop to bring all 9 character passwords into a practical time
frame (just about 1 month). This probably takes it out of the realm of
most traditional "enthuiast" crackers and into the realm of organized
crime and government agencies.
I was thinking that people should be making passwords that will protect
their bank account, credit card, investment, and other accounts that may be
the target of organized crime and where it could really hurt if a criminal
gained access to such an account. There are suggestions
and
claims
and
more claims
that a single desktop like computer equipped with the right
Grapics Processing Unit (GPU)
and cracking software designed to use the GPU, can crack passwords perhaps 100
times faster than the times listed in my most recent
password cracking times table
or the 2012 defaults set in the
Crack Time Calculator.
What should a password not on a common password list look like? I'd say first
consider what Mark Burnett has to say about
pass phrases versus
passwords. An easy to remember and type pass phrase may actually be faster to
type and stronger than a conventional "strong" password. If you don't like
this approach, look at what can be done with my
password generator. (I am working on a much
more flexible version, but it still has obvious bugs.) For important or sensitive
accounts you should seriously consider 12 to 14 character passwords if you want a margin
of safety, over what we know can be cracked by a well equiped malicious organization.
If you use one a real word in a password, remember that no matter how many
changes you make to it, it's still going to look like the same word to most
crackers and cracking tools. These have been developed for more than 25 years
to find every change to a word that anyone has ever thought of. Cracking tools
do nothing better than find the real word in a mangled word. Think of that
piece of your password as already known to the crackers. It's what you do
with the rest of your password that will give it away or save it. If you
start the password with your word, uppercase the first letter, and add some
numbers and or letters to the end, you've probably given your password away.
If you start your password with something else, your chances improve. If you
uppercase the last letter, or any other letter but the first, your chances
improve. Crackers are great at finding changes to a word, including dropped
letters, especially vowels. One change they are not good at is finding multiple
non letters ADDED somewhere near the middle. If you add a
letter there is always a chance that the new string is actually a
different rotated, shifted, inverted, keyboard rotated, or keyboard
shifted word you cannot recognize, but a cracker will very likely
spot during a dictionary attack.
If you must use a word, be sure it is LESS THAN than two thirds
of the the total password length: 5 characters in 8 (too short),
5 in 9 (still too short), 6 in 10 (marginal), 7 in 11 (fair),
7 in 12 (potentially good); 8 in 13 (potentially strong) 9 in 14
(likely very strong). It's not hard to calculate password
strength (or relative strength). The size of character set
raised to the power of the length of the password tells how many
possible passwords there are. You get 26 charcters for using
lower case letters, and 52 when you add upper case to lower case,
but it doesn't count if it's the first letter of the password
or the word in the password. You get 10 more charcters when you
add a digit, but dont use a 1. The digit "1" accounts for 43%
of all digits used in passwords. So you don't get to count digits unless
you use at least one that's not a "1". If you use a punctuation
mark or symbol, you get to add 33 more for 95 total. The
strength is then determined by dividing the total number
of passwords by the estimated cracks (or encryptions) per second
to determine how long it would take to to crack all passwords.
On average, a single password will take half that time. If you've
made a common mistake that allows a dictionary attact to work,
the strength calculation is meaningless because a much much faster
way to find it is availabe, and it's the cracker's preferred method.
One good
way to start a password is with a symbol you never type. One that
you don't even know what it is called or used for is often good.
It's not expected. What is expected is upper casing the first
letter of a word and adding 1 to 3 digits to the end. If the password
is not simply a word, ninety some percent of the population
does just that. The strength of all such passwords is ZERO.
That's what crackers look for first.
Returning to the State Department's password generator what did it
teach us? Primarily that good passwords need to balance security
and useability. In their day the CUE passwords were adequately
secure. Secure enough, that as recently as 2012 I still had one or
two on important accounts that almost matched the State Department's
password structure. Most of the patterns I use allow significant
variability in the generated passwords. Occasionally I select one
that is weaker than it should be. In addition to being strong enough
for the mid 1980's the CUE passwords were among the easiest decent
passwords I've ever seen to remember.
For most purposes it's good that technology improves, but this is not
true for passwords. People's memory and intelligence have not changed
as computers have progressed. Some would argue that education is not
as good as it used to be. We are incapable of creating anything by
mental processes that approximates randomness. We are excruciatingly
slow at creating passwords that are at all strong and still reasonably
easy to remember.
What I've tried to do over time is to extend the philosophy behind the
CUE passwords to ever larger universes of possible passwords.
My first generators clearly worked
of off the original CUE structure. But I added variations, most of which
had some adverse effect on useability or memorability. I think this
negative effect was well within the acceptable range. In a sense it had to
be acceptable; 3 small easy to remember pieces simply cannot provide adequate
security in the second decade of 21st century. The other approach, which I am
fully open to, fully pronounceable, all lower case passwords. These simply
reqiure more character length, than the standard recommended by the U.S.
Government's Automated Password Generator, was willing to consider.
I question if
anyone can remember strong passwords they do not use for months at a time.
Any password I use several times a weak produced by any of my generators,
I learn to remember without any aids. When I create a new password I expect
to use with some regularity, I use it right away, multiple times.
I think beyond purely conscious thought, there is muscle memory at work
with frequently typed words, including passwords.
I was very pleased to learn in 2012 that upper case letters are almost
non existent at the right end of passwords. As long as I've been using
mixed case in passwords, I've considered the tail end of an
alpha string about as good, and just as likely in practice, to have an
upper case letter as the front. The sample strings I've presented, have
often favored capitals at the front of alpha strings acknowledging a
common user preference; the actual passwords I use, have rarely if ever
been based on patterns and defaults included in my public samples.
As different patterns can create the same passwords, it's possible
passwords I've used could have also been created by one of my default
patterns. On the other hand, I have yet to see any password I've used
in the past 15 years ever appear in any list of common or cracked
passwords.
My first versions of password.pl are discussed below.
With the current pattern based generator I departed quite significantly
from CUE style passwords, while keeping descendants and remote descendants
an important part of the password generator. The "w"
and "e" characters represent the extreme examples of consonant development.
The "w" stands for all single consonants plus the 2 and 3 character consonant
sequences that often start English words. "e" is the end of word equivalent.
The multi character strings in "w" rarely work well in the middle of an
alpha string, but the "e" sequences usually do. The "W" and "E" variations
may include a capital letter. If a capital appears it can only be the first
character of a W or the last character of an E. Unless you want gibberish
W's are only placed at the beginning of an alpha string and E's at the
end (or lower case e's in the middle). In the newest version I'm working on
"w" and "W" are used for words from electronic dictionaries and "B" and "b"
replace the w's for word beginnings.
The most extreme extension of "cvc" that I have made is
"Wvv0ev2evv0E". I believe this is the first time (June 2012) I've publicly
shown how far the “cvc” approach may be extended. Theoretically this
could produce an 18 character alpha string with 4 consonant groups
separated by 3 vowel pairs. The consonant
groups can be 1 to 3 characters but are overwhelmingly 1 or 2.
There are only a very few 3 character strings compared to 2 character
strings in w's and e's. It is highly unlikely a string over 14 characters will
actually be created. The shortest string this pattern can generate is 7.
Though many are not initially obvious, I'd estimate 80 - 90% of the
resulting strings are pronounceable.
Unlike the first password generators which were
limited to specific variations on CUE style passwords, the pattern based generator
has alway been open ended in the structures that could be created. Users could use any
samples provided, modify samples, or develop new patterns of their own design.
Almost any kind of structure could be defined. Besides the pattern characters,
the users could control most of the important odds, and force certain character
variety (as long as the pattern letters included the character types that were
being forced (mixed case, digit, symbols)).
Any pattern developed by a user, with all user changeable
settings could easily be saved just by bookmarking the page with the
desired settings; this works because the password generator uses gets
rather than posts and all user settings are saved in the query part
of the bookmarked URL. Over time, the pattern based generator has had its
flexibility and user control extended multiple times. In the near future I
hope to release a major upgrade that significantly extends it's capabilities.
The main danger is that it may have become so complex that no one can really
understand how the options are intended to be used. The biggest advantage
is that even if the cracker knows what password generator is being used, there are so
many variations, that the cracker has little chance to match specific setings
to a specific user. Even if this can be done, the universe of possible
passwords may be so large, it gives the cracker only a small advantage.
The goal of a password generator should be to provide passwords that have some structure,
cues, or clues, that aid human memory while at the same time being such that they do not
give a cracker who may suspect which generator is in use, sufficient information that
the cracker migth reasonably be expected to create programmable dictionaries which would
greatly reduce their workload compared to a pure brute force attack.
I hope that the
GeodSoft password generators have accomplished this by providing multiple user changeable
core patterns (each user modifiable), each of which generates a few to many different
specific patterns, and with the pattern based generator, gives the
user the ability to define entirely new patterns. Many of the control patterns
overlap in various way. The cracker not only has to figure out how to program all the
patterns, he has to efficiently prevent all identical results from all different patterns,
and do this efficiently. As the potential password universe has greatly expanded beyond
the ability of hard disks, let alone ram, to hold them, this is very much not a
trivial problem. The original password.pl may not have met these goals, but it was
replaced within a month of when I first submitted GeodSoft.com to the major search
engines. By the time GeodSoft.com was getting its first search engine placements
the first pattern based password generator was in place. I believe it does meet these
goals. And for those who choose to develop new password generators, based on either
the original command line version or the similar web version, who is to predict
what choices they have made, unless they share the new code publicly.
Of the three password generators that have clearly attempted something other than
random character strings, the one I know the least about is Mark Burnett's Pawfert.
Based on screen shots and the little I'v read about it, I think I'd likely consider
it a good generator. I have not yet gotten a working copy. I have a Windows laptop
it should run on but my firewall settings are so restrictive they seem to have
blocked every attempt to download this from three different sites that have it. I
have not had the time t figure out what I need to do to relax the rules, at least
temporarily. I've never had a problem downloading a file before, but the sites I'v
tried are Javascript rich and seem to be Windows specific sites. I mostly use and
work on Linux and OpenBSD PCs.
U.S. Government Automated Password Generator
After the State Departmen CUE password generator, the U.S.
Government developed a
password
standard that was specifically designed to create pseudo
random letter sequences that were fully pronounceable by those
who speak English. The character sequences are tested against
pronounce-ability tables as they are built. As soon as an
unpronounceable result is obtained, the password is discarded
and the process restarted. The standard excludes the use of
mixed case or any characters except letters.
There are supposed to be approximately 1.6 trillion pronounceable
ten character sequences. The authors of the standards thought
this was adequate for any purpose. Longer passwords would be
harder but these are not discussed in the standard. The standard
was finalized in late 1993 and was to go into effect March 1994.
Here the U.S government appears to have erred on the side of
user friendliness. It seems clear they did not consider desktop
computers. Windows was the dominant desktop system by the time
the standard was to go into effect. Though Windows 3.1 did not
have passwords, the LANMAN password hashing scheme had been in
use a few years. This was to be used by Windows 95 which was
released the same year the US APG standard was to go into effect.
Professionals working on a government standard should be aware of
products in the development pipeline, including the serious
deficiencies in them. Anyone familiar with LANMAN password hashing
techniques and how it compared with other systems at the same time,
knows it was fundamentally deficient.
1.6 trillion seems marginal when the standard was adopted and
simply inadequate today for anything of particular value. It
only takes 185 days at 100,000 passwords per second (2001) to do all
possibilities. This is not likely to be a major problem for
anyone with significant resources to apply to the problem.
Further, given the nature of the standard, a cracker knows
exactly what character set to work with for any site that employs
the standard, making such sites perfect targets for brute force
attacks. If a site using this standard, experienced a breach in
which accounts and password hashes were compromised, it seems
almost a forgone conclusion that all passwords will be cracked.
Due to how
NT stores password hashes ,
all alphabetic passwords are a bad joke. NT and all later Windows versions,
through at least 8.1, have
included the deficient LANMAN hashing scheme. As of Vista, permanent
storage of these defective password hashes is turned off by default, but may
still be turned on by a simple registry change. In addition, these defective
password hashes are stored in the memory of all current Windows computers except,
domain controlers and perhaps those which use Active Directory and Kerberos.
Hacker tools exist that allow these password hashes saved in memory, to be
used in attacks on any computer which provides networked services to
these already compromised computers. If an NT system is
compromised so that the password hashes are obtained, a fast
desktop system should be able to crack all possible pure alpha
passwords in less than half an hour (as of 2001). Increasing the password
length to NT's maximum of 14 would make no difference.
It seems to me, someone put
end user, ease of use, way ahead of security here.
PWGen
I've very recently (mid 2012) come to realize that I should consider a nod
of approval to PWGen, an open source product that attempts to
balance security and useability. When I first saw a list of PWGen
default passwords, my reaction was that they were clearly not random
alpha numeric strings, but they did not appear to be pronounceable
and certainly had no discernible structure as most of mine do,
which might help a user remember a specific password.
Even the documentation says the passwords are not pronounceable and
are not expected to become so.
On additional study however, I concluded that nearly all alpha strings
were pronounceable. In a file of 1000 twelve character passwords I found
every q whether upper or lower case to be followed by a u. In addition
the letters in each run appear in approximately the same frequency positions. Where
there is a large delta between one character and others it appears in the
same position from one run to the next. Other characters with close counts
switch places from one run to the next. There also seems to be a very rough
approximation to the character use distribution in the English language.
When the security and symbols (including punctuation) options are used together
PWGen's output looks like any random password generator. Interestingly, PWGen
still defaults to 8 character passwords with no symbols or punctuation,
even when the security option is chosen. If you want anything but 8
character passwords you have to tell PWGen. Still it is a tool that with
the passwords lengthened to 12 or so characters and symbols added, but
the security option NOT used, can create passwords that may be
reasonably easy to remember but still acceptably secure. To know if
this security is real or an illusion, we need to know how good John
the Ripper's programmed dictionary is for passwords of a reasonable
length. John the Ripper's programmed dictionary specifically targets
PWGen. If PWGen has to fall back on fully random passwords for security
then it's no better than any other random password generator.
First password.pl
The idea behind my first (and subsequent) password.pl
was the State Department style passwords. I added some
configuration options for multiple purposes. I wanted variations
that would allow for increased complexity and thus a larger
universe of possible passwords. Since I gave the
command line Perl source code away
from the very beginning, I wanted sites to be able to make
adjustments suitable to their own needs. I assumed if they had
a web site and a competent Perl programmer, they could easily
add the password generator to their site if they desired,
especially if it were for internal use only. Beginning in late May 2012
I started offering the source for a slightly enhanced
CGI (web) version. The generated passwords
could be made easier or harder or more or less consistent than
the default settings. More important, sites could make
adjustments to the algorithm both by changing several constants
or even the program logic. Even if a potential intruder knew that
a site he or she was attacking, used password.pl and had their
own copy, there would be no way of knowing that the passwords the
intruder saw, looked like those being generated at a targeted
site.
If just the first consonant of each alpha sequence (in the State
Department default pattern) is randomly allowed to be upper case,
the number of passwords jumps to just under 2 billion. If the upper
casing is truly random then approximately one fourth of the
generated passwords will still be all lower case. If program logic
discards the all lowercase passwords and regenerates a new password
until there is one upper case letter, then the password population
is actually reduced by one fourth even though the resulting
passwords look more complex (and are better passwords).
Password.pl's first letter upper option forced at least one of
the leading alpha characters to be upper case. Both could be;
there will always lower case letters in other positions.
An additional variation on the original State Department password
pattern is to open up the digit positions to symbols.
If all are allowed in either position, then the number of possible
passwords with the pattern CvcnnCvc, where the n's are any non
letter, is over 34 billion.
With any password generator that pseudo randomly builds passwords
to match defined patters, there will be instances where limiting
the displayed passwords by forcing a more complex appearance
reduces the possible number of passwords. Here we are looking
at two character positions that can be a digit or
symbol. If all possibilities are allowed, there will be more
possible passwords but there will be combinations that are all
numbers or all symbols. Users are likely to pick the easier all
numeric combinations. This is identical to the choice discussed
when two words were being combined with two non letter characters.
The issue is still the same with longer character sequences.
The original password.pl actually did something quite different.
There was a variable called symbolOdds. At 0, two digits were
always output. At 1 there would nearly always be only a single
digit. As it increased to 10 there was an increasing chance that
there would be two digits or a digit and a symbol. At 10 there
were always 2 non alphabetic characters. At least one was always
a digit with about a 50% chance the other would be a symbol. When
there was both a digit and a symbol they could occur in either
order. When I wrote the original, I thought two symbols was just
too difficult, so I created a deliberate bias toward the
easier passwords. The current version has no such biases.
An additional option, randomly added an extra consonant following
each consonant but more than two were never added in the whole
password. The resulting character pattern could be represented by
"Cc0vcc0nn0Cc0vcc0" where "c" is a consonant and "C" is an
optionally upper case consonant, "v" is a vowel and "n" is a non
alphabetic character. A "0" indicates the preceding character
may or may not be present. Finally a mixed case option could
override the first upper behavior and pseudo randomly forced all
characters to be mixed case. Any generated password that
happened to have all lower or all upper case letters was
discarded and regenerated.
The original
password.pl default behavior generated a mixture of
passwords ranging from 7 to 10 characters with two alpha
sequences of 3 to 5 characters with either or both of the leading
characters upper case. The alpha sequences were separated by one
or two digits or a digit and a symbol with the digit and symbol
in either order. In any batch of ten passwords there was
usually one or two very easy ones ("Cvcdcvc" where "d" is a
digit) and a few suitable for an admin on an important system.
The configuration options allowed from very easy to too hard to
be practical.
Password Universe Size Versus Strength
If an attacker knows the exact algorithm being used and that it
forces at least one upper case character out of 2 possible upper
case letters at the start of 2 alpha sequences, then they know that if
the first generated character is lower case, the first character
of the second alpha sequence must be upper case. If the all
lower case passwords are allowed to display, then the attacker
needs to generate a larger population to cover all possibilities.
The danger of displaying the all lower case passwords, is that
users are likely to pick the easier of the options presented to
them. Users who pick all lower case, will have their passwords
found by a cracker creating dictionaries from matching patterns
but using only lower case letters. If nearly all users pick the all
lower case and the attacker anticipates this behavior, they
won't bother to include the upper case in their generated
passwords and will get most passwords with a quarter of the
effort. If the ordinary users pick all lower case but the system
administrators pick mixed case, the attacker may get most
accounts but miss the really desirable administrator accounts.
There is no right or wrong choice as
making the best choice depends on correctly anticipating
human behavior (selection from among the displayed passwords,
assuming more than one is shown at a time). It also assumes the
user cannot and does not choose a password which is not displayed.
It also depends
on anticipating an attacker's potential reactions based on the
attacker's assessment of the types of passwords users will choose
and assuming that the attacker knows a password generator is in
use. There are a lot of assumptions in this paragraph. Typically,
unless an attacker has singled out a specific organization for
specific reasons, and done significant research, including social
engineering, it is not likely they will know if a site uses a
password generator, and if so which one, and very unlikely which
settings.
Though there is no right or wrong, a weaker password is a weaker
password regardless of the size of the set of passwords it was
drawn from or how it was created. A seven character password with
six same case letters and a digit will always be a much weaker password
than a password with nine characters including
mixed case letters, a digit and a symbol. No plausible algorithm
or approach to building a custom dictionary will get a poor nine
character password before a "good" seven character password.
!!!!!!1Aa is a poor password because ! is the first typeable character
in the ASCII collating sequence. While some, perhaps most crackers using
brute force will use a frequency based table, some will inevitbly resort
to the simpler ASCII collating sequence. !!!!!!1Aa will last longer than
7 character password but is likely to be one of the first 9 character
passwords to be broken. It's hard to describe a
rational way to find !!!!!!1Aa before finding rud7gek which looks
semi-decent for a seven character password. On the other hand,
depending on the character sequence used, rud7_gocK, may last
much longer than !!!!!!1Aa. (I switched from gecK to gocK, because
gacK, gecK, gicK and gucK, are all dictionary words, though I don't
know what they mean.) A sronger 9 character password might be
~zud7gocK. The "~" (tilde) is the last typeable character in the
ASCII sequence. It is also one of the least frequently used characters
on the keyboard, except in Perl (and perhaps some other programming
languages) regular expression comparisons and substitutions.
Returning to patterned passwords,
it should be obvious when discussing passwords that have a
structure which can be described, if there are options such as
mixed case, the full set, including the weak passwords without
mixed case, will always be larger than the strong set that is
left when the single case passwords are removed. The goal is
to find a few arbitrary locations, which will be obvious to a
user who knows the password, but which we hope the a potential
cracker has no way of knowing.
If each component of the structure is fixed lenth, then there
can only be a very few places capitals may appear when the normal
case is lower case. As more variable length components are added,
the places a capital may appear, begins to approach the full
password length. If multiple patterns are available at a
single site, then very literally a capital may be located
in any location. And if approaches other than the those derived
form the CUE style passwords are in use, the capitals will
include the entire alphabet.
Even with a single stucture such as s0Cc0v2c0Cd3Cc0v2c0Cs0, it's
almost impossible to determine where a capital letter may appear,
though if the cracker were sure of the pattern, he would
know he only needs to be concerned with consonant capitals;
still that is 21 of the 26 letters. This pattern allows
an optional symbol at either or both ends of the password.
It contains two Cc0v2c0C sequences, either of which may
be 3 to 6 letters containing 1 or 2 consonants, 1 or 2
vowels, and 1 or 2 consonants. Any upper case letters
will either be the first or last consonant. 1 to 3 digits
may appear somewhere near the middle of the password.
We are looking at a pattern that can create 7 to 17
character passwords.
Assuming this is for the GeodSoft password generator the
minimum and maximum password lengths will control the
actual lengths that appear. With this pattern a capital
letter may appear in any position from 1 to 16 and with
the zero odds set to .6 there will likely be two or more.
Top 10,000 Passwords
If you are a password cracker, you probably want to know as much as you can about
how users form passwords. There are huge numbers of already cracked passwords
available online. I've done some analysis of Mark Burnett's list of
the top
10,000 passwords from his personal database of cracked passwords.
When discussng common passwords there is an important fact to keep
in mind. In every system that has been cracked, there are nearly always some
passwords that are not cracked. Depending on the system, I believe this
number often ranges from a few percent to around twenty percent. We
know nothing about these passwords except they have resisted the cracker's
efforts to reveal them. The flip side of this is, that the cracked passwords
may well say almost as much about the cracker's assumptions and working methods
and what are believed to be cost effectictive ways of finding passwords
from password hashes, as they do about how user's form passwords, since a not
trivial minority may remain unaccounted for.
From Mark Burnett's list, users show an overwhelming preference for all alphabetic
passwords. 8,326 of the top 10,000 are all alpha. 6 characters is most frequent,
followed by 7, 8, 5 then 4. There are about a tenth as many 9 character alpha passwords
as 4, and the numbers drop quickly for 10, 11, and 12 characters. Not surprisingly,
the next most common group of passwords are all alpha except for 1 character on one end.
There are 754 of these ranging from 5 to 9 characters, with 6 characters being most
popular, followed by 7, 8, 5 then 9. Not surprisingly digits account for the
overwhelming majority of the non alpha characters and 1 is clearly most popular
among the digits. Actually only 2 passwords had a symbol or punctuation mark
attached.
Next come the 560 purely non alpha passwords ranging from 4 to 9 characters. Here
4 characters are most popular, followed by 6, 8, 7, 5 and 9 characters. These passwords
are almost all purely numeric. Some are simply the same digit repeated for the full
password length and most others are numeric sequences.
Of the 286 four digit passwords around 70 are probably dates. These include 1800,
1812, 1900, 1911, 1914, and 1919. I'd guess 1911 and 1919 are just easy to type
and remember. Every year from 1941 to 2005 is represented. I'd guess most are the
birth year of the password user or one of their children. The
only group left with any statistical significance are 178 passwords with the 3 to 5 letters
with 2 or 3 non alpha characters attached at the end. Surprisingly in each case of 6, 8, and 7
character passwords, groups of 3 non alpha characters slightly outnumber the pairs of
of non alpha characters. The non alpha characters are
exclusively digits. There are 14 passwords with a 4 character alpha group and
a 4 digit group.
There are 35 passwords that contain 2 or more digits with one or more letters.
There is 1 four character password, 20 six character, 2 seven character,
11 eight character, and 1 ten character password. Most of these were alpha or
keyboard alapha (qwert) sequences interleaved with numeric sequences up to 12345.
There was an r2d2, r2d2c3po, and a pa55w0rd. The rest made no particular sense to
me.
There were 63 passwords where a single digit split alphabetic groups. Here
there 44 six character passwords, 4 seven character, 14 8 character and 1 nine
character password. The digits appeared to occur in all possible positions without
any preference for any specific position. Here there were several phrases: just4fun,
just4me, all4one, dad2ownu, letme1n; several explicit sexual references; and some "word"
with a number substituted for a letter: passw0rd, k2trix, mounta1n, and drag0n.
For the rest, I could make some guesses at what some may have been shortened
or mangled from, but most made little or no sense to me.
I find it interesting and strange that in all sub sets I've looked at, there
is a clear preference for an even number of characters in a password. Six
always out numbers five, and eight out numbers seven. Only in the full set
which were overwhelmingly single words was this not true, and the set with
one digit added to a letter. Even here where
you would expect a close approximation to a bell curve as the available
number of words is pretty close to a bell curve, that's not the case. Six
character passwords are by far the most common, but there are only a few
more five than four (1173 vs. 1140) and a modest number more seven than
eigth (2053 vs. 1931). Most of the patterns have a fairly clear
explanation. I wonder what reason could there be for a even number over
an odd preference? Perhaps it's simply a preference for something that
suggests order rather than chaos in things.
Eight letters, e, a, r, o, n, i, s, and t account for more than half of all the
characters. Q (lowercase, the password list contains no uppercase letters) is the least
frequent letter with 189 occurances (compared to e which just missed 6000).
1 is the runnaway leading digit appearing 1,376 times; it is more than 6 times as common as the
least common digit, 8 (220 times). When adjusted for the number of times each password
is used, the distributions remains much the same. n, i, and s shift places as do
a number of other characters that were close in their use. Several digits move
past several of the least frequently used letters because 123456, 12345678, 1234,
and 12345 are among the very most frequent at positions 2, 3, 4 and 6 in the list
respectively.
This is the list of every password that contained
any punctuation marks or symbols: iloveyou!, fuck_inside, 0.0.000, 0.0.0.000,
f**k, pic\'s, close-up, homepage-, films+pic+galeries, &, &,
*****, ****, ******, ??????, ?????. The commas and final period are not part of the
passwords. The comma, by far the most common punctuation mark in the English
language, never appears in the 10,000 most common passwords. For those not familiar
with it, "&" is the HTML code for an ampersand; so the only password in the
entire list with an arbitrary non letter attached, which is not a digit, is
homepage-. Not a single keyboard or ASCII sequence with a symbol or punctuation
mark appears. This is not what you might expect given some other common password lists.
The passwords
were converted to lower case. I understand some reasons for doing this.
I do it in my Password Evaluator as it makes it much easier to find dictionary words.
On the other hand it makes it much harder to evaluate the quality of the cracked passwords.
Mark Burnett states "Note that capitalization is not taken into consideration when matching
passwords so this list has been converted to all lowercase letters.
When I asked him about this he replied "What I meant is that in my database when running
queries to get password totals for this particular list I don't take case into account."
I did the same thing when I put up a list of common passwords in 2001. I think the point
is, in nearly all passwords and just about always in common passwords, if there is a
capital letter, it is the first letter. This is one place crackers always check for
upper case, so if you vary your password by upper cassing the first letter you are
doing nothing to make it stronger because the crackers know these patterns and will
always check to see if the first character is a capital letter, even if they are
lazy and check nowhere else.
I've seen, and made, shorter lists before, but after reviewing this list, I still find
myself shaking my head. How much of the weakness is real and how much is it the result of a
large database collected over more than a decade? How much is that many, proably most
passwords,
are from Internet web sites that are not highly sensitive? One recent short list based
on hacked "enterprise," often administrator accounts, suggests the situation is no
different where it matters and the passwords belong to those who ought to know
better. (This list is in mixed case. All capitals that are present are in
the first character position. "Password1" and "password1" are shown as
clearly different passwords.)
Is any part of the user population learning and using better ways? Why
is something that is so basic not even registering with those who ought to be the
ones teaching others how to use good passwords?
More Programmed Dictionary Thoughts
If you are a cracker, and your computer(s) have the speed, memory, and or disk space,
and you are going for all 9 character passwords, you might consider
dictionary attacks before brute force. Likely candidates would include
words and variations, as well as all 4, 5, 6, 7, 8 and 9
character repeats (!!!! . . . ~~~~~~~~~, keyboard sequences, ~!@# . . .
XCVBNM<>? (which would include the familiar "qwerty") and ASCII
sequences, ' !"#', (that's a space, exclamation, quote, and
pound sign) . . . vwxyz{|}~ (which would include ABCD, and
rstuvwxyz). The unfamiliar ASCII symbol sequences might be ommitted
as well as the digits which duplicate the keyboard digit sequences.
It might make sense to pad the repeats and sequences to 9 characters
with following and preceding 5, 4, 3, 2 and 1 character random
sequences. To gain any advantage, all tested passwords would need
to be stored in a hash or database with an efficient lookup
mechanism, so the brute force tool might avoid repeating then.
If I did the calculations correctly, all the above sequences including
a 300,000 word dictionary (fairly large unabridged with relevant
addtions) with 100 word variations each, plus all one, two, and
three character sequences from the 95 character keyboard at the
beginning and end of each word, comes out to a little less than a 7 character,
95 character set of passwords. With the word part limited
to those combinations that create only 9 character passwords, its
a lot less than the 7 character password universe.
Given what we've seen with the top 10,000 passwords, it might
make sense to only do the full 95 characters for one character,
before and after words. Additional digits and only digits for
2 and 3 character seqeuences at the start and end of dictionary
words might make sense. It might make sense to limit repeats and
sequences to 7, 8 and 9 character groups with only 1 and 2 character
random sequences at the ends. These would bring the number way way
below all 7 character brute force passwords. What makes better sense
depends on what you are trying to do. If you are looking for a
decent to moderate yield, and do not plan to do a full 9 character
brute force attack, using the smaller dictionaries makes sense.
If however, after doing the dictionary, and programmed repeats and
sequences, you plan to go onto to a 9 character brute force attack
then using the more comprehensive dictionaries, and
programmed repeats and sequences make better sense. No matter how improbable
many of these will seem compared to our top 10,000 word dictionary
is not really relevant. All dictionary variations, plus every
easily described repeat and sequence group result in passwords that
are relatiely easy to remember and are therefore MUCH more likely
to be used as a password, than a true 9 random character password,
and a database lookup should be faster than the hashing
algorithm.
I do not know of any cracking tool that has the ability
to avoid specific passwords already used in previous attacks. To
get this would require developing an open source cracking tool to add
these abilities, or creating a new cracking tool from scratch,
something out of the reach of nearl all individuals, and
organizations without significant resources and determination.
So, in practice, for anyone or any group lacking the skills to
create or customize a cracking tool to work around programmed
dictionaries, the choice is simple. Without these
skills, one takes brute force as far as resources allow, then
uses dictionary attacks to extend the search to password lengths
which cannot be reached with available brute force. The
dictionary attacks are taken as long, and at each length, as
extensively as seems productive.
There is a fairly good chance that there is no way to include database
lookups efficiently with GPU technology. The whole point of dictionary
lookups is to efficiently search a small part of the entire space
created by character set ^ length. If a significant amount of GPU
technology resources are available, there may be no point in doing
the dictionary lookups of any passwords below the length that the
available GPU technology can search with brute force "adequately
quickly," however that may be defined.
For those with the resources to develop customized cracking tools that
can work around attacks already performed, the limiting factor becomes
disk space. For the suggested dictionaries, I estimated about 64
trillion passwords. Even if there were a database with efficient
retrieval that only used 12 bytes per 9 byte password, that's about 769
terabytes. I'd guess a fast database would likely go over 1000
terabytes. With the developments in GPU password cracking, their speed
in brute force cracking of passwords, may offset the need to go to the
more extended and less probable dictionary attacks.
It's a matter of calculating where the best trade off is between disk
capacity and CPU/GPU capacity.
SALTS
Which Accounts Can Cause Significant Damage
Perhaps the most important question you need to ask yourself is
whether or not persons with ordinary user status have access
that can seriously compromise valuable resources? Depending on
the resource in question simply having it copied and stolen may
or may not be a serious compromise. Having something defaced or
destroyed may be no more than a moderate nuisance if you have reliable
and current backups. One of the most insidious forms of data
damage is data corruption. To be serious this normally requires
long term access to your system. Most data entry clerks have the
ability to corrupt data. If random records are tampered with over an
extended time frame, all your backups will also be corrupt.
This may first show up as apparently unrelated incidents reported
by customers, users or members. At some point, after a lot of damage
has been done, someone will recognize that the complaints are
related, and you have a systemic issue. Denial of service attacks,
which are normally thought of as network attacks can also take
the form of periodic or intermittent failure of essential system
components. This will normally require root or admin level of
access but may come from anyone or where if it is the exploitation
of unpatched system bugs. I'm sure in this short list I've overlooked
something that you recognize as important.
So system admins need to know who can inflict what kinds of damage
and be sure those accounts are protected by adequate passwords. This
will require visible support from the top. It becomes primarily a
user training issue so you are likely to know best how to address
it. Once your are sure that some accounts cannot inflict any significant
damage, these accounts can be allowed to have comparatively weak
passwords. There is no point in wasting valuable resources on accounts
of little significance.
We know that users tend to prefer passwords that are dictionary words,
closely related to themselves, or are clever, crude or erotic.
Unless a user can take one of these and create a personal algorithm
that lets them make passwords you cannot crack, they mostly need
to be cured of these habits. See
Alternative
Manual Passwords.
They will have other preferences that are less dangerous. They like
capitals at the beginning of words. If you require mixed case, let
them have capitals on either or both of two alpha sequences. I'm
pretty sure users don't want any password that contains characters
that they rarely if ever type. Programmers and web developers tend
to use a much broader character set than most other users. Don't
force specific symbols on users. Either give them access to personal
character sets as the updated password.pl will soon have, or let
them choose from a sufficiently broad selection of strong passwords
that includes passwords without the characters they object to.
If you are going to make them use 15 character or longer passwords,
then let
them have lots of repeat characters, character sequences, short
dictionary words or other things that make a password easy to remember.
If you can make sure each password has an upper case letter, a lower
case letter, a digit and and a symbol or punctuation mark, in
arbitrary locations, let the rest be easy. They will still be strong
passwords because of their length and character diversity. Except for
old Windows systems you either crack the whole password, or don't
crack it all. You don't do it bits at a time. While it's obvious that
passwords with easy sets of characters will be a tiny portion of the
possible password universe, this is still a huge set, and it is
very difficult to predict all the ways the many easy to remember
groupings may be combined.
When crackers eventually figure out 15 character passwords, we
do something, like jump to 18 to change the game again. We do
look at the specifics of what have been cracked, and modify
password generators, not to use the kinds of components that
help the crackers, crack long passwords.
Top of Page -
Site Map
Copyright © 2000 - 2014 by George Shaffer. This material may be
distributed only subject to the terms and conditions set forth in
https://geodsoft.com/terms.htm
(or https://geodsoft.com/cgi-bin/terms.pl).
These terms are subject to change. Distribution is subject to
the current terms, or at the choice of the distributor, those
in an earlier, digitally signed electronic copy of
https://geodsoft.com/terms.htm (or cgi-bin/terms.pl) from the
time of the distribution. Distribution of substantively modified
versions of GeodSoft content is prohibited without the explicit written
permission of George Shaffer. Distribution of the work or derivatives
of the work, in whole or in part, for commercial purposes is prohibited
unless prior written permission is obtained from George Shaffer.
Distribution in accordance with these terms, for unrestricted and
uncompensated public access, non profit, or internal company use is
allowed.
|