GeodSoft logo   GeodSoft

Good and Bad Passwords How-To

A Review of Some Automated Password Generators
Automated Password Generators

Today, small dictionaries or common password lists get a significant percentage of passwords. Ways to extend and improve the cracking dictionaries were reviewed. When the limits of improving the dictionaries and rules are reached, we've shown how to build front end generators to replace the cracking tools low yield, brute force methods. These build passwords along the lines of standard good password recommendations. The yield should be much better than brute force. As computing power increases, password cracking tools should be able to generate ever larger numbers and variations on what have generally been regarded as strong passwords.

The response to automated cracking methods lies in letting computers create passwords for us. An automated password generator can almost instantaneously create as many strong passwords as we want. If such a tool is properly configurable it should be able to generate passwords with any degree of structure or randomness as is desired. The default control pattern in the current password.pl generates passwords from 147 trillion possibilities. In line with the large set, many of these passwords are more difficult than most users and administrators are used to dealing with. There are alternative control patterns and option settings that create more pseudo word like passwords which will be discussed later.

State Department Passwords

In the early eighties the State Department built an automated password generator as part of their Controlled User Environment. This generator did not generate random passwords but highly structured passwords that were easy to remember. Truly random sequences of the 95 printable ASCII characters become difficult to remember and type even at 4 characters, and beyond 6 characters very few normal humans can remember a truly random sequence, at least not without substantial effort and frequent use.

The State Department analysts realized the problem of truly random passwords and recognized that easy to remember passwords could be built from three small, easy to remember pieces. The State Department passwords conformed to a very specific pattern, CVC99CVC or consonant vowel consonant, digit digit, consonant vowel consonant. Click here to see dynamically generated examples. The letters were all upper case. This looks too simple to have a significant number of passwords but the arithmetic is quite simple: 21 * 5 * 21 * 10 * 10 * 21 * 5 * 21 = 486,202,500. That's nearly 500 million passwords, but that's only slightly larger than the word, non letter, word pattern that we worked through in about three hours. This was fine for the mid 80's but not adequate for the twenty first century.

Because of the consonant vowel consonant sequence, the alphabetic sequences are fairly easy to pronounce. Anyone can remember two digits. Most of these eight character passwords are really quite easy to remember which is a significant accomplishment given that the password universe is several hundred times larger than the words in the English language. Of course a user only needs to remember his or her password, not the nearly 500 million that are possible.

From the opening comments of my first command line password generator to these pages and the CGI password generator that I wrote in 2000 to the current pattern based generator, I've always acknowledged the State Department's Controlled User Environment (CUE) password generator as the inspiration for those I've written. What I did not realize until mid 2012, as I was reviewing and updating some pages and programs on the site, was just how central to my thinking about passwords that the CUE password generator has been. One could easily say it is the figurative corner stone to my password thinking.

The analysts and programmers who developed the CUE password generator have never been acknowledged. I never saw a line of CUE code or met anyone who worked on the the project. I don't know if one inspired individual thought of and wrote the generator or if it was a small team. Who ever it was, I and any I may have influenced, owe them a debt of gratitude. I could never have written any of the generators I have written if I had not used the CUE password generator. In my opinion, before my generators, which are direct descendants of the CUE generator, except for the CUE password generator, I've never seen a decent let alone a good password generator.

With three exceptions, nearly every other generator I've seen (not inspired by mine) seems to be obsessed with randomness and entropy, with no regard to usability. These produce the worst passwords I've seen. Learning a truly random 8 character password from the 95 typeable characters, requires an excessive amount of effort followed by continued frequent use. Now that passwords really should be at least 12 characters long, it's insanity. This guarantees not only will they be written down (which as I explain elsewhere is not necessarily bad) but in an immediately accessible and usually obvious location. The favorite spot is a post-it on the monitor. As most security breeches, at least historically, have been internal and not network or external, this immediately makes these random passwords worse than the number one favorite password, "password" and obvious and trivially crackable ones like Attack1. At least most systems can lock out people who sit down and try to guess these, and someone may come and investigate.

If you don't follow follow passwords, you probably don't know what is most popular. On the other hand, an Internet search for "common passwords" shows up a lot of results. Unfortunately this interest seems very superficial. It seems a fair number of people are interested in what the #1 password, or the top 10 or top 25 are. The later seems to have been spurred on by an Internet security firm's release of a top 25 list late in 2011. A very different list was released by another firm which listed it as the top 25 used in enterprises. I hate these lists. So much of the talk is along the lines if your password is on this list, you need to change it. The implication is often that if yours is not on the list, then it's probably OK.

What BS. If your password is in any top 10,000 password list, any list of compromised passwords ever released, any dictionary, book, movie, song, or been published anywhere in print or on the Internet you need to change it just as quickly. Any capable cracker is interested in at least the top 600 quadrillion (that's all 9 character passwords made of all 95 typeable characters on most English keyboards), and really serious crackers are interested in much larger lists. All crackers are interested in any password that appears in any list they can get their hands on; they know if anyone has ever even seen it before it is MUCH MUCH more likely to be a password than any random 8 character string.

Mark Burnett responded to the previous paragraph with "About your comments on top password lists, I thought I would defend them some. Of course the fact that your password isn't on that list doesn't guarantee that it it strong, but these lists are important nonetheless. It is important to note that if you took all the unique passwords from all my lists, there are only about 2.3 million total unique passwords that I have seen. But if you take that further, there are only 553,956 passwords that have shown up on the list more than once. Despite how many thousands of passwords I add to that list, the number of unique passwords grows very slowly."

He continued "What that means is that most people select passwords that are on that list [his [10.000 Top Passwords list] and very few people select passwords that don't already appear on there at least once. This is definitely illustrated by the fact that 93% of those showed up on the linkedin leak. So you definitely cannot disregard a list that represents such a large percentage of everyone's passwords."

According to Mark Burnett's statistics, which are clearly supported by the linkedin leak he referred to, 91% of all known passwords appear in his Top 10,000 list. This isn't just most but a huge, an overwhelming majority. I think he may be suggesting, like all other people, all crackers are not equally competent, all don't have the time, knowledge, or resoruces I alluded to when I said 600 quadrillion. If your password is on any common password list you are just giving it to every cracker. If it's not on any of these lists, especially his long one which is far more comprehensive than than any other common password list released to date, you at least have some chance of not getting cracked, if a system you have an account on is compromised.

He has a point. It actually takes about 10,000 times the power of a fast desktop to bring all 9 character passwords into a practical time frame (just about 1 month). This probably takes it out of the realm of most traditional "enthuiast" crackers and into the realm of organized crime and government agencies.

I was thinking that people should be making passwords that will protect their bank account, credit card, investment, and other accounts that may be the target of organized crime and where it could really hurt if a criminal gained access to such an account. There are suggestions and claims and more claims that a single desktop like computer equipped with the right Grapics Processing Unit (GPU) and cracking software designed to use the GPU, can crack passwords perhaps 100 times faster than the times listed in my most recent password cracking times table or the defaults set in the Crack Time Calculator.

What should a password not on a common password list look like? I'd say first consider what Mark Burnett has to say about pass phrases versus passwords. An easy to remember and type pass phrase may actually be faster to type and stronger than a conventional "strong" password. If you don't like this approach look at what can be done with my password generator. For important or sensitive accounts you should seriously consider 12 character passwords if you want a margin of safety, over what we know can be done by a well equiped malicious organization.

If you use one a real word in a password remember that no matter how many changes you make to it, it's still going to look like the same word to most crakers and cracking tools. These have been developed for more than 25 years to find every change to a word that anyone has ever thought of. Cracking tools do nothing better than find the real word in a mangled word. Think of that piece of your password as already known to the crackers. It's what you do with the rest of your password that will give it away or save it. If you start the password with your word, uppercase the first letter, and add some numbers and or letters to the end, you've probably given your password away. If you start your password with something else, your chances improve. If you uppercase the last letter, or any other letter but the first, your chances improve. Crackers are great at finding changes to a word, including dropped letters, especially vowels. One change they are not good at is finding something ADDED arbitrarily somewhere near the middle, as long as it's not a letter that creates a new recognizable word. If you add a letter there is always a chance that the new string is actually a different rotated, shifted, inverted, keyboard rotated, or keyboard shifted word you cannot recognize, but a cracker will very likely spot during a dictionary attack.

If you must use a word, be sure it is LESS THAN than two thirds of the the total password length: 5 characters in 8 (too short), 5 in 9 (still too short), 6 in 10 (marginal), 7 in 11 (fair), 7 in 12 (potentially good); 8 in 13 (potentially very strong). It's not hard to calculate password strength (or relative strength). The size of character set raised to the power of the length of the password tells how many possible passwords there are. You get 26 charcters for using lower case letters, and 52 when you add upper case to lower case, but it doesn't count if it's the first letter of the password or the word in the password. You get 10 more charcters when you add a digit, but dont use a 1. The digit "1" accounts for 43% of all digits used in passwords. So you don't get to count digits unless you use at least one that's not a "1". If you use a punctuation mark or symbol, you get to add 33 more for 95 total. The strength is then determined by dividing the total number of passwords by the estimated cracks (or encryptions) per second to determine how long it would take to to crack all passwords. On average, a single password will take half that time. If you've made a common mistake that allows a dictionary attact to work, the strength calculation is meaningless because a much much faster way to find it is availabe, and it's the cracker's preferred method.

One good way to start a password is with a symbol you never type. One that you don't even know what it is called or used for is often good. It's not expected. What is expected is upper casing the first letter of a word and adding 1 to 3 digits to the end. If the password is not simply a word, ninety some percent of the population does just that. The strength of all such passwords is ZERO. That's what crackers look for first.

Returning to the State Department's password generator what did it teach us? Primarily that good passwords need to balance security and useability. In their day the CUE passwords were adequately secure. Secure enough that I still have one or two on important accounts that almost match the State Department's password structure. Most of the patterns I use allow significant variability in the generated passwords. Occasionally I select one that is weaker than it should be. In addition to being strong enough for the mid 1980's the CUE passwords were among the easiest decent passwords I've ever seen to remember.

For most purposes it's good that technology improves, but this is not true for passwords. People's memory and intelligence have not changed as computers have progressed. Some would argue that education is not as good as it used to be. We are incapable of creating anything by mental processes that approximates randomness. We are excruciatingly slow at creating passwords that are at all strong and still reasonably easy to remember.

What I've tried to do over time is to extend the philosophy behind the CUE passwords to ever larger universes of possible passwords. My first generators clearly worked of off the original CUE structure. But I added variations, most of which had some adverse effect on useability or memorability. I think this negative effect was well within the acceptable range. In a sense it had to be acceptable; 3 small easy to remember pieces simply cannot provide adequate security in the second decade of 21st century. The other approach, which I am fully open to, fully pronounceable, all lower case passwords, simply reqiures more character length, than the standard recommended by the U.S. Government's Automated Password Generator, was willing to consider.

I question if anyone can remember strong passwords they do not use for months at a time. Any password I use several times a weak produced by any of my generators, I learn to remember without any aids. When I create a new password I expect to use with some regularity, I use it right away, multiple times. I think beyond purely conscious thought, there is muscle memory at work with frequently typed words, including passwords.

I was very pleased to learn recently that upper case letters are almost non existent at the right end of passwords. As long as I've been using mixed case in passwords, I've considered the tail end of an alpha string about as good, and just as likely in practice, to have an upper case letter as the front. The sample strings I've presented, have often favored capitals at the front of alpha strings acknowledging a common user preference; the actual passwords I use have rarely if ever been based on patterns and defaults included in my public samples. Of course the actual passwords output frequently could also be produced by one of the public samples, and there is nothing to stop me from using such a password. On the other hand, I have yet to see any password I've used in the past 15 years ever appear in any list of common or cracked passwords.

My first versions of password.pl are discussed below. With the current pattern based generator I departed quite significantly from CUE style passwords, while keeping descendants and remote descendants an important part of the password generator. The "w" and "e" characters represent the extreme examples of consonant development. The "w" stands for all single consonants plus the 2 and 3 character consonant sequences that often start English words. "e" is the end of word equivalent. The multi character strings in "w" rarely work well in the middle of an alpha string, but the "e" sequences usually do. The "W" and "E" variations may include a capital letter. If a capital appears it can only be the first character of a W or the last character of an E. Unless you want gibberish W's are only placed at the beginning of an alpha string and E's at the end (or lower case e's in the middle).

The most extreme extension of "cvc" that I have made is "Wvv0ev2evv0E". I believe this is the first time (June 2012) I've publicly shown how far the “cvc” approach may be extended. Theoretically this could produce an 18 character alpha string with 4 consonant groups separated by 3 vowel pairs. The consonant groups can be 1 to 3 characters but are overwhelmingly 1 or 2. There are only a very few 3 character strings compared to 2 character strings in w's and e's. It is highly unlikely a string over 14 characters will actually be created. The shortest string this pattern can generate is 7. Though many are not initially obvious, I'd estimate 80 - 90% of the resulting strings are pronounceable.

Unlike the first password generators which were limited to specific variations on CUE style passwords, the pattern based generator has alway been open ended in the structures that could be created. Users could use any samples provided, modify samples, or develop new patterns of their own design. Almost any kind of structure could be defined. Besides the pattern characters, the users could control most of the important odds, and force certain character variety (as long as the pattern letters included the character types that were being forced (mixed case, digit, symbols)).

Any pattern developed by a user, with all user changeable settings could easily be saved just by bookmarking the page with the desired settings; this works because the password generator uses gets rather than posts and all user settings are saved in the query part of the bookmarked URL. Over time the pattern based generator has had its flexibility and user control extended multiple times. In the near future I hope to release a major upgrade that significantly extends it's capabilities. The main danger is that it may have become so complex that no one can really understand how the options are intended to be used. The biggest advantage is that even if the cracker knows what password generator is being used, there are so many variations, that the cracker has little chance to match specific setings to a specific user. Even if this can be done, the universe of possible passwords may be so large, it gives the cracker only a small advantage.

The goal of a password generator should be to provide passwords that have some structure, cues, or clues, that aid human memory while at the same time being such that they do not give a cracker who may suspect which generator is in use, sufficient information that the cracker migth reasonably be expected to create programmable dictionaries which would greatly reduce their workload compared to a pure brute force attack.

I hope that the GeodSoft password generators have accomplished this by providing multiple user changeable core patterns (each user modifiable), each of which generates a few to many different specific patterns, and with the pattern based generator, gives the user the ability to define entirely new patterns. Many of the control patterns overlap in various way. The cracker not only has to figure out how to program all the patterns, he has to efficiently prevent all identical results from all different patterns, and do this efficiently. As the potential password universe has greatly expanded beyond the ability of hard disks, let alone ram, to hold them, this is very much not a trivial problem. The original password.pl may not have met these goals, but it was replaced within a month of when I first submitted GeodSoft.com to the major search engines. By the time GeodSoft.com was getting its first search engine placements the first pattern based password generator was in place. I believe it does meet these goals. And for those who choose to develop new password generators, based on either the original command line version or the similar web version, who is to predict what choices they have made, unless they share the new code publicly.

Of the three password generators that have clearly attempted something other than random character strings, the one I know the least about is Mark Burnett's ?Pawfert? Based on screen shots and the little I'v read about it, I think I'd likely consider it a good generator. I have not yet gotten a working copy. I have a Windows laptop it should run on but my firewall settings are so restrictive they seem to have blocked every attempt to download this from three different sites that have it. I have not had the time t figure out what I need to do to relax the rules, at least temporarily. I've never had a problem downloading a file before, but the sites I'v tried are Javascript rich and seem to be Windows specific sites. I mostly use and work on Linux and OpenBSD PCs.

U.S. Government Automated Password Generator

Later, the U.S. Government developed a password standard that was specifically designed to create pseudo random letter sequences that were fully pronounceable by those who speak English. The character sequences are tested against pronounce-ability tables as they are built. As soon as an unpronounceable result is obtained, the password is discarded and the process restarted. The standard excludes the use of mixed case or any characters except letters.

There are supposed to be approximately 1.6 trillion pronounceable ten character sequences. The authors of the standards thought this was adequate for any purpose. Longer passwords would be harder but these are not discussed in the standard. The standard was finalized in late 1993 and was to go into effect March 1994.

Here the U.S government appears to have erred on the side of user friendliness. It seems clear they did not consider desktop computers. Windows was the dominant desktop system by the time the standard was to go into effect. Though Windows 3.1 did not have passwords, the LANMAN password hashing scheme had been in use a few years. This was to be used by Windows 95 which was released the same year the US APG standard was to go into effect. Professionals working on a government standard should be aware of products in the development pipeline, including the serious deficiencies in them. Anyone familiar with LANMAN password hashing techniques and how it compared with other systems at the same time, knows it was fundamentally deficient.

1.6 trillion seems marginal when the standard was adopted and simply inadequate today for anything of particular value. It only takes 185 days at 100,000 passwords per second to do all possibilities. This is not likely to be a major problem for anyone with significant resources to apply to the problem. Further, given the nature of the standard, a cracker knows exactly what character set to work with for any site that employs the standard, making such sites perfect targets for brute force attacks. If a site using this standard, experienced a breach in which accounts and password hashes were compromised, it seems almost a forgone conclusion that all passwords will be cracked.

Due to how NT stores password hashes , all alphabetic passwords are a bad joke. NT and later 2000 included the deficient LANMAN hashing scheme used in Windows 95. If an NT system is compromised so that the password hashes are obtained, a fast desktop system should be able to crack all possible pure alpha passwords in less than half an hour (as of 2001). Increasing the password length to NT's maximum of 14 would make no difference. It seems to me, someone put end user, ease of use, way ahead of security here.

PWGen

I've very recently (mid 2012) come to realize that I should consider a nod of approval to PWGen, an open source product that attempts to balance security and useability. When I first saw a list of PWGen default passwords, my reaction was that they were clearly not random alpha numeric strings, but they did not appear to be pronounceable and certainly had no discernible structure as most of mine do, which might help a user remember a specific password. Even the documentation says the passwords are not pronounceable and are not expected to become so.

On additional study however, I concluded that nearly all alpha strings were pronounceable. In a file of 1000 twelve character passwords I found every q whether upper or lower case to be followed by a u. In addition the letters in each run appear in approximately the same frequency positions. Where there is a large delta between one character and others it appears in the same position from one run to the next. Other characters with close counts switch places from one run to the next. There also seems to be a very rough approximation to the character use distribution in the English language.

When the security and symbols (including punctuation) options are used together PWGen's output looks like any random password generator. Interestingly, PWGen still defaults to 8 character passwords with no symbols or punctuation, even when the security option is chosen. If you want anything but 8 character passwords you have to tell PWGen. Still it is a tool that with the passwords lengthened to 12 or so characters and symbols added, but the security option NOT used, can create passwords that may be reasonably easy to remember but still acceptably secure. To know if this security is real or an illusion, we need to know how good John the Ripper's programmed dictionary is for passwords of a reasonable length. John the Ripper's programmed dictionary specifically targets PWGen. If PWGen has to fall back on fully random passwords for security then it's no better than any other random password generator.

First password.pl

The idea behind my first (and subsequent) password.pl was the State Department style passwords. I added some configuration options for multiple purposes. I wanted variations that would allow for increased complexity and thus a larger universe of possible passwords. Since I gave the command line Perl source code away from the very beginning, I wanted sites to be able to make adjustments suitable to their own needs. I assumed if they had a web site and a competent Perl programmer, they could easily add the password generator to their site if they desired, especially if it were for internal use only. Beginning in late May 2012 I started offering the source for a slightly enhanced CGI (web) version. The generated passwords could be made easier or harder or more or less consistent than the default settings. More important, sites could make adjustments to the algorithm both by changing several constants or even the program logic. Even if a potential intruder knew that a site he or she was attacking, used password.pl and had their own copy, there would be no way of knowing that the passwords the intruder saw, looked like those being generated at a targeted site.

If just the first consonant of each alpha sequence (in the State Department default pattern) is randomly allowed to be upper case, the number of passwords jumps to just under 2 billion. If the upper casing is truly random then approximately one fourth of the generated passwords will still be all lower case. If program logic discards the all lowercase passwords and regenerates a new password until there is one upper case letter, then the password population is actually reduced by one fourth even though the resulting passwords look more complex (and are better passwords).

Password.pl's first letter upper option forced at least one of the leading alpha characters to be upper case. Both could be; there will always lower case letters in other positions.

An additional variation on the original State Department password pattern is to open up the digit positions to symbols. If all are allowed in either position, then the number of possible passwords with the pattern CvcnnCvc, where the n's are any non letter, is over 34 billion.

With any password generator that pseudo randomly builds passwords to match defined patters, there will be instances where limiting the displayed passwords by forcing a more complex appearance reduces the possible number of passwords. Here we are looking at two character positions that can be a digit or symbol. If all possibilities are allowed, there will be more possible passwords but there will be combinations that are all numbers or all symbols. Users are likely to pick the easier all numeric combinations. This is identical to the choice discussed when two words were being combined with two non letter characters. The issue is still the same with longer character sequences.

The original password.pl actually did something quite different. There was a variable called symbolOdds. At 0, two digits were always output. At 1 there would nearly always be only a single digit. As it increased to 10 there was an increasing chance that there would be two digits or a digit and a symbol. At 10 there were always 2 non alphabetic characters. At least one was always a digit with about a 50% chance the other would be a symbol. When there was both a digit and a symbol they could occur in either order. When I wrote the original, I thought two symbols was just too difficult, so I created a deliberate bias toward the easier passwords. The current version has no such biases.

An additional option, randomly added an extra consonant following each consonant but more than two were never added in the whole password. The resulting character pattern could be represented by "Cc0vcc0nn0Cc0vcc0" where "c" is a consonant and "C" is an optionally upper case consonant, "v" is a vowel and "n" is a non alphabetic character. A "0" indicates the preceding character may or may not be present. Finally a mixed case option could override the first upper behavior and pseudo randomly forced all characters to be mixed case. Any generated password that happened to have all lower or all upper case letters was discarded and regenerated.

The original password.pl default behavior generated a mixture of passwords ranging from 7 to 10 characters with two alpha sequences of 3 to 5 characters with either or both of the leading characters upper case. The alpha sequences were separated by one or two digits or a digit and a symbol with the digit and symbol in either order. In any batch of ten passwords there was usually one or two very easy ones ("Cvcdcvc" where "d" is a digit) and a few suitable for an admin on an important system. The configuration options allowed from very easy to too hard to be practical.

Password Universe Size Versus Strength

If an attacker knows the exact algorithm being used and that it forces at least one upper case character out of 2 possible upper case letters at the start of 2 alpha sequences, then they know that if the first generated character is lower case, the first character of the second alpha sequence must be upper case. If the all lower case passwords are allowed to display, then the attacker needs to generate a larger population to cover all possibilities. The danger of displaying the all lower case passwords, is that users are likely to pick the easier of the options presented to them. Users who pick all lower case, will have their passwords found by a cracker creating dictionaries from matching patterns but using only lower case letters. If nearly all users pick the all lower case and the attacker anticipates this behavior, they won't bother to include the upper case in their generated passwords and will get most passwords with a quarter of the effort. If the ordinary users pick all lower case but the system administrators pick mixed case, the attacker may get most accounts but miss the really desirable administrator accounts.

There is no right or wrong choice as making the best choice depends on correctly anticipating human behavior (selection from among the displayed passwords, assuming more than one is shown at a time). It also depends on anticipating an attacker's potential reactions based on the attacker's assessment of the types of passwords users will choose.

Though there is no right or wrong, a weaker password is a weaker password regardless of the size of the set of passwords it was drawn from or how it was created. A seven character password with six same case letters and a digit will always be a much weaker password than a password with nine characters including mixed case letters, a digit and a symbol. No plausible algorithm or approach to building a custom dictionary will get a poor nine character password before a "good" seven character password.

!!!!!!1Aa is a poor password because ! is the first typeable character in the ASCII collating sequence. While some, perhaps most crackers using brute force will use a frequency based table, some will inevitbly resort to the simpler ASCII collating sequence. !!!!!!1Aa will last longer than 7 character password but is likely to be one of the first 9 character passwords to be broken. It's hard to describe a rational way to find !!!!!!1Aa before finding rud7gek which looks semi-decent for a seven character password. On the other hand, depending on the character sequence used, rud7_gocK, may last much longer than !!!!!!1Aa. (I switched from gecK to gocK, because gacK, gecK, gicK and gucK, are all dictionary words, though I don't know what they mean.) A sronger 9 character password might be ~zud7gocK. The "~" (tilde) is the last typeable character in the ASCII sequence. It is also one of the least frequently used characters on the keyboard, except in Perl (and perhaps some other programming language) regular expression comparisons and substitutions.

Returning to patterned passwords, it should be obvious when discussing passwords that have a structure which can be described, if there are options such as mixed case, the full set, including the weak passwords without mixed case, will always be larger than the strong set that is left when the single case passwords are removed. The goal is to find a few arbitrary locations, which will be obvious to a user who knows the password, but which we hope the a potential cracker has no way of knowing.

If each component of the structure is fixed lenth, then there can only be a very few places capials may appear when the normal case is lower case. As more variable length components are added, the places a capital may appear, begins to approach the full password length. If multiple patterns are available at a single site, then very literally a capital may be located in any location. And if approaches other than the those derived form the CUE style passwords are in use, the capitals will include the entire alphabet.

Even with single stucture such as s0Cc0v2c0Cd3Cc0v2c0Cs0 it's almost impossible to determine where a capital letter may appear, though if the cracker were sure of the pattern, he would know he only needs to be concerned with consonant capitals; still that is 21 of the 26 letters. This pattern allows an optional symbol at either or both ends of the password. It contains two Cc0v2c0C sequences, either of which may be 3 to 6 letters containing 1 or 2 consonants, 1 or 2 vowels, and 1 or 2 consonants. Any upper case letters will either be the first or last consonant. 1 to 3 digits may appear somewhere near the middle of the password. We are looking at a pattern that can create 7 to 17 character passwords.

Assuming this is for the GeodSoft password generator the minimum and maximimu password lengths will control the actual lengths that appear. With this pattern a capital letter may appear in any positions from 1 to 16. The only way to stop a capital from being the last letter is to make a 17 character password likely by having that as the maximum length, with zero odds set above .5 and max zero characters set to 6.

Top 10,000 Passwords

If you are a password cracker, you probably want to know as much as you can about how users form passwords. There are huge numbers of already cracked passwords available online. I've done some analysis of Mark Burnett's list of the top 10,000 passwords from his personal database of cracked passwords. When discussng common passwords there is an important fact to keep in mind. In every system that has been cracked, there are nearly always some passwords that are not cracked. Depending on the system, I believe this number often ranges from a few percent to around twenty percent. We know nothing about these passwords except they have resisted the cracker's efforts to reveal them. The flip side of this is, that the cracked passwords may well say almost as much about the cracker's assumptions and working methods and what are believed to be cost effectictive ways of finding passwords from password hashes, as they do about how user's form passwords, since a not trivial minority remain unaccounted for.

From Mark Burnett's list, users show an overwhelming preference for all alphabetic passwords. 8,326 of the top 10,000 are all alpha. 6 characters is most frequent, followed by 7, 8, 5 then 4. There are about a tenth as many 9 character alpha passwords as 4, and the numbers drop quickly for 10, 11, and 12 characters. Not surprisingly, the next most common group of passwords are all alpha except for 1 character on one end. There are 754 of these ranging from 5 to 9 characters, with 6 characters being most popular, followed by 7, 8, 5 then 9. Not surprisingly digits account for the overwhelming majority of the non alpha characters and 1 is clearly most popular among the digits. Actually only 2 passwords had a symbol or punctuation mark attached.

Next come the 560 purely non alpha passwords ranging from 4 to 9 characters. Here 4 characters are most popular, followed by 6, 8, 7, 5 and 9 characters. These passwords are almost all purely numeric. Some are simply the same digit repeated for the full password length and most others are numeric sequences. Of the 286 four digit passwords around 70 are probably dates. These include 1800, 1812, 1900, 1911, 1914, and 1919. I'd guess 1911 and 1919 are just easy to type and remember. Every year from 1941 to 2005 is represented. I'd guess most are the birth year of the password user or one of their children. The only group left with any statistical significance are 178 passwords with the 3 to 5 letters with 2 or 3 non alpha characters attached at the end. Surprisingly in each case, 6, 8, and 7 character passwords, groups of 3 non alpha characters slightly outnumber the pairs of of non alpha characters. The non alpha characters are exclusively digits. There are 14 passwords with a 4 character alpha group and a 4 digit group.

The only other arrangements that have double digit representation are 23 six character passwords where a non alpha (read digit) separates 1 alpha from the the other 4, plus 21 where the digit splits the leters into 3 and 2. There are also 11 six character passwords where 2 digits split the letters 3 and 1. Other than to say that the remaining arrangements involve more interleaving of letters and digits there is not much to generalize. Not including arrangements already discussed there is 1 character arrangment that appears eight times, 1 appears seven times, 4 six times, 2 five times, 2 four times and 2 three times. Seventeen character arrangements appear 2 times and 34 appear 1 time each.

Eight letters, e, a, r, o, n, i, s, and t account for more than half of all the characters. Q (lowercase, the password list contains no uppercase letters) is the least frequent letter with 189 occurances (compared to e which just missed 6000). 1 is the runnaway leading digit appearing 1,376 times; it is more than 6 times as common as the least common digit, 8 (220 times). When adjusted for the number of times each password is used, the distributions remains much the same. n, i, and s shift places as do a number of other characters that were close in their use. Several digits move past several of the least frequently used letters because 123456, 12345678, 1234, and 12345 are among the very most frequent at positions 2, 3, 4 and 6 in the list respectively.

This is the list of every password that contained any punctuation marks or symbols: iloveyou!, fuck_inside, 0.0.000, 0.0.0.000, f**k, pic\'s, close-up, homepage-, films+pic+galeries, &amp, &, *****, ****, ******, ??????, ?????. The commas and final period are not part of the passwords. The comma, by far the most common punctuation mark in the English language, never appears in the 10,000 most common passwords. For those not familiar with it, "&" is the HTML code for an ampersand; so the only password in the entire list with an arbitrary non letter attached, which is not a digit, is homepage-. Not a single keyboard or ASCII sequence with a symbol or punctuation mark appears. This is not what you might expect given some other common password lists.

The passwords were converted to lower case. I understand some reasons for doing this. I do it in my Password Evaluator as it makes it much easier to find dictionary words. On the other hand it makes it much harder to evaluate the quality of the cracked passwords. Mark Burnett states "Note that capitalization is not taken into consideration when matching passwords so this list has been converted to all lowercase letters. When I asked him about this he replied "What I meant is that in my database when running queries to get password totals for this particular list I don't take case into account." I did the same thing when I put up a list of common passwords in 2001. I think the point is, in nearly all passwords and just about always in common passwords, if there is a capital letter, it is the first letter. This is one place crackers always check for upper case, so if you vary your password by upper cassing the first letter you are doing nothing to make it stronger because the crackers know these patterns and will always check to see if the first character is a capital letter, even if they are lazy and check nowhere else.

I've seen, and made, shorter lists before, but after reviewing this list, I stll find myself shaking my head. How much of the weakness is real and how much is it the result of a large database collected over more than a decade? How much is that many, proably most passwords, are from Internet web sites that are not highly sensitive? One recent short list based on hacked "enterprise," often administrator accounts, suggests the situation is no different where it matters and the passwords belong to those who ought to know better. (This list is in mixed case. All capitals that are present are in the first character position. "Password1" and "password1" are shown as clearly different passwords.) Is any part of the user population learning and using better ways? Why is something that is so basic not even registering with those who ought to be the ones teaching others how to use good passwords?

More Programmed Dictionary Thoughts

Before reviewing the top 10,000 passwords I had the following thoughts. If you are a cracker, and your computer has the memory and or disk space, and you are going for all 9 character passwords, you might do all dictionary words and variations, as well as all 4, 5, 6, 7, 8 and 9 character repeats and sequences (!!!! . . . ~~~~~~~~~, '  !"#' (that's a space, exclamation, quote, and pound sign) . . . vwxyz{|}~ (which would include 1234 . . . rstuvwxyz) as well as ~!@# . . . XCVBNM<>? (which would include "qwerty" and all similar keyboard sequences) followed and preceded by all 5, 4, 3, 2 and 1 character random character sequences, and store all these in a hash or database with an efficient lookup mechanism.

If I did the calculations correctly all the above sequences plus a 300,000 word dictionary (fairly large unabridged with relevant addtions) with 100 word variations each, and all one, two, and three character sequences from the 95 character keyboard at the beginning and end, comes out to a little less than a 7 character, 95 character set of passwords. With the dictionary part limited to those combinations that create only 9 character passwords, its a lot less than the 7 character password universe. There does seem a rational way to go after !!!!!!1Aa before rud7gek after all.

In light of what we've seen with the top 10,000 passwords, it might make sense to only do the full 95 characters for one character, and digits only for 2 and 3 character seqeuences at the start and end of dictionary words. It might makes sense to limit repeats and sequences to 7, 8 and 9 character groups with only 1 and 2 character random sequences at the ends. These would bring the number way way below all 7 character brute force passwords. What makes better sense depends on what you are trying to do. If you are looking for a decent to moderate yield, and do not plan to do a full 9 character brute force attack, using the smaller dictionary does make sense.

If however, after doing the dictionary, and programmed repeats and sequences, you plan to go onto to a 9 character brute force attack then you should stick to the more comprehensive dictionary, and programmed repeats and sequence approach. No matter how improbable many of these will seem compared to our top 10,000 word dictionary that's not really relevant. All dictionary variations, plus every easily described repeat and sequence group result in passwords that are relatiely easy to remember and are therefore MUCH more likely to be used as a password, than a true 9 random character password, and the database lookup should be much faster than the hashing algorithm.

The main issue here is having enough disk space. I estimated about 64 trillion passwords. Even if you could find a database with efficient retrieval that only used 12 bytes per 9 byte password, that's about 769 terabytes. With a fast database I'd guess it would likely go over 1000 terabytes. Even today that's rather a lot to come up with unless you are Google or NSA. I suppose at present there is no choice but to scale back and go only for the more likely passwords.

When technology has advanced so that disk space in no longer an obstacle, if you are going to do the brute force anyhow, you ought to try everything you can describe that seems even moderately more likely to be used as a password than a random sequence. If you're mostly dealing with repeats and sequences, plus generated random ends, these are simple enough to do all in memory, but if you are going to save any time in the brute force phase you still need to save each used password to an efficient database, so the brute force can look up each created password before doing the computationally intensive hashing.

Which Accounts Can Cause Significant Damage

Perhaps the most important question you need to ask yourself is whether or not persons with ordinary user status have access that can seriously compromise valuable resources? Depending on the resource in question simply having it copied and stolen may or may not be a serious compromise. Having something defaced or destroyed may be no more than a moderate nuisance if you have reliable and current backups. One of the most insidious forms of data damage is data corruption. To be serious this normally requires long term access to your system. Most data entry clerks have the ability to corrupt data. If random records are tampered with over an extended time frame, all your backups will also be corrupt.

This may first show up as apparently unrelated incidents reported by customers, users or members. At some point, after a lot of damage has been done. Someone will recognize that the complaints are related and you have a systemic issue. Denial of service attacks, which are normally thought of as network attacks can also take the form of periodic or intermittent failure of essential system components. This will normally require root or admin level of access but may come from anyone or where if it is the exploitation of unpatched system bugs. I'm sure in this short list I've overlooked something that you recognize as important.

So system admins need to know who can inflict what kinds of damage and be sure those accounts are protected by adequate passwords. This will require visible support from the top. It becomes primarily a user training issue so you are likely to know best how to address it. Once your are sure that some accounts cannot inflict any significant damage, these accounts can be allowed to have comparatively weak passwords. There is no point in wasting valuable resources on accounts of little significance.

We know that users tend to prefer passwords that are dictionary words, closely related to themselves, or are clever, crude or erotic. Unless a user can take one of these and create a personal algorithm that lets them make passwords you cannot crack, they mostly need to be cured of these habits. See Alternative Manual Passwords. They will have other preferences that are less dangerous. They like capitals at the beginning of words. If you require mixed case, let them have capitals on either or both of two alpha sequences. I'm pretty sure users don't want any password that contains characters that they rarely if ever type. Programmers and web developers tend to use a much broader character set than most other users. Don't force specific symbols on users. Either give them access to personal character sets as the updated password.pl will soon have, or let them choose from a sufficiently broad selection of strong passwords that includes passwords without the characters they object to.

If you are going to make them use 15 character or longer passwords, then let them have lots of repeat characters, character sequences, short dictionary words or other things that make a password easy to remember. If you can make sure each password has an upper case letter, a lower case letter, a digit and and a symbol or punctuation mark, in arbitrary locations, let the rest be easy. They will still be strong passwords because of their length and character diversity. Except for old Windows systems you either crack the whole password, or don't crack it all. You don't do it bits at a time. While it's obvious that passwords with easy sets of characters will be a tiny portion of the possible password universe, this is still a huge set, and it is very difficult to predict all the ways the many easy to remember groupings may be combined.

When crackers eventually figure out 15 character passwords, we do something, like jump to 18 to change the game again. We do look at the specifics of what have been cracked, and modify password generators, not to use the kinds of components that help the crackers, crack long passwords.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 
Home >
How-To >
Good Passwords >
password_generators.htm


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2014, George Shaffer. Terms and Conditions of Use.