The Reality of Passphrase Token Attacks

In my blog, Password Constraints and Their Unintended Security Consequences, I advocate for the use of passphrases. Embedded in the comments section, one of our readers Ben makes a very astute observation:

What happens when attackers start guessing by the word instead of by the letter? Then a four-word passphrase effectively becomes a four-character password.

What Ben is describing is called a “passphrase token attack,” and it’s real. With a good passphrase, the attack is not much of a threat though. First, a definition, then I’ll explain why.

What’s a token?

In the context of a passphrase token attack, a token is a grouping of letters, AKA a word. The passphrase made famous by the comic xkcd, “correct horse battery staple,” is 28 characters long. But, in a passphrase token attack, I wouldn’t try to guess all possible combinations of 28 letters. I would guess combinations of entire words, or tokens, each representing a group of characters.

The math behind passphrases

One might assume, as Ben did, that a four-word password is the same as a four-character password. But that’s a math error. Specifically, 95≠1,000,000.

Here’s why: There are 95 letters, numbers, and symbols that can be used for each character in a password. However, there are over a million words in the English language. For simplicity’s sake, let’s call it an even million words. If I’m thinking of a single character, then at most you have to try 95 characters to guess it. But if I ask you to guess which word I am thinking of, then you may need to guess a million words before you have guessed the word that I am thinking of.

So while there are 95^4 possible combinations of characters for a four-character password, there are over 1,000,000^4 combinations of words for a four-word password.

You might be thinking “But nobody knows a million words,” and you are correct. According to some research, the average person uses no more than 10,000. So, as an attacker, I’d try combinations of only the most common words. Actually, I may be able to get by with a dictionary as small as 5,000 words. But 5,000^4 is still a whole lot more combinations than 95^4.

Here is one list of 5,000 of the most commonly used words in the English language, and another of the 10,000 most commonly used words. Choosing an uncommon word is great, but even words in the top 5,000 are still far better than a complex nine-character password.

Why and how to use a passphrase

There are two major strengths of passphrases:

Passphrases allow for longer, more secure passwords. It’s length that makes a passphrase a killer password. A password/passphrase that’s 20 lowercase characters long is stronger than a 14 character password that uses uppercase letters, lowercase letters, numbers, and symbols.
Passphrases can be easy to remember, making creating and using passwords a lot less painful. “Aardvarks eat at the diner” is easy to remember and, at 26 characters long and including uppercase and lowercase letters, is more than 9 trillion times stronger than the password “eR$48tx!53&(oPZe”, or any other complex, 16-character password, and potentially uncrackable.

Why potentially uncrackable? Because “aardvark” is not one of the 10,000 most frequently used words and, if a word is not in the attacker’s dictionary, then you win. This is why it helps to use foreign-language words. Even common foreign words require an attacker to increase the size of their dictionary, the very factor that makes passphrase token attacks impractical. Learning a word in an obscure foreign language can be fun and pretty much assures a passphrase won’t be cracked.

As we’ve seen, cracking a passphrase can be far more difficult than cracking a password, unless you make one of two common mistakes. The first is choosing a combination of words without enough characters. “I am a cat,” for example. Although it’s four words, it’s only 10 characters long and an attacker can use a conventional brute force attack, even for a passphrase. Spaces between words can be used to increase the length and complexity of passphrases.

The second most common mistake is using a common phrase as a passphrase. I can create a dictionary of the top 1,000,000 common phrases and, if you’re using one, then it only takes at most 1,000,000 guesses to crack (about the same as a complex three-character password).

So create your own unique passphrases and you’re all set. Most experts recommend passphrases be at least 20 characters long. But if you only go from eight characters to 16 upper and lower case letters, you’ll already be 430 trillion times better off. And if you’re creating a passphrase for a site requiring a number or symbol, it’s fine to add the same number and symbol to the end of your phrase, provided the passphrase is long to begin with.

As a side note, according to math, a five word passphrase is generally stronger than a four word passphrase, but don’t get too hung up on that.

So Ben, you are 100% right about the reality of passphrase token attacks. But, with a strong passphrase, the math says it doesn’t matter. Note: If this stuff fascinates you, or you suffer from insomnia, you might enjoy “Linguistic Cracking of Passphrases using Markov Chains.” You can download the PDF or watch this riveting thriller on YouTube. Sweet dreams.

About the Author

Randy Abrams

Senior Security Analyst

Randy Abrams is Senior Security Analyst and frequent contributor to the Webroot Blog and Community. Prior to joining Webroot, Randy was a research director responsible for analyzing and reporting the test results of antimalware products.