O'Reilly logo

Regular Expressions Cookbook, 2nd Edition by Steven Levithan, Jan Goyvaerts

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

4.19. Validate Password Complexity

Problem

You’re tasked with ensuring that any passwords chosen by your website users meet your organization’s minimum complexity requirements.

Solution

The following regular expressions check many individual conditions, and can be mixed and matched as necessary to meet your business requirements. At the end of this section, we’ve included several JavaScript code examples that show how you can tie these regular expressions together as part of a password security validation routine.

Length between 8 and 32 characters

^.{8,32}$
Regex options: Dot matches line breaks (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Standard JavaScript doesn’t have a “dot matches line breaks” option. Use [\s\S] instead of a dot in JavaScript to ensure that the regex works correctly even for crazy passwords that include line breaks:

^[\s\S]{8,32}$
Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

ASCII visible and space characters only

If this next regex matches a password, you can be sure it includes only the characters AZ, az, 09, space, and ASCII punctuation. No control characters, line breaks, or characters outside of the ASCII table are allowed:

^[\x20-\x7E]+$
Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

If you want to additionally prevent the use of spaces, use ^[\x21-\x7E]+$ instead.

One or more uppercase letters

ASCII uppercase letters only:

[A-Z]
Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Any Unicode uppercase letter:

\p{Lu}
Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, PCRE, Perl, Ruby 1.9

If you want to check for the presence of any letter character (not limited to uppercase), enable the “case insensitive” option or use [A-Za-z]. For the Unicode case, you can use \p{L}, which matches any kind of letter from any language.

One or more lowercase letters

ASCII lowercase letters only:

[a-z]
Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Any Unicode lowercase letter:

\p{Ll}
Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, PCRE, Perl, Ruby 1.9

One or more numbers

[0-9]
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

One or more special characters

ASCII punctuation and spaces only:

[!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Anything other than ASCII letters and numbers:

[^A-Za-z0-9]
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Disallow three or more sequential identical characters

This next regex is intended to rule out passwords like 111111. It works in the opposite way of the others in this recipe. If it matches, the password doesn’t meet the condition. In other words, the regex only matches strings that repeat a character three times in a row.

(.)\1\1
Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby
([\s\S])\1\1
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Example JavaScript solution, basic

The following code combines five password requirements:

  • Length between 8 and 32 characters.

  • One or more uppercase letters.

  • One or more lowercase letters.

  • One or more numbers.

  • One or more special characters (ASCII punctuation or space characters).

function validate(password) {
    var minMaxLength = /^[\s\S]{8,32}$/,
        upper = /[A-Z]/,
        lower = /[a-z]/,
        number = /[0-9]/,
        special = /[ !"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]/;

    if (minMaxLength.test(password) &&
        upper.test(password) &&
        lower.test(password) &&
        number.test(password) &&
        special.test(password)
    ) {
        return true;
    }

    return false;
}

The validate function just shown returns true if the provided string meets the password requirements. Otherwise, false is returned.

Example JavaScript solution, with x out of y validation

This next example enforces a minimum and maximum password length (8–32 characters), and additionally requires that at least three of the following four character types are present:

  • One or more uppercase letters.

  • One or more lowercase letters.

  • One or more numbers.

  • One or more special characters (anything other than ASCII letters and numbers).

function validate(password) {
    var minMaxLength = /^[\s\S]{8,32}$/,
        upper = /[A-Z]/,
        lower = /[a-z]/,
        number = /[0-9]/,
        special = /[^A-Za-z0-9]/,
        count = 0;

    if (minMaxLength.test(password)) {
        // Only need 3 out of 4 of these to match
        if (upper.test(password)) count++;
        if (lower.test(password)) count++;
        if (number.test(password)) count++;
        if (special.test(password)) count++;
    }

    return count >= 3;
}

As before, this modified validate function returns true if the provided password meets the overall requirements. If not, it returns false.

Example JavaScript solution, with password security ranking

This final code example is the most complicated of the bunch. It assigns a positive or negative score to various conditions, and uses the regexes we’ve been looking at to help calculate an overall score for the provided password. The rankPassword function returns a number from 04 that corresponds to the password rankings “Too Short,” “Weak,” “Medium,” “Strong,” and “Very Strong”:

var rank = {
    TOO_SHORT: 0,
    WEAK: 1,
    MEDIUM: 2,
    STRONG: 3,
    VERY_STRONG: 4
};

function rankPassword(password) {
    var upper = /[A-Z]/,
        lower = /[a-z]/,
        number = /[0-9]/,
        special = /[^A-Za-z0-9]/,
        minLength = 8,
        score = 0;

    if (password.length < minLength) {
        return rank.TOO_SHORT; // End early
    }

    // Increment the score for each of these conditions
    if (upper.test(password)) score++;
    if (lower.test(password)) score++;
    if (number.test(password)) score++;
    if (special.test(password)) score++;

    // Penalize if there aren't at least three char types
    if (score < 3) score--;

    if (password.length > minLength) {
        // Increment the score for every 2 chars longer than the minimum
        score += Math.floor((password.length - minLength) / 2);
    }

    // Return a ranking based on the calculated score
    if (score < 3) return rank.WEAK; // score is 2 or lower
    if (score < 4) return rank.MEDIUM; // score is 3
    if (score < 6) return rank.STRONG; // score is 4 or 5
    return rank.VERY_STRONG; // score is 6 or higher
}

// Test it...
var result = rankPassword("password1"),
    labels = ["Too Short", "Weak", "Medium", "Strong", "Very Strong"];

alert(labels[result]); // -> Weak

Because of how this password ranking algorithm is designed, it can serve two purposes equally well. First, it can be used to give users guidance about the quality of their password while they’re still typing it. Second, it lets you easily reject passwords that don’t rank at whatever you choose as your minimum security threshold. For example, the condition if(result <= rank.MEDIUM) can be used to reject any password that isn’t ranked as “Strong” or “Very Strong.”

Discussion

Users are notorious for choosing simple or common passwords that are easy to remember. But easy to remember doesn’t necessarily translate into something that keeps their account and your company’s information safe. It’s therefore typically necessary to protect users from themselves by enforcing minimum password complexity rules. However, the exact rules to use can vary widely between businesses and systems, which is why this recipe includes numerous regexes that serve as the raw ingredients to help you cook up whatever combination of validation rules you choose.

Limiting each regex to a specific rule brings the additional benefit of simplicity. As a result, all of the regexes shown thus far are fairly straightforward. Following are a few additional notes on each of them:

Length between 8 and 32 characters

To require a different minimum or maximum length, change the numbers used as the upper and lower bounds for the quantifier {8,32}. If you don’t want to specify a maximum, use {8,}, or remove the $ anchor and change the quantifier to {8}.

All of the programming languages covered by this book provide a simple and efficient way to determine the length of a string. However, using a regex allows you to test both the minimum and maximum length at the same time, and makes it easier to mix and match password complexity rules by choosing from a list of regexes.

ASCII visible and space characters only

As mentioned earlier, this regex allows the characters AZ, az, 09, space, and ASCII punctuation only. To be more specific about the allowed punctuation characters, they are !, ", #, $, %, &, ', (, ), *, +, -, ., /, :, ;, <, =, >, ?, @, [, \, ], ^, _, `, {, |, }, ~, and comma. In other words, all the punctuation you can type using a standard U.S. keyboard.

Limiting passwords to these characters can help avoid character encoding related issues, but keep in mind that it also limits the potential complexity of your passwords.

Uppercase letters

To check whether the password contains two or more uppercase letters, use [A-Z].*[A-Z]. For three or more, use [A-Z].*[A-Z].*[A-Z] or (?:[A-Z].*){3}. If you’re allowing any Unicode uppercase letters, just change each [A-Z] in the preceding examples to \p{Lu}. In JavaScript, replace the dots with [\s\S].

Lowercase letters

As with the “uppercase letters” regex, you can check whether the password contains at least two lowercase letters using [a-z].*[a-z]. For three or more, use [a-z].*[a-z].*[a-z] or (?:[a-z].*){3}. If you’re allowing any Unicode lowercase letters, change each [a-z] to \p{Ll}. In JavaScript, replace the dots with [\s\S].

Numbers

You can check whether the password contains two or more numbers using [0-9].*[0-9], and [0-9].*[0-9].*[0-9] or (?:[0-9].*){3} for three or more. In JavaScript, replace the dots with [\s\S].

We didn’t include a listing for matching any Unicode decimal digit (\p{Nd}), because it’s uncommon to treat characters other than 09 as numbers (although readers who speak Arabic or Hindi might disagree!).

Special characters

Use the same principles shown for letters and numbers if you want to require more than one special character. For instance, using [^A-Za-z0-9].*[^A-Za-z0-9] would require the password to contain at least two special characters.

Note that [^A-Za-z0-9] is different than \W (the negated version of the \w shorthand for word characters). \W goes beyond [^A-Za-z0-9] by additionally excluding the underscore, which we don’t want to do here. In some regex flavors, \W also excludes any Unicode letter or decimal digit from any language.

Disallow three or more sequential identical characters

This regex matches repeated characters using backreferences to a previously matched character. Recipe 2.10 explains how backreferences work. If you want to disallow any use of repeated characters, change the regex to (.)\1. To allow up to three repeated characters but not four, use (.)\1\1\1 or (.)\1{3}.

Remember that you need to check whether this regular expression doesn’t match your subject text. A match would indicate that repeated characters are present.

Example JavaScript solutions

The three blocks of JavaScript example code each use this recipe’s regular expressions a bit differently.

The first example requires all conditions to be met or else the password fails. In the second example, acing the password test requires three out of four conditional requirements to be met. The third example, titled , is probably the most interesting. It includes a function called rankPassword that does what it says on the tin and ranks passwords by how secure they are. It can thus help provide a more user-friendly experience and encourage users to choose strong passwords.

The rankPassword function’s password ranking algorithm increments and decrements an internal password score based on multiple conditions. If the password’s length is less than the specified minimum of eight characters, the function returns early with the numeric equivalent of “Too Short.” Not including at least three character types incurs a one-point penalty, but this can be balanced out because every two additional characters after the minimum of eight adds a point to the running score.

The code can of course be customized to further improve it or to meet your particular requirements. However, it works quite well as-is, regardless of what you throw at it. As a sanity check, we ran it against several hundred of the known most common (and therefore most insecure) user passwords. All came out ranked as either “Too Short” or “Weak,” which is exactly what we were hoping for.

Caution

Using JavaScript to validate passwords in a web browser can be very beneficial for your users, but make sure to also implement your validation routine on the server. If you don’t, it won’t work for users who disable JavaScript or use custom scripts to circumvent your client-side validation.

Variations

Validate multiple password rules with a single regex

Up to this point, we’ve split password validation into discrete rules that can be tested using simple regexes. That’s usually the best approach. It keeps the regexes readable, and makes it easier to provide error messages that identify why a password isn’t up to code. It can even help you rank a password’s complexity, as we’ve seen. However, there may be times when you don’t care about all that, or when one regex is all you can use. In any case, it’s common for people to want to validate multiple password rules using a single regex, so let’s take a look at how it can be done. We’ll use the following requirements:

  • Length between 8 and 32 characters.

  • One or more uppercase letters.

  • One or more lowercase letters.

  • One or more numbers.

Here’s a regex that pulls it all off:

^(?=.{8,32}$)(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9]).*
Regex options: Dot matches line breaks (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

This regex can be used with standard JavaScript (which doesn’t have a “dot matches line breaks” option) if you replace each of the five dots with [\s\S]. Otherwise, you might fail to match some valid passwords that contain line breaks. Either way, though, the regex won’t match any invalid passwords.

Notice how this regular expression puts each validation rule into its own lookahead group at the beginning of the regex. Because lookahead does not consume any characters as part of a match (see Recipe 2.16), each lookahead test runs from the very beginning of the string. When a lookahead succeeds, the regex moves along to test the next one, starting from the same position. Any lookahead that fails to find a match causes the overall match to fail.

The first lookahead, (?=.{8,32}$), ensures that any match is between 8 and 32 characters long. Make sure to keep the $ anchor after {8,32}, otherwise the match will succeed even when there are more than 32 characters. The next three lookaheads search one by one for an uppercase letter, lowercase letter, and digit. Because each lookahead searches from the beginning of the string, they use .* before their respective character classes. This allows other characters to appear before the character type that they’re searching for.

By following the approach shown here, it’s possible to add as many lookahead-based password tests as you want to a single regex, so long as all of the conditions are always required.

The .* at the very end of this regex is not actually required. Without it, though, the regex would return a zero-length empty string when it successfully matches. The trailing .* lets the regex include the password itself in successful match results.

Caution

It’s equally valid to write this regex as ^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9]).{8,32}$, with the length test coming after the lookaheads. Unfortunately, writing it this way triggers a bug in Internet Explorer 5.5–8 that prevents it from working correctly. Microsoft fixed the bug in the new regex engine included in IE9.

See Also

Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.2 explains how to match nonprinting characters. Recipe 2.3 explains character classes. Recipe 2.4 explains that the dot matches any character. Recipe 2.5 explains anchors. Recipe 2.7 explains how to match Unicode characters. Recipe 2.9 explains grouping. Recipe 2.10 explains backreferences. Recipe 2.12 explains repetition. Recipe 2.16 explains lookaround.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required