O'Reilly logo

Regular Expressions Cookbook, 2nd Edition by Steven Levithan, Jan Goyvaerts

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

4.4. Validate Traditional Date Formats

Problem

You want to validate dates in the traditional formats mm/dd/yy, mm/dd/yyyy, dd/mm/yy, and dd/mm/yyyy. You want to use a simple regex that simply checks whether the input looks like a date, without trying to weed out things such as February 31st.

Solution

Solution 1: Match any of these date formats, allowing leading zeros to be omitted:

^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Solution 2: Match any of these date formats, requiring leading zeros:

^[0-3][0-9]/[0-3][0-9]/(?:[0-9][0-9])?[0-9][0-9]$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Solution 3: Match m/d/yy and mm/dd/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:

^(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Solution 4: Match mm/dd/yyyy, requiring leading zeros:

^(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Solution 5: Match d/m/yy and dd/mm/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:

^(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Solution 6: Match dd/mm/yyyy, requiring leading zeros:

^(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Solution 7: Match any of these date formats with greater accuracy, allowing leading zeros to be omitted:

^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|↵
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

We can use the free-spacing option to make this regular expression easier to read:

^(?:
  # m/d or mm/dd
  (1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
  # d/m or dd/mm
  (3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy or /yyyy
/(?:[0-9]{2})?[0-9]{2}$
Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Solution 8: Match any of these date formats with greater accuracy, requiring leading zeros:

^(?:(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])|↵
(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9]))/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

The same solution using the free-spacing option to make it easier to read:

^(?:
  # mm/dd
  (1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])
|
  # dd/mm
  (3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])
)
# /yyyy
/[0-9]{4}$
Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Discussion

You might think that something as conceptually trivial as a date should be an easy job for a regular expression. But it isn’t, for two reasons. Because dates are such an everyday thing, humans are very sloppy with them. 4/1 may be April Fools’ Day to you. To somebody else, it may be the first working day of the year, if New Year’s Day is on a Friday.

The other issue is that regular expressions don’t deal directly with numbers. You can’t tell a regular expression to “match a number between 1 and 31”, for instance. Regular expressions work character by character. We use 3[01]|[12][0-9]|0?[1-9] to match 3 followed by 0 or 1, or to match 1 or 2 followed by any digit, or to match an optional 0 followed by 1 to 9. In character classes, we can use ranges for single digits, such as [1-9]. That’s because the characters for the digits 0 through 9 occupy consecutive positions in the ASCII and Unicode character tables. See Chapter 6 for more details on matching all kinds of numbers with regular expressions.

Because of this, you have to choose how simple or how accurate you want your regular expression to be. If you already know your subject text doesn’t contain any invalid dates, you could use a trivial regex such as \d{2}/\d{2}/\d{4}. The fact that this matches things like 99/99/9999 is irrelevant if those don’t occur in the subject text.

The first two solutions for this recipe are quick and simple, too, and they also match invalid dates, such as 0/0/00 and 31/31/2008. They only use literal characters for the date delimiters, character classes (see Recipe 2.3) for the digits, and the question mark (see Recipe 2.12) to make certain digits optional. (?:[0-9]{2})?[0-9]{2} allows the year to consist of two or four digits. [0-9]{2} matches exactly two digits. (?:[0-9]{2})? matches zero or two digits. The noncapturing group (see Recipe 2.9) is required, because the question mark needs to apply to the character class and the quantifier {2} combined. [0-9]{2}? matches exactly two digits, just like [0-9]{2}. Without the group, the question mark makes the quantifier lazy, which has no effect because {2} cannot repeat more than two times or fewer than two times.

Solutions 3 through 6 restrict the month to numbers between 1 and 12, and the day to numbers between 1 and 31. We use alternation (see Recipe 2.8) inside a group to match various pairs of digits to form a range of two-digit numbers. We use capturing groups here because you’ll probably want to capture the day and month numbers anyway.

The final two solutions are a little more complex, so we’re presenting these in both condensed and free-spacing form. The only difference between the two forms is readability. JavaScript does not support free-spacing. The final two solutions allow all of the date formats, just like the first two examples. The difference is that the last two use an extra level of alternation to restrict the dates to 12/31 and 31/12, disallowing invalid months, such as 31/31.

Variations

If you want to search for dates in larger bodies of text instead of checking whether the input as a whole is a date, you cannot use the anchors ^ and $. Merely removing the anchors from the regular expression is not the right solution. That would allow any of these regexes to match 12/12/2001 within 9912/12/200199, for example. Instead of anchoring the regex match to the start and end of the subject, you have to specify that the date cannot be part of longer sequences of digits.

This is easily done with a pair of word boundaries. In regular expressions, digits are treated as characters that can be part of words. Replace both ^ and $ with \b. As an example:

\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

See Also

This chapter has several other recipes for matching dates and times. Recipe 4.5 shows how to validate traditional date formats more accurately. Recipe 4.6 shows how to validate traditional time formats. Recipe 4.7 shows how to validate date and time formats according to the ISO 8601 standard.

Recipe 6.7 explains how you can create a regular expression to match a number in a given range of numbers.

Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required