4.4. Validate Traditional Date Formats

Problem

You want to validate dates in the traditional formats mm/dd/yy, mm/dd/yyyy, dd/mm/yy, and dd/mm/yyyy. You want to use a simple regex that simply checks whether the input looks like a date, without trying to weed out things such as February 31st.

Solution

Match any of these date formats, allowing leading zeros to be omitted:

^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match any of these date formats, requiring leading zeros:

^[0-3][0-9]/[0-3][0-9]/(?:[0-9][0-9])?[0-9][0-9]$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match m/d/yy and mm/dd/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:

^(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match mm/dd/yyyy, requiring leading zeros:

^(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match d/m/yy and dd/mm/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:

^(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match dd/mm/yyyy, requiring leading zeros:

^(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match any of these date formats with greater accuracy, allowing leading zeros to be omitted:

^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|↵
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match any of these date formats with greater accuracy, requiring leading zeros:

^(?:(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])|↵
(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9]))/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

The free-spacing option makes these last two a bit more readable:

^(?:
  # m/d or mm/dd
  (1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
  # d/m or dd/mm
  (3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy or /yyyy
/(?:[0-9]{2})?[0-9]{2}$
Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^(?:
  # mm/dd
  (1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])
|
  # dd/mm
  (3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])
)
# /yyyy
/[0-9]{4}$
Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Discussion

You might think that something as conceptually trivial as a date should be an easy job for a regular expression. But it isn’t, for two reasons. Because dates are such an everyday thing, humans are very sloppy with them. 4/1 may be April Fools’ Day to you. To somebody else, it may be the first working day of the year, if New Year’s Day is on a Friday. The solutions shown match some of the most common date formats.

The other issue is that regular expressions don’t deal directly with numbers. You can’t tell a regular expression to “match a number between 1 and 31”, for instance. Regular expressions work character by character. We use 3[01]|[12][0-9]|0?[1-9] to match 3 followed by 0 or 1, or to match 1 or 2 followed by any digit, or to match an optional 0 followed by 1 to 9. In character classes, we can use ranges for single digits, such as [1-9]. That’s because the characters for the digits 0 through 9 occupy consecutive positions in the ASCII and Unicode character tables. See Chapter 6 for more details on matching all kinds of numbers with regular expressions.

Because of this, you have to choose how simple or how accurate you want your regular expression to be. If you already know your subject text doesn’t contain any invalid dates, you could use a trivial regex such as \d{2}/\d{2}/\d{4}. The fact that this matches things like 99/99/9999 is irrelevant if those don’t occur in the subject text. You can quickly type in this simple regex, and it will be quickly executed.

The first two solutions for this recipe are quick and simple, too, and they also match invalid dates, such as 0/0/00 and 31/31/2008. They only use literal characters for the date delimiters, and character classes (see Recipe 2.3) for the digits and the question mark (see Recipe 2.12) to make certain digits optional. (?:[0-9]{2})?[0-9]{2} allows the year to consist of two or four digits. [0-9]{2} matches exactly two digits. (?:[0-9]{2})? matches zero or two digits. The noncapturing group (see Recipe 2.9) is required, because the question mark needs to apply to the character class and the quantifier {2} combined. [0-9]{2}? matches exactly two digits, just like [0-9]{2}. Without the group, the question mark makes the quantifier lazy, which has no effect because {2} cannot repeat more than two times or fewer than two times.

Solutions 3 through 6 restrict the month to numbers between 1 and 12, and the day to numbers between 1 and 31. We use alternation (see Recipe 2.8) inside a group to match various pairs of digits to form a range of two-digit numbers. We use capturing groups here because you’ll probably want to capture the day and month numbers anyway.

The final two solutions are a little more complex, so we’re presenting these in both condensed and free-spacing form. The only difference between the two forms is readability. JavaScript does not support free-spacing. The final solutions allow all of the date formats, just like the first two examples. The difference is that the last two use an extra level of alternation to restrict the dates to 12/31 and 31/12, disallowing invalid months, such as 31/31.

Variations

If you want to search for dates in larger bodies of text instead of checking whether the input as a whole is a date, you cannot use the anchors ^ and $. Merely removing the anchors from the regular expression is not the right solution. That would allow any of these regexes to match 12/12/2001 within 9912/12/200199, for example. Instead of anchoring the regex match to the start and end of the subject, you have to specify that the date cannot be part of longer sequences of digits.

This is easily done with a pair of word boundaries. In regular expressions, digits are treated as characters that can be part of words. Replace both ^ and $ with \b. As an example:

\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

See Also

Recipes 4.5, 4.6, and 4.7

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.