O'Reilly logo

Regular Expressions Cookbook, 2nd Edition by Steven Levithan, Jan Goyvaerts

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

4.5. Validate Traditional Date Formats, Excluding Invalid Dates

Problem

You want to validate dates in the traditional formats mm/dd/yy, mm/dd/yyyy, dd/mm/yy, and dd/mm/yyyy, as shown in Recipe 4.4. But this time, you also want to weed out invalid dates, such as February 31st.

Solution

C#

The first solution requires the month to be specified before the day. The regular expression works with a variety of flavors:

^(?<month>[0-3]?[0-9])/(?<day>[0-3]?[0-9])/(?<year>(?:[0-9]{2})?[0-9]{2})$
Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10

This is the complete solution implemented in C#:

DateTime foundDate;
Match matchResult = Regex.Match(SubjectString,
    "^(?<month>[0-3]?[0-9])/(?<day>[0-3]?[0-9])/" +
    "(?<year>(?:[0-9]{2})?[0-9]{2})$");
if (matchResult.Success) {
    int year = int.Parse(matchResult.Groups["year"].Value);
    if (year < 50) year += 2000;
    else if (year < 100) year += 1900;
    try {
        foundDate = new DateTime(year,
            int.Parse(matchResult.Groups["month"].Value),
            int.Parse(matchResult.Groups["day"].Value));
    } catch {
        // Invalid date
    }
}

The second solution requires the day to be specified before the month. The only difference is that we’ve swapped the names of the capturing groups in the regular expression.

^(?<day>[0-3]?[0-9])/(?<month>[0-3]?[0-9])/(?<year>(?:[0-9]{2})?[0-9]{2})$
Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10

The C# code is unchanged, except for the regular expression:

DateTime foundDate;
Match matchResult = Regex.Match(SubjectString,
    "^(?<day>[0-3]?[0-9])/(?<month>[0-3]?[0-9])/" +
    "(?<year>(?:[0-9]{2})?[0-9]{2})$");
if (matchResult.Success) {
    int year = int.Parse(matchResult.Groups["year"].Value);
    if (year < 50) year += 2000;
    else if (year < 100) year += 1900;
    try {
        foundDate = new DateTime(year,
            int.Parse(matchResult.Groups["month"].Value),
            int.Parse(matchResult.Groups["day"].Value));
    } catch {
        // Invalid date
    }
}

Perl

The first solution requires the month to be specified before the day. The regular expression works with all flavors covered in this book.

^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

This is the complete solution implemented in Perl:

@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
$validdate = 0;
if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) 
{
    $month = $1;
    $day = $2;
    $year = $3;
    $year += 2000 if $year < 50;
    $year += 1900 if $year < 100;
    if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 ||
                                          $year % 400 == 0)) {
    	$validdate = 1 if $day >= 1 && $day <= 29;
    } elsif ($month >= 1 && $month <= 12) {
        $validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1];
    }
}

The second solution requires the day to be specified before the month. The regular expression is exactly the same. The Perl code swaps the meaning of the first two capturing groups.

@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
$validdate = 0;
if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) 
{
    $day = $1;
    $month = $2;
    $year = $3;
    $year += 2000 if $year < 50;
    $year += 1900 if $year < 100;
    if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 ||
                                          $year % 400 == 0)) {
    	$validdate = 1 if $day >= 1 && $day <= 29;
    } elsif ($month >= 1 && $month <= 12) {
        $validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1];
    }
}

Pure regular expression

You can solve this problem with one regular expression without procedural code, if that is all you can use in your application.

Month before day:

^(?:
  # February (29 days every year)
  (?<month>0?2)/(?<day>[12][0-9]|0?[1-9])
|
  # 30-day months
  (?<month>0?[469]|11)/(?<day>30|[12][0-9]|0?[1-9])
|
  # 31-day months
  (?<month>0?[13578]|1[02])/(?<day>3[01]|[12][0-9]|0?[1-9])
)
# Year
/(?<year>(?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing
Regex flavors: .NET, Perl 5.10, Ruby 1.9
^(?:
  # February (29 days every year)
  (0?2)/([12][0-9]|0?[1-9])
|
  # 30-day months
  (0?[469]|11)/(30|[12][0-9]|0?[1-9])
|
  # 31-day months
  (0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9])
)
# Year
/((?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby
^(?:(0?2)/([12][0-9]|0?[1-9])|(0?[469]|11)/(30|[12][0-9]|0?[1-9])|↵
(0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]))/((?:[0-9]{2})?[0-9]{2})$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Day before month:

^(?:
  # February (29 days every year)
  (?<day>[12][0-9]|0?[1-9])/(?<month>0?2)
|
  # 30-day months
  (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[469]|11)
|
  # 31-day months
  (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[13578]|1[02])
)
# Year
/(?<year>(?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing
Regex flavors: .NET, Perl 5.10, Ruby 1.9
^(?:
  # February (29 days every year)
  ([12][0-9]|0?[1-9])/(0?2)
|
  # 30-day months
  (30|[12][0-9]|0?[1-9])/([469]|11)
|
  # 31-day months
  (3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02])
)
# Year
/((?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby
^(?:([12][0-9]|0?[1-9])/(0?2)|(30|[12][0-9]|0?[1-9])/([469]|11)|↵
(3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02]))/((?:[0-9]{2})?[0-9]{2})$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Regex with procedural code

There are essentially two ways to accurately validate dates with a regular expression. One method is to use a simple regex that merely captures groups of numbers that look like a month/day/year combination, and then use procedural code to check whether the date is correct.

The main benefit of this method is that you can easily add additional restrictions, such as limiting dates to certain periods. Many programming languages provide specific support for dealing with dates. The C# solution uses .NET’s DateTime structure to check whether the date is valid and return the date in a useful format, all in one step.

We used the first regex from Recipe 4.4 that allows any number between 0 and 39 for the day and month. That makes it easy to change the format from mm/dd/yy to dd/mm/yy by changing which capturing group is treated as the month. When we’re using named capture, that means changing the names of the capturing groups in the regular expression. When we’re using numbered capture, that means changing the references to the numbered groups in the procedural code.

Pure regular expression

The other method is to do everything with a regular expression. We can use the same technique of spelling out the alternatives as we did for the more final solutions presented in Recipe 4.4. The solution is manageable, if we take the liberty of treating every year as a leap year, allowing the regex to match February 29th regardless of the year. Allowing February 29th only on leap years would require us to spell out all the years that are leap years, and all the years that aren’t.

The problem with using a single regular expression is that it no longer neatly captures the day and month in a single capturing group. We now have three capturing groups for the month, and three for the day. When the regex matches a date, only three of the seven groups in the regex will actually capture something. If the month is February, groups 1 and 2 capture the month and day. If the month has 30 days, groups 3 and 4 return the month and day. If the month has 31 days, groups 5 and 6 take action. Group 7 always captures the year.

Perl 5.10, Ruby 1.9, and .NET help us in this situation. Their regex flavors allow multiple named capturing groups to share the same name. See the section Groups with the same name in Recipe 2.11 for details. We take advantage of this by using the same names “month” and “day” in each of the alternatives. When the regex finds a match, we can retrieve the text matched by the groups “month” and “day” without worrying about how many days the month has.

For the other regex flavors, we use numbered capturing groups. When a match is found, three different groups have to be checked to extract the day, and three other groups to extract the month.

The pure regex solution is interesting only in situations where one regex is all you can use, such as when you’re using an application that offers one box to type in a regex. When programming, make things easier with a bit of extra code. This will be particularly helpful if you want to add extra checks on the date later.

Variations

To show how complicated the pure regex solution gets as you add more requirements, here’s a pure regex solution that matches any date between 2 May 2007 and 29 August 2008 in d/m/yy or dd/mm/yyyy format:

# 2 May 2007 till 29 August 2008
^(?:
  # 2 May 2007 till 31 December 2007
  (?:
    # 2 May till 31 May
    (?<day>3[01]|[12][0-9]|0?[2-9])/(?<month>0?5)/(?<year>2007)
  |
    # 1 June till 31 December
    (?:
      # 30-day months
      (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[69]|11)
    |
      # 31-day months
      (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[78]|1[02])
    )
    /(?<year>2007)
  )
|
  # 1 January 2008 till 29 August 2008
  (?:
    # 1 August till 29 August
    (?<day>[12][0-9]|0?[1-9])/(?<month>0?8)/(?<year>2008)
  |
    # 1 Janary till 30 June
    (?:
      # February
      (?<day>[12][0-9]|0?[1-9])/(?<month>0?2)
    |
      # 30-day months
      (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[46])
    |
      # 31-day months
      (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[1357])
    )
    /(?<year>2008)
  )
)$
Regex options: Free-spacing
Regex flavors: .NET, Perl 5.10, Ruby 1.9

See Also

This chapter has several other recipes for matching dates and times. Recipe 4.5 shows how to validate traditional date formats more simply, giving up some accuracy. Recipe 4.6 shows how to validate traditional time formats. Recipe 4.7 shows how to validate date and time formats according to the ISO 8601 standard.

Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.11 explains named capturing groups. Recipe 2.12 explains repetition.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required