4.1. Dates

Dates are great for trying regexes because they are short strings with much internal variation: 29 September 2006; 29 Sept. 2006; September 28th, 2006; Sept. 29th, 2006; 29-10-2006; 29-10-06; 10-29-2006; 10-29-06; and the last four can be written with slashes or dots instead of hyphens. The regular expression:

/\d\d-\d\d-\d\d\d\d/g

matches 29-10-2006 and 10-29-2006 in the list. To match single-digit days and months as well, we simply make the second digit in each optional; to capture years written in four or two digits, we make the last two digits optional. The regex in the following script matches all date formats written in digits:

var txt = 'the date 29-1-2006 is memorable';
txt.match( /\d?\d-\d?\d-\d\d(\d\d)?/g );

Another type of variation in date formats is the separator, which can be a hyphen, a slash, or a dot. To account for slashes, hyphens, and dots as separator, we define them as a class:

/\d?\d[-\/\.]\d?\d[-\/\.]\d\d(\d\d)?/

Matching dates when the months are written in full is a bit trickier. To match the 23 September or Sept. 2006, for example, we need something like the following:

/\d\d? [A-Z][a-z]+\.? \d\d\d\d/

looking for one or two digits (\d\d?) followed by a space, followed by a word starting with an upper-case letter, possibly ending with a period (\.?, to capture abbreviated month names; note the escaped dot), followed by four digits. To match September 23rd, 2006, another regex is needed. Something like this:

 /[A-Z][a-z]+ \d?\d(st|nd|rd|th), ...

Get Automating InDesign with Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.