6.5. Drop Digits

So far we've been able to capture what we want in one regex. But there are situations where that's either not possible or awkward. One such situation is finding a regular expression that drops digits from page ranges, so that 23–28 is changed to 23–8, 254–259 to 254–9, 325–368 to 325–68, etc. This would be simple enough if it weren't for the teens, which should not be dropped: 12–17 remains 12–17 (the reason is that you can't pronounce these: you can say "twenty-two to nine" (22–9) but not "twelve to five" (12–5) intending "twelve to fifteen"). But it's not enough to exclude from dropping digits any page range that has a "1" in penultimate position in the first or the second number, as some of these should have one or more numbers dropped (412–414 > 412–14 and 408–412 > 408–12). This, combined with JavaScript's lack of support of negative lookbehinds and conditionals make it difficult to formulate in one regex the pattern that matches all these different types of page range.

But even if a single regex to capture all these situations were possible, it would probably be utterly unreadable and, consequently, almost impossible to maintain. We therefore use a different approach than we've been using so far. The complete script follows:

 var myregex = '\\b(([2-9])\\d\\u2013\\2\\d' // two-digit numbers, skip doublets with teens + '|\\d\\d\\d\\u2013\\d\\d\\d' // three-digit numbers + '|\\d\\d\\d\\d\\u2013\\d\\d\\d\\d)\\b'; // four-digit numbers, to include years var myDoc ...

Get Automating InDesign with Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.