6.4. Page Ranges: Continued

Let's now continue with the regexes that match page ranges. Earlier we came up with a regex that matched all types of range (arabic and roman pages, parentheticals, and single letters. We're now ready to apply the script to a document to replace those hyphens with an en dash (which is Unicode \u2013). Here is the full script:

// Replace hyphens with en dashes in page and other ranges
myregex  =    '[)\\d]-[(\\d]|'               //arabic numbers and parentheses
         +    '\\b[ivxlcdm]+-[ivxlcdm]+\\b|' //roman numbers
         +     '\\b\\d*[a-z]-[a-z]\\b';      //single letters, perhaps preceded by a
letter

var myRanges = documentContents( app.activeDocument ).match( myregex, 'g' );
if( myRanges != null )
     {
     app.findPreferences = app.changePreferences = null;
     for( var i = 0; i < myRanges.length; i++ )
              app.activeDocument.search(
                   myRanges[i], false, false,
                   myRanges[i].replace( '-', '\u2013' ) );
     }

This script follows the same pattern as the ones shown earlier: create an array of matched items and process that array. The script may have an undesirable side-effect for some in that it replaces hyphens with en dashes in numbers that are not ranges, such as ISBN, telephone, grant, and serial numbers. If you want hyphens there, use another script to replace en dashes with hyphens in number series with two or more hyphens. The regex to find numbers with more than one en dash can be formulated like this:

/\d+\u2013\d+(\u2013\d+)+/

And the full script is as follows:

 var myDoc = app.activeDocument; var ...

Get Automating InDesign with Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.