3.10. Making Regexes More Readable

The last two examples showed that as you add escape characters, regular expressions quickly become unreadable—and these examples here are still simple. To overcome this, you can split the regex into several lines and add comments. In order to do so, you create a text string containing the regex, then feed that as a regex object to the string method. The last example could be given as follows:

myString = 'one (two) three (four) five six';
myregex =    '\\('    // match opening parenthesis
        +    '[^)]+'  // match any character up to the first closing parenthesis
        +    '\\)';   // and match that closing parenthesis as well
myString.match( RegExp( myregex, 'g' ) );

This is easier to read, though note that now we need to escape the escape character itself (the backslash) as well. Notice how the regex is given in the match( ) method. Regular expressions are in fact objects, just like String, Array, etc. When fed as a string, you don't use the slashes, and any modifiers are entered as a string, too: here, 'g'. Later we shall see that in certain circumstances, regexes must be written as text strings.

Exceptionally, the backslash in Unicode notation doesn't need escaping. The regex given earlier that matches accented letters apart from the unaccented ones can be presented like this, which is much easier readable:

 myregex = '[A-Za-z' //ASCII + '\xC0-\xFF' //Latin-1 supplement + '\u0100-\u024F' //Latin Extended A and B + '\u1E00-\u1EFF]'; //Latin extended additional ...

Get Automating InDesign with Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.