Chapter 8. Look Before You Leap

All the script illustrated earlier in this short cut assumed that once the regex had been matched against a text, all the matches could be processed. But this will not always be the case. Some compound words, for instance, need to be linked with an en dash rather than a hyphen. Examples are English–Polish dictionaries and the Keenan–Comrie hypothesis. The latter does not contain a double-barreled surname, but is a certain hypothesis proposed by two people, Keenan and Comrie. It would be wrong to assume that in any two words with initial capitals linked with a hyphen, this hyphen could be changed to an en dash. Not only would double-barreled names such as Robin Knox-Johnson be included but also combinations such as Afro-Asiatic and North-East.

To deal with situations like this we take two steps. The first step is to collect all instances and display them in a new document. We then inspect this list, discarding all hyphenated and init-capped words that should stay hyphenated. The remaining list is then processed by a second script. Here is the first script:

 doctext = documentContents( app.activeDocument ); found = doctext.match( /[A-Z]\w+-[A-Z]\w+/g ); if( found != null ) newDoc().contents = deleteDuplicates( found ); function deleteDuplicates( array ) { var str = array.sort().join('\r')+'\r'; str = str.replace(/([^\r]+\r)(\1)+/g,'$1'); return str.replace(/\r$/,''); } // start a new document, place a text frame on the first page, // and return that ...

Get Automating InDesign with Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.