3.4. String and Word Boundaries

^

Beginning of string

$

End of string

\b

Word boundary (most things that aren't word characters)

\B

Not word-boundary

Take the string Manchester United took a hammering. We can do several tests on it to check the behavior of string and word boundaries.

myString = 'Manchester United took a hammering';
myString.match( /^Man/ );     // match
myString.match( /^United/ );  // no match
myString.match( /hammer$/ );  // no match
myString.match( /ring$/ );    // match
myString.match( /\btook\b/ ); // match
myString.match( /\btoo\b/ );  // no match

These four operators are sometimes called zero-width characters because they match just positions, not characters. The word-boundary markers give you the equivalent of whole-word-only searches. They ignore all punctuation.

3.4.1. Alternatives

Alternatives are separated by the vertical bar (|) and enclosed in parentheses. This is a useful part of regexes for matching variant spellings. Any number of characters can be given as alternatives,

myString.match( /cent(re|er)/g );

matches all occurrences of centre and center. This could be combined with the optionality operator ? that we saw earlier

myString.match( /cent(re|er)s?/g );

to match center, centers, centre, and centres. When using alternatives, it is best to keep as much outside the parentheses as possible to improve efficiency. For example, the regex just given can be rephrased as /cent(res?|ers?)/ and gives the same result, but is less efficient ...

Get Automating InDesign with Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.