5.10. Match Complete Lines That Contain a Word

Problem

You want to match all lines that contain the word ninja anywhere within them.

Solution

^.*\bninja\b.*$
Regex options: Case insensitive, ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

It’s often useful to match complete lines in order to collect or remove them. To match any line that contains the word ninja, we start with the regular expression \bninja\b. The word boundary tokens on both ends make sure that we only match “ninja” when it appears as a complete word, as explained in Recipe 2.6.

To expand the regex to match a complete line, add .* at both ends. The dot-asterisk sequences match zero or more characters within the current line. The asterisk quantifiers are greedy, so they will match as much text as possible. The first dot-asterisk matches until the last occurrence of “ninja” on the line, and the second dot-asterisk matches any nonline-break characters that occur after it.

Finally, place caret and dollar sign anchors at the beginning and end of the regular expression, respectively, to ensure that matches contain a complete line. Strictly speaking, the dollar sign anchor at the end is redundant since the dot and greedy asterisk will always match until the end of the line. However, it doesn’t hurt to add it, and makes the regular expression a little more self-explanatory. Adding line or string anchors to your regexes, when appropriate, ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.