5.10. Match Complete Lines That Contain a Word
Problem
You want to match all lines that contain the word ninja
anywhere within
them.
Solution
^.*\bninja\b.*$
Regex options: Case insensitive, ^ and $ match at line breaks (“dot matches line breaks” must not be set) |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
It’s often useful to match complete lines in order to collect or
remove them. To match any line that contains the word ninja
, we start with
the regular expression ‹\bninja\b
›. The word boundary tokens on both
ends make sure that we only match “ninja” when it appears as a
complete word, as explained in Recipe 2.6.
To expand the regex to match a complete line, add ‹.*
› at both ends. The
dot-asterisk sequences match zero or more characters within the
current line. The asterisk quantifiers are greedy, so they will match
as much text as possible. The first dot-asterisk matches until the
last occurrence of “ninja” on the line, and the second dot-asterisk
matches any nonline-break
characters that occur after it.
Finally, place caret and dollar sign anchors at the beginning and end of the regular expression, respectively, to ensure that matches contain a complete line. Strictly speaking, the dollar sign anchor at the end is redundant since the dot and greedy asterisk will always match until the end of the line. However, it doesn’t hurt to add it, and makes the regular expression a little more self-explanatory. Adding line or string anchors to your regexes, when appropriate, ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.