Matching Newlines in Text

Problem

You need to match newlines in text.

Solution

Use \n or \r.

See also the flags constant RE.MATCH_MULTILINE, which makes newlines match as beginning-of-line and end-of-line (^ and $).

Discussion

While line-oriented tools from Unix such as sed and grep match regular expressions one line at a time, not all tools do. The sam text editor from Bell Laboratories was the first interactive tool I know of to allow multiline regular expressions; the Perl scripting language followed shortly. In our API, the newline character by default has no special significance. The BufferedReader method readLine( ) normally strips out whichever newline characters it finds. If you read in gobs of characters using some method other than readLine( ), you may have \n in your text string. Since it’s just an ordinary character, you can match it with .* or similar multipliers, and, if you want to know exactly where it is, \n or \r in the pattern will match it as well. In other words, to this API, a newline character is just another character with no special significance. You can recognize a newline either by the metacharacter \n, or you could also refer to it by its numerical value, \u000a.

import org.apache.regexp.*;

/**
 * Show line ending matching using RE class.
 */
public class NLMatch {
    public static void main(String[] argv) throws RESyntaxException {

        String input = "I dream of engines\nmore engines, all day long";
        System.out.println("INPUT: " + input);
        System.out.println(  );

        String[] patt = {
            "engines\nmore engines",
            "engines$"
        };

        for (int i = 0; i < patt.length; i++) {
            System.out.println("PATTERN " + patt[i]);

            boolean found;
            RE r = new RE(patt[i]);

            found = r.match(input);
            System.out.println("DEFAULT match " + found);

            r.setMatchFlags(RE.MATCH_MULTILINE);
            found = r.match(input);
            System.out.println("MATCH_MULTILINE match was " + found);
            System.out.println(  );
        }
    }
}

If you run this code, the first pattern (with the embedded \n) always matches, while the second pattern (with $) matches only when MATCH_MULTILINE is set.

> java NLMatch
INPUT: I dream of engines
more engines, all day long
 
PATTERN engines
more engines
DEFAULT match true
MATCH_MULTILINE match was true
 
PATTERN engines$
DEFAULT match false
MATCH_MULTILINE match was true

Get Java Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.