Chapter 3. Boundaries

This chapter focuses on assertions. Assertions mark boundaries, but they don’t consume characters—that is, characters will not be returned in a result. They are also known as zero-width assertions. A zero-width assertion doesn’t match a character, per se, but rather a location in a string. Some of these, such as ^ and $, are also called anchors.

The boundaries I'll talk about in this chapter are:

  • The beginning and end of a line or string

  • Word boundaries (two kinds)

  • The beginning and end of a subject

  • Boundaries that quote string literals

To start, I’ll use RegExr again, but this time, for variety, I’ll use the Safari browser (however, you can use any browser you like). I’ll also use the same text I used last time: the first 12 lines of rime.txt. Open the Safari browser with http://gskinner.com/regexr and copy the first 12 lines of rime.txt from the code archive into the lower box.

The Beginning and End of a Line

As you have seen a number of times already, to match the beginning of a line or string, use the caret or circumflex (U+005E):

^

Depending on the context, a ^ will match the beginning of a line or string, sometimes a whole document. The context depends on your application and what options you are using with that application.

To match the end of a line or string, as you know, use the dollar sign:

$

In RegExr, make sure that multiline is checked. global is checked by default when you open RegExr, but you can leave it checked or unchecked for this example. When multiline ...

Get Introducing Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.