Regex Metacharacters, Modes, and Constructs

The metacharacters and metasequences shown here represent most available types of regular expression constructs and their most common syntax. However, syntax and availability vary by implementation.

Character representations

Many implementations provide shortcuts to represent some characters that may be difficult to input. (See MRE 114-117.)

Character shorthands

Most implementations have specific shorthands for the alert, backspace, escape character, form feed, newline, carriage return, horizontal tab, and vertical tab characters. For example, \n is often a shorthand for the newline character, which is usually LF (012 octal) but can sometimes be CR (15 octal) depending on the operating system. Confusingly, many implementations use \b to mean both backspace and word boundary (between a “word” character and a non-word character). For these implementations, \b means backspace in a character class (a set of possible characters to match in the string) and word boundary elsewhere.

Octal escape: \num

Represents a character corresponding to a two- or three- octal digit number. For example, \015\012 matches an ASCII CR/LF sequence.

Hex and Unicode escapes: \xnum, \x{num}, \unum, \Unum

Represents a character corresponding to a hexadecimal number. Four-digit and larger hex numbers can represent the range of Unicode characters. For example, \x0D\x0A matches an ASCII CR/LF sequence.

Control characters: \cchar

Corresponds to ASCII control characters encoded ...

Get Regular Expression Pocket Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.