Regex Metacharacters, Modes, and Constructs
The metacharacters and metasequences shown here represent most available types of regular expression constructs and their most common syntax. However, syntax and availability vary by implementation.
Character representations
Many implementations provide shortcuts to represent some characters that may be difficult to input. (See MRE 114-117.)
- Character shorthands
Most implementations have specific shorthands for the
alert
,backspace
,escape character
,form feed
,newline
,carriage return
,horizontal tab
, andvertical tab
characters. For example,\n
is often a shorthand for the newline character, which is usually LF (012 octal) but can sometimes be CR (15 octal) depending on the operating system. Confusingly, many implementations use\b
to mean bothbackspace
and word boundary (between a “word” character and a non-word character). For these implementations,\b
meansbackspace
in a character class (a set of possible characters to match in the string) and word boundary elsewhere.- Octal escape:
\num
Represents a character corresponding to a two- or three- octal digit number. For example,
\015\012
matches an ASCII CR/LF sequence.- Hex and Unicode escapes:
\x
num
,\x{
num
},\u
num
,\U
num
Represents a character corresponding to a hexadecimal number. Four-digit and larger hex numbers can represent the range of Unicode characters. For example,
\x0D\x0A
matches an ASCII CR/LF sequence.- Control characters:
\c
char
Corresponds to ASCII control characters encoded ...
Get Regular Expression Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.