Unicode Support

Perl provides built-in support for Unicode 3.2, including full support in the \w, \d, \s, and \b metasequences.

The following constructs respect the current locale if use locale is defined: case-insensitive (i) mode, \L, \l, \U, \u, \w, and \W.

Perl supports the standard Unicode properties (see Table 3) as well as Perl-specific composite properties (see Table 10). Scripts and properties may have an Is prefix, but do not require it. Blocks require an In prefix only if the block name conflicts with a script name.

Table 1-10. Perl composite Unicode properties

Property

Equivalent

IsASCII

[\x00-\x7f]

IsAlnum

[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]

IsAlpha

[\p{Ll}\p{Lu}\p{Lt}\p{Lo}]

IsCntrl

\p{C}

IsDigit

\p{Nd}

IsGraph

[^\p{C}\p{Space}]

IsLower

\p{Ll}

IsPrint

\P{C}

IsPunct

\p{P}

IsSpace

[\t\n\f\r\p{Z}]

IsUppper

[\p{Lu}\p{Lt}]

IsWord

[_\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]

IsXDigit

[0-9a-fA-F]

Get Regular Expression Pocket Reference, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.