Chapter 5. Character Classes

I’ll now talk more about character classes or what are sometimes called bracketed expressions. Character classes help you match specific characters, or sequences of specific characters. They can be just as broad or far-reaching as character shorthands—for example, the character shorthand \d will match the same characters as:

0-9

But you can use character classes to be even more specific than that. In this way, they are more versatile than shorthands.

Try these examples in whatever regex processor you prefer. I’ll use Rubular in Opera and Reggy on the desktop.

To do this testing, enter this string in the subject or target area of the web page:

! " # $ % & ' ( ) * + , - . /
0       1       2       3       4       5       6       7       8       9
: ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[ \ ] ^ _ `
a b c d e f g h i j k l m n o p q r s t u v w x y z
{ | } ~

You don’t have to type all that in. You’ll find this text stored in the file ascii-graphic.txt in the code archive that comes with this book.

To start out, use a character class to match a set of English characters—in this case, the English vowels:

[aeiou]

The lowercase vowels should be highlighted in the lower text area (see Figure 5-1). How would you highlight the uppercase vowels? How would you highlight or match both?

Character class with Rubular in the Opera browser

Figure 5-1. Character class with Rubular in the Opera browser

With character classes, you can also match a range of characters: ...

Get Introducing Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.