Search and Replace with Regular Expressions

Search-and-replace is a common job for regular expressions. A search-and-replace function takes a subject string, a regular expression, and a replacement string as input. The output is the subject string with all matches of the regular expression replaced with the replacement text.

Although the replacement text is not a regular expression at all, you can use certain special syntax to build dynamic replacement texts. All flavors let you reinsert the text matched by the regular expression or a capturing group into the replacement. Recipes 2.20 and 2.21 explain this. Some flavors also support inserting matched context into the replacement text, as Recipe 2.22 shows. In Chapter 3, Recipe 3.16 teaches you how to generate a different replacement text for each match in code.

Many Flavors of Replacement Text

Different ideas by different regular expression software developers have led to a wide range of regular expression flavors, each with different syntax and feature sets. The story for the replacement text is no different. In fact, there are even more replacement text flavors than regular expression flavors. Building a regular expression engine is difficult. Most programmers prefer to reuse an existing one, and bolting a search-and-replace function onto an existing regular expression engine is quite easy. The result is that there are many replacement text flavors for regular expression libraries that do not have built-in search-and-replace features.

Fortunately, all the regular expression flavors in this book have corresponding replacement text flavors, except PCRE. This gap in PCRE complicates life for programmers who use flavors based on it. The open source PCRE library does not include any functions to make replacements. Thus, all applications and programming languages that are based on PCRE need to provide their own search-and-replace function. Most programmers try to copy existing syntax, but never do so in exactly the same way.

This book covers the following replacement text flavors. Refer to Regex Flavors Covered by This Book for more details on the regular expression flavors that correspond with the replacement text flavors:

.NET

The System.Text.RegularExpressions package provides various search-and-replace functions. The .NET replacement text flavor corresponds with the .NET regular expression flavor. All versions of .NET use the same replacement text flavor. The new regular expression features in .NET 2.0 do not affect the replacement text syntax.

Java

The java.util.regex package has built-in search-and-replace functions. This book covers Java 4, 5, 6, and 7.

JavaScript

In this book, we use the term JavaScript to indicate both the replacement text flavor and the regular expression flavor defined in editions 3 and 5 of the ECMA-262 standard.

XRegExp

Steven Levithan’s XRegExp has its own replace() function that eliminates cross-browser inconsistencies and adds support for backreferences to XRegExp’s named capturing groups. Recipes in this book that use named capture show additional solutions using XRegExp. If a solution shows XRegExp as the replacement text flavor, that means it works with JavaScript when using the XRegExp library, but not with standard JavaScript without the XRegExp library. If a solution shows JavaScript as the replacement text flavor, then it works with JavaScript whether you are using the XRegExp library or not.

This book covers XRegExp version 2.0, which you can download at http://xregexp.com.

PHP

In this book, the PHP replacement text flavor refers to the preg_replace function in PHP. This function uses the PCRE regular expression flavor and the PHP replacement text flavor. It was first introduced in PHP 4.0.0.

Other programming languages that use PCRE do not use the same replacement text flavor as PHP. Depending on where the designers of your programming language got their inspiration, the replacement text syntax may be similar to PHP or any of the other replacement text flavors in this book.

PHP also has an ereg_replace function. This function uses a different regular expression flavor (POSIX ERE), and a different replacement text flavor, too. PHP’s ereg functions are deprecated. They are not discussed in this book.

Perl

Perl has built-in support for regular expression substitution via the s/regex/replace/ operator. The Perl replacement text flavor corresponds with the Perl regular expression flavor. This book covers Perl 5.6 to Perl 5.14. Perl 5.10 added support for named backreferences in the replacement text, as it adds named capture to the regular expression syntax.

Python

Python’s re module provides a sub function to search and replace. The Python replacement text flavor corresponds with the Python regular expression flavor. This book covers Python 2.4 until 3.2. There are no differences in the replacement text syntax between these versions of Python.

Ruby

Ruby’s regular expression support is part of the Ruby language itself, including the search-and-replace function. This book covers Ruby 1.8 and 1.9. While there are significant differences in the regex syntax between Ruby 1.8 and 1.9, the replacement syntax is basically the same. Ruby 1.9 only adds support for named backreferences in the replacement text. Named capture is a new feature in Ruby 1.9 regular expressions.

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.