O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mastering Regular Expressions, 3rd Edition

Book Description

Regular expressions are an extremely powerful tool for manipulatingtext and data. They are now standard features in a wide range oflanguages and popular tools, including Perl, Python, Ruby, Java,VB.NET and C# (and any language using the .NET Framework), PHP, andMySQL.

If you don't use regular expressions yet, you will discover inthis book a whole new world of mastery over your data. If youalready use them, you'll appreciate this book's unprecedenteddetail and breadth of coverage. If you think you know all you needto know about regular expressions, this book is a stunningeye-opener.

As this book shows, a command of regular expressions is aninvaluable skill. Regular expressions allow you to code complex andsubtle text processing that you never imagined could be automated.Regular expressions can save you time and aggravation. They can beused to craft elegant solutions to a wide range of problems. Onceyou've mastered regular expressions, they'll become an invaluablepart of your toolkit. You will wonder how you ever got by withoutthem.

Yet despite their wide availability, flexibility, andunparalleled power, regular expressions are frequentlyunderutilized. Yet what is power in the hands of an expert can befraught with peril for the unwary. Mastering RegularExpressions will help you navigate the minefield to becoming anexpert and help you optimize your use of regular expressions.

Mastering Regular Expressions, Third Edition, nowincludes a full chapter devoted to PHP and its powerful andexpressive suite of regular expression functions, in addition toenhanced PHP coverage in the central "core" chapters. Furthermore,this edition has been updated throughout to reflect advances inother languages, including expanded in-depth coverage of Sun'sjava.util.regex package, which has emerged as the standardJava regex implementation.Topics include:

  • A comparison of features among different versions of manylanguages and tools

  • How the regular expression engine works

  • Optimization (major savings available here!)

  • Matching just what you want, but not what you don't want

  • Sections and chapters on individual languages

  • Written in the lucid, entertaining tone that makes a complex,dry topic become crystal-clear to programmers, and sprinkled withsolutions to complex real-world problems, Mastering RegularExpressions, Third Edition offers a wealth information that youcan put to immediate use.

    Reviews of this new edition and the second edition:"There isn't a better (or more useful) book available on regularexpressions."--Zak Greant, Managing Director, eZ Systems

    "A real tour-de-force of a book which not only covers themechanics of regexes in extraordinary detail but also talks aboutefficiency and the use of regexes in Perl, Java, and .NET...If youuse regular expressions as part of your professional work (even ifyou already have a good book on whatever language you'reprogramming in) I would strongly recommend this book to you."--Dr. Chris Brown, Linux Format

    "The author does an outstanding job leading the reader fromregex novice to master. The book is extremely easy to read andchock full of useful and relevant examples...Regular expressionsare valuable tools that every developer should have in theirtoolbox. Mastering Regular Expressions is the definitiveguide to the subject, and an outstanding resource that belongs onevery programmer's bookshelf. Ten out of Ten Horseshoes."--Jason Menard, Java Ranch

    Table of Contents

    1. Cover Page
    2. Title Page
    3. Copyright Page
    4. Dedication
    5. Table of Contents
    6. Preface
    7. 1: Introduction to Regular Expressions
      1. Solving Real Problems
      2. Regular Expressions as a Language
        1. The Filename Analogy
        2. The Language Analogy
      3. The Regular-Expression Frame of Mind
        1. If You Have Some Regular-Expression Experience
        2. Searching Text Files: Egrep
      4. Egrep Metacharacters
        1. Start and End of the Line
        2. Character Classes
        3. Matching Any Character with Dot
        4. Alternation
        5. Ignoring Differences in Capitalization
        6. Word Boundaries
        7. In a Nutshell
        8. Optional Items
        9. Other Quantifiers: Repetition
        10. Parentheses and Backreferences
        11. The Great Escape
      5. Expanding the Foundation
        1. Linguistic Diversification
        2. The Goal of a Regular Expression
        3. A Few More Examples
        4. Regular Expression Nomenclature
        5. Improving on the Status Quo
        6. Summary
      6. Personal Glimpses
    8. 2: Extended Introductory Examples
      1. About the Examples
        1. A Short Introduction to Perl
      2. Matching Text with Regular Expressions
        1. Toward a More Real-World Example
        2. Side Effects of a Successful Match
        3. Intertwined Regular Expressions
        4. Intermission
      3. Modifying Text with Regular Expressions
        1. Example: Form Letter
        2. Example: Prettifying a Stock Price
        3. Automated Editing
        4. A Small Mail Utility
        5. Adding Commas to a Number with Lookaround
        6. Text-to-HTML Conversion
        7. That Doubled-Word Thing
    9. 3: Overview of Regular Expression Features and Flavors
      1. A Casual Stroll Across the Regex Landscape
        1. The Origins of Regular Expressions
        2. At a Glance
      2. Care and Handling of Regular Expressions
        1. Integrated Handling
        2. Procedural and Object-Oriented Handling
        3. A Search-and-Replace Example
        4. Search and Replace in Other Languages
        5. Care and Handling: Summary
      3. Strings, Character Encodings, and Modes
        1. Strings as Regular Expressions
        2. Character-Encoding Issues
        3. Unicode
        4. Regex Modes and Match Modes
      4. Common Metacharacters and Features
        1. Character Representations
        2. Character Classes and Class-Like Constructs
        3. Anchors and Other “Zero-Width Assertions”
        4. Comments and Mode Modifiers
        5. Grouping, Capturing, Conditionals, and Control
      5. Guide to the Advanced Chapters
    10. 4: The Mechanics of Expression Processing
      1. Start Your Engines!
        1. Two Kinds of Engines
        2. New Standards
        3. Regex Engine Types
        4. From the Department of Redundancy Department
        5. Testing the Engine Type
      2. Match Basics
        1. About the Examples
        2. Rule 1: The Match That Begins Earliest Wins
        3. Engine Pieces and Parts
        4. Rule 2: The Standard Quantifiers Are Greedy
      3. Regex-Directed Versus Text-Directed
        1. NFA Engine: Regex-Directed
        2. DFA Engine: Text-Directed
        3. First Thoughts: NFA and DFA in Comparison
      4. Backtracking
        1. A Really Crummy Analogy
        2. Two Important Points on Backtracking
        3. Saved States
        4. Backtracking and Greediness
      5. More About Greediness and Backtracking
        1. Problems of Greediness
        2. Multi-Character “Quotes”
        3. Using Lazy Quantifiers
        4. Greediness and Laziness Always Favor a Match
        5. The Essence of Greediness, Laziness, and Backtracking
        6. Possessive Quantifiers and Atomic Grouping
        7. Possessive Quantifiers, ?+, *+, ++, and {m,n}+
        8. The Backtracking of Lookaround
        9. Is Alternation Greedy?
        10. Taking Advantage of Ordered Alternation
      6. NFA, DFA, and POSIX
        1. “The Longest-Leftmost”
        2. POSIX and the Longest-Leftmost Rule
        3. Speed and Efficiency
        4. Summary: NFA and DFA in Comparison
      7. Summary
    11. 5: Practical Regex Techniques
      1. Regex Balancing Act
      2. A Few Short Examples
        1. Continuing with Continuation Lines
        2. Matching an IP Address
        3. Working with Filenames
        4. Matching Balanced Sets of Parentheses
        5. Watching Out for Unwanted Matches
        6. Matching Delimited Text
        7. Knowing Your Data and Making Assumptions
        8. Stripping Leading and Trailing Whitespace
      3. HTML-Related Examples
        1. Matching an HTML Tag
        2. Matching an HTML Link
        3. Examining an HTTP URL
        4. Validating a Hostname
        5. Plucking Out a URL in the Real World
      4. Extended Examples
        1. Keeping in Sync with Your Data
        2. Parsing CSV Files
    12. 6: Crafting an Efficient Expression
      1. A Sobering Example
        1. A Simple Change—Placing Your Best Foot Forward
        2. Efficiency Versus Correctness
        3. Advancing Further—Localizing the Greediness
        4. Reality Check
      2. A Global View of Backtracking
        1. More Work for a POSIX NFA
        2. Work Required During a Non-Match
        3. Being More Specific
        4. Alternation Can Be Expensive
      3. Benchmarking
        1. Know What You’re Measuring
        2. Benchmarking with PHP
        3. Benchmarking with Java
        4. Benchmarking with VB.NET
        5. Benchmarking with Ruby
        6. Benchmarking with Python
        7. Benchmarking with Tcl
      4. Common Optimizations
        1. No Free Lunch
        2. Everyone’s Lunch is Different
        3. The Mechanics of Regex Application
        4. Pre-Application Optimizations
        5. Optimizations with the Transmission
        6. Optimizations of the Regex Itself
      5. Techniques for Faster Expressions
        1. Common Sense Techniques
        2. Expose Literal Text
        3. Expose Anchors
        4. Lazy Versus Greedy: Be Specific
        5. Split Into Multiple Regular Expressions
        6. Mimic Initial-Character Discrimination
        7. Use Atomic Grouping and Possessive Quantifiers
        8. Lead the Engine to a Match
      6. Unrolling the Loop
        1. Method 1: Building a Regex From Past Experiences
        2. The Real “Unrolling-the-Loop” Pattern
        3. Method 2: A Top-Down View
        4. Method 3: An Internet Hostname
        5. Observations
        6. Using Atomic Grouping and Possessive Quantifiers
        7. Short Unrolling Examples
        8. Unrolling C Comments
      7. The Freeflowing Regex
        1. A Helping Hand to Guide the Match
        2. A Well-Guided Regex is a Fast Regex
        3. Wrapup
      8. In Summary: Think!
    13. 7: Perl
      1. Regular Expressions as a Language Component
        1. Perl’s Greatest Strength
        2. Perl’s Greatest Weakness
      2. Perl’s Regex Flavor
        1. Regex Operands and Regex Literals
        2. How Regex Literals Are Parsed
        3. Regex Modifiers
      3. Regex-Related Perlisms
        1. Expression Context
        2. Dynamic Scope and Regex Match Effects
        3. Special Variables Modified by a Match
      4. The qr/···/ Operator and Regex Objects
        1. Building and Using Regex Objects
        2. Viewing Regex Objects
        3. Using Regex Objects for Efficiency
      5. The Match Operator
        1. Match’s Regex Operand
        2. Specifying the Match Target Operand
        3. Different Uses of the Match Operator
        4. Iterative Matching: Scalar Context, with /g
        5. The Match Operator’s Environmental Relations
      6. The Substitution Operator
        1. The Replacement Operand
        2. The /e Modifier
        3. Context and Return Value
      7. The Split Operator
        1. Basic Split
        2. Returning Empty Elements
        3. Split’s Special Regex Operands
        4. Split’s Match Operand with Capturing Parentheses
      8. Fun with Perl Enhancements
        1. Using a Dynamic Regex to Match Nested Pairs
        2. Using the Embedded-Code Construct
        3. Using local in an Embedded-Code Construct
        4. A Warning About Embedded Code and my Variables
        5. Matching Nested Constructs with Embedded Code
        6. Overloading Regex Literals
        7. Problems with Regex-Literal Overloading
        8. Mimicking Named Capture
      9. Perl Efficiency Issues
        1. “There’s More Than One Way to Do It”
        2. Regex Compilation, the /o Modifier, qr/···/, and Efficiency
        3. Understanding the “Pre-Match” Copy
        4. The Study Function
        5. Benchmarking
        6. Regex Debugging Information
      10. Final Comments
    14. 8: Java
      1. Java’s Regex Flavor
        1. Java Support for \p{···} and \P{···}
        2. Unicode Line Terminators
      2. Using java.util.regex
      3. The Pattern.compile() Factory
        1. Pattern’s matcher method
      4. The Matcher Object
        1. Applying the Regex
        2. Querying Match Results
        3. Simple Search and Replace
        4. Advanced Search and Replace
        5. In-Place Search and Replace
        6. The Matcher’s Region
        7. Method Chaining
        8. Methods for Building a Scanner
        9. Other Matcher Methods
      5. Other Pattern Methods
        1. Pattern’s split Method, with One Argument
        2. Pattern’s split Method, with Two Arguments
      6. Additional Examples
        1. Adding Width and Height Attributes to Image Tags
        2. Validating HTML with Multiple Patterns Per Matcher
        3. Parsing Comma-Separated Values (CSV) Text
      7. Java Version Differences
        1. Differences Between 1.4.2 and 1.5.0
        2. Differences Between 1.5.0 and 1.6
    15. 9: .NET
      1. .NET’s Regex Flavor
        1. Additional Comments on the Flavor
      2. Using .NET Regular Expressions
        1. Regex Quickstart
        2. Package Overview
        3. Core Object Overview
      3. Core Object Details
        1. Creating Regex Objects
        2. Using Regex Objects
        3. Using Match Objects
        4. Using Group Objects
      4. Static “Convenience” Functions
        1. Regex Caching
      5. Support Functions
      6. Advanced .NET
        1. Regex Assemblies
        2. Matching Nested Constructs
        3. Capture Objects
    16. 10: PHP
      1. PHP’s Regex Flavor
      2. The Preg Function Interface
        1. “Pattern” Arguments
      3. The Preg Functions
        1. preg_match
        2. preg_match_all
        3. preg_replace
        4. preg_replace_callback
        5. preg_split
        6. preg_grep
        7. preg_quote
      4. “Missing” Preg Functions
        1. preg_regex_to_pattern
        2. Syntax-Checking an Unknown Pattern Argument
        3. Syntax-Checking an Unknown Regex
      5. Recursive Expressions
        1. Matching Text with Nested Parentheses
        2. No Backtracking Into Recursion
        3. Matching a Set of Nested Parentheses
      6. PHP Efficiency Issues
        1. The S Pattern Modifier: “Study”
      7. Extended Examples
        1. CSV Parsing with PHP
        2. Checking Tagged Data for Proper Nesting
    17. Index
    18. About the Author
    19. Colophon
    20. Footnotes
      1. Chapter 1
      2. Chapter 2
      3. Chapter 3
      4. Chapter 4
      5. Chapter 5
      6. Chapter 6
      7. Chapter 7
      8. Chapter 8
      9. Chapter 9
      10. Chapter 10