You are previewing Mastering Regular Expressions, 3rd Edition.

Mastering Regular Expressions, 3rd Edition

Cover of Mastering Regular Expressions, 3rd Edition by Jeffrey E.F. Friedl Published by O'Reilly Media, Inc.
  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Dedication
  5. Table of Contents
  6. Preface
  7. 1: Introduction to Regular Expressions
    1. Solving Real Problems
    2. Regular Expressions as a Language
      1. The Filename Analogy
      2. The Language Analogy
    3. The Regular-Expression Frame of Mind
      1. If You Have Some Regular-Expression Experience
      2. Searching Text Files: Egrep
    4. Egrep Metacharacters
      1. Start and End of the Line
      2. Character Classes
      3. Matching Any Character with Dot
      4. Alternation
      5. Ignoring Differences in Capitalization
      6. Word Boundaries
      7. In a Nutshell
      8. Optional Items
      9. Other Quantifiers: Repetition
      10. Parentheses and Backreferences
      11. The Great Escape
    5. Expanding the Foundation
      1. Linguistic Diversification
      2. The Goal of a Regular Expression
      3. A Few More Examples
      4. Regular Expression Nomenclature
      5. Improving on the Status Quo
      6. Summary
    6. Personal Glimpses
  8. 2: Extended Introductory Examples
    1. About the Examples
      1. A Short Introduction to Perl
    2. Matching Text with Regular Expressions
      1. Toward a More Real-World Example
      2. Side Effects of a Successful Match
      3. Intertwined Regular Expressions
      4. Intermission
    3. Modifying Text with Regular Expressions
      1. Example: Form Letter
      2. Example: Prettifying a Stock Price
      3. Automated Editing
      4. A Small Mail Utility
      5. Adding Commas to a Number with Lookaround
      6. Text-to-HTML Conversion
      7. That Doubled-Word Thing
  9. 3: Overview of Regular Expression Features and Flavors
    1. A Casual Stroll Across the Regex Landscape
      1. The Origins of Regular Expressions
      2. At a Glance
    2. Care and Handling of Regular Expressions
      1. Integrated Handling
      2. Procedural and Object-Oriented Handling
      3. A Search-and-Replace Example
      4. Search and Replace in Other Languages
      5. Care and Handling: Summary
    3. Strings, Character Encodings, and Modes
      1. Strings as Regular Expressions
      2. Character-Encoding Issues
      3. Unicode
      4. Regex Modes and Match Modes
    4. Common Metacharacters and Features
      1. Character Representations
      2. Character Classes and Class-Like Constructs
      3. Anchors and Other “Zero-Width Assertions”
      4. Comments and Mode Modifiers
      5. Grouping, Capturing, Conditionals, and Control
    5. Guide to the Advanced Chapters
  10. 4: The Mechanics of Expression Processing
    1. Start Your Engines!
      1. Two Kinds of Engines
      2. New Standards
      3. Regex Engine Types
      4. From the Department of Redundancy Department
      5. Testing the Engine Type
    2. Match Basics
      1. About the Examples
      2. Rule 1: The Match That Begins Earliest Wins
      3. Engine Pieces and Parts
      4. Rule 2: The Standard Quantifiers Are Greedy
    3. Regex-Directed Versus Text-Directed
      1. NFA Engine: Regex-Directed
      2. DFA Engine: Text-Directed
      3. First Thoughts: NFA and DFA in Comparison
    4. Backtracking
      1. A Really Crummy Analogy
      2. Two Important Points on Backtracking
      3. Saved States
      4. Backtracking and Greediness
    5. More About Greediness and Backtracking
      1. Problems of Greediness
      2. Multi-Character “Quotes”
      3. Using Lazy Quantifiers
      4. Greediness and Laziness Always Favor a Match
      5. The Essence of Greediness, Laziness, and Backtracking
      6. Possessive Quantifiers and Atomic Grouping
      7. Possessive Quantifiers, ?+, *+, ++, and {m,n}+
      8. The Backtracking of Lookaround
      9. Is Alternation Greedy?
      10. Taking Advantage of Ordered Alternation
    6. NFA, DFA, and POSIX
      1. “The Longest-Leftmost”
      2. POSIX and the Longest-Leftmost Rule
      3. Speed and Efficiency
      4. Summary: NFA and DFA in Comparison
    7. Summary
  11. 5: Practical Regex Techniques
    1. Regex Balancing Act
    2. A Few Short Examples
      1. Continuing with Continuation Lines
      2. Matching an IP Address
      3. Working with Filenames
      4. Matching Balanced Sets of Parentheses
      5. Watching Out for Unwanted Matches
      6. Matching Delimited Text
      7. Knowing Your Data and Making Assumptions
      8. Stripping Leading and Trailing Whitespace
    3. HTML-Related Examples
      1. Matching an HTML Tag
      2. Matching an HTML Link
      3. Examining an HTTP URL
      4. Validating a Hostname
      5. Plucking Out a URL in the Real World
    4. Extended Examples
      1. Keeping in Sync with Your Data
      2. Parsing CSV Files
  12. 6: Crafting an Efficient Expression
    1. A Sobering Example
      1. A Simple Change—Placing Your Best Foot Forward
      2. Efficiency Versus Correctness
      3. Advancing Further—Localizing the Greediness
      4. Reality Check
    2. A Global View of Backtracking
      1. More Work for a POSIX NFA
      2. Work Required During a Non-Match
      3. Being More Specific
      4. Alternation Can Be Expensive
    3. Benchmarking
      1. Know What You’re Measuring
      2. Benchmarking with PHP
      3. Benchmarking with Java
      4. Benchmarking with VB.NET
      5. Benchmarking with Ruby
      6. Benchmarking with Python
      7. Benchmarking with Tcl
    4. Common Optimizations
      1. No Free Lunch
      2. Everyone’s Lunch is Different
      3. The Mechanics of Regex Application
      4. Pre-Application Optimizations
      5. Optimizations with the Transmission
      6. Optimizations of the Regex Itself
    5. Techniques for Faster Expressions
      1. Common Sense Techniques
      2. Expose Literal Text
      3. Expose Anchors
      4. Lazy Versus Greedy: Be Specific
      5. Split Into Multiple Regular Expressions
      6. Mimic Initial-Character Discrimination
      7. Use Atomic Grouping and Possessive Quantifiers
      8. Lead the Engine to a Match
    6. Unrolling the Loop
      1. Method 1: Building a Regex From Past Experiences
      2. The Real “Unrolling-the-Loop” Pattern
      3. Method 2: A Top-Down View
      4. Method 3: An Internet Hostname
      5. Observations
      6. Using Atomic Grouping and Possessive Quantifiers
      7. Short Unrolling Examples
      8. Unrolling C Comments
    7. The Freeflowing Regex
      1. A Helping Hand to Guide the Match
      2. A Well-Guided Regex is a Fast Regex
      3. Wrapup
    8. In Summary: Think!
  13. 7: Perl
    1. Regular Expressions as a Language Component
      1. Perl’s Greatest Strength
      2. Perl’s Greatest Weakness
    2. Perl’s Regex Flavor
      1. Regex Operands and Regex Literals
      2. How Regex Literals Are Parsed
      3. Regex Modifiers
    3. Regex-Related Perlisms
      1. Expression Context
      2. Dynamic Scope and Regex Match Effects
      3. Special Variables Modified by a Match
    4. The qr/···/ Operator and Regex Objects
      1. Building and Using Regex Objects
      2. Viewing Regex Objects
      3. Using Regex Objects for Efficiency
    5. The Match Operator
      1. Match’s Regex Operand
      2. Specifying the Match Target Operand
      3. Different Uses of the Match Operator
      4. Iterative Matching: Scalar Context, with /g
      5. The Match Operator’s Environmental Relations
    6. The Substitution Operator
      1. The Replacement Operand
      2. The /e Modifier
      3. Context and Return Value
    7. The Split Operator
      1. Basic Split
      2. Returning Empty Elements
      3. Split’s Special Regex Operands
      4. Split’s Match Operand with Capturing Parentheses
    8. Fun with Perl Enhancements
      1. Using a Dynamic Regex to Match Nested Pairs
      2. Using the Embedded-Code Construct
      3. Using local in an Embedded-Code Construct
      4. A Warning About Embedded Code and my Variables
      5. Matching Nested Constructs with Embedded Code
      6. Overloading Regex Literals
      7. Problems with Regex-Literal Overloading
      8. Mimicking Named Capture
    9. Perl Efficiency Issues
      1. “There’s More Than One Way to Do It”
      2. Regex Compilation, the /o Modifier, qr/···/, and Efficiency
      3. Understanding the “Pre-Match” Copy
      4. The Study Function
      5. Benchmarking
      6. Regex Debugging Information
    10. Final Comments
  14. 8: Java
    1. Java’s Regex Flavor
      1. Java Support for \p{···} and \P{···}
      2. Unicode Line Terminators
    2. Using java.util.regex
    3. The Pattern.compile() Factory
      1. Pattern’s matcher method
    4. The Matcher Object
      1. Applying the Regex
      2. Querying Match Results
      3. Simple Search and Replace
      4. Advanced Search and Replace
      5. In-Place Search and Replace
      6. The Matcher’s Region
      7. Method Chaining
      8. Methods for Building a Scanner
      9. Other Matcher Methods
    5. Other Pattern Methods
      1. Pattern’s split Method, with One Argument
      2. Pattern’s split Method, with Two Arguments
    6. Additional Examples
      1. Adding Width and Height Attributes to Image Tags
      2. Validating HTML with Multiple Patterns Per Matcher
      3. Parsing Comma-Separated Values (CSV) Text
    7. Java Version Differences
      1. Differences Between 1.4.2 and 1.5.0
      2. Differences Between 1.5.0 and 1.6
  15. 9: .NET
    1. .NET’s Regex Flavor
      1. Additional Comments on the Flavor
    2. Using .NET Regular Expressions
      1. Regex Quickstart
      2. Package Overview
      3. Core Object Overview
    3. Core Object Details
      1. Creating Regex Objects
      2. Using Regex Objects
      3. Using Match Objects
      4. Using Group Objects
    4. Static “Convenience” Functions
      1. Regex Caching
    5. Support Functions
    6. Advanced .NET
      1. Regex Assemblies
      2. Matching Nested Constructs
      3. Capture Objects
  16. 10: PHP
    1. PHP’s Regex Flavor
    2. The Preg Function Interface
      1. “Pattern” Arguments
    3. The Preg Functions
      1. preg_match
      2. preg_match_all
      3. preg_replace
      4. preg_replace_callback
      5. preg_split
      6. preg_grep
      7. preg_quote
    4. “Missing” Preg Functions
      1. preg_regex_to_pattern
      2. Syntax-Checking an Unknown Pattern Argument
      3. Syntax-Checking an Unknown Regex
    5. Recursive Expressions
      1. Matching Text with Nested Parentheses
      2. No Backtracking Into Recursion
      3. Matching a Set of Nested Parentheses
    6. PHP Efficiency Issues
      1. The S Pattern Modifier: “Study”
    7. Extended Examples
      1. CSV Parsing with PHP
      2. Checking Tagged Data for Proper Nesting
  17. Index
  18. About the Author
  19. Colophon
  20. Footnotes
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
    8. Chapter 8
    9. Chapter 9
    10. Chapter 10
O'Reilly logo

3Overview of Regular Expression Features and Flavors

Now that you have a feel for regular expressions and a few diverse tools that use them, you might think we’re ready to dive into using them wherever they’re found. But even a simple comparison among the egrep versions of the first chapter and the Perl and Java in the previous chapter shows that regular expressions and the way they’re used can vary wildly from tool to tool.

When looking at regular expressions in the context of their host language or tool, there are three broad issues to consider:

• What metacharacters are supported, and their meaning. Often called the regex “flavor.”

• How regular expressions “interface” with the language or tool, such as how to specify regular-expression operations, what operations are allowed, and what text they operate on.

• How the regular-expression engine actually goes about applying a regular expression to some text. The method that the language or tool designer uses to implement the regular-expression engine has a strong influence on the results one might expect from any given regular expression.

Regular Expressions and Cars

The considerations just listed parallel the way one might think while shopping for a car. With regular expressions, the metacharacters are the first thing you notice, just as with a car it’s the body shape, shine, and nifty features like a CD player and leather seats. These are the types of things you’ll find splashed across the pages of a glossy brochure, and a list ...

The best content for your career. Discover unlimited learning on demand for around $1/day.