You are previewing Analytic Pattern Matching.
O'Reilly logo
Analytic Pattern Matching

Book Description

How do you distinguish a cat from a dog by their DNA? Did Shakespeare really write all of his plays? Pattern matching techniques can offer answers to these questions and to many others, from molecular biology, to telecommunications, to classifying Twitter content. This book for researchers and graduate students demonstrates the probabilistic approach to pattern matching, which predicts the performance of pattern matching algorithms with very high precision using analytic combinatorics and analytic information theory. Part I compiles known results of pattern matching problems via analytic methods. Part II focuses on applications to various data structures on words, such as digital trees, suffix trees, string complexity and string-based data compression. The authors use results and techniques from Part I and also introduce new methodology such as the Mellin transform and analytic depoissonization. More than 100 end-of-chapter problems help the reader to make the link between theory and practice.

Table of Contents

  1. Coverpage
  2. Halftitle page
  3. Title page
  4. Copyright page
  5. Dedication page
  6. Contents
  7. Foreword
  8. Preface
  9. Acknowledgments
  10. About the sketches
  11. I ANALYSIS
    1. Chapter 1 Probabilistic Models
      1. 1.1 Probabilistic models on words
      2. 1.2 Probabilistic tools
      3. 1.3 Generating functions and analytic tools
      4. 1.4 Special functions
        1. 1.4.1 Euler’s gamma function
        2. 1.4.2 Riemann’s zeta function
      5. 1.5 Exercises
      6. Bibliographical notes
    2. Chapter 2 Exact String Matching
      1. 2.1 Formulation of the problem
      2. 2.2 Language representation
      3. 2.3 Generating functions
      4. 2.4 Moments
      5. 2.5 Limit laws
        1. 2.5.1 Pattern count probability for small values of r
        2. 2.5.2 Central limit laws
        3. 2.5.3 Large deviations
        4. 2.5.4 Poisson laws
      6. 2.6 Waiting times
      7. 2.7 Exercises
      8. Bibliographical notes
    3. Chapter 3 Constrained Exact String Matching
      1. 3.1 Enumeration of (d, k) sequences
        1. 3.1.1 Languages and generating functions
      2. 3.2 Moments
      3. 3.3 The probability count
      4. 3.4 Central limit law
      5. 3.5 Large deviations
      6. 3.6 Some extensions
      7. 3.7 Application: significant signals in neural data
      8. 3.8 Exercises
      9. Bibliographical notes
    4. Chapter 4 Generalized String Matching
      1. 4.1 String matching over a reduced set
      2. 4.2 Generalized string matching via automata
      3. 4.3 Generalized string matching via a language approach
        1. 4.3.1 Symbolic inclusion–exclusion principle
        2. 4.3.2 Multivariate generating function
        3. 4.3.3 Generating function of a cluster
        4. 4.3.4 Moments and covariance
      4. 4.4 Exercises
      5. Bibliographical notes
    5. Chapter 5 Subsequence String Matching
      1. 5.1 Problem formulation
      2. 5.2 Mean and variance analysis
      3. 5.3 Autocorrelation polynomial revisited
      4. 5.4 Central limit laws
      5. 5.5 Limit laws for fully constrained pattern
      6. 5.6 Generalized subsequence problem
      7. 5.7 Exercises
      8. Bibliographical notes
  12. II APPLICATIONS
    1. Chapter 6 Algorithms and Data Structures
      1. 6.1 Tries
      2. 6.2 Suffix trees
      3. 6.3 Lempel–Ziv’77 scheme
      4. 6.4 Digital search tree
      5. 6.5 Parsing trees and Lempel–Ziv’78 algorithm
      6. Bibliographical notes
    2. Chapter 7 Digital Trees
      1. 7.1 Digital tree shape parameters
      2. 7.2 Moments
        1. 7.2.1 Average path length in a trie by Rice’s method
        2. 7.2.2 Average size of a trie
        3. 7.2.3 Average depth in a DST by Rice’s method
        4. 7.2.4 Multiplicity parameter by Mellin transform
        5. 7.2.5 Increasing domains
      3. 7.3 Limiting distributions
        1. 7.3.1 Depth in a trie
        2. 7.3.2 Depth in a digital search tree
        3. 7.3.3 Asymptotic distribution of the multiplicity parameter
      4. 7.4 Average profile of tries
      5. 7.5 Exercises
      6. Bibliographical notes
    3. Chapter 8 Suffix Trees and Lempel–Ziv’77
      1. 8.1 Random tries resemble suffix trees
        1. 8.1.1 Proof of Theorem 8.1.2
        2. 8.1.2 Suffix trees and finite suffix trees are equivalent
      2. 8.2 Size of suffix tree
      3. 8.3 Lempel–Ziv’77
      4. 8.4 Exercises
      5. Bibliographical notes
    4. Chapter 9 Lempel–Ziv’78 Compression Algorithm
      1. 9.1 Description of the algorithm
      2. 9.2 Number of phrases and redundancy of LZ’78
      3. 9.3 From Lempel–Ziv to digital search tree
        1. 9.3.1 Moments
        2. 9.3.2 Distributional analysis
        3. 9.3.3 Large deviation results
      4. 9.4 Proofs of Theorems 9.2.1 and 9.2.2
        1. 9.4.1 Large deviations: proof of Theorem 9.2.2
        2. 9.4.2 Central limit theorem: Proof of Theorem 9.2.1
        3. 9.4.3 Some technical results
        4. 9.4.4 Proof of Theorem 9.4.2
      5. 9.5 Exercises
      6. Bibliographical notes
    5. Chapter 10 String Complexity
      1. 10.1 Introduction to string complexity
        1. 10.1.1 String self-complexity
        2. 10.1.2 Joint string complexity
      2. 10.2 Analysis of string self-complexity
      3. 10.3 Analysis of the joint complexity
        1. 10.3.1 Independent joint complexity
        2. 10.3.2 Key property
        3. 10.3.3 Recurrence and generating functions
        4. 10.3.4 Double depoissonization
      4. 10.4 Average joint complexity for identical sources
      5. 10.5 Average joint complexity for nonidentical sources
        1. 10.5.1 The kernel and its properties
        2. 10.5.2 Main results
        3. 10.5.3 Proof of (10.18) for one symmetric source
        4. 10.5.4 Finishing the proof of Theorem 10.5.6
      6. 10.6 Joint complexity via suffix trees
        1. 10.6.1 Joint complexity of two suffix trees for nonidentical sources
        2. 10.6.2 Joint complexity of two suffix trees for identical sources .
      7. 10.7 Conclusion and applications
      8. 10.8 Exercises
      9. Bibliographical notes
  13. Bibliography
  14. Index