CHAPTER 3

FINITE AUTOMATA IN PATTERN MATCHING

Jan Holub

3.1 INTRODUCTION

Stringology is a part of computer science dealing with processing strings and sequences. It finds many important applications in various fields used by information society. Biological sequences processing is one of the most important fields. However, the finite automata theory is a well-developed formal system used for a long time in the area of compiler construction.

The chapter aims to show various approaches of the finite automata use in stringology. The approaches are demonstrated on practical examples. Of course, it is impossible to describe all approaches, as it would be out of scope of the chapter. However, we would like to present basic approaches that the reader can modify and combine to a given task.

The following four kinds of finite automata use were identified in the area of stringology:

1. A direct use of deterministic finite automata (DFA)

2. A simulation of nondeterministic finite automata (NFA)

3. A use of finite automata as a model for computation

4. A composition of various automata approaches for particular subproblems

The direct use of DFA is used in case of pattern matching automata when a DFA is built over the given pattern and then a text is given as an input to the DFA (the pattern is preprocessed). One also can construct a DFA over a given text and a pattern is given as an input of DFA, which is the case of factor automata (providing a complete index of the text; the text is preprocessed). ...

Get Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.