### 6.5 SELF-TIMED GENETIC STRING DISTANCE EVALUATION

#### 6.5.1 String Comparison

The *string comparison problem* is to find the minimal distance between two strings from a user-defined alphabet set, where the distance between two strings is defined as the cost of transforming one string into the other. Any string can be transformed into another with the aid of the operations insert, delete, and substitute. Distance is considered as the weighted sum of such operations. For example, one way to turn ‘ATGC’ into ‘AATG’, would start by substituting the second letter ‘T’ in ‘ATGC’ with ‘A’, substituting the letter ‘G’ in ‘ATGC’ with ‘T’, and substituting the fourth letter ‘C’ with ‘G’. Alternatively, we could insert an ‘A’ before ‘ATGC’, and delete the fourth letter, ‘C’, in ‘ATGC’. If we define the cost of both insertion and deletion operations as ‘1’, and a substitution operation as ‘2’, then the distance between ‘AATG’ and ‘ATGC’ is ‘6’ using the first method, and ‘2’ with the second. In the absence of any better alternative, the minimal distance between ‘AATG’ and ‘ATGC’ is ‘2’.

String comparison is a common and important operation in almost all information retrieval systems, such as text retrieval, dictionary searching, and biomorphic database retrieval. It is regarded as one of the Grand Challenge computing problems of this decade.

#### 6.5.2 Genetic Sequence Comparison

The *genetic sequence comparison* problem is a special case of the *string comparison problem* applied in the field of ...