Chapter 4

Multiple Sequence Alignment and Clustering with Dot Matrices, Entropy, and Genetic Algorithms

John Tsiligaridis

Abstract

The purpose of this project is to present a set of algorithms and their efficiency for Multiple Sequence Alignment (MSA) and clustering problems, including also solutions in distributive environments with Hadoop. The strength, the adaptability, and the effectiveness of the genetic algorithms (GAs) for both problems are pointed out. MSA is among the most important tasks in computational biology. In biological sequence comparison, emphasis is given to the simultaneous alignment of several sequences. GAs are stochastic approaches for efficient and robust search that can play a significant role for MSA and clustering. ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.