Chapter 2

Protein Sequence Motif Information Discovery

BERNARD CHEN

2.1 Introduction

Proteins can be regarded as one of the most important elements in the process of life; they can be grouped into different families according to their sequential or structural similarities. Many biochemical tests suggest that a sequence determines conformation completely, because all the information that is necessary for specifying protein interaction sites with other molecules is embedded into the protein's amino acid sequence. The close relationship between protein sequence and structure plays an important role in current analysis and prediction technologies. Therefore, understanding the hidden relationships between protein structures and their sequences is an important task in modern bioinformatics research. The biological term sequence motif denotes a relatively small number of functionally or structurally conserved sequence patterns that occur repeatedly in a group of related proteins. These motif patterns may be able to predict the structural or functional area of other proteins, such as enzyme binding sites, DNA or RNA binding sites, prosthetic attachment sites, and protein–protein interaction sites.

PROSITE [1], PRINTS [2], and BLOCKS [3] are three of the most popular motif databases. PROSITE is a method for determining the function of uncharacterized proteins translated from genomic or cyclic DNA (cDNA) sequences. It consists of a database of biologically significant sites and patterns ...

Get Algorithmic and Artificial Intelligence Methods for Protein Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.