Bioinformatics - development of sequence alignment strategies
Sequence alignment of proteins and nucleotides are very important for biologists, it is the first step in many evolutionary and functional studies. These sequences have a very precise function and have mutated over time. Thus, sequences that are similar probably have similar functions, and a similarity among two sequences is mostly indicative of common ancestry. By comparing homologous characters, we can reconstruct the evolutionary events that have led to the formation of the extant sequences from the common ancestor. To compare two or more sequences, we use several sequence alignment strategies. All these algorithms involve the identification of the correct location of deletions, insertions and substitutions that have occurred in a set of sequences since their divergence from a common ancestor. Two different modes of alignments are used : a local alignment will align part of a sequence with part of other sequences in an optimal way. a global alignment will compare each element of sequence with each element in other sequences.
Their usage is different : global alignment algorithms are used in comparative and evolutionary studies, because two genes in different species may be similar over short regions but very different on the remaining parts of the gene, so a local alignment which would try to align the entire sequence would not find these homologies. Local alignment methods have their greatest utility in database searching and retrieval.
Before computer ages, biologists were doing manual alignment. When there are only a few gaps and the two sequences are not too different from each other, a reasonable alignment can be obtained by visual inspection, but this method is subjective and unscalable.
A first algorithmic resolution of this problem has been created by Gibbs and McIntyre in 1970 : the dot-matrix method. The two sequences are written as a header of a twodimentional matrix, and a dot is