| Literature DB >> 17182633 |
Xiaoqiu Huang1, Douglas L Brutlag.
Abstract
The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257,716 pairs of homologous sequences from 100 protein families. On 168,475 of the 257,716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM.Entities:
Mesh:
Year: 2006 PMID: 17182633 PMCID: PMC1802605 DOI: 10.1093/nar/gkl1063
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1An initial part of a global alignment from GAP4 on two protein sequences (SwissProt accession nos Q9NTI2 and Q9Y2G3). A difference block is indicated by the plus sign +. Each column of the alignment is marked with one of the three letters: l, m and h, which indicates one of the three parameter sets used to score the column.
Groups of identical alignments from GAP4 with seven combinations of parameter sets on five pairs of SwissProt protein sequences
| Accession of sequence A | Length | Accession of sequence B | Length | Alignment groupsa |
|---|---|---|---|---|
| Q9NY15 | 2570 | Q8R4U0 | 2559 | {l} {m, mh} {h} {lm} {lh} {lmh} |
| O12990 | 1153 | Q62120 | 1129 | {l, lm} {m} {h} {lh, lmh} {mh} |
| Q9H7F0 | 1130 | Q9NQ11 | 1180 | {l, lh, lmh} {m, mh} {h} {lm} |
| Q82Z40 | 1207 | Q9XPS7 | 1076 | {l} {m} {h} {lm} {lh, lmh} {mh} |
| Q9NTI2 | 1076 | Q9Y2G3 | 1177 | {l, lh, lmh} {m, mh} {h} {lm} |
aEach alignment is denoted by the parameter set combination used to produce the alignment. Each parameter set combination is indicated by the letters for the parameter sets in the combination, with the letter l for the low parameter set, m for the medium parameter set and h for the high parameter set.
Proportions of the motif residues in the l, m and h sections of the alignment, respectively, and proportions of the alignment positions in the l, m and h sections of the alignment, respectively, for each alignment and for all alignments
| Alignment | Proportions of motif residues | Proportions of alignment positions | ||||
|---|---|---|---|---|---|---|
| Type l | Type m | Type h | Type l | Type m | Type h | |
| 1 | 0.06 | 0.43 | 0.51 | 0.25 | 0.44 | 0.31 |
| 2 | 0.10 | 0.04 | 0.86 | 0.39 | 0.03 | 0.58 |
| 3 | 0.00 | 0.00 | 1.00 | 0.43 | 0.05 | 0.52 |
| 4 | 0.01 | 0.02 | 0.97 | 0.42 | 0.19 | 0.39 |
| 5 | 0.12 | 0.00 | 0.88 | 0.49 | 0.06 | 0.45 |
| All | 0.06 | 0.20 | 0.74 | 0.37 | 0.21 | 0.42 |