| Literature DB >> 17081313 |
Virpi Ahola1, Tero Aittokallio, Mauno Vihinen, Esa Uusipaikka.
Abstract
BACKGROUND: Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments.Entities:
Mesh:
Year: 2006 PMID: 17081313 PMCID: PMC1687212 DOI: 10.1186/1471-2105-7-484
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1MultiDisp visualization of part of the Ras-like proteins (upper) and the corresponding scaled -log(p)-values (lower). The curves show the p-values calculated using (red) Blosum62, (green) Gonnet250, (black) PAM250, (magenta) identity scoring matrices and (blue) classification of the amino acids for the Ras-like proteins.
Figure 2MultiDisp visualization of the a) . The curves show (red) the scaled -log(p)-values, (blue) Mean Distance and (green) Information content scores for the alignment. Consensus sequence for the alignment positions in c) is F P S L P E L V E H Y.
Figure 3MultiDisp visualization of the a) I, b) II, c) III and d) IV motifs of the peptidase M13, e) I, f) II and g) III motifs of the subtilase, and h) I and i) II motifs of the . MD = mean distance, IC = information content scores and maxZ = scaled -log(p)-values for the alignment.
Median (lower and upper quartiles) of the -log(p)-values with different residue scoring schema together with the MD and IC scores.
| Score | Box11 | Box12 | Box22 | Box23 | Box33 |
| LogP Blosum62 | 708 (708, 708) | 611 (198, 708) | 208 (161, 547) | 120 (99, 177) | 75 (47, 123) |
| LogP Gonnet | 708 (708, 708) | 190 (164, 708) | 158 (131, 189) | 98 (78, 136) | 64 (56, 106) |
| LogP Indep | 708 (708, 708) | 202 (158, 708) | 171 (108, 202) | 75 (63, 113) | 57 (35, 96) |
| LogP PAM | 708 (212, 708) | 201 (166, 708) | 153 (125, 201) | 94 (81, 133) | 66 (56, 105) |
| LogP 6 groups | 644 (631, 683) | 312 (300, 333) | 279 (241, 341) | 216 (77, 240) | 43 (26, 91) |
| MD | 92 (86, 97) | 43 (29, 55) | 34 (24, 42) | 24 (19, 31) | 20 (15, 25) |
| IC | 57 (55, 59) | 39 (34, 48) | 31 (27, 35) | 21 (19, 23) | 13 (10, 19) |
Box11 and Box33 represent positions with low and high entropy and variability, respectively. The three middle columns represent the moderately conserved positions. More detailed description of the categories can be found in Oliveira et al. [35].
Figure 4Scatterplot between the AQ and SP scores for the Mafft (L-INS-i) alignments (r = 0.53). Four outlying alignments on the bottom right corner are from the reference sets 11 and 40.
Figure 5Barplots for the median (red) AQ, (green) SP and (blue) CP scores in the BAliBASE reference sets. Error bars show the 25% and 75% percentile values.
The alignment programs which obtained the highest AQ, SP and CS scores in different reference sets.
| Top Programs | |||
| Reference set | AQ | SP | CS |
| 11 | Probcons, L-INS-i, Muscle | ProbCons | ProbCons, L-INS-i |
| 12 | Probcons, Muscle, L-INS-i, Tcoffee | ProbCons | ProbCons |
| 20 | L-INS-i, Clustal, Muscle, Tcoffee | ProbCons | ProbCons, L-INS-i |
| 30 | L-INS-i, Clustal, TCoffee, FFT-NS-2, Muscle, Probcons | L-INS-i, Probcons, Muscle, TCoffee | L-INS-i, Probcons, TCoffee, Muscle |
| 40 | L-INS-i, TCoffee | L-INS-i, Probcons, TCoffee | L-INS-i, TCoffee, Probcons |
| 50 | L-INS-i, TCoffee, Probcons, Muscle, FFT-NS-2, Clustal | TCoffee, Probcons, L-INS-i | L-INS-i, TCoffee, Probcons, Muscle |
The programs have no statistically significant differences between each other in the particular set. Statistical analyses were performed using Wilcoxon signed rank test.
Alignment programs and parameters used.
| Program | Version | Parameters (strategy) |
| Clustal[9] | 1.83 | default |
| TCoffee[10] | 2.66 | default |
| Dialign2[11] | 2.2.1 | default |
| Probcons [12] | 1.10 | default |
| Muscle[13] | 3.52 | default |
| Mafft[14, 15] | 5.667 | -localpair -maxiterate 1000 (L-INS-i) |
| Mafft[14, 15] | 5.667 | default (FFT-NS-2) |