| Literature DB >> 20406500 |
Chan-seok Jeong1, Dongsup Kim.
Abstract
BACKGROUND: Although both conservation and correlated mutation (CM) are important information reflecting the different sorts of context in multiple sequence alignment, most of alignment methods use sequence profiles that only represent conservation. There is no general way to represent correlated mutation and incorporate it with sequence alignment yet.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20406500 PMCID: PMC3165164 DOI: 10.1186/1471-2105-11-S2-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Selected parameters for the different combination of scoring terms.
| Method | Parameters | Average MaxSub | |||||||
|---|---|---|---|---|---|---|---|---|---|
| PPA | 1.0 | - | - | - | 6.0 | 0.5 | 0.5 | 0.3099 | |
| PPA_SS | 1.0 | 1.5 | - | - | 5.0 | 0.5 | 0.0 | 0.3183 | |
| CMPA | - | - | 1.0 | 2.8 | 7.0 | 1.7 | 0.5 | 0.2873 | |
| CMPA_PPA | 1.0 | - | 0.5 | 3.2 | 8.0 | 0.4 | 1.0 | 0.3291 | |
| CMPA_PPA_SS | 1.0 | 1.5 | 0.5 | 3.2 | 8.0 | 0.6 | 0.0 | 0.3334 | |
w, w, w, and d0 denote the corresponding notations in alignment score function, and g, g, and base denote gap-open cost, gap-extension cost, and baseline parameter used for dynamic programming procedure. PPA and PPA_SS denote profile-profile alignment without and with secondary structure prediction, respectively, CMPA denote alignment solely using CM profile, and CMPA_PPA and CMPA_PPA_SS denote alignment combining CM profile with PPA and PPA_SS, respectively.
Average MaxSub scores of test set by different methods
| Method | Average MaxSub | ||
|---|---|---|---|
| Family | Superfamily | All | |
| PPA | 0.4485 | 0.2053 | 0.2851 |
| PPA_SS | 0.4524 | 0.2177 | 0.2947 |
| CMPA_PPA | 0.4524 | 0.2248 | 0.2994 |
| CMPA_PPA_SS | 0.4572 | 0.2338 | 0.3070 |
Family and Superfamily denote the average MaxSub score at SCOP family and superfamily level, and All denotes the average MaxSub score of all samples. PPA and PPA_SS denote profile-profile alignment without and with secondary structure prediction, respectively, and CMPA_PPA and CMPA_PPA_SS denote alignment combining CM profile with PPA and PPA_SS, respectively.
Figure 1Average MaxSub scores of various methods, measured by different sequence identities at family level Average MaxSub scores above 25% sequence idenitty are not shown.
Figure 2Average MaxSub scores of various methods, measured by different sequence identities at superfamily level Average MaxSub scores above 25% sequence idenitty are not shown.
Average MaxSub scores of test set with 50-99 MSA sequences by different methods
| Method | Average MaxSub | ||
|---|---|---|---|
| Family | Superfamily | All | |
| PPA | 0.4708 | 0.1714 | 0.2631 |
| PPA_SS | 0.4762 | 0.1839 | 0.2734 |
| CMPA_PPA | 0.4811 | 0.1832 | 0.2744 |
| CMPA_PPA_SS | 0.4852 | 0.1949 | 0.2838 |
Family and Superfamily denote the average MaxSub score at SCOP family and superfamily level, and All denotes the average MaxSub score of all samples. PPA and PPA_SS denote profile-profile alignment without and with secondary structure prediction, respectively, and CMPA_PPA and CMPA_PPA_SS denote alignment combining CM profile with PPA and PPA_SS, respectively.
Average MaxSub scores of test set with 1-49 MSA sequences by different methods
| Method | Average MaxSub | ||
|---|---|---|---|
| Family | Superfamily | All | |
| PPA | 0.4209 | 0.1816 | 0.2499 |
| PPA_SS | 0.4239 | 0.1925 | 0.2586 |
| CMPA_PPA | 0.4222 | 0.1896 | 0.2560 |
| CMPA_PPA_SS | 0.4305 | 0.2024 | 0.2676 |
Family and Superfamily denote the average MaxSub score at SCOP family and superfamily level, and All denotes the average MaxSub score of all samples. PPA and PPA_SS denote profile-profile alignment without and with secondary structure prediction, respectively, and CMPA_PPA and CMPA_PPA_SS denote alignment combining CM profile with PPA and PPA_SS, respectively.
Protein pairs with the MaxSub scores of various methods
| Protein 1 (SCOP classification) | Protein 2 (SCOP classification) | Sequence identity | PPA | PPA_SS | CMPA_PPA | CMPA_PPA_SS |
|---|---|---|---|---|---|---|
| d1xd7a_ (a.4.5.55) | d1ldja1 (a.4.5.34) | 14.9 | 0.1197 | 0.1256 | 0.4042 | 0.4089 |
| d1mvea_ (b.29.1.2) | d1ulea_ (b.29.1.3) | 13 | 0.0515 | 0.0483 | 0.2201 | 0.2278 |
| d1uwva2 (c.66.1.40) | d1p1ca_ (c.66.1.16) | 9.4 | 0.0472 | 0.0533 | 0.2743 | 0.2996 |
| d1lrza3 (d.108.1.4) | d1tiqa_ (d.108.1.1) | 15.4 | 0.2458 | 0.2908 | 0.4420 | 0.4418 |
PPA and PPA_SS denote profile-profile alignment without and with secondary structure prediction, respectively, and CMPA_PPA and CMPA_PPA_SS denote alignment combining CM profile with PPA and PPA_SS, respectively.
Figure 3Correlated mutation matrices of the proteins listed in Table 5 The upper-lower triangular matrix represents mutual information between the residues of the respective protein, (a) d1xd7a_-d1ldja1, (b) d1mvea_-d1u1ea_, (c) d1uwva2-d1p1ca_, and (d) d1lrza3-d1tiqa_, respectively. Note that the intensity and the size of image are scaled differently, regarding to the distribution of mutual information and the alignment length.
Figure 4High scoring residue pairs of the examples listed in Table 5 Each pair of proteins, (a) d1xd7a_-d1ldja1, (b) d1mvea_-d1u1ea_, (c) d1uwva2-d1p1ca_, and (d) d1lrza3-d1tiqa_ is superimposed based on the CMPA_PPA alignment, and coded as cyan-yellow, respectively. Top-10 residue pairs ranked by sequence profile score and CM profile score are shown as spheres coded as red and blue, respectively, but the residue pairs ranked by both are coded as green.