| Literature DB >> 22536955 |
Jia-Ming Chang1, Paolo Di Tommaso, Jean-François Taly, Cedric Notredame.
Abstract
BACKGROUND: Transmembrane proteins (TMPs) constitute about 20~30% of all protein coding genes. The relative lack of experimental structure has so far made it hard to develop specific alignment methods and the current state of the art (PRALINE™) only manages to recapitulate 50% of the positions in the reference alignments available from the BAliBASE2-ref7.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22536955 PMCID: PMC3303701 DOI: 10.1186/1471-2105-13-S4-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Typical colour output (tm_html). In this example, the protein Or9a of Drosophila melanogaster and its orthologues of other Drosophila species were aligned with PSITM template. The colour code corresponds to prediction by HMMTOP, where yellow: in loop, red: TM helix, blue: out loop. Notably, the predicted topology of the Or9a set is consistent with the Benton et al.'s conclusion [20].
Comparison between the PSI-Coffee and other multiple sequence alignment methods on each BAliBASE2-ref7 family
| family | MSAProbs | Kalign | PROMALS | MAFFT | ProbCons | PRALINE™ | PSI-Coffee |
|---|---|---|---|---|---|---|---|
| SP | |||||||
| 7TM | 0.981 | 0.938 | 0.985 | 0.962 | 0.978 | 0.983 | |
| Nat | 0.789 | 0.765 | 0.797 | 0.777 | 0.732 | 0.779 | |
| ACR | 0.989 | 0.969 | 0.964 | 0.989 | 0.987 | 0.992 | |
| DTD | 0.972 | 0.961 | 0.965 | 0.975 | 0.967 | 0.960 | |
| ION | 0.817 | 0.810 | 0.761 | 0.788 | 0.837 | 0.783 | |
| MSL | 0.965 | 0.936 | 0.980 | 0.958 | 0.986 | 0.971 | |
| PHOTO | 0.957 | 0.928 | 0.954 | 0.949 | 0.957 | 0.955 | |
| PTGA | 0.899 | 0.826 | 0.863 | 0.886 | 0.903 | 0.808 | |
| avg | 0.921 | 0.892 | 0.913 | 0.916 | 0.907 | 0.921 | |
| Pairs | 3,117,244 | 3,014,033 | 3,109,227 | 3,093,269 | 3,108,377 | 3,080,356 | |
| TC | |||||||
| 7TM | 0.600 | 0.360 | 0.440 | 0.550 | 0.560 | 0.620 | |
| Nat | 0.190 | 0.190 | 0.100 | 0.110 | 0.180 | 0.180 | |
| ACR | 0.830 | 0.620 | 0.530 | 0.830 | 0.810 | 0.880 | |
| DTD | 0.540 | 0.580 | 0.400 | 0.540 | 0.520 | 0.580 | |
| ION | 0.270 | 0.130 | 0.260 | 0.260 | 0.000 | 0.210 | |
| MSL | 0.910 | 0.850 | 0.950 | 0.900 | 0.960 | 0.930 | |
| PHOTO | 0.510 | 0.440 | 0.510 | 0.510 | 0.540 | 0.520 | |
| PTGA | 0.320 | 0.280 | 0.370 | 0.270 | |||
| avg | 0.531 | 0.436 | 0.471 | 0.509 | 0.530 | 0.506 | |
| Cols | 1,066 | 863 | 814 | 1,057 | 1,054 | 1,058 | |
SP and TC are the accuracy figures reported by BAliScore 3.01 based on core regions (Additional file 1). The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively. The best performance of each family is marked in bold.
Statistical significance test of the performance between two methods
| SP | MSAProbs | Kalign | PROMALS | MAFFT | ProbCons | PRALINE™ | PSI-Coffee |
|---|---|---|---|---|---|---|---|
| MSAProbs | NA | 0.547 | 0.483 | 0.675 | 0.779 | 0.726 | |
| Kalign | NA | 0.195 | 0.195 | 0.078 | |||
| PROMALS | 0.547 | 0.195 | NA | 0.575 | 0.742 | 0.742 | 0.528 |
| MAFFT | 0.483 | 0.575 | NA | 0.889 | 0.844 | 0.779 | |
| ProbCons | 0.675 | 0.742 | 0.889 | NA | 0.461 | 0.262 | |
| PRALINE™ | 0.779 | 0.195 | 0.742 | 0.844 | 0.461 | NA | 0.641 |
| PSI-Coffee | 0.726 | 0.078 | 0.528 | 0.779 | 0.262 | 0.641 | NA |
| TC | MSAProbs | Kalign | PROMALS | MAFFT | ProbCons | PRALINE™ | PSI-Coffee |
| MSAProbs | NA | 0.150 | 0.529 | 1.000 | 0.779 | 0.204 | |
| Kalign | NA | 0.742 | 0.078 | 0.055 | 0.272 | ||
| PROMALS | 0.150 | 0.742 | NA | 0.529 | 0.362 | 0.624 | 0.233 |
| MAFFT | 0.529 | 0.078 | 0.529 | NA | 0.362 | 0.945 | 0.262 |
| ProbCons | 1.000 | 0.055 | 0.362 | 0.362 | NA | 1.000 | 0.353 |
| PRALINE™ | 0.779 | 0.272 | 0.624 | 0.945 | 1.000 | NA | 0.195 |
| PSI-Coffee | 0.204 | 0.233 | 0.262 | 0.353 | 0.195 | NA | |
Wilcoxon Signed-Rank Test (R: wilcoxon.test, paired = TRUE)
NULL hypothesis: x and y have identical performance
0.05 significance level: p-value smaller than 0.05 marked in bold, reject null hypothesis
Performance comparison of different database sizes for the BAliBASE2-ref7
| database | # of seqs | SP | TC | extension(s) | total(s) |
|---|---|---|---|---|---|
| default T-Coffee | 0 | 0.911 | 0.498 | 0 | 2,735 |
| UniRef50-TM | 87,989 | 0.916 | 0.561 | 1,483 | 8,177 |
| UniRef90-TM | 263,306 | 0.918 | 0.548 | 3,343 | 9,610 |
| UniRef100-TM | 613,015 | 0.925 | 0.545 | 6,499 | 12,111 |
| UniProt-TM | 818,635 | 0.923 | 0.536 | 7,871 | 13,285 |
| UniRef50 | 3,077,464 | 0.920 | 0.553 | 19,087 | 26,442 |
| UniRef90 | 6,544,144 | 0.924 | 0.561 | 40,448 | 46,478 |
| UniRef100 | 9,865,668 | 0.922 | 0.554 | 66,696 | 71,895 |
| UniProt | 11,009,767 | 0.923 | 0.563 | 66,964 | 72,199 |
| NCBI NR | 10,565,004 | 0.921 | 0.554 | 65,201 | 70,375 |
# of seqs indicates the number of sequences contained in the considered database subset. extension indicates the CPU time required by the homology extension process. total indicates the total CPU time usage.
Statistical significance test of the performance in different databases
| SP | default | UniRef50-TM | UniRef90-TM | UniRef100-TM | UniProt-TM | UniRef50 | UniRef90 | UniRef100 | UniProt | NR |
|---|---|---|---|---|---|---|---|---|---|---|
| default | NA | 0.447 | 0.141 | 0.195 | ||||||
| UniRef50-TM | 0.447 | NA | 0.833 | 0.675 | 0.483 | 0.410 | 0.141 | 0.446 | 0.160 | 0.446 |
| UniRef90-TM | 0.141 | 0.833 | NA | 0.161 | 0.395 | 0.799 | 0.074 | 0.172 | 0.205 | 0.495 |
| UniRef100-TM | 0.675 | 0.161 | NA | 0.713 | 0.834 | 0.674 | 0.202 | 0.786 | 0.293 | |
| UniProt-TM | 0.483 | 0.395 | 0.713 | NA | 0.933 | 0.752 | 0.735 | 0.892 | 0.779 | |
| UniRef50 | 0.195 | 0.410 | 0.799 | 0.834 | 0.933 | NA | 0.447 | 0.461 | 0.553 | 0.483 |
| UniRef90 | 0.141 | 0.074 | 0.674 | 0.752 | 0.447 | NA | 1.000 | 0.598 | 0.430 | |
| UniRef100 | 0.446 | 0.172 | 0.202 | 0.735 | 0.461 | 1.000 | NA | 0.798 | 0.203 | |
| UniProt | 0.160 | 0.205 | 0.786 | 0.892 | 0.553 | 0.598 | 0.798 | NA | 0.528 | |
| NR | 0.446 | 0.495 | 0.293 | 0.779 | 0.483 | 0.430 | 0.203 | 0.528 | NA | |
| TC | default | UniRef50-TM | UniRef90-TM | UniRef100-TM | UniProt-TM | UniRef50 | UniRef90 | UniRef100 | UniProt | NR |
| default | NA | 0.092 | 0.148 | 0.050 | 0.050 | |||||
| UniRef50-TM | NA | 0.834 | 0.281 | 0.207 | 1.000 | 0.799 | 0.396 | 1.000 | 0.396 | |
| UniRef90-TM | 0.834 | NA | 0.855 | 0.584 | 0.865 | 0.281 | 0.672 | 0.462 | 0.892 | |
| UniRef100-TM | 0.092 | 0.281 | 0.855 | NA | 0.174 | 0.798 | 0.554 | 0.916 | 0.423 | 0.832 |
| UniProt-TM | 0.148 | 0.207 | 0.584 | 0.174 | NA | 0.611 | 0.397 | 0.396 | 0.178 | 0.402 |
| UniRef50 | 1.000 | 0.865 | 0.798 | 0.611 | NA | 0.674 | 0.832 | 0.670 | 1.000 | |
| UniRef90 | 0.050 | 0.799 | 0.281 | 0.554 | 0.397 | 0.674 | NA | 0.581 | 1.000 | 0.588 |
| UniRef100 | 0.050 | 0.396 | 0.672 | 0.916 | 0.396 | 0.832 | 0.581 | NA | 0.528 | 1.000 |
| UniProt | 1.000 | 0.462 | 0.423 | 0.178 | 0.670 | 1.000 | 0.528 | NA | 0.528 | |
| NR | 0.396 | 0.892 | 0.832 | 0.402 | 1.000 | 0.588 | 1.000 | 0.528 | NA | |
Wilcoxon Signed-Rank Test (R: wilcoxon.test, paired = TRUE)
NULL hypothesis: x and y have identical performance
0.05 significance level: p-value smaller than 0.05 marked in bold, reject null hypothesis
Comparison of running time
| MSAProbs | Kalign | PROMALS | MAFFT | ProbCons | T-Coffee | TM-Coffee | ||
|---|---|---|---|---|---|---|---|---|
| extension | alignment | |||||||
| 7TM | 759.42 | 0.94 | 19,587.13 | 14.03 | 711.32 | 1,164.66 | 488.04 | 2,684.41 |
| Nat | 244.02 | 0.60 | 13,285.11 | 43.51 | 269.07 | 243.42 | 175.40 | 395.88 |
| ACR | 548.96 | 1.13 | 19,102.74 | 41.73 | 620.30 | 418.28 | 174.24 | 1,130.15 |
| DTD | 343.45 | 0.71 | 12,555.83 | 70.90 | 396.70 | 335.35 | 187.64 | 949.92 |
| ION | 413.04 | 1.06 | 10,034.86 | 75.68 | 498.60 | 364.19 | 203.80 | 931.70 |
| MSL | 0.75 | 0.01 | 646.19 | 0.12 | 0.72 | 0.84 | 16.72 | 9.17 |
| PHOTO | 16.73 | 0.06 | 2,388.75 | 1.66 | 15.40 | 18.05 | 55.89 | 25.85 |
| PTGA | 202.76 | 0.48 | 9,745.36 | 21.75 | 217.93 | 190.35 | 181.00 | 567.69 |
| SUM | 2,529.13 | 4.99 | 87,345.97 | 269.37 | 2,730.04 | 2,735.14 | 1,482.73 | 6,694.77 |
The PRALINE™ is run through its web server (standalone version not available), so the comparison does not include PRALINE™. MSAProbs is measured by using single threaded for comparison with TM-Coffee in single core. The unit is second.
Figure 2Line chart of the average TC respect to different . The number of homologues is counted by summing all homologues found in eight families and plotted in log10 scale. The standard error of TC score cross eight families is the range of dash line. SP is skipped due to minor change respect to different e-value thresholds.