| Literature DB >> 24018415 |
Renxiang Yan1, Dong Xu, Jianyi Yang, Sara Walker, Yang Zhang.
Abstract
Protein sequence alignment is essential for template-based protein structure prediction and function annotation. We collect 20 sequence alignment algorithms, 10 published and 10 newly developed, which cover all representative sequence- and profile-based alignment approaches. These algorithms are benchmarked on 538 non-redundant proteins for protein fold-recognition on a uniform template library. Results demonstrate dominant advantage of profile-profile based methods, which generate models with average TM-score 26.5% higher than sequence-profile methods and 49.8% higher than sequence-sequence alignment methods. There is no obvious difference in results between methods with profiles generated from PSI-BLAST PSSM matrix and hidden Markov models. Accuracy of profile-profile alignments can be further improved by 9.6% or 21.4% when predicted or native structure features are incorporated. Nevertheless, TM-scores from profile-profile methods including experimental structural features are still 37.1% lower than that from TM-align, demonstrating that the fold-recognition problem cannot be solved solely by improving accuracy of structure feature predictions.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24018415 PMCID: PMC3965362 DOI: 10.1038/srep02619
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of template identification by different alignment methods
| TM-score | RMSD (Å) | Coverage | |||||
|---|---|---|---|---|---|---|---|
| Methods | First | Best in top10 | First | Best in top10 | First | Best in top10 | CPU |
| Profile-to-profile alignments | |||||||
| MUSTER | 0.435(0.449) | 0.487(0.512) | 10.3(15.2) | 8.7(12.3) | 0.875 | 0.875 | 27.0 |
| HHsearch-II | 0.429(0.449) | 0.477(0.507) | 9.5(21.6) | 9.0(14.1) | 0.767 | 0.820 | 13.0 |
| dPPAS | 0.426(0.438) | 0.481(0.502) | 9.6(20.6) | 8.5(15.3) | 0.819 | 0.844 | 17.0 |
| PPAS | 0.424(0.441) | 0.473(0.499) | 10.3(17.4) | 8.9(13.4) | 0.839 | 0.850 | 10.0 |
| SP3 | 0.424(0.438) | 0.476(0.499) | 10.7(15.7) | 9.1(12.3) | 0.873 | 0.873 | 11.0 |
| HHsearch-I | 0.422(0.444) | 0.472(0.502) | 9.5(20.3) | 9.0(14.7) | 0.763 | 0.817 | 16.0 |
| SPARKS | 0.421(0.433) | 0.469(0.493) | 11.0(15.7) | 9.4(12.1) | 0.891 | 0.886 | 36.0 |
| PROSPECT | 0.418(0.428) | 0.469(0.490) | 11.5(13.3) | 9.9(11.3) | 0.914 | 0.903 | 15.0 |
| PPA | 0.397(0.413) | 0.447(0.469) | 10.9(17.5) | 9.7(15.5) | 0.844 | 0.851 | 25.0 |
| FFAS | 0.393(0.406) | 0.444(0.465) | 9.5(24.2) | 8.6(18.9) | 0.758 | 0.790 | 4.0 |
| PRC | 0.372(0.388) | 0.417(0.442) | 8.6(32.9) | 8.0(24.3) | 0.668 | 0.712 | 23.0 |
| Sequence-to-profile alignments | |||||||
| SAM | 0.344(0.358) | 0.405(0.426) | 10.6(27.5) | 9.9(18.3) | 0.717 | 0.778 | 8.0 |
| PSA | 0.338(0.333) | 0.371(0.392) | 12.9(17.5) | 12.0(15.0) | 0.870 | 0.873 | 9.0 |
| PSI-BLAST | 0.301(0.320) | 0.344(0.369) | 7.8(51.7) | 7.4(42.1) | 0.507 | 0.556 | 4.0 |
| Sequence-to-sequence alignments | |||||||
| NW-align | 0.321(0.336) | 0.377(0.403) | 12.7(21.7) | 11.4(15.0) | 0.849 | 0.866 | 5.0 |
| SW-align | 0.265(0.285) | 0.324(0.348) | 9.9(49.5) | 9.2(35.7) | 0.560 | 0.625 | 4.0 |
| BLAST | 0.246(0.263) | 0.292(0.315) | 8.5(59.7) | 8.2(47.5) | 0.470 | 0.529 | 0.1 |
| Other controls | |||||||
| TM-align | 0.661(0.664) | 0.663(0.683) | 3.1(7.5) | 3.0(7.1) | 0.856 | 0.846 | 90.0 |
| MUSTERSS + BTA + SA | 0.482(0.511) | 0.512(0.559) | 8.0(14.1) | 7.2(11.2) | 0.797 | 0.800 | 26.0 |
| MUSTERSS + BTA | 0.453(0.481) | 0.493(0.536) | 9.5(12.7) | 8.1(11.3) | 0.831 | 0.820 | 26.0 |
| MUSTERSS | 0.447(0.474) | 0.487(0.528) | 9.7(12.7) | 8.3(11.3) | 0.839 | 0.830 | 26.0 |
aAlignment methods as sorted by TM-score in each category.
bAverage TM-score. Values in parentheses are for full-length models built by MODELLER. ‘First’ refers to the top-ranking model based on alignment score; ‘Best in top10’ to the best model of the highest TM-score among the top ten models with the highest alignment scores.
cRMSD to the native.
dAlignment coverage equals to the number of aligned residues divided by target length.
eAverage CPU time in minutes, which consists of constructing profile and building of alignments in a HP DL1000h computer.
Figure 1TM-score histogram of the top hits identified by different algorithms in Easy, Medium and Hard categories.
Figure 2TM-score of full-length models of 20 threading methods on 538 non-homologous proteins versus the alignment scores.
Easy, Medium and Hard targets are colored blue, green and red, respectively. PSI-BLAST, BLAST and PRC use bit score and others use z-score to score the alignments.
Score cutoffs and false positive and negative rates of different programs
| Methods | Cutoff | FPR | FNR | FPR + FNR |
|---|---|---|---|---|
| PSI-BLAST | 50.4 | 0.093 | 0.094 | 0.187 |
| SAM | 14.5 | 0.129 | 0.099 | 0.229 |
| FFAS | 12.9 | 0.170 | 0.100 | 0.270 |
| PPA | 7.8 | 0.126 | 0.158 | 0.284 |
| SP3 | 6.5 | 0.117 | 0.175 | 0.292 |
| SPARKS | 6.4 | 0.111 | 0.194 | 0.306 |
| SW-align | 8.0 | 0.162 | 0.149 | 0.311 |
| PRC | 20.8 | 0.110 | 0.205 | 0.316 |
| BLAST | 35.7 | 0.149 | 0.169 | 0.318 |
| PPAS | 6.9 | 0.186 | 0.145 | 0.331 |
| dPPAS | 13.2 | 0.113 | 0.233 | 0.346 |
| HHsearch-I | 8.1 | 0.172 | 0.179 | 0.351 |
| MUSTER | 6.2 | 0.147 | 0.205 | 0.353 |
| HHsearch-II | 9.3 | 0.16 | 0.200 | 0.360 |
| PROSPECT | 4.2 | 0.075 | 0.317 | 0.392 |
| PSA | 4.1 | 0.128 | 0.400 | 0.528 |
| NW-align | 1.5 | 0.464 | 0.160 | 0.625 |
*Methods sorted by sum of false positive rate (FPR) and false negative rate (FNR).
Figure 3The illustration of template identifications for 2qudA.
(A) MUSTER with predicted secondary structure; (B) MUSTERSS with native secondary structure. The experimental structure and MUSTER models are shown in red and green cartoons, respectively, and the first two beta-strands in (A) are in yellow on the template. The secondary structures are labeled as ‘C’ for coil, ‘E’ for strand and ‘H’ for helix, where ‘Pred’ and ‘Obs’ denotes the PSI-pred prediction and the native, respectively. ‘*’ in (A) marks the residues with mis-predicted secondary structure.