| Literature DB >> 15849316 |
Yang Zhang1, Jeffrey Skolnick.
Abstract
We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and Dynamic Programming (DP). The algorithm is approximately 4 times faster than CE and 20 times faster than DALI and SAL. On average, the resulting structure alignments have higher accuracy and coverage than those provided by these most often-used methods. TM-align is applied to an all-against-all structure comparison of 10 515 representative protein chains from the Protein Data Bank (PDB) with a sequence identity cutoff <95%: 1996 distinct folds are found when a TM-score threshold of 0.5 is used. We also use TM-align to match the models predicted by TASSER for solved non-homologous proteins in PDB. For both folded and misfolded models, TM-align can almost always find close structural analogs, with an average root mean square deviation, RMSD, of 3 A and 87% alignment coverage. Nevertheless, there exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB. This correlation could be used to assist in model selection in blind protein structure predictions. The TM-align program is freely downloadable at http://bioinformatics.buffalo.edu/TM-align.Entities:
Mesh:
Year: 2005 PMID: 15849316 PMCID: PMC1084323 DOI: 10.1093/nar/gki524
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Structural alignments by different algorithms for 200 non-homologous PDB proteins
| Average over all pairs | Average over pairs with TMM | 〈t〉 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 〈R〉 | 〈L〉 | 〈cov〉 | 〈TM〉 | 〈RM〉 | 〈LM〉 | 〈covM〉 | 〈TMM〉 | ||
| Test set of all 39 800 structure pairs | |||||||||
| CE | 6.52 | 64.3 | 34.7% | 0.169 | 3.95 | 128.8 | 61.4% | 0.441 | 2.25 |
| SAL | 7.33 | 95.3 | 47.3% | 0.229 | 5.84 | 164.8 | 72.8% | 0.474 | 10.00 |
| TM-align | 4.99 | 87.4 | 42.0% | 0.253 | 4.45 | 166.2 | 73.1% | 0.510 | 0.51 |
| Test set of 17 086 pairs where DALI has an output | |||||||||
| CE | 6.36 | 73.0 | 34.7% | 0.185 | 3.95 | 129.2 | 61.2% | 0.440 | 2.28 |
| DALI | 14.25 | 123.2 | 53.5% | 0.223 | 9.40 | 175.2 | 76.8% | 0.471 | 12.22 |
| SAL | 7.53 | 108.4 | 47.5% | 0.241 | 5.83 | 164.4 | 71.7% | 0.471 | 10.13 |
| TM-align | 5.18 | 101.9 | 43.4% | 0.271 | 4.44 | 165.8 | 71.9% | 0.506 | 0.52 |
aResults are averaged over all structure pairs. R, L, cov and TM denote, respectively, the RMSD (in the unit of Å), number of aligned residues, coverage of aligned regions over the target sequence and TM-score as defined in Equation 3.
bFor each protein, only the pair with the maximum TM-score is considered, on which the averages are taken.
cAverage CPU time (in the unit of second) per structure alignment on a 1.26 GHz PIII processor.
Figure 1Illustrative example of structure alignments by different alignment methods for 1atzA and 1auoA. The first row is the ribbon diagram of the native structures of 1atzA (184 residues) and 1auoA (218 residues), which have a sequence identity 16% and adopt the common αβα-sandwich topology. The second and third rows are the structure superposition between the aligned residues by CE (17) and SAL (18), DALI (38) and TM-align algorithms, respectively. The thick and thin backbones denote the aligned residues from 1atzA and 1auoA, respectively. The indicated numbers are the length of aligned residues, the RMSD between the aligned residues, and the TM-score normalized by the length of 1atzA. All the pictures are generated by RASMOL () with blue to red running from the N- to C-terminus.
Figure 2Number of folds included in the representative protein sets collected from the PDB library on January 28, 2005 using different sequence identity cutoffs. A fold is defined using a TM-score threshold of 0.5.
Figure 3Two examples of protein pairs that have high sequence identities but adopt entirely different folds. In both examples, the upper parts show the sequence alignments of the proteins and ‘:’ denotes the residues with identical amino acids; the lower parts are the cartoon structures of the proteins with blue to red running from N- to C-terminus. The proteins in the first example are from 1a64A (32) and the N-terminal domain of 1hngB (39). The deletion mutation of two key residues (K44 and M45) induces a domain swapping of two proteins. The proteins in the second example are from the calmodulin binding domain (CaMBD), where 1g4yB is the crystal structure from Ca2+-loaded CaMBD in complex with calmodulin (40) and 1kkdA is the NMR structure from Ca2+-free CaMBD in complex with calmodulin (33). Ca2+-binding is responsible for the conformational changes of the two structures.
Figure 4Structure alignments of the computer models by TASSER (8) to non-homologous proteins in the PDB library (6). (A) TM-score between the closest template to the native structure found by TM-align and the native structure versus the TM-score between the TASSER model and the native. (B) TM-score between the TASSER model and the closest found (highest TM-score) template versus the TM-score between the TASSER model and the native. (C) RMSD between the closest template to the native structure and the native structure versus RMSD between the model and the native. (D) RMSD between the model and the closest template versus the RMSD between the model and the native. The stars denote the alignment coverage of the closest templates found by TM-align. The yellow solid circles denote the average of the points fallen in the intervals of the horizontal axis in each picture. The black lines are to guide the eye.
Figure 5A comparison of a computer model generated by TASSER (8) and the closest PDB structure (template) found by TM-align. This is a typical example where the model has a much larger RMSD than the template because of the misoriented tails and loops. The thick backbones are the model or template and the thin ones the native structure of 1c0fS. The red residues are those residues where their distances are <5 Å in the TM-score rotation matrix.
Comparison of the first model selected by different ranking methods
| Free-energy | TM-align | Random | Combination | |
|---|---|---|---|---|
| 〈TM-score〉 | 0.551 | 0.544 | 0.5042 | 0.559 |
| 〈RMSD〉 (Å) | 8.89 | 9.19 | 10.13 | 8.71 |
aRanked by the cluster size from SPICKER (31).
bThe models are ranked on the basis of their distances to the closest non-homologous PDB structures found by TM-align.
cThe first model is randomly selected from the five largest size clusters.
dCombined rank of free-energy and TM-align structural alignment. Here, for each model, a target function is defined as C = Rank1 + Rank2/2, where Rank1 and Rank2 are the ranks of the considered model on the basis of free-energy and TM-align, respectively. The first model is selected as the one having the lowest C.