| Literature DB >> 20122207 |
Zaixin Lu1, Zhiyu Zhao, Bin Fu.
Abstract
BACKGROUND: Proteins show a great variety of 3D conformations, which can be used to infer their evolutionary relationship and to classify them into more general groups; therefore protein structure alignment algorithms are very helpful for protein biologists. However, an accurate alignment algorithm itself may be insufficient for effective discovering of structural relationships among tens of thousands of proteins. Due to the exponentially increasing amount of protein structural data, a fast and accurate structure alignment tool is necessary to access protein classification and protein similarity search; however, the complexity of current alignment algorithms are usually too high to make a fully alignment-based classification and search practical.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122207 PMCID: PMC3009506 DOI: 10.1186/1471-2105-11-S1-S34
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Results of multiple alignment algorithms. Comparison of the average alignment length, RMSD and SAS.
| Dali | CE | SSM | SPSA | NPSA | |
|---|---|---|---|---|---|
| Average alignment length | 130.43 | 132.82 | 117.78 | 119.20 | 122.65 |
| Average RMSD | 2.78 | 2.83 | 2.37 | 2.23 | 2.30 |
| Average SAS1 | 2.96 | 3.08 | 2.76 | 2.62 | 2.48 |
| Average SAS2 | 3.89 | 3.60 | 3.92 | 3.43 | 3.25 |
| Average SAS3 | 6.67 | 5.69 | 6.84 | 5.60 | 5.19 |
Figure 1Q-score difference plots. Figure 1 shows the Q-score Difference between our algorithm and CE [12], Dali [37], and SSM [8], respectively.
Figure 2Precision and recall curves. Figure 2 shows the accuracy performance of multiple protein search methods. The left shows the precisions and recall rates of 108 queries by multiple methods at SCOP family level, and the right shows those of 129 queries by the same methods at SCOP superfamily level.
Statistics on the reliability of scores. Precision is defined as n/N and recall rate is defined as n/T, where n is the number of true proteins of Q-scores higher than the limit value in the result list. A true protein means it is from the same family or superfamily of the query protein. N is the total number of retrieved proteins whose Q-scores are higher than the corresponding value, and T is the total number of proteins in the family or superfamily of the input protein.
| Q-Score | 0.9 | 0.8 | 0.7 | 0.6 | 0.5 | 0.4 | 0.3 | 0.2 |
|---|---|---|---|---|---|---|---|---|
| avg.recall(%)-family | 14.65 | 20.20 | 28.14 | 39.28 | 48.99 | 62.05 | 75.84 | 84.31 |
| avg.precision(%)-family | 99.35 | 98.56 | 97.42 | 97.16 | 94.83 | 91.57 | 87.43 | 70.85 |
| avg.recall(%)-superfamily | 9.22 | 12.89 | 18.96 | 27.53 | 34.87 | 45.22 | 56.59 | 68.43 |
| avg.precision(%)-superfamily | 99.39 | 99.15 | 99.01 | 98.89 | 98.74 | 97.58 | 96.60 | 87.39 |
Average search time of each program on 108 queries in the SCOP 1.59. Results for all the methods except ours are taken from [33]. Their experiments were performed on a computer with an Intel Pentium 2.8 GHz processor and 1,024 megabytes of RAM memory. Ours were done on a computer with Intel Pentium 2.66 GHz processor and 1,024 megabytes of RAM memory.
| Software | Total search time (s) | Average search time per query (s) |
|---|---|---|
| Our method | 12,117 | 112.20 |
| 3D-Blast | 34.35 | 0.318 |
| PSI-BLAST | 18.31 | 0.170 |
| CE | 13.5 days | 3 hours |
| MAMMOTH | 131,855 | 1220.88 |
Figure 3Flowchart. Figure 3 is a flowchart of our alignment algorithm.