| Literature DB >> 20371498 |
Degui Zhi1, Maxim Shatsky, Steven E Brenner.
Abstract
MOTIVATION: Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20371498 PMCID: PMC2859133 DOI: 10.1093/bioinformatics/btq127
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Writhe decomposition. As an example, we use PDB (Berman et al., 2000) entry 1UCS:A (Ko et al., 2003), structure of a Type III antifreeze protein RD1. (a) Cartoon representation of the original chain. (b) Smoothed backbone. (c) Writhe matrix (only upper triangle entries are shown). (d) Local extreme values in writhe matrix are mapped back to the smoothed backbone. One extreme value is used as an illustration. As is typical, it represents a close perpendicular crossing of secondary structure, in this case two adjacent strands of a highly twisted sheet. (e) A segment in the original chain corresponding to the red linker in (d).
Smoothing dependency of writhe decomposition
| 1 | 3 | 5 | 7 | 9 | 11 | 13 | |
|---|---|---|---|---|---|---|---|
| 544.8 | 90.6 | 62.1 | 36.0 | 28.7 | 19.1 | 16.2 |
The average numbers of fragments from writhe decomposition over 10 representative structures (See caption of Supplementary Fig. S2 for PDB codes) with different smoothing parameter k are shown. k = 1 means no smoothing. Smoothing with even k = 3 significantly reduces the number of fragments compared with non-smoothing. We observe a generally exponentially decay in number of fragments with the smoothing parameter k, which we fit to log(n) = −0.18k + 7.3 by linear regression (Supplementary Fig. S2). The value of k = 7 yields the largest negative deviation (residue of regression) from the regression line, and thus representing a balance of structural details (small k) and concise writhe decomposition (small n).
Fig. 2.Scatter plot of length of protein versus number of fragments from writhe decomposition.
Accuracy of Astral 1.65 40% non-redundant superfamily benchmark
| Method | Coverage (%) | Precision (%) | Time (s/query) |
|---|---|---|---|
| 3D-lookup | 89.0 | 90.0 | 163.8 |
| SSEA | 90.3 | 70.1 | 0.14 |
| SSEF | 90.3 | 70.2 | 3.8 |
| SGM | 85.6 | 64.8 | 0.28 |
| Writher | 83.6 | 65.6 | 4.3 |
| Writher | 83.6 | 55.6 | 3.6 |
| with RandDecomp |
Coverage is the portion of queries for which the program gives an answer among 5345 proteins queried (). Precision is the portion of answers that are correct, i.e. the top-scoring hit excluding self is from the same scop superfamily . Time is average time for handling a one-versus-all query. RandDecomp is writher with random (not writhe) decomposition (see text in Section 2.2).
Fig. 3.(a) Multi-domain benchmark result. Multi-domain proteins are queried against the database of single-domain proteins. See text for the exact protocol for generating these ROC curves. (b) Multi-domain benchmark result using only the multi-domain proteins that belong to superfamilies with at least four database members.
Multi-domain benchmark test result
| Sensitivity at | |||||
|---|---|---|---|---|---|
| 99.7% | 99% | 95% | 90% | Time | |
| Specif. | Specif. | Specif. | Specif. | (s/query) | |
| CE | 71 | 78 | 85 | 88 | 17748.9 |
| 3D-lookup | 67 | 70 | 404.2 | ||
| Writher | 50 | 17.3 | |||
| SSEF | 37 | 44 | 60 | 68 | 1.9 |
| SSEA | 15 | 26 | 44 | 53 | 0.34 |
| SGM | 9 | 18 | 41 | 48 | 0.14 |
| Random | 2 | 7 | 27 | 41 | NA |
Multi-domain chains are queried against the Astral 1.65 40% non-redundant superfamily benchmark set. Only those multi-domain chains with at least four members in its scop superfamily are used for the query. Sensitivities at several specificity levels are reported. Bold values signify the highest sensitivity achieved at a specificity level.