| Literature DB >> 22336468 |
Jafar Razmara1, Safaai Deris, Sepideh Parvizpour.
Abstract
BACKGROUND: In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.Entities:
Year: 2012 PMID: 22336468 PMCID: PMC3298807 DOI: 10.1186/1748-7188-7-4
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Defined labels for secondary structure vectors
| Vector type | |||
|---|---|---|---|
| A | I | Q | |
| B | J | R | |
| C | K | S | |
| D | L | T | |
| E | M | U | |
| F | N | V | |
| G | O | W | |
| H | P | X | |
* (x1, y1, z1) and (x2, y2, z2) denote start and end points of SSE vectors.
Figure 1Secondary structure elements representation as vectors in 3D-space. Dashed vectors represent inter-SSEs vectors.
Figure 2A typical example for secondary structure modelling in a topology string.
Figure 3The algorithm for secondary structure matching using topology strings.
Permutations on SSEs direction labels based on 90 degree rotation around axes
| Strand | Helix | Inter-SSEs | |
|---|---|---|---|
| Old | A B C D E F G H | I J K L M N O P | Q R S T U V W X |
| Rotate 90° around | B D A C F H E G | J L I K N P M O | R T Q S V X U W |
| Rotate 90° around | E A G C F B H D | M I O K N J P L | U Q W S V R X T |
| Rotate 90° around | E F A B G H C D | M N I J O P K L | U V Q R W X S T |
Figure 4An example for matching topology string of two reference proteins with 24 permuted topology strings of query protein.
Semi-adjacent letters defined for Strand SSE vectors
| A | B | C | D | E | F | G | |
|---|---|---|---|---|---|---|---|
| * | * | * | |||||
| * | * | ||||||
| * | * | ||||||
| * | |||||||
| * | * | ||||||
| * | |||||||
| * | |||||||
Accuracy index adopted from Receiver Operating Characteristic (ROC) curve
| 3-gram | 4-gram | 5-gram | 6-gram | |
|---|---|---|---|---|
| TPR* | 0.437 | 0.523 | 0.852 | 0.982 |
| FPR* | 0.129 | 0.103 | 0.053 | 0.024 |
* True Positive Rate (TPR) and False Positive Rate (FPR)
Figure 5Average TM.
Alignment results summary for 200 non-homologous proteins averaged over all structure pairs
| Length of alignment | Coverage | RMSD | TMscore | |
|---|---|---|---|---|
| CE | 64.3 | 34.7% | 6.52 | 0.169 |
| TM-Align | 87.4 | 42.0% | 4.99 | 0.253 |
| 3D-BLAST | 65.7 | 36.2% | 6.69 | 0.172 |
| TS-AMIR | 91.4 | 46.6% | 6.17 | 0.237 |
- The results of CE and TM-Align were taken from [7].
- Coverage denotes fraction of residues aligned within the target protein.
- Length of alignment denotes number of aligned residues
Alignment results summary for the same dataset in table 5 averaged over the most similar pairs
| Length of alignment | Coverage | RMSD | TMscore | |
|---|---|---|---|---|
| CE | 128.8 | 61.4% | 3.95 | 0.441 |
| TM-Align | 166.2 | 73.1% | 4.45 | 0.510 |
| 3D-BLAST | 131.4 | 63.1% | 4.32 | 0.454 |
| TS-AMIR | 168.9 | 74.7% | 4.48 | 0.502 |
- The results of CE and TM-Align were taken from [7].
Figure 6Average precision-recall for searching 108 query proteins.
Average running time of the methods to search in a database of 34,055 proteins (in seconds)
| Method | Average time per query | Average time per comparison |
|---|---|---|
| CE | 82789.20 | 2.43 |
| TM-align | 9273.41 | 0.272 |
| YAKUSA | 35.60 | 0.00105 |
| TS-AMIR | 11.47 | 0.000337 |
| 3D-BLAST | 9.07 | 0.000266 |
| SARST | 0.34 | 0.00000998 |
- Except for TM-align and TS-AMIR, the Results were taken from literature [22].
- The experiments were done on a 3.2 GHz CPU.
Figure 7Retrieval effectiveness on different structural categories.
Figure 8Retrieval effectiveness on low sequence identity.