| Literature DB >> 17254310 |
Sourangshu Bhattacharya1, Chiranjib Bhattacharyya, Nagasuma R Chandra.
Abstract
BACKGROUND: In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences.Entities:
Mesh:
Year: 2006 PMID: 17254310 PMCID: PMC1764482 DOI: 10.1186/1471-2105-7-S5-S5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of results from Matchprot and Dali using Fischer's and Novotny's Benchmark Dataset.
| Data set/Classifn. | Total pairs | Better | Worse | Level |
| Fischer Novotny et. al. | 68 | 17 | 18 | 33 |
| 1.10.40 | 21 | 8 | 1 | 12 |
| 1.10.164 | 10 | 2 | 0 | 8 |
| 1.25.30 | 21 | 3 | 0 | 18 |
| 2.30 110 | 6 | 1 | 2 | 3 |
| 2.40.100 | 28 | 4 | 3 | 21 |
| 2.100.10 | 15 | 5 | 4 | 6 |
| 3.10.70 | 10 | 0 | 2 | 8 |
| 3.40.91 | 6 | 6 | 0 | 0 |
| 3.70.10 | 15 | 1 | 3 | 11 |
| 2.40.20 | 21 | 1 | 4 | 16 |
Figure 12pelA – 5cnaA alignments generated by Dali and Matchprot.
Results from comparison of multi-domain proteins with partial structures and individual domains.
| ID1 – ID2 | No. of Deletions/Total Size | Matchprot (Lali/RMSD/Zscore) | Dali (Lali/RMSD/Zscore) |
| 2hcka – 2hcka-4 | 4/437 | 433/0.00/91.60 | 434/0.00/60.5 |
| 2hcka – 2hcka-8 | 8/437 | 429/0.00/91.21 | 430/0.00/60.5 |
| 2hcka – 2hcka-12 | 12/437 | 425/0.00/90.85 | 426/0.00/60.5 |
| 2hcka – 2hcka-20 | 20/437 | 417/0.00/90.75 | 418/0.00/60.3 |
| 2hcka – 2hcka-50 | 50/437 | 387/0.00/93.63 | 388/0.00/60.5 |
| 2hcka – 2hcka-100 | 100/437 | 337/0.00/76.49 | 338/0.00/49.3 |
| 2hcka – d2hcka1 | 374/437 | 34/2.81/-1.18 | 63/0.0/15.1 |
| 2hcka – d2hcka2 | 334/437 | 59/3.12/-0.63 | 103/0.0/21.6 |
| 2hcka – d2hcka3 | 166/437 | 271/0.00/58.53 | 272/0.0/43.3 |
| 2src – d2src_1 | 387/449 | 45/3.33/-1.56 | 62/0.0/15.1 |
| 2src – d2src_2 | 346/449 | 68/3.31/-0.83 | 103/0.0/22.9 |
| 2src – d2src_3 | 165/449 | 284/0.00/64.61 | 285/0.0/46.4 |
Results from comparison of proteins with internal repeats.
| PDB1 – PDB2 | Matchprot (Lali/RMSD/Zscore) | Dali (Lali/RMSD/Zscore) |
| 1gyhA – 1tl2A | 179/3.36/7.79 | 196/3.9/7.4 |
| 1nscA – 3sil | 291/2.99/37.52 | 289/3.2/23.5 |
| 1bd8 – 1ihbA | 154/1.26/27.99 | 154/1.3/25.2 |
| 1l4aA – 1n7sA | 62/1.58/5.61 | 62/1.7/5.6 |
| 2pec – 1bn8A | 282/2.07/49.15 | 287/2.5/32.1 |
| 1kapP – 1sat | 444/1.43/70.71 | 448/1.7/49.9 |
Detection of Similar Proteins by Matchprot.
| Query ID (SCOP classfn.) | SCOP sim. level | Z-Score cutoff | No. of structures (actual/detected/false +ve) |
| d101m__ | Family | 12 | 64/64/0 |
| (a.1.1.2) | Superfamily | 5 | 93/93/0 |
| Fold | 5 | 97/93/0 | |
| d1htia_ | Family | 20 | 15/15/0 |
| (c.1.1.1) | Superfamily | 20 | 15/15/0 |
| Fold | 6 | 327/272/56 | |
| d1jzba_ | Family | 4 | 17/17/0 |
| (g.3.7.1) | Superfamily | 2 | 55/23/0 |
| Fold | 2 | 238/23/0 | |
| d2pela_ | Family | 25 | 26/26/0 |
| (b.29.1.1) | Superfamily | 5 | 87/70/50 |
| Fold | 5 | 87/70/50 | |
| d7rsa__ | Family | 5 | 18/18/0 |
| (d.5.1.1) | Superfamily | 5 | 18/18/0 |
| Fold | 5 | 18/18/0 |
Comparison of results for SCOP database search from Matchprot with those from CE.
| Query ID | Matchprot (detected/false +ve/precision/recall) | CE (detected/false +ve/precision/recall) |
| d101m__ | 93/0/1/0.95 | 96/2/0.97/0.99 |
| d1htia_ | 272/56/0.82/0.83 | 307/29/0.91/0.93 |
| d1jzba_ | 23/0/1/0.1 | 33/270/0.1/0.14 |
| d2pela_ | 70/50/0.58/0.8 | 61/36/0.62/0.70 |
| d7rsa__ | 18/0/1/1 | 17/1/0.94/0.94 |
Figure 2Comparison of time taken by CE, Matchprot and Dali for different sizes of structures.
Figure 3Non-topological similarity between 2 proteins. Sequence ordering of the first protein is A-B-C and that of the second protein is A'-C'-B'.