| Literature DB >> 20236520 |
Mallika Veeramalai1, David Gilbert, Gabriel Valiente.
Abstract
BACKGROUND: Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20236520 PMCID: PMC2858036 DOI: 10.1186/1471-2105-11-138
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1ROC curves for the PDB40 dataset. ROC curves for SCOP classes all-alpha, all-beta, alpha/beta and alpha+beta from TOPS, TOPS+ and advanced TOPS+ methods on the PDB40 dataset.
ROC curve and F-measure analysis of structural homology for the PDB40 dataset.
| SCOP Class | TOPS | TOPS+ | advTOPS+ | |
|---|---|---|---|---|
| 1 | All alpha | 0.76/0.79 | 0.83/0.85 | 0.82/0.88 |
| 2 | All beta | 0.89/0.85 | 0.85/0.83 | 0.87/0.86 |
| 3 | Alpha/beta | 0.82/0.75 | 0.75/0.70 | 0.77/0.70 |
| 4 | Alpha+beta | 0.84/0.75 | 0.84/0.74 | 0.90/0.81 |
AUC/F-measure values for all the comparison methods for the PDB40 dataset.
Figure 2Clusters for the Chew-Kedem dataset. Chew-Kedem dataset clusters obtained from the advanced TOPS+ comparison method (for more information see the supplementary material page at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/optTOPSplus-results.html).
Biological significance of protein domain clusters for the Chew-Kedem dataset.
| Method | |
|---|---|
| SSAP | 0.966 |
| TOPS | 0.955 |
| TOPS+ | 0.931 |
| advTOPS+ | 0.985 |
F-measure of the clusters obtained from different protein structure comparison methods.
Figure 3Protein domain 3D cartoon diagrams. 3D cartoon diagrams (top view) of the alpha-beta protein domains (a) d1aa9__, (b) d1ct9a1, (c) TIM barrel protein domain d6xia__. Beta-strands, alpha-helices and loops are colored with yellow, red and green, respectively. Ligand molecules are indicated by spheres and dots (metals).
advTOPS+ comparison scores for the Chew-Kedem dataset.
| Protein Fold | Domain | SSE Ln | LCS PAT Ln | Adv TOPS+ Score | LCS SSE PATTERN |
|---|---|---|---|---|---|
| Alpha-beta | d1aa9__ | 23 | 20 | 0.49 | uEUhuEuhuEuhuEUhuEhu |
| d1gnp__ | 25 | 20 | 0.51 | uEUhuEhuEuhuuEUhuEhu | |
| d6q21a_ | 21 | 16 | 0.59 | uEUhuEuhuEUhuEhu | |
| d1qraa_ | 25 | 20 | 0.51 | ueUHueHueuHuueUHueHu | |
| d5p21__ | 25 | 20 | 0.51 | ueUHueHueuHuueUHueHu | |
| TIM-barrel | d6xia__ | 60 | 40 | 0.30 | uhuhuHuHueuHueuHuhuehhuHuHueuuuHuhuhuuhu |
| d2mnr_1 | 37 | 29 | 0.37 | uuuuHueuHueuHueuHueuHuuhuuuhu | |
| d1chra1 | 41 | 31 | 0.35 | uuuHuHueuHueuHueuHuHueuHuhuuuhu | |
| d4enl_1 | 55 | 35 | 0.37 | uhuhuHuueuuuHuhueuHuHuuhuuHuuHuHuhu | |
advTOPS+ comparison scores for alpha-beta and TIM-barrel folds in the Chew-Kedem dataset.
Figure 4TOPS+ representation of a protein domain. TOPS+ diagram and string representation of the protein domain 1fnb01. (a) TOPS+ diagram. (b) TOPS+ string. (c) 3D cartoon.
Identity and dissimilarity scoring matrices for TOPS+ diagrams.
| ISM | DSM | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| E | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 2 | 2 | 2 | 2 |
| e | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 2 | 2 | 2 | 2 |
| H | 1 | 1 | 0 | 1 | 1 | 1 | 2 | 2 | 0 | 1 | 2 | 2 |
| h | 1 | 1 | 1 | 0 | 1 | 1 | 2 | 2 | 1 | 0 | 2 | 2 |
| U | 1 | 1 | 1 | 1 | 0 | 1 | 2 | 2 | 2 | 2 | 0 | 1 |
| u | 1 | 1 | 1 | 1 | 1 | 0 | 2 | 2 | 2 | 2 | 1 | 0 |
Identity Scoring Matrix (ISM) and Dissimilarity Scoring Matrix (DSM) of secondary structure elements used for matching TOPS+ diagrams.
Normalized similarity score between secondary structure elements.
| Absolute Differences | Equations | Description |
|---|---|---|
| total incoming arcs | ||
| total incoming arcs type_R | ||
| total incoming arcs type_L | ||
| total incoming arcs type_P | ||
| total incoming arcs type_A | ||
| total outgoing arcs | ||
| total outgoing arcs type_R | ||
| total outgoing arcs type_L | ||
| total outgoing arcs type_P | ||
| total outgoing arcs type_A | ||
| total ligand arcs | ||
Equations used for computing the normalized similarity score between secondary structure elements of TOPS+ strings.
Figure 5Parameter optimization results for SCOP classes. Parameter optimization experimental results for SCOP classes all-alpha, all-beta, alpha/beta, alpha+beta, and all classes together. Each element in the matrix represents the AUC values based on the 17 nC scores corresponding to the basic parameters. The x-axis represents 1,134 basic parameters and the y-axis represents the 17 nC scores from 1 through 17 represented in a block which is repeated five times along the y-axis and each block corresponding to the SCOP classes all-alpha, all-beta, alpha/beta, alpha+beta, and all classes together.
Structural homology of protein domains for the PDB40 dataset.
| SCOP Class | Hom | % | NonHom | % | Total | % |
|---|---|---|---|---|---|---|
| All alpha | 129 | 69 | 58 | 31 | 187 | 10 |
| All beta | 219 | 68 | 102 | 32 | 321 | 18 |
| Alpha/beta | 452 | 48 | 487 | 52 | 939 | 52 |
| Alpha+beta | 167 | 46 | 193 | 54 | 360 | 20 |
| Total | 967 | 54 | 840 | 46 | 1807 | 100 |
Superfamily homologous (Hom) and non-homologous (NonHom) statistics of the PDB40 dataset.