| Literature DB >> 19038986 |
Ganesan Pugalenthi1, Ke Tang, P N Suganthan, Saikat Chakrabarti.
Abstract
MOTIVATION: So far various bioinformatics and machine learning techniques applied for identification of sequence and functionally conserved residues in proteins. Although few computational methods are available for the prediction of structurally conserved residues from protein structure, almost all methods require homologous structural information and structure-based alignments, which still prove to be a bottleneck in protein structure comparison studies. In this work, we developed a neural network approach for identification of structurally important residues from a single protein structure without using homologous structural information and structural alignment.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19038986 PMCID: PMC2638999 DOI: 10.1093/bioinformatics/btn618
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Residue conservation between SCR and non-SCR residues
| Residue conservation (%) | No. of residues | No. of SCR |
|---|---|---|
| 30–40 | 31648 | 2518 |
| 41–50 | 15848 | 1976 |
| 51–60 | 5226 | 1122 |
| 61–70 | 1620 | 254 |
| 71–80 | 276 | 88 |
| 81–90 | 238 | 58 |
| 91–100 | 102 | 26 |
| Total | 54958 | 6042 |
Fig. 1.Distribution of spatial distances between pairs of SCRs. Spatial distance between two SCRs was calculated utilizing the Cβ–Cβ atom coordinates supplied in the individual PDB (Berman et al., 2000) file.
Classification results achieved for the testing data using different feature subsets
| No. of features | Sensitivity (%) | Specificity (%) | PPV (%) | MCC |
|---|---|---|---|---|
| 5 | 92.59 | 85.95 | 86.83 | 0.787 |
| 8 | 90.47 | 91.30 | 91.23 | 0.818 |
| 10 | 92.19 | 91.63 | 91.68 | 0.836 |
| 50 | 91.03 | 93.52 | 93.35 | 0.846 |
| 100 | 91.72 | 92.73 | 92.66 | 0.845 |
| 150 | 92.06 | 92.57 | 92.53 | 0.846 |
| 200 | 91.00 | 93.20 | 93.05 | 0.842 |
| 212 | 92.82 | 92.50 | 92.52 | 0.852 |
Fig. 2.ROC curves were plotted utilizing the fractions of TP and FP values derived using top 10 features and all features.
Classification results achieved for the training data using 5-fold cross-validation on different feature subsets
| No. of features | Sensitivity (%) | Specificity(%) | PPV (%) | MCC |
|---|---|---|---|---|
| 5 | 93.81 (1.60) | 79.84 (3.37) | 82.70 (2.26) | 0.747 (0.014) |
| 8 | 92.09 (1.49) | 85.44 (2.77) | 86.68 (2.12) | 0.780 (0.012) |
| 10 | 94.34 (1.10) | 87.62 (2.24) | 88.62 (1.77) | 0.823 (0.012) |
| 50 | 94.17 (0.95) | 90.10 (0.99) | 90.54 (0.79) | 0.844 (0.006) |
| 100 | 93.41 (1.36) | 90.60 (1.23) | 90.94 (0.98) | 0.842 (0.008) |
| 150 | 92.68 (0.51) | 91.99 (0.70) | 92.07 (0.62) | 0.847 (0.006) |
| 200 | 92.68 (0.33) | 92.02 (0.41) | 92.08 (0.36) | 0.847 (0.004) |
| 212 | 93.03 (0.10) | 91.48 (0.13) | 91.61 (0.12) | 0.845 (0.002) |
Statistical errors (standard error) associated with the average sensitivity, specificity, PPV and MCC are provided within the parenthesis.
List of best performing features
| Feature | SCR related | SCR neighbors related | Structural feature | Sequence feature |
|---|---|---|---|---|
| Helix content in SCR | Yes | No | Yes | No |
| Strand content in SCR | Yes | No | Yes | No |
| Coil content in the SCR | Yes | No | Yes | No |
| Helix content in the spatial neighbor | No | Yes | Yes | No |
| Solvent accessibility in SCR | Yes | No | Yes | No |
| Hydrogen bonding information in SCR | Yes | No | Yes | No |
| Residue compactness in SCR | Yes | No | Yes | No |
| Residue compactness in the spatial neighbor | No | Yes | Yes | No |
| Leucine content in spatial neighbor | No | Yes | Yes | Yes |
| Cysteine content in SCR | Yes | No | No | Yes |
Fig. 3.Distribution of 20 amino acid type within the spatial neighbors of SCRs. (a) Shows the percentage of residues having at least one of the 20 amino acids within their spatial neighbor whereas (b) provides the fraction of each amino acid within the spatial neighbor. White bars with standard error from five trials provide data obtained from randomly selected non-SCRs.
Evaluation of performance of different feature groups
| No. of features | Sensitivity using 5-fold CV (%) | Sensitivity without CV (%) | Specificity using 5-fold CV (%) | Specificity without CV (%) |
|---|---|---|---|---|
| Group 1 | 33.79 (10.03) | 30.22 | 89.57 (2.79) | 93.77 |
| Group 2 | 11.35 (3.46) | 7.55 | 97.85 (1.06) | 98.83 |
| Group 3 | 41.90 (9.90) | 41.31 | 90.41 (4.79) | 85.48 |
| Group 4 | 97.91 (0.26) | 97.38 | 79.11 (1.87) | 79.13 |
| Group 1+2 | 13.01 (3.66) | 10.76 | 98.94 (0.28) | 99.07 |
| Group 1+3 | 44.15 (8.53) | 32.17 | 93.12 (2.43) | 96.12 |
| Group 1+4 | 96.62 (0.36) | 96.46 | 85.57 (1.14) | 85.31 |
| Group 2+3 | 26.45 (6.88) | 12.45 | 96.79 (1.07) | 99.17 |
| Group 2+4 | 95.13 (0.61) | 94.97 | 88.25 (1.62) | 88.80 |
| Group 3+4 | 97.85 (0.26) | 97.25 | 79.48 (2.19) | 80.47 |
| Group 1+2+3 | 22.28 (4.36) | 14.86 | 98.54 (0.51) | 99.22 |
| Group 1+2+4 | 92.85 (0.22) | 93.11 | 91.89 (0.78) | 92.18 |
| Group 1+3+4 | 96.29 (0.63) | 96.19 | 85.67 (1.28) | 86.28 |
| Group 2+3+4 | 95.10 (0.65) | 94.74 | 88.28 (1.60) | 89.73 |
| All Groups | 93.03 (0.10) | 92.82 | 91.48 (0.13) | 92.50 |
Statistical errors (standard error) associated with the average sensitivity are provided within the parenthesis. CV, cross-validation.
Fig. 4.Example of successful prediction of SCRs. SCRs predicted by NCL-NNE are shown in purple. Predicted SCRs that are experimentally verified are shown in ball and stick model. (a) Wild-type CheY from Escherichia coli (PDB code: 3CHY); (b) serum RBP (PDB code: 1JYD) and c) Cu–Zn superoxide dismutase (SOD) (PDB code: 2SOD). Regular secondary structures are colored in blue (helix), green (strand) and yellow (loops).
Execution time for NCL-NNE method
| Protein PDB code | Chain identifier | Length | Execution time (in s) |
|---|---|---|---|
| 1BY5 | A | 698 | 74 |
| 1EZ0 | A | 504 | 57 |
| 1CPT | – | 412 | 43 |
| 1EZF | A | 323 | 39 |
| 1A7T | A | 227 | 31 |
| 1DOI | – | 128 | 25 |
| 2HPQ | P | 79 | 21 |