| Literature DB >> 15937195 |
Huiling Chen1, Huan-Xiang Zhou.
Abstract
Residues that form the hydrophobic core of a protein are critical for its stability. A number of approaches have been developed to classify residues as buried or exposed. In order to optimize the classification, we have refined a suite of five methods over a large dataset and proposed a metamethod based on an ensemble average of the individual methods, leading to a two-state classification accuracy of 80%. Many studies have suggested that hydrophobic core residues are likely sites of deleterious mutations, so we wanted to see to what extent these sites can be predicted from the putative buried residues. Residues that were most confidently classified as buried were proposed as sites of deleterious mutations. This proposition was tested on six proteins for which sites of deleterious mutations have previously been identified by stability measurement or functional assay. Of the total of 130 residues predicted as sites of deleterious mutations, 104 (or 80%) were correct.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15937195 PMCID: PMC1142490 DOI: 10.1093/nar/gki633
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Accuracy (%) and correlation (in parentheses) of multiple-sequence predictions for solvent accessibility on a test set of 464 protein chains
| Test set | Chains | BS | MLR | DT | NN | SVM | WE |
|---|---|---|---|---|---|---|---|
| Small | 208 | 76.0 (0.48) | 75.5 (0.46) | 77.5 (0.52) | 79.6 (0.56) | 79.5 (0.56) | 80.1 (0.57) |
| Medium | 198 | 75.6 (0.51) | 76.4 (0.53) | 77.1 (0.54) | 79.2 (0.58) | 79.8 (0.60) | 80.2 (0.60) |
| Large | 58 | 74.7 (0.49) | 75.5 (0.51) | 76.3 (0.52) | 78.4 (0.57) | 78.9 (0.58) | 79.4 (0.59) |
| All | 464 | 75.5 (0.51) | 76.0 (0.52) | 77.0 (0.54) | 79.1 (0.58) | 79.5 (0.59) | 80.0 (0.60) |
Every protein residue was given a prediction, so the coverage was 1.
Accuracy (%) and correlation (in parentheses) of accessibility classification methods on CASP6 targets
| Test set | Chains | NN | SVM | WE | ROBETTA | PROFacc | SABLE | ACCpro |
|---|---|---|---|---|---|---|---|---|
| CM/FR(H) | 48 | 78.6 (0.57) | 78.9 (0.57) | 79.4 (0.58) | 74.5 (0.49) | 76.7 (0.54) | 78.4 (0.57) | 78.9 (0.58) |
| FR(A)/NF | 16 | 78.6 (0.56) | 77.7 (0.54) | 78.8 (0.56) | 65.3 (0.26) | 76.4 (0.52) | 77.2 (0.54) | 76.9 (0.53) |
| All | 64 | 78.6 (0.57) | 78.6 (0.57) | 79.2 (0.58) | 72.5 (0.44) | 76.6 (0.54) | 78.2 (0.57) | 78.5 (0.57) |
Figure 1Relation between accuracy and coverage, as found for the 464-protein test set. Values of accuracy at coverage of 1 are listed in Table 1. Coverage of MLR, NN and SVM was controlled by the confidence threshold (higher confidence corresponds to lower coverage). The unanimity metamethod was based on predictions of MLR, NN and SVM at the same confidence threshold.
Statistics of proteins with identified sites of deleterious mutations
| Protein | Real sites | Fraction buried | Predictions | Accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MLR | NN | SVM | Una | MLR | NN | SVM | Una | |||
| λ Repressor | 22 | 0.50 | 16 | 8 | 12 | 6 | 44 | 50 | 50 | 50 |
| HIV-1 | 42 | 0.74 | 22 | 7 | 12 | 6 | 50 | 14 | 25 | 0 |
| Snase | 32 | 0.78 | 23 | 28 | 35 | 17 | 78 | 57 | 57 | 77 |
| T4 lyso. | 60 | 0.77 | 31 | 28 | 31 | 16 | 84 | 82 | 84 | 94 |
| Gene V | 15 | 0.47 | 10 | 5 | 5 | 4 | 70 | 100 | 100 | 100 |
| LacI | 145 | 0.75 | 89 | 101 | 125 | 81 | 83 | 81 | 50 | 85 |
| Total | 316 | 0.72 | 191 | 177 | 220 | 130 | 75 | 74 | 73 | 80 |
Una, unanimity metamethod. Results of MLR, NN and SVM were obtained from predictions with confidence threshold of 0.5.
Figure 2Details of the predictions for sites of deleterious mutations. Upper-case letters in each sequence denote buried residues (as determined by solvent accessibility calculated on the X-ray structure). Asterisks below each sequence denote sites of deleterious mutations determined experimentally. Vertical lines below the sequence are positioned at every tenth residue (if not already occupied by an asterisk). Bars above the sequence denote predictions by MLR (white), NN (gray) and SVM (black). A prediction by the unanimity metamethod is signified by the simultaneous presence of all three types of bars.