| Literature DB >> 29540703 |
François Ancien1,2, Fabrizio Pucci3,4, Maxime Godfroid5,6, Marianne Rooman7,8.
Abstract
The classification of human genetic variants into deleterious and neutral is a challenging issue, whose complexity is rooted in the large variety of biophysical mechanisms that can be responsible for disease conditions. For non-synonymous mutations in structured proteins, one of these is the protein stability change, which can lead to loss of protein structure or function. We developed a stability-driven knowledge-based classifier that uses protein structure, artificial neural networks and solvent accessibility-dependent combinations of statistical potentials to predict whether destabilizing or stabilizing mutations are disease-causing. Our predictor yields a balanced accuracy of 71% in cross validation. As expected, it has a very high positive predictive value of 89%: it predicts with high accuracy the subset of mutations that are deleterious because of stability issues, but is by construction unable of classifying variants that are deleterious for other reasons. Its combination with an evolutionary-based predictor increases the balanced accuracy up to 75%, and allowed predicting more than 1/4 of the variants with 95% positive predictive value. Our method, called SNPMuSiC, can be used with both experimental and modeled structures and compares favorably with other prediction tools on several independent test sets. It constitutes a step towards interpreting variant effects at the molecular scale. SNPMuSiC is freely available at https://soft.dezyme.com/ .Entities:
Mesh:
Substances:
Year: 2018 PMID: 29540703 PMCID: PMC5852127 DOI: 10.1038/s41598-018-22531-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic representation of: (a) the artificial neural network and (b) the probabilistic neural network used in the classification of the variants.
Figure 2Probability density distributions of deleterious mutations (red curve) and neutral mutations (blue curve) for: (a) the solvent accessibility A (0–100%) of the mutated residues; (b) the change in volume ΔV (in ) for residues with A ≤ 60%; (c) the change in volume ΔV for residues with A > 60%; in our conventions, mutations of smaller into larger residues have a positive ΔV value.
Figure 3Probability density distributions of deleterious mutations (red curve) and neutral mutations (blue curve) for the changes in folding free energy ΔΔW (in kcal/mol) computed with the following statistical potentials: (a) the distance potential ; (b) the distance potential ; (c) the torsion angle and solvent accessibility potential ; in our conventions, positive values correspond to destabilizing mutations.
Performance of the different prediction methods in mutation-based 5-fold cross validation on the learning set.
| Method | Sensitivity | Specificity | PPV | NPV | BACC | Threshold | AUROC |
|---|---|---|---|---|---|---|---|
| PoPMuSiC | 0.65 | 0.62 | 0.84 | 0.37 | 0.63 | 0.75 kcal/mol | 0.68 |
| HoTMuSiC | 0.59 | 0.65 | 0.84 | 0.34 | 0.62 | −1.8 °C | 0.66 |
| Solvent Accessibility | 0.71 | 0.66 | 0.86 | 0.42 | 0.68 | 18.0% | 0.72 |
| PNN | 0.69 |
| 0.88 | 0.43 | 0.71 | — | 0.76 |
| ANN | 0.70 |
|
| 0.44 | 0.71 | 0.74 | 0.77 |
| Provean |
| 0.58 | 0.86 |
| 0.72 | −2.5 | 0.80 |
| SNPMuSiC | 0.79 | 0.71 |
| 0.52 |
| 0.66 |
|
Sensitivity is defined as , specificity as , positive predictive value (PPV) as , and negative predictive value (NPV) as . The scores and threshold values correspond to averages on the 5-fold cross-validation experiments. The values in bold indicate the highest scores in each category; the AUROC score in bold is statistically different from the other AUROC scores, as estimated by DeLong’s test.
Figure 4Probability density distributions of deleterious mutations (red curve) and neutral mutations (blue curve) for (a) the change in folding free energy ΔΔG computed by PoPMuSiC (Eq. (3)) (in kcal/mol), (b) the Provean score[3,4], and (c) the pathogenicity index I computed by the ANN model (Eq. (8)).
Figure 5Probability density distribution of deleterious mutations (red curve) and neutral mutations (blue curve) for the pathological index (Eq. (11)) computed by SNPMuSiC. The distribution curves in the high confidence intervals, which lie from either side of the two vertical lines, are depicted on a white background.
Comparison of the performances of the different predictors on the test sets S and S.
| Method | Sensitivity | Specificity | PPV | NPV | BACC | AUROC | |
|---|---|---|---|---|---|---|---|
|
| |||||||
| VIPUR | 0.87 | 0.55 |
| 0.62 |
|
| |
| Polyphen-2 |
| 0.33 | 0.78 | 0.71 | 0.64 | 0.70 | |
| Provean | 0.94 | 0.39 | 0.80 |
| 0.67 | 0.72 | |
| SNPMuSiC | 0.76 |
|
| 0.49 | 0.68 |
| |
|
| |||||||
| Polyphen-2 |
| 0.30 | 0.77 |
| 0.63 |
| |
| Provean | 0.94 | 0.34 | 0.77 | 0.70 | 0.64 | 0.70 | |
| SNPMuSiC | 0.78 |
|
| 0.51 |
| 0.70 | |
These datasets have no overlap with the training sets of Polyphen-2 and SNPMuSiC. For VIPUR, the scores for S are those labelled as being cross validated in[12], while for the S set, no cross validated scores are available. For the sequence-based method Provean, the dataset overlap has not been considered, although it plays a role in the identification of the threshold values. See Table 1 for further details. The values in bold indicate the highest scores in each category; the AUROC scores that are not significantly different from the highest score (as estimated by a DeLong test P-value ≥ 0.05) are also in bold.
List of the 10 mutations from the S dataset which are annotated as disease causing and are predicted as the most stabilizing (i.e. with the most negative values) by PoPMuSiC.
| Protein | Chain | Mutation | Biophysical effect | ||||
|---|---|---|---|---|---|---|---|
| 1m6i | A | E493V | Increase of NADH affinity | −2.24 | 1.7 | 1.00 | 4.8 |
| 2wzb | A | D163V | Decrease of enzymatic activity | −1.62 | 1.0 | 1.00 | 4.1 |
| 3hcn | A | T283I | Decrease of enzymatic activity | −1.46 | 1.1 | 0.92 | 1.9 |
| 2izz | A | G206W | Loss of function | −1.35 | 2.8 | 1.00 | 1.4 |
| 2nt0 | A | D399Y | Decrease of enzymatic activity | −1.31 | 0.8 | 1.00 | 4.4 |
| 3f9m | A | G385V | Decrease of enzymatic activity | −1.23 | −0.2 | 0.84 | 1.3 |
| 4do4 | A | R329W | Decrease of enzymatic activity | −0.99 | 0.3 | 0.99 | 2.3 |
| 1aly | A | G227V | Loss of ligand binding | −0.96 | 1.8 | 0.99 | 1.7 |
| 4az3 | A | S23Y | Loss of phosphorylation | −0.95 | 1.8 | 1.00 | 0.7 |
| 2nt0 | A | D380H | Decrease of enzymatic activity | −0.86 | 0.3 | 1.00 | 3.1 |
All variants are predicted as neutral by the PoPMuSiC- and HoTMuSiC-based classifiers while they are predicted as deleterious by ANN and SNPMuSiC.