| Literature DB >> 32958805 |
Lukas Gerasimavicius1, Xin Liu1, Joseph A Marsh2.
Abstract
Attempts at using protein structures to identify disease-causing mutations have been dominated by the idea that most pathogenic mutations are disruptive at a structural level. Therefore, computational stability predictors, which assess whether a mutation is likely to be stabilising or destabilising to protein structure, have been commonly used when evaluating new candidate disease variants, despite not having been developed specifically for this purpose. We therefore tested 13 different stability predictors for their ability to discriminate between pathogenic and putatively benign missense variants. We find that one method, FoldX, significantly outperforms all other predictors in the identification of disease variants. Moreover, we demonstrate that employing predicted absolute energy change scores improves performance of nearly all predictors in distinguishing pathogenic from benign variants. Importantly, however, we observe that the utility of computational stability predictors is highly heterogeneous across different proteins, and that they are all inferior to the best performing variant effect predictors for identifying pathogenic mutations. We suggest that this is largely due to alternate molecular mechanisms other than protein destabilisation underlying many pathogenic mutations. Thus, better ways of incorporating protein structural information and molecular mechanisms into computational variant effect predictors will be required for improved disease variant prioritisation.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32958805 PMCID: PMC7506547 DOI: 10.1038/s41598-020-72404-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Protein stability predictors used in this study.
| Predictor | Link | Description |
|---|---|---|
| DynaMut[ | Consensus predictor which uses outputs from Bio3D, ENCoM and DUET to assess the impact of mutations on protein stability. Due to its nature, the predictor leverages multiple methodologies, such as normal mode analysis and statistical potentials | |
| ENCoM[ | No longer available as a stand-alone server, but available from DynaMut | A prediction method based on normal mode analysis that relates changes in vibrational entropy upon mutation to changes in protein stability. Uses coarse-grained protein representations that accounts for residue properties |
| DUET[ | A machine-learnt consensus predictor that leverages output from SDM and mCSM, integrated using support vector machines | |
| SDM[ | No longer available as a stand-alone server (succeeded by the SDM2 webserver), but available from DynaMut | A knowledge-based energy potential, derived using evolutionary environment-specific residue substitution propensities |
| FoldX[ | A full-atom force field consisting of physics-based interaction and entropic terms, parametrised on empirical training data. Allows to easily run predictions on multi-chain assemblies | |
| Rosetta[ | Rosetta macromolecular modelling software suite, which includes algorithms for stability impact prediction. Driven by a scoring function that is a linear combination of statistical and empirical energy terms. Highly modular and customisable | |
| INPS3D[ | INPS3D builds upon its sequence and physicochemical conservation-based predecessor INPS, and employs structure-derived features such as solvent accessibility and local energy differences. The predictor is trained by employing support vector regression | |
| mCSM[ | A machine-learned approach that evaluates structural signature changes imparted by mutations. Derives graph representation of physicochemical and geometric residue environment features | |
| SDM2[ | Updated version of SDM, a knowledge-based potential, which uses environment-specific residue substitution tables, information on residue conformation and interactions, as well as packing density and residue depth, to assess protein stability changes | |
| CUPSAT[ | Prediction method that uses a residue torsion angle potential and an environment-specific atom pair potential (an improvement upon amino acid potentials) to assess stability changes | |
| PoPMuSiC[ | A potential consisting of 13 statistical terms, volume difference between the wild-type and mutant residues, as well as the solvent accessibility of the original residue to differentiate core and surface substitutions | |
| MAESTRO[ | Combines 3 statistical scoring functions of solvent exposure and residue pair distances, as well as 6 protein properties, in a machine-learning framework to derive a consensus stability impact prediction | |
| I-Mutant 3.0[ | A machine-learning derived method that takes into account mutated residue spatial environment in terms of surrounding residue types and surface accessibility |
Figure 1Using ΔΔG values from protein stability predictors to discriminate between pathogenic and putatively benign missense variants. Receiver operating characteristic (ROC) curves are plotted for each predictor, with the classification performance being presented next to its name in the form of area under the curve (AUC). (A) ROC curves for classification performance using native ΔΔG value scale for each predictor. (B) ROC curves for predictor classification performance when using absolute ΔΔG values. The figure was generated in R v3.6.3 (https://www.r-project.org) using ggplot2 v3.3.0 (https://ggplot2.tidyverse.org/), both freely available.
Best stability predictor classification thresholds according to ‘distance-to-corner’ metric.
| Predictor | Absolute ΔΔG threshold | False positive rate (95% confidence interval) | True positive rate (95% confidence interval) |
|---|---|---|---|
| FoldX | 1.578 | 0.339–0.357 | 0.591–0.624 |
| INPS3D | 0.674 | 0.389–0.409 | 0.595–0.628 |
| Rosetta | 1.886 | 0.390–0.409 | 0.572–0.605 |
| PoPMuSiC | 0.795 | 0.417–0.437 | 0.584–0.618 |
| CUPSAT | 1.455 | 0.415–0.434 | 0.549–0.583 |
| MAESTRO | 0.321 | 0.418–0.437 | 0.544–0.578 |
| SDM | 1.025 | 0.350–0.370 | 0.477–0.511 |
| SDM2 | 0.875 | 0.365–0.385 | 0.510–0.544 |
| mCSM | 0.889 | 0.433–0.453 | 0.542–0.575 |
| DUET | 0.803 | 0.400–0.421 | 0.548–0.582 |
| I-Mutant 3.0 | 0.915 | 0.405–0.424 | 0.545–0.578 |
| ENCoM | 0.221 | 0.415–0.436 | 0.598–0.632 |
| DynaMut | 0.476 | 0.446–0.467 | 0.570–0.605 |
The performance metrics and their 95% confidence intervals were derived from 2000 bootstraps of the data.
Figure 2The heterogeneity of protein-specific missense variant classification performance. All the stability predictors exhibit very high degrees of heterogeneity in their protein-specific performance, as measured by the ROC AUC on a per-protein basis. Absolute ΔΔG values were used during protein-specific tool assessment. The mean performance of each predictor is indicated by a red dot and numerically showcased below the plot. Boxes inside the violins illustrate the interquartile range (IQR) of the protein-specific performance points, with the whiskers measuring 1.5 IQR. Boxplot outliers are designated by black dots. The figure was generated in R v3.6.3 (https://www.r-project.org) using ggplot2 v3.3.0 (https://ggplot2.tidyverse.org), both freely available.
Figure 3Performance comparison of protein stability and variant effect predictors for identifying pathogenic variants. Error bars indicate the 95% confidence interval of the ROC AUC as derived through bootstrapping. Stability predictors are shown in red, while other variant effect prediction methods are shown in green. Absolute ΔΔG values were used for stability-based methods. The figure was generated in R v3.6.3 (https://www.r-project.org) using ggplot2 v3.3.0 (https://ggplot2.tidyverse.org), both freely available.