| Literature DB >> 24281696 |
Douglas E V Pires1, David B Ascher, Tom L Blundell.
Abstract
MOTIVATION: Mutations play fundamental roles in evolution by introducing diversity into genomes. Missense mutations in structural genes may become either selectively advantageous or disadvantageous to the organism by affecting protein stability and/or interfering with interactions between partners. Thus, the ability to predict the impact of mutations on protein stability and interactions is of significant value, particularly in understanding the effects of Mendelian and somatic mutations on the progression of disease. Here, we propose a novel approach to the study of missense mutations, called mCSM, which relies on graph-based signatures. These encode distance patterns between atoms and are used to represent the protein residue environment and to train predictive models. To understand the roles of mutations in disease, we have evaluated their impacts not only on protein stability but also on protein-protein and protein-nucleic acid interactions.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24281696 PMCID: PMC3904523 DOI: 10.1093/bioinformatics/btt691
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Predicting the impact of mutations with mCSM. (a) Highlights important steps in the methodology and how the main components of the signatures are computed. Here, we use as an example the published crystal structure of p53 (PDB ID: 2OCJ), considering the mutation site R282W, further discussed in Section 3.3. Given a mutation site in a wild-type protein, its structural environment is extracted and the distance patterns among the atoms summarized in the mCSM signature. To take into account the change in atom types due to the mutation, a pharmacophore count is performed for the wild-type and mutant residue. The changes in pharmacophore count are then appended into the signature, which is used to train/test predictive models. The considered pharmacophore types are eight: hydrophobic (green), positive (blue), negative (red), hydrogen acceptor (red), hydrogen donor (blue), aromatic (green), sulphur (yellow) and neutral (white). (b) Summarizes the mCSM predictive workflow that can be divided into the following steps: gathering and preprocessing the thermodynamic and structural data, extracting the residue environments, signature calculation and noise reduction, supervised learning and mutation impact prediction and validation
Summary of data sets used, the experiments performed and validation process used
| Experiment | Data set | Task | Validation | References |
|---|---|---|---|---|
| Protein stability change | S2648 | Regression | 5-fold cross-validation | ( |
| Protein stability change | S1925 | Regression and classification | 20-fold cross-validation | ( |
| Protein stability change | S350/S309/S87 | Regression | Train (S2298) | ( |
| Protein–nucleic acid affinity | ProNIT | Regression and classification | 10-fold cross-validation | ( |
| Protein–protein affinity | SKEMPI | Regression and classification | 10-fold cross-validation | ( |
| Protein–protein affinity | BeAtMuSiC | Regression | 10-fold cross-validation | ( |
| Disease-related mutations | KIN | Classification | 20-fold cross-validation | ( |
Fig. 2.Regression results for mCSM signature predictive model trained using Gaussian processes regression for different tasks. From left to right: stability change prediction (S1925 dataset), protein–protein affinity change (SKEMPI dataset) and protein–DNA affinity change (ProNIT data set). For each data set the Pearson’s correlation coefficient (ρ) and standard error (σ) are also shown in the top-left part of each graph
Comparative regression experiments using the S350 data set
| Method | Number of predictions | Pearson’s coefficient | Standard error(kcal/mol) |
|---|---|---|---|
| Automute | 315 | 0.46/0.45/0.45 | 1.43/1.46/1.99 |
| Cupsat | 346 | 0.37/0.35/0.50 | 1.91/1.96/2.14 |
| Dmutant | 0.48/0.47/0.57 | 1.81/1.87/2.31 | |
| Eris | 334 | 0.35/0.34/0.49 | 4.12/4.28/3.91 |
| I-Mutant-2.0 | 346 | 0.29/0.27/0.27 | 1.65/1.69/2.39 |
| PoPMuSiC-1.0 | 0.62/0.63/0.70 | 1.24/1.25/1.66 | |
| PoPMuSiC-2.0 | 0.67/0.67/0.71 | 1.16/1.19/1.67 | |
| SDM | 0.52/0.53/0.63 | 1.80/1.81/2.11 | |
Note: Results directly obtained from Worth . Bold values highlight are the best performing metrics.
aThe three values given per column correspond, respectively, to the whole validation set of 350 mutants, the 309 mutants for which a prediction was available for all predictors. Finally, in the third column are the results for 87 mutants, a subset of the 309 mutants, which the experimental is >2 kcal/mol.