| Literature DB >> 22759653 |
Ezequiel Juritz1, Maria Silvina Fornasari, Pier Luigi Martelli, Piero Fariselli, Rita Casadio, Gustavo Parisi.
Abstract
BACKGROUND: Non-synonymous coding SNPs (nsSNPs) that are associated to disease can also be related with alterations in protein stability. Computational methods are available to predict the effect of single amino acid substitutions (SASs) on protein stability based on a single folded structure. However, the native state of a protein is not unique and it is better represented by the ensemble of its conformers in dynamic equilibrium. The maintenance of the ensemble is essential for protein function. In this work we investigated how protein conformational diversity can affect the discrimination of neutral and disease related SASs based on protein stability estimations. For this purpose, we used 119 proteins with 803 associated SASs, 60% of which are disease related. Each protein was associated with its corresponding set of available conformers as found in the Protein Conformational Database (PCDB). Our dataset contains proteins with different extensions of conformational diversity summing up a total number of 1023 conformers.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22759653 PMCID: PMC3303731 DOI: 10.1186/1471-2164-13-S4-S5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Distribution of maximum RMSD between conformers corresponding to the 119 proteins in the dataset used in this study.
Figure 2Distribution of ΔASA (Å2) for substituted positions derived from the analysis of the conformational ensemble for each of the 119 proteins in the dataset.
Figure 3Distributions of maximum and minimum values of ΔΔG obtained for the different conformers for each protein in the dataset. Disease and neutral SASs are shown as separate distributions.
Figure 4Distribution of maximum Δ(ΔΔG)(kcal/mol) for the 803 SASs studied. Bars represent the frequency and dots the cumulative frequency.
Figure 5Scatter plot of average differences of Δ(ΔΔG) (kcal/mol) among conformers as a function of averaged ΔASA (Å2) and their respective error bars. Dots represent the average derived from ranges of 10 units of ΔASA and are represented in the centre of each interval.
Scoring the capability of discriminating among disease related and neutral SASs on different set of conformers.
| MCC | Accuracy | Specificity | Sensitivity | |
|---|---|---|---|---|
| 0.19 | 0.54 | 0.76 | 0.44 | |
| 0.23 | 0.54 | 0.85 | 0.34 | |
| 0.36 | 0.68 | 0.69 | 0.68 | |
| 0.25 | 0.62 | 0.80 | 0.5 | |
| 0.25 | 0.60 | 0.68 | 0.55 |
*Global uses all data neglecting the correspondence between proteins and their conformers. °Minimum, Maximum and Average characterize the values of ΔΔG per SAS taking into account the conformers for each protein. ^Random MCC was calculated as an average over 50 independent selections of a randomly taken one ΔΔG value per SAS and per protein. In all the cases the threshold is ± 2kcal/mol. The difference between MCC values for Maximum and Random is significant at P-value=0.02. For definition of the different scoring indexes see text.
Figure 6Example showing how conformational diversity affects ΔΔG estimation and MCC values. The protein fructose-biphosphate aldolase B (P05062) is shown in the example. Two conformers were found for this protein with a RMSD = 1.3 A. At the top of the figure there are cartoon representations of the two conformers with the mutated amino acids represented in stick (in yellow). Bars graphics represent ΔΔG values (red for disease associated mutations and blue for neutral mutations) and finally the corresponding MCC for each conformer. Black arrows indicate wrong predictions based in the reference interval of ±2kcal/mol. As structural changes produce variations in ΔΔG the different conformers have different MCC values.