| Literature DB >> 30533007 |
Ning Zhang1, Yuting Chen1, Feiyang Zhao1, Qing Yang1, Franco L Simonetti2, Minghui Li1.
Abstract
Protein-DNA interactions play important roles in regulations of many vital cellular processes, including transcription, translation, DNA replication and recombination. Sequence variants occurring in these DNA binding proteins that alter protein-DNA interactions may cause significant perturbations or complete abolishment of function, potentially leading to diseases. Developing a mechanistic understanding of impacts of variants on protein-DNA interactions becomes a persistent need. To address this need we introduce a new computational method PremPDI that predicts the effect of single missense mutation in the protein on the protein-DNA interaction and calculates the quantitative binding affinity change. The PremPDI method is based on molecular mechanics force fields and fast side-chain optimization algorithms with parameters optimized on experimental sets of 219 mutations from 49 protein-DNA complexes. PremPDI yields a very good agreement between predicted and experimental values with Pearson correlation coefficient of 0.71 and root-mean-square error of 0.86 kcal mol-1. The PremPDI server could map mutations on a structural protein-DNA complex, calculate the associated changes in binding affinity, determine the deleterious effect of a mutation, and produce a mutant structural model for download. PremPDI can be applied to many tasks, such as determination of potential damaging mutations in cancer and other diseases. PremPDI is available at http://lilab.jysw.suda.edu.cn/research/PremPDI/.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30533007 PMCID: PMC6303081 DOI: 10.1371/journal.pcbi.1006615
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
The p-value and importance of each feature in energy function for binding affinity change determined by multiple linear regression (MLR).
| Feature | P-value | Importance |
|---|---|---|
| 5.74e-09 | 0.47 | |
| ΔΔ | 6.34e-08 | 0.41 |
| 5.37e-06 | 0.33 | |
| 1.98e-07 | 0.28 | |
| 2.33e-06 | 0.27 | |
| 9.42e-04 | 0.26 | |
| 4.30e-04 | 0.18 | |
| Δ | 3.03e-03 | 0.17 |
| 2.28e-03 | 0.17 |
All features have significant contribution to the quality of the model with p-value < 0.01 (t-test). Standardized coefficients are used for describing the importance for MLR.
Fig 1PremPDI performance.
Pearson correlation coefficients between experimental and calculated changes in binding free energies (ΔΔG) for “Prempdi” training/test set (a), for two types of cross-validation (CV1 and CV2) (b) and for “leave-one-complex-out” cross-validation (CV3) (c). ROC curves for predictions of deleterious mutations applied on “Prempdi” set (d).
PremPDI performance.
| Test set | Method | R | RMSE (kcal mol-1) | Slope |
|---|---|---|---|---|
| Prempdi | PremPDI | 0.71 | 0.86 | 1 |
| PremPDI (CV1) | 0.68 | 0.90 | 0.94 | |
| PremPDI (CV2) | 0.68 | 0.90 | 0.95 | |
| PremPDI (CV3) | 0.63 | 0.95 | 0.90 | |
| Alanine-scanning mutations | PremPDI | 0.68 | 0.87 | 0.97 |
| PremPDI (CV3) | 0.58 | 0.96 | 0.87 | |
| Non-Alanine- scanning mutations | PremPDI | 0.64 | 0.81 | 0.88 |
| PremPDI (CV3) | 0.58 | 0.88 | 0.72 | |
| Interfacial mutations | PremPDI | 0.71 | 0.86 | 1.01 |
| PremPDI (CV3) | 0.64 | 0.95 | 0.89 | |
| Non-interfacial mutations | PremPDI | 0.69 | 0.85 | 0.98 |
| PremPDI (CV3) | 0.59 | 0.95 | 0.91 |
R: Pearson correlation coefficient between experimental and predicted ΔΔG values. RMSE: root-mean square error. The last column shows the slope of the regression line between experimental and predicted ΔΔG values. All correlation coefficients are statistically significantly different from zero (P-value << 0.01). CV1 and CV2 results for “Alanine-scanning mutations”, “Non-Alanine-scanning mutations”, “Interfacial mutations” and “Non-interfacial mutations” test sets are shown in S5 Table.
Comparison of methods’ performances on different test sets.
| Test set | Training set | Method | R | RMSE (kcal mol-1) | AUC-ROC | AUC-PR | MCC |
|---|---|---|---|---|---|---|---|
| P.O.M | Prempdi | PremPDI | 0.80 | 0.81 | 0.88 | 0.87 | 0.54 |
| Mcsm | mCSM | 0.76 | 0.95 | 0.82 | 0.79 | 0.50 | |
| P.O.S | Prempdi | PremPDI | 0.68 | 0.63 | 0.88 | 0.81 | 0.52 |
| Sampdi | SAMPDI | 0.39 | 0.80 | 0.66 | 0.53 | 0.27 | |
| P.D.M | Prempdi | PremPDI(CV3) | 0.51 | 0.97 | 0.78 | 0.72 | 0.54 |
| P.O.M | PremPDI(Ind) | 0.51 | 1 | 0.77 | 0.72 | 0.41 | |
| Mcsm | mCSM | 0.54 | 1.17 | 0.69 | 0.65 | 0.28 | |
| P.D.S.I | Prempdi | PremPDI(CV3) | 0.70 | 1.10 | 0.85 | 0.82 | 0.62 |
| Prempdi- P.D.S.I | PremPDI(Ind) | 0.74 | 1.08 | 0.85 | 0.83 | 0.68 | |
| Sampdi | SAMPDI | 0.53 | 1.32 | 0.79 | 0.71 | 0.35 |
R: Pearson correlation coefficient between experimental and predicted ΔΔG values. RMSE: root-mean square error. AUC-ROC: the AUC values of ROC curves. AUC-PR: the AUC values of Precision-recall curves. MCC: Matthews correlation. All correlation coefficients are statistically significantly different from zero (p-value < 0.01). The descriptions of training and test set are shown in S1 Table. Nine mutations do not have SAMPDI scores in the P.D.S.I test set, so they were excluded in the comparison.
Fig 2Assessment of classification performance between deleterious and neutral mutations.
ROC curves for PremPDI, mCSM-NA and SAMPDI methods applied on different training and test set. More information is shown in Table 3.
Fig 3Left corner: The entry page of PremPDI server; right corner: The third step for selecting mutations, wild-type residue (R124) in the mutated site is shown in the 3D viewer; and bottom: Final results table and alignment of homologous binding sites.