| Literature DB >> 32756481 |
Ning Zhang1, Haoyu Lu1, Yuting Chen1, Zefeng Zhu1, Qing Yang1, Shuqin Wang1, Minghui Li1.
Abstract
Protein-RNA interactions are crucial for many cellular processes, such as protein synthesis and regulation of gene expression. Missense mutations that alter protein-RNA interaction may contribute to the pathogenesis of many diseases. Here, we introduce a new computational method PremPRI, which predicts the effects of single mutations occurring in RNA binding proteins on the protein-RNA interactions by calculating the binding affinity changes quantitatively. The multiple linear regression scoring function of PremPRI is composed of three sequence- and eight structure-based features, and is parameterized on 248 mutations from 50 protein-RNA complexes. Our model shows a good agreement between calculated and experimental values of binding affinity changes with a Pearson correlation coefficient of 0.72 and the corresponding root-mean-square error of 0.76 kcal·mol-1, outperforming three other available methods. PremPRI can be used for finding functionally important variants, understanding the molecular mechanisms, and designing new protein-RNA interaction inhibitors.Entities:
Keywords: Mutation; Protein–RNA interaction; binding affinity change; computational approach
Mesh:
Substances:
Year: 2020 PMID: 32756481 PMCID: PMC7432928 DOI: 10.3390/ijms21155560
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Pearson correlation coefficients between experimental and calculated changes in binding affinity for PremPRI trained and tested on S248 dataset (a), using two types of cross-validation (CV1 and CV2) (b) and performing leave-one-complex-out validation (CV3) (c), respectively.
The performance for PremPRI trained and tested on the S248 dataset and performing three types of cross-validation.
| Method | R | RMSE | Slope |
|---|---|---|---|
|
| 0.72 | 0.76 | 1.00 |
|
| 0.68 | 0.80 | 0.94 |
|
| 0.68 | 0.80 | 0.95 |
|
| 0.61 | 0.87 | 0.89 |
R: Pearson correlation coefficient between experimental and predicted ΔΔG values. RMSE (kcal·mol−1): root-mean-square error. Slope: the slope of the regression line between experimental and predicted ΔΔG values. All correlation coefficients are statistically significantly different from zero (p-value < 0.01).
Comparison of methods’ performances on the S248 dataset.
| Method | R | RMSE | AUC-ROC | AUC-PR | MCC |
|---|---|---|---|---|---|
|
| 0.61 | 0.87 | 0.76 | 0.76 | 0.45 |
|
| 0.24 * | 5.41 | 0.56 * | 0.60 | 0.22 |
|
| 0.20 * | 1.57 | 0.53 * | 0.59 | 0.24 |
|
| - | - | 0.58 * | 0.61 | 0.26 |
R: Pearson correlation coefficient. RMSE (kcal··mol−1): root-mean-square error. AUC-ROC: the AUC values of ROC curves. AUC-PR: the AUC values of precision-recall curves. MCC: maximal Matthews correlation coefficient. All correlation coefficients are statistically significantly different from zero (p-value < 0.01). * indicates a statistically significant difference between PremPRI and other methods in terms of R and AUC-ROC with p-value < 0.01 (Hittner2003 and DeLong tests are used for comparing correlation coefficients and AUC values, respectively).
Figure 2Performance of three methods of PremPRI, mCSM-NA, and FoldX applied to the S248 dataset. The leave-one-complex-out validation (CV3) results of PremPRI are used. (a) ROC curves for predicting highly decreasing mutations. The number of highly decreasing mutations ( ≥ 1 kcal·mol−1) in S248 is 124. (b) Pearson correlation coefficients between predicted and experimental for mutations located at protein–RNA binding interface and noninterface. Only correlation coefficients that are significantly different from zero are shown in the figure (p-value < 0.01, t-test).
Figure 3Left corner: the entry page of PremPRI server; right corner: the third step for selecting mutations and three options are provided: “Specify One or More Mutations Manually”, “Upload Mutation List” and “Alanine Scanning for Each Chain”, see also Figure S3; and bottom: final results, see also Figure S4. “Processing time” refers to the running time of a job without counting the waiting time in the queue.