| Literature DB >> 34805992 |
Thanh Binh Nguyen1, Yoochan Myung1, Alex G C de Sá1, Douglas E V Pires1, David B Ascher1.
Abstract
While protein-nucleic acid interactions are pivotal for many crucial biological processes, limited experimental data has made the development of computational approaches to characterise these interactions a challenge. Consequently, most approaches to understand the effects of missense mutations on protein-nucleic acid affinity have focused on single-point mutations and have presented a limited performance on independent data sets. To overcome this, we have curated the largest dataset of experimentally measured effects of mutations on nucleic acid binding affinity to date, encompassing 856 single-point mutations and 141 multiple-point mutations across 155 experimentally solved complexes. This was used in combination with an optimized version of our graph-based signatures to develop mmCSM-NA (http://biosig.unimelb.edu.au/mmcsm_na), the first scalable method capable of quantitatively and accurately predicting the effects of multiple-point mutations on nucleic acid binding affinities. mmCSM-NA obtained a Pearson's correlation of up to 0.67 (RMSE of 1.06 Kcal/mol) on single-point mutations under cross-validation, and up to 0.65 on independent non-redundant datasets of multiple-point mutations (RMSE of 1.12 kcal/mol), outperforming similar tools. mmCSM-NA is freely available as an easy-to-use web-server and API. We believe it will be an invaluable tool to shed light on the role of mutations affecting protein-nucleic acid interactions in diseases.Entities:
Year: 2021 PMID: 34805992 PMCID: PMC8600011 DOI: 10.1093/nargab/lqab109
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.mmCSM–NA workflow and application on real-world data. The method relies on graph-based structural signatures that model distance patterns on the wild-type residue environment. In addition, the effect of mutations was also evaluated using protein dynamics, and interaction network. Complementary information including from mutated residues and predicted protein stability change upon mutation are also used to train and test the predictive models.
Figure 2.Regression plot between the experimental and predicted changes in binding affinity (in kcal/mol) during cross-validation. mmCSM–NA obtained a Pearson's correlation of 0.67 across the original dataset (A). The performance of the model against complexes containing ssRNA (B), dsRNA (C), ssDNA (D) and dsDNA (E) are shown, highlighting the accuracy and applicability of mmCSM–NA to handle all different types of protein–NA complexes. The overall Pearson correlation coefficients, including outliers, is shown in red; with the correlation after removing outliers shown in black. The Pearson's, Spearman’ and Kendall's correlations are written in abbreviation as p, s and k, respectively
Benchmark with other servers that predict the hot-spots and non-hot-spots in protein–NA complexesa
| Cutoff (kcal/mol) | Method | SEN | SPE | PRE | ACC | F1-Score | MCC |
| |
|---|---|---|---|---|---|---|---|---|---|
| DNA+RNA | −2 | mCSM-NA | 0.31 | 0.96 | 0.64 | 0.85 | 0.42 | 0.37 | <0.001 |
| iPNHOT | 0.37 | 0.90 | 0.43 | 0.80 | 0.40 | 0.28 | |||
| DNA | −1 | mCSM-NA | 0.63 | 0.77 | 0.61 | 0.72 | 0.62 | 0.39 | 0.006 |
| PrPDH | 0.48 | 0.81 | 0.59 | 0.70 | 0.53 | 0.31 | |||
| RNA | −1 | mCSM-NA | 0.54 | 0.73 | 0.63 | 0.64 | 0.58 | 0.28 | <0.001 |
| PrabHot | 0.69 | 0.37 | 0.48 | 0.52 | 0.57 | 0.07 |
aNon-predicted residues in other servers are considered as non-hot-spots.