| Literature DB >> 31015822 |
Vera G Pshennikova1,2, Nikolay A Barashkov1,2, Georgii P Romanov1,2, Fedor M Teryutin1,2, Aisen V Solov'ev1,2, Nyurgun N Gotovtsev1,2, Alena A Nikanorova1,2, Sergey S Nakhodkin2, Nikolay N Sazonov2, Igor V Morozov3,4, Alexander A Bondar3, Lilya U Dzhemileva5,6, Elza K Khusnutdinova5,7, Olga L Posukh4,8, Sardana A Fedorova1,2.
Abstract
In silico predictive software allows assessing the effect of amino acid substitutions on the structure or function of a protein without conducting functional studies. The accuracy of in silico pathogenicity prediction tools has not been previously assessed for variants associated with autosomal recessive deafness 1A (DFNB1A). Here, we identify in silico tools with the most accurate clinical significance predictions for missense variants of the GJB2 (Cx26), GJB6 (Cx30), and GJB3 (Cx31) connexin genes associated with DFNB1A. To evaluate accuracy of selected in silico tools (SIFT, FATHMM, MutationAssessor, PolyPhen-2, CONDEL, MutationTaster, MutPred, Align GVGD, and PROVEAN), we tested nine missense variants with previously confirmed clinical significance in a large cohort of deaf patients and control groups from the Sakha Republic (Eastern Siberia, Russia): Сх26: p.Val27Ile, p.Met34Thr, p.Val37Ile, p.Leu90Pro, p.Glu114Gly, p.Thr123Asn, and p.Val153Ile; Cx30: p.Glu101Lys; Cx31: p.Ala194Thr. We compared the performance of the in silico tools (accuracy, sensitivity, and specificity) by using the missense variants in GJB2 (Cx26), GJB6 (Cx30), and GJB3 (Cx31) genes associated with DFNB1A. The correlation coefficient (r) and coefficient of the area under the Receiver Operating Characteristic (ROC) curve as alternative quality indicators of the tested programs were used. The resulting ROC curves demonstrated that the largest coefficient of the area under the curve was provided by three programs: SIFT (AUC = 0.833, p = 0.046), PROVEAN (AUC = 0.833, p = 0.046), and MutationAssessor (AUC = 0.833, p = 0.002). The most accurate predictions were given by two tested programs: SIFT and PROVEAN (Ac = 89%, Se = 67%, Sp = 100%, r = 0.75, AUC = 0.833). The results of this study may be applicable for analysis of novel missense variants of the GJB2 (Cx26), GJB6 (Cx30), and GJB3 (Cx31) connexin genes.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31015822 PMCID: PMC6446107 DOI: 10.1155/2019/5198931
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Localization of the tested nonsynonymous (missense) amino acid substitutions in the structure of connexin 26. Note. The information about the structure Сx26 was obtained from the database of three-dimensional structures of proteins and nucleic acids PDB ID:2ZW3 (https://www.ncbi.nlm.nih.gov/Structure/pdb/2ZW3) [22]. Localization of the studied amino acids in structure of Cx26 was obtained using the 3D-structure viewer applet with the protein structure loaded software PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/). Detailed structure models of human Cx30 and Cx31 proteins are currently not defined.
Evaluation of missense variants by predictive in silico tools.
| Gene | Missense | Clinical significance | SIFT | FATHMM | MutationAssessor | Polyphen-2 | CONDEL | MutationTaster | MutPred | Align GVGD | PROVEAN |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| c.79G>A | Benign |
| Damaging | Medium | Probably damaging | Deleterious |
|
|
|
|
| c.101T>C | Pathogenic |
|
|
| Benign |
|
| hypotheses are absent |
|
| |
| c.109G>A | Pathogenic | Tolerated |
|
|
|
|
| hypotheses are absent | Unclassified | Neutral | |
| c.269T>C | Pathogenic |
|
|
|
|
|
|
|
|
| |
| c.341A>G | Benign |
| Damaging | Medium |
| Deleterious |
|
| Deleterious |
| |
| c.368C>A | Benign |
| Damaging |
|
|
| Disease causing |
| Deleterious |
| |
| c.457G>A | Benign |
| Damaging |
|
|
| Disease causing |
|
|
| |
|
| |||||||||||
|
| c.301G>A | Benign |
| Damaging |
|
|
| Disease causing | Actionable | Deleterious |
|
|
| |||||||||||
|
| с.580G>A | Benign |
| Damaging |
|
| Deleterious | Disease causing |
| Deleterious |
|
Note. The correct results (both “true” positive and “true” negative results) are highlighted by bold font.
Performance of in silico tools.
|
| Accuracy | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| SIFT | 89% | 67% | 100% | 100% | 86% |
| MutationAssessor | 78% | 100% | 67% | 60% | 100% |
| FATHMM | 33% | 100% | 0% | 33% | 0% |
| Polyphen-2 | 78% | 67% | 83% | 67% | 50% |
| MutationTaster | 56% | 100% | 33% | 43% | 33% |
| PROVEAN | 89% | 67% | 100% | 100% | 86% |
| Align GVGD | 44% | 33% | 33% | 33% | 67% |
| MutPred | 67% | 33% | 83% | 50% | 71% |
| CONDEL | 67% | 100% | 50% | 50% | 100% |
Note. Accuracy (Aс) - the proportion of the correct test results (that is the sum of true positive and true negative results) among all the patients examined. In our case, this is the proportion of correct estimates of pathogenic and benign variants; Sensitivity (Se) - the ability of the diagnostic method to give the correct result which is defined as the proportion of true positive results among all performed tests. In our case, this is the proportion of true positive results, that is, the correct identification of pathogenic variants; Specificity (Sp) - the ability of the diagnostic method not to give false positive results in the absence of disease, which is defined as the proportion of true negative results among healthy individuals in studied group. In our case, this is a share of true negative results, that is, a correct identification of benign variants; Positive predictive values (PPV) - prediction of pathogenic variants; Negative predictive values (NPV) - prediction of benign variants.
Figure 2The correlation coefficient (r) histogram. Note. r: the relationship between the known clinical significance of missense variants and in silico evaluation given by 9 predictive tools; α: the level of significance of the correlation coefficient: the critical value for the significance level and the sample size n=9 is 0.933, so the correlation is significant at p<0.001 [23].
Figure 3ROC curves expressing the relationship of the sensitivity and specificity of the tested programs. These graphs illustrate performance of studied in silico tools. The overall accuracy of the tests can be described as the area under the ROC curve (AUC); a higher AUC score indicates a better performance. The diagonal line shows the relationship between true-positive and false-positive values of absolutely uninformative in silico tools (FATHMM and Align GVGD). 95% CI indicates 95% confidence interval (Binomial Exact). The ROC curves were constructed using the MedCalc statistical software for biomedical researches (https://www.medcalc.org).