| Literature DB >> 32313206 |
Shalaw R Sallah1,2, Panagiotis I Sergouniotis3, Stephanie Barton3, Simon Ramsden3, Rachel L Taylor3, Amro Safadi4, Mitra Kabir4, Jamie M Ellingford3, Nick Lench5, Simon C Lovell4, Graeme C M Black4,3.
Abstract
Advances in DNA sequencing technologies have revolutionised rare disease diagnostics and have led to a dramatic increase in the volume of available genomic data. A key challenge that needs to be overcome to realise the full potential of these technologies is that of precisely predicting the effect of genetic variants on molecular and organismal phenotypes. Notably, despite recent progress, there is still a lack of robust in silico tools that accurately assign clinical significance to variants. Genetic alterations in the CACNA1F gene are the commonest cause of X-linked incomplete Congenital Stationary Night Blindness (iCSNB), a condition associated with non-progressive visual impairment. We combined genetic and homology modelling data to produce CACNA1F-vp, an in silico model that differentiates disease-implicated from benign missense CACNA1F changes. CACNA1F-vp predicts variant effects on the structure of the CACNA1F encoded protein (a calcium channel) using parameters based upon changes in amino acid properties; these include size, charge, hydrophobicity, and position. The model produces an overall score for each variant that can be used to predict its pathogenicity. CACNA1F-vp outperformed four other tools in identifying disease-implicated variants (area under receiver operating characteristic and precision recall curves = 0.84; Matthews correlation coefficient = 0.52) using a tenfold cross-validation technique. We consider this protein-specific model to be a robust stand-alone diagnostic classifier that could be replicated in other proteins and could enable precise and timely diagnosis.Entities:
Year: 2020 PMID: 32313206 PMCID: PMC7608274 DOI: 10.1038/s41431-020-0623-y
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
The comparison of the true positive (TP) and false positive (FP) predictions of CACNA1F variants using four different tools (total positives and negatives = 72 and 322, respectively (NM_005183.3; ENST00000376265.2)); FPR: false positive rate.
| Tools | Optimal threshold | TP | FP | FPR (%) |
|---|---|---|---|---|
| SIFT | 0.05 | 65 | 171 | 53 |
| PolyPhen2 | 0.85 | 62 | 132 | 41 |
| CADD | 15 | 70 | 277 | 86 |
| CONDEL | 0.52 | 69 | 255 | 79 |
Fig. 1Model representation of the template structure (PDB ID 5GJV) used in homology modelling.
The structure is of the mammalian voltage-gated calcium channel Cav1.1 complex at a resolution of 3.6 angstroms [43]. The transmembrane domain (approximately within the lines) in the side view representation (a). The pore (indicated by an arrow) and the first four out of six segments (highlighted in the rectangle) of each of the four domains in top view representation (b).
Fig. 2Protein structure modelling for a disease-implicated variant.
In testing the molecular goodness-of-fit for the disease-implicated CACNA1F (NM_005183.3; ENST00000376265.2) variant c.647 T > G p.(Leu216Arg), the red spikes reflect an overlap of van der Waals interaction between the surrounding residues and the introduced arginine (in orange) in place of the mutated leucine (in green) highlighted by the arrows.
Comparing the predictions and the overall performance of the different tools shows a high recall rate at the expense of the precision rate at optimum thresholds for all the tools except for CACNA1F-vp. CACNA1F-vp has also a higher MCC score (MCC scores range from 1 to −1 with 1 being a perfect correlation between predictions and the classes, and −1 being an inverse correlation); total disease-implicated and benign variants = 72 and 322, respectively (NM_005183.3; ENST00000376265.2); AUC ROC: area under the receiver operating characteristic curve, AUC PR: area under the precision recall curve, TPR: true positive rate, FPR: false positive rate, PPV: positive predictive value, MCC: Matthews Correlation Coefficient.
| Tools | Threshold | Recall/TPR (%) | FPR (%) | Precision/PPV (%) | AUC ROC | AUC PR | MCC |
|---|---|---|---|---|---|---|---|
| SIFT | 0.05 | 88 | 53 | 28 | 0.77 | 0.61 | 0.3 |
| PolyPhen2 | 0.85 | 86 | 41 | 32 | 0.83 | 0.59 | 0.35 |
| CADD | 15 | 97 | 86 | 20 | 0.79 | 0.43 | 0.12 |
| CONDEL | 0.522 | 96 | 79 | 21 | 0.85 | 0.61 | 0.17 |
| CACNA1F-vp | 0.567 | 86 | 33 | 72 | 0.84 | 0.84 | 0.52 |
Fig. 3ROC curves for the different classifiers.
The predictive power of the protein-specific (CACNA1F-vp) model is comparable to that of the four tools, using 72 disease-implicated and 322 presumably benign CACNA1F variants, shown by an area under the receiver operating characteristic (ROC) curve of 0.84.
Fig. 4PR curves for the different classifiers.
The precision of the protein-specific (CACNA1F-vp) model is outperforming that of the four tools, using 72 disease-implicated and 322 presumably benign CACNA1F variants, shown by an area under the precision recall (PR) curve of 0.84.