| Literature DB >> 33766936 |
Shalaw Rassul Sallah1,2, Jamie M Ellingford1,2, Panagiotis I Sergouniotis2, Simon C Ramsden2, Nicholas Lench3, Simon C Lovell1, Graeme C Black4,2.
Abstract
BACKGROUND: Improving the clinical interpretation of missense variants can increase the diagnostic yield of genomic testing and lead to personalised management strategies. Currently, due to the imprecision of bioinformatic tools that aim to predict variant pathogenicity, their role in clinical guidelines remains limited. There is a clear need for more accurate prediction algorithms and this study aims to improve performance by harnessing structural biology insights. The focus of this work is missense variants in a subset of genes associated with X linked disorders.Entities:
Keywords: clinical decision-making; genetic variation; missense; mutation; point mutation; protein; structural homology
Mesh:
Year: 2021 PMID: 33766936 PMCID: PMC8961765 DOI: 10.1136/jmedgenet-2020-107404
Source DB: PubMed Journal: J Med Genet ISSN: 0022-2593 Impact factor: 6.318
Figure 1The number of missense variants on the modelled and unmodelled regions from 21 disease-associated X linked genes. Data sets P and B comprised pathogenic and benign variants, respectively. The modelled variants represent variants found on regions with a known structure, or those found in regions shared by both the homologous template and the sequence for the proteins without a structure. The unmodelled variants represent variants outside of these regions.
MCC used to evaluate the performance of the seven tools
| Genes | SIFT | PolyPhen2 | VEST4 | REVEL | ReVe | ClinPred | CAPICE |
|
| 0.41 | 0.42 | 0.59 |
| 0.56 | 0.52 | 0.33 |
|
| – | 0.41 | – | 0.63 | 0.61 |
| 0.58 |
|
| 0.38 | 0.43 |
| 0.59 |
|
| 0.43 |
|
| 0.60 | 0.67 |
| 0.58 | 0.56 | 0.66 | 0.55 |
|
| 0.38 | 0.38 | 0.61 | 0.46 | 0.46 |
| 0.51 |
|
| 0.42 | 0.55 |
| 0.58 | 0.48 |
| 0.41 |
|
| 0.38 | 0.61 | 0.75 |
| 0.77 | 0.74 | 0.63 |
|
| 0.52 | 0.59 | 0.74 |
| 0.61 |
| 0.58 |
|
| 0.42 | 0.49 |
| 0.65 | 0.64 | 0.58 | 0.51 |
|
| 0.55 |
| 0.59 | 0.42 | 0.56 | 0.58 | 0.53 |
|
| 0.62 | 0.72 | 0.68 | 0.64 | 0.70 |
| 0.67 |
|
| 0.26 | 0.29 |
| 0.53 | 0.51 | 0.52 | 0.31 |
|
| 0.60 | 0.63 | 0.72 | 0.68 | 0.69 |
| 0.53 |
|
| 0.25 | 0.30 | 0.39 | 0.57 |
| 0.50 | 0.41 |
|
| 0.31 | 0.35 | 0.49 |
| 0.53 | 0.53 | 0.34 |
|
| 0.56 | 0.59 |
| 0.62 | 0.75 | 0.73 | 0.65 |
|
| 0.62 | 0.53 | 0.70 | 0.60 | 0.55 |
| 0.57 |
|
| 0.69 | 0.52 |
|
| 0.76 | 0.72 | 0.58 |
|
| 0.54 | 0.76 |
| 0.76 | 0.68 | 0.62 | 0.71 |
|
| – | 0.57 | – | 0.64 | 0.59 |
| 0.48 |
|
| 0.44 | 0.24 | 0.73 | 0.62 | 0.65 |
| 0.48 |
*As suggested by their respective authors, the pathogenicity thresholds used for SIFT, PolyPhen2, VEST4, REVEL, ReVe, ClinPred and CAPICE were 0.05, 0.85, 0.5, 0.5, 0.7, 0.5 and 0.02, respectively.
†The highest MCC value for each gene is highlighted in bold.
‡The SIFT and VEST4 prediction scores for ALAS2 and NDP variants in the transcript of interest were unavailable.
MCC, Matthews correlation coefficient.
Figure 2The performance of ProSper (protein-specific variant interpreter) evaluated using Matthews correlation coefficient (MCC) in the classification of variants in 21 genes associated with X linked disorders. For each gene, the line shows the SD from the repeated (n=10) 10-fold cross-validation with random subsampling.
Figure 3The Matthews correlation coefficient (MCC) values for the gene-specific approach ProSper (protein-specific variant interpreter) compared with REVEL, VEST4 and ClinPred using the complete data sets. The VEST4 MCC results were unavailable as the VEST4 predictions were unavailable for the variants in the transcript of interest in ALAS2 and NDP.
Figure 4A comparison of the default Matthews correlation coefficient (MCC) with the optimised MCC for the performance of VEST4, REVEL and ClinPred (left, middle and right panels, respectively) using all of the data sets (the top three panels) and using balanced data sets (the bottom three panels) for the 21 genes. For each gene, the data set was balanced using undersampling, that is, using a random subset from the majority class to match the number of variants in the minority class. The default MCC values were generated using the default threshold of 0.5. The optimised MCC values were generated using gene-specific thresholds. The gene-specific threshold was identified using 80% of all the predictions from each tool through repeated (n=10) fivefold cross-validation with random subsampling. The optimised MCC value was generated using the rest (20%) of the predictions from each tool at the threshold identified for each gene. VEST4 predictions were unavailable for ALAS2 and NDP variants in the respective transcripts of interest. The lines between the default MCC and the optimised MCC values for each gene are for visualisation purposes only.
Figure 5A comparison of the Matthews correlation coefficient (MCC) values for ProSper (protein-specific variant interpreter) with the optimised MCC values for VEST4, REVEL and ClinPred using all of the data sets (on the left) and using balanced data sets (on the right). Optimised MCC values were generated using gene-specific or protein-specific pathogenicity thresholds. The data set for each gene was balanced using undersampling, that is, using a random subset from the majority class to match the number of variants in the minority class. The gene-specific threshold was identified using 80% of all the predictions from each tool through repeated (n=10) fivefold cross-validation with random subsampling. The optimised MCC value was generated using the rest (20%) of the predictions from each tool at the threshold identified for each gene. VEST4 predictions were unavailable for ALAS2 and NDP variants in the respective transcripts of interest.