| Literature DB >> 32843488 |
Adam C Gunning1,2, Verity Fryer2, James Fasham1, Andrew H Crosby1, Sian Ellard2, Emma L Baple1, Caroline F Wright3.
Abstract
BACKGROUND: Pathogenicity predictors are integral to genomic variant interpretation but, despite their widespread usage, an independent validation of performance using a clinically relevant dataset has not been undertaken.Entities:
Keywords: genetic testing; genetic variation; genetics; genomics; human genetics
Mesh:
Year: 2020 PMID: 32843488 PMCID: PMC8327323 DOI: 10.1136/jmedgenet-2020-107003
Source DB: PubMed Journal: J Med Genet ISSN: 0022-2593 Impact factor: 6.318
Figure 1Flow diagram of selection and filtering steps used for the generation of the open (A) and clinical (B) datasets. Oval—variant source; box—selection criteria; rounded box—dataset. Red text (right) shows the number of pathogenic variants, green text (left) shows the number of benign variants. MAF, minor allele frequency.
Figure 2In silico pathogenicity predictor feature usage and source. Shading indicates that a category of evidence is used by the tool. Codes within each box indicate that the feature is inherited from another tool. Feature lists were taken from the tools' original publications, supplementary materials and available online material. C, CADD; D, DANN; F, FATHMM; FC, FitCons; MP, MutPred; MT, MutationTaster; P, PolyPhen-2; S, SIFT; V, VEST. An extended version is shown in online supplementary figure S1.
Results of variant classification for individual tool, and two consensus-based combinations, for the (A) open (n=8480) and (B) clinical (n=1757) datasets
| True positive | True negative | False positive | False negative | Sensitivity | Specificity | MCC | LR+ | LR− | ||
| (A) Open dataset | ||||||||||
| Individual | SIFT | 2302 | 3857 | 1878 | 443 | 0.84 | 0.67 | 0.48 | 2.6:1 | 1:4.2 |
| PolyPhen-2 | 2387 | 4177 | 1558 | 358 | 0.87 | 0.73 | 0.56 | 3.2:1 | 1:5.6 | |
| REVEL | 2394 | 5445 | 290 | 351 | 0.87 | 0.95 | 0.83 | 17.2:1 | 1:7.4 | |
| GAVIN | 2615 | 5611 | 124 | 130 | 0.95 | 0.98 | 0.93 | 44.1:1 | 1:20.7 | |
| ClinPred | 2469 | 5731 | 4 | 276 | 0.90 | 1.00 | 0.93 | 1289.6:1 | 1:9.9 | |
| Consensus | SIFT+PolyPhen-2 | 2240 | 3410 | 2325 | 505 | 0.82 | 0.59 | 0.39 | 2:1 | 1:3.2 |
| REVEL+ClinPred | 2233 | 5442 | 293 | 512 | 0.81 | 0.95 | 0.78 | 15.9:1 | 1:5.1 | |
| (B) Clinical dataset | ||||||||||
| Individual | SIFT | 1031 | 212 | 406 | 108 | 0.91 | 0.34 | 0.31 | 1.38:1 | 1:3.62 |
| PolyPhen-2 | 1021 | 211 | 407 | 118 | 0.90 | 0.34 | 0.29 | 1.36:1 | 1:3.3 | |
| REVEL | 983 | 370 | 248 | 156 | 0.86 | 0.60 | 0.48 | 2.15:1 | 1:4.37 | |
| GAVIN | 1100 | 157 | 461 | 39 | 0.97 | 0.25 | 0.33 | 1.29:1 | 1:7.42 | |
| ClinPred | 1107 | 167 | 451 | 32 | 0.97 | 0.27 | 0.36 | 1.33:1 | 1:9.62 | |
| Consensus | SIFT+PolyPhen-2 | 960 | 135 | 483 | 179 | 0.84 | 0.22 | 0.08 | 1.08:1 | 1:1.39 |
| REVEL+ClinPred | 973 | 142 | 476 | 166 | 0.85 | 0.23 | 0.11 | 1.11:1 | 1:1.58 | |
For consensus-based results, non-concordant, where tools disagree on the classification, were considered incorrect. Matthews correlation coefficient (MCC) was calculated as follows:
LR+ is the positive likelihood ratio; LR− is the negative likelihood ratio.
FN, false negatives (ie, pathogenic variants predicted to be benign); FP, false positives (ie, benign variants predicted to be pathogenic); TN, true negatives (ie, benign variants predicted to be benign); TP, true positives (ie, pathogenic variants predicted to be pathogenic).
Figure 3Violin plot showing variant scores for SIFT, PolyPhen-2, REVEL and ClinPred using two datasets. Open dataset—blue; clinical dataset—red; pathogenic variants—filled; benign variants—unfilled. Plot was generated in R using the 'vioplot' function in the 'vioplot' library. For ease of comparison, SIFT scores have been inverted.
Figure 4Receiver operating characteristic (ROC) curves for SIFT, PolyPhen-2, REVEL and ClinPred using two datasets. Open dataset—blue; clinical dataset—red. Generated in R using the ‘roc’ and ‘plot.roc’ functions in the ‘pROC’ library. Area under the ROC curve (AUC) was calculated in R using the ‘roc’ function. For ease of comparison, SIFT scores have been inverted.
Figure 5Concordance between tools separated by dataset and classification (pathogenic and benign). Open dataset—blue; clinical dataset—red; pathogenic variants—top graph; benign variants—bottom graph. True concordance indicates that the tools agree and were correct. False concordance indicates that the tools agree but were incorrect. Discordance indicates that the tools disagreed on the classification.