| Literature DB >> 30212499 |
Dajun Qian1, Shuwei Li1, Yuan Tian1, Jacob W Clifford1, Brice A J Sarver1, Tina Pesaran1, Chia-Ling Gau1, Aaron M Elliott1, Hsiao-Mei Lu1, Mary Helen Black1.
Abstract
There is a growing need to develop variant prediction tools capable of assessing a wide spectrum of evidence. We present a Bayesian framework that involves aggregating pathogenicity data across multiple in silico scores on a gene-by-gene basis and multiple evidence statistics in both quantitative and qualitative forms, and performs 5-tiered variant classification based on the resulting probability credible interval. When evaluated in 1,161 missense variants, our gene-specific in silico model-based meta-predictor yielded an area under the curve (AUC) of 96.0% and outperformed all other in silico predictors. Multifactorial model analysis incorporating all available evidence yielded 99.7% AUC, with 22.8% predicted as variants of uncertain significance (VUS). Use of only 3 auto-computed evidence statistics yielded 98.6% AUC with 56.0% predicted as VUS, which represented sufficient accuracy to rapidly assign a significant portion of VUS to clinically meaningful classifications. Collectively, our findings support the use of this framework to conduct large-scale variant prioritization using in silico predictors followed by variant prediction and classification with a high degree of predictive accuracy.Entities:
Mesh:
Year: 2018 PMID: 30212499 PMCID: PMC6136750 DOI: 10.1371/journal.pone.0203553
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Algorithm modules of multifactorial model analysis for variant prediction and classification.
(A) Modules of Bayesian multifactorial model analysis for variant prediction and classification. SLR = stepwise logistic regression. (B) 5-tiered variant classification scheme based on the estimated 95% probability credible interval (PCI) of variant pathogenicity.
Fig 2Outcome of 5-tiered predicted classes in MGPT data.
(A) The proportions of predicted classes from gene-specific IVP model analysis in which each prediction was evaluated from a subset of 16 in silico predictors. The analysis was based on 1,161 class known missense variants in 10 genes using LOOCV. (B) The proportions of predicted classes from MVP model analysis in which each prediction aggregated a prior distribution from IVP model with the available evidence predictors. The analysis was based on 1,016 variants with any available evidence statistics.
In silico variant prediction in MGPT data.
| Method | No. by Predicted Outcomes | Performance Statistics | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | VUS | Sen | Spe | PPV | NPV | Acc | AUC | PVUS | |
| Standalone predictor: | ||||||||||||
| MutPred | 101 | 496 | 1 | 17 | 546 | 0.244 | 0.990 | 0.967 | 0.971 | |||
| phyloP vertebrate | 75 | 438 | 6 | 15 | 627 | 0.181 | 0.586 | 0.926 | 0.967 | 0.961 | 0.915 | 0.540 |
| MutationAssessor | 71 | 315 | 0 | 9 | 766 | 0.171 | 0.422 | 0.972 | 0.977 | 0.891 | 0.660 | |
| FATHMM | 99 | 276 | 2 | 6 | 778 | 0.239 | 0.369 | 0.980 | 0.979 | 0.979 | 0.884 | 0.670 |
| AGVGD | 217 | 0 | 18 | 0 | 926 | 0.000 | 0.923 | NA | 0.923 | 0.878 | 0.798 | |
| Polyphen2 HVAR | 0 | 414 | 0 | 21 | 726 | 0.000 | 0.554 | NA | 0.952 | 0.952 | 0.878 | 0.625 |
| Siphy | 0 | 277 | 0 | 5 | 879 | 0.000 | 0.371 | NA | 0.863 | 0.757 | ||
| LRT | 0 | 291 | 0 | 8 | 862 | 0.000 | 0.390 | NA | 0.973 | 0.973 | 0.862 | 0.742 |
| GERP++ | 0 | 329 | 0 | 12 | 820 | 0.000 | 0.440 | NA | 0.965 | 0.965 | 0.860 | 0.706 |
| Polyphen2 HDIV | 0 | 404 | 0 | 26 | 731 | 0.000 | 0.541 | NA | 0.940 | 0.940 | 0.859 | 0.630 |
| SIFT | 0 | 315 | 0 | 12 | 834 | 0.000 | 0.422 | NA | 0.963 | 0.963 | 0.856 | 0.718 |
| PROVEAN | 50 | 102 | 1 | 2 | 1,006 | 0.121 | 0.137 | 0.980 | 0.981 | 0.981 | 0.855 | 0.866 |
| phastCons mammalian | 0 | 361 | 0 | 14 | 786 | 0.000 | 0.483 | NA | 0.963 | 0.963 | 0.838 | 0.677 |
| phastCons vertebrate | 0 | 455 | 0 | 10 | 696 | 0.000 | 0.609 | NA | 0.978 | 0.978 | 0.836 | 0.599 |
| phyloP mammalian | 0 | 213 | 0 | 5 | 943 | 0.000 | 0.285 | NA | 0.977 | 0.977 | 0.742 | 0.812 |
| Grantham | 0 | 0 | 0 | 0 | 1,161 | 0.000 | 0.000 | NA | NA | NA | 0.662 | 1.000 |
| Meta-predictor: | ||||||||||||
| IVP | 148 | 489 | 0 | 4 | 520 | 0.655 | ||||||
| REVEL | 90 | 475 | 4 | 20 | 572 | 0.217 | 0.636 | 0.957 | 0.960 | 0.959 | 0.942 | 0.493 |
| MetaSVM | 0 | 518 | 0 | 16 | 627 | 0.000 | NA | 0.970 | 0.970 | 0.940 | 0.540 | |
| Eigen | 22 | 497 | 0 | 4 | 638 | 0.053 | 0.665 | 0.992 | 0.932 | 0.550 | ||
| Eigen PC | 12 | 506 | 0 | 5 | 638 | 0.029 | 0.677 | 0.990 | 0.990 | 0.927 | 0.550 | |
| CADD | 61 | 330 | 17 | 3 | 750 | 0.147 | 0.442 | 0.782 | 0.991 | 0.951 | 0.906 | 0.646 |
| MutationTaster | 0 | 517 | 0 | 13 | 631 | 0.000 | 0.692 | NA | 0.975 | 0.975 | 0.894 | 0.543 |
aThe in silico variant prediction analyses were evaluated from gene-specific prediction models from each of 16 standalone predictors and 7 meta-predictors, respectively, in which the IVP predictor was derived from the 16 standalone predictors. Each analysis was evaluated in MGPT data containing 1,161 missense variants (747 negatives, 414 positives) using LOOCV. Results are listed in descending order of AUC values among models using standalone predictors and meta-predictors, respectively.
bPredicted outcomes were derived from the predicted positive/negative categories and the known ClinVar consensus classes. TN = true negative, TP = true positive, FN = false negative, FP = false positive.
cPerformance statistics were reported as Sen = sensitivity, Spe = specificity, PPV, NPV, Acc = accuracy, AUC, and PVUS = proportion of variants classified as VUS. NA = not able to calculate. The best performance statistics among comparison in silico prediction methods are highlighted in bold.
Fig 3Comparison of AUC statistics of standalone and meta in silico predictors in MGPT data.
(A) AUC statistics of top 10 standalone in silico predictors. (B) AUC statistics of 7 meta in silico predictors. The analysis models in legend were listed in descending order of AUC values. Abbreviations: AUC = area under the receiver operating characteristic curve; IVP = in silico variant prediction; MGPT = multigene panel test.
Multifactorial variant prediction in MGPT data.
| Method and Data (n-, n+) | No. by Predicted Outcomes | Performance Statistics | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | VUS | Sen | Spe | PPV | NPV | Acc | AUC | PVUS | |
| Using all evidence data: | ||||||||||||
| Any evidence data (686, 330) | 263 | 519 | 2 | 0 | 232 | 0.797 | 0.757 | 0.992 | 1.000 | 0.997 | 0.998 | 0.228 |
| LR < 0.1 or > 10 (372, 226) | 210 | 358 | 2 | 0 | 28 | 0.929 | 0.962 | 0.991 | 1.000 | 0.996 | 0.999 | 0.047 |
| LR < 0.01 or > 100 (223, 74) | 74 | 223 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 |
| LR < 0.001 or > 1,000 (155, 23) | 23 | 155 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 |
| Using only auto-computed evidence: | ||||||||||||
| Any evidence data (618, 255) | 95 | 287 | 2 | 0 | 489 | 0.373 | 0.464 | 0.979 | 1.000 | 0.995 | 0.986 | 0.560 |
| LR < 0.1 or > 10 (225, 35) | 31 | 212 | 2 | 0 | 15 | 0.886 | 0.942 | 0.939 | 1.000 | 0.992 | 0.996 | 0.058 |
| LR < 0.01 or > 100 (174, 8) | 8 | 174 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 |
| LR < 0.001 or > 1,000 (107, 0) | 0 | 107 | 0 | 0 | 0 | NA | 1.000 | NA | 1.000 | 1.000 | NA | 0.000 |
aMultifactorial variant predictions by MVP model analyses were conducted using either all available evidence data, which includes all quantitative and qualitative evidence predictors (total variants = 1,016), or only the 3 auto-computed predictors from readily available databases: family history, co-occurrence and mutation hotspot indicator (total variants = 873), respectively. The prior model for the MVP analysis was constructed for each variant using LOOCV. The n- and n+ values refer to the numbers of negative and positive variants, respectively, in the analytical dataset. LR = total LR from all evidence statistics or auto-computed ones. The abbreviation terms of predicted outcomes and performance statistics are same as those in footnotes b and c of Table 1.