| Literature DB >> 33767343 |
James M Holt1, Melissa Kelly2, Brett Sundlof2, Ghunwa Nakouzi2, David Bick2, Elaine Lyon2.
Abstract
PURPOSE: Clinical genome sequencing (cGS) followed by orthogonal confirmatory testing is standard practice. While orthogonal testing significantly improves specificity, it also results in increased turnaround time and cost of testing. The purpose of this study is to evaluate machine learning models trained to identify false positive variants in cGS data to reduce the need for orthogonal testing.Entities:
Mesh:
Year: 2021 PMID: 33767343 PMCID: PMC8257489 DOI: 10.1038/s41436-021-01148-3
Source DB: PubMed Journal: Genet Med ISSN: 1098-3600 Impact factor: 8.822
Summary of trained models for Dragen-based pipeline.
| Variant/genotype | Best model | CV capture rate (%) | Final capture rate (%) | CV TP flag rate (%) | Final TP flag rate (%) |
|---|---|---|---|---|---|
| SNV—heterozygous | GradientBoosting | 99.76 + −0.18 | 99.58 | 12.78 + −2.26 | 12.20 |
| SNV—homozygous | EasyEnsemble | 99.94 + −0.14 | 99.75 | 17.25 + −2.07 | 17.40 |
| SNV—complex heterozygous | — | — | — | — | — |
| Indel—heterozygous | GradientBoosting | 99.62 + −0.26 | 99.68 | 43.11 + −3.35 | 43.41 |
| Indel—homozygous | GradientBoosting | 99.78 + −0.27 | 99.50 | 55.65 + −4.16 | 55.16 |
| Indel—complex heterozygous | GradientBoosting | 99.86 + -0.14 | 99.60 | 53.45 + −5.65 | 54.22 |
For each variant–genotype combination, the following table reflects the best model for our criteria, the cross-validation (CV) mean and standard deviation for capture rate and true positive (TP) flag rate, and final evaluation for capture rate and TP flag rate.
SNV single-nucleotide variant.
Summary of retrospective variant analysis.
| Variant/genotype | Confirmed true calls | False calls | False calls captured | True calls flagged | Model TP flag rate |
|---|---|---|---|---|---|
| SNV—heterozygous | 176 | 0 | — | 29 (16.48%) | 12.20% |
| SNV—homozygous | 34 | 0 | — | 1 (2.94%) | 17.40% |
| SNV—complex heterozygous | 0 | 0 | — | — | — |
| Indel—heterozygous | 20 | 2 | 2 (100.00%) | 5 (25.00%) | 43.41% |
| Indel—homozygous | 0 | 0 | — | — | 55.16% |
| Indel—complex heterozygous | 0 | 0 | — | — | 54.22% |
Here we report the total number of variants confirmed to be true positive (TP) or false positive calls, the number of false positive calls correctly identified (capture rate), and the number of true calls incorrectly labeled as false calls (TP flag rate). The model TP flag rate (i.e., expected TP flag rate) from the final evaluation is also provided here for comparison. Models used for this analysis were generated from the Dragen-based pipeline.
SNV single-nucleotide variant.
Summary of prospective variant predictions.
| Variant/genotype | Predicted false positive calls | Predicted true positive calls | Orthogonal order reduction |
|---|---|---|---|
| SNV—heterozygous | 29 | 164 | 84.97% |
| SNV—homozygous | 1 | 34 | 97.14% |
| SNV—complex heterozygous | 0 | 0 | — |
| Indel—heterozygous | 6 | 18 | 75.00% |
| Indel—homozygous | 0 | 0 | — |
| Indel—complex heterozygous | 0 | 0 | — |
| Overall | 36 | 216 | 85.71% |
This table details the outcome of the use of the models in clinical cases. It shows the total number of variants that were predicted to be false positive or true positive in the clinical cases along with the percentage of variants that were not sent for orthogonal confirmation.
SNV single-nucleotide variant.
Summary of clinical approaches.
| Category | All variants | GIAB benchmark only | Nonactionable only | GIAB benchmark + nonactionable only |
|---|---|---|---|---|
| Risk of reporting false positive | Higher | Lower | Lower | |
| Risk of adverse impact on patient care | Higher | Higher | Lower | |
| Not eligible—no passing model | 6 | 6 | 6 | |
| Not eligible—outside benchmark region AND actionable | N/A | N/A | 4 | |
| Not eligible—actionable | N/A | N/A | 44 | |
| Not eligible—outside benchmark region | N/A | 88 | 84 | |
| Eligible—predicted true | 240 | 164 | 141 | |
| Eligible—predicted false | 60 | 48 | 27 | |
| Confirmation order rate | 21.57% | 46.41% | 53.92% |
This table summarizes the prospective results for all variants (n = 306) under different clinical approaches. The methods are organized from highest risk of reporting a false positive to lowest, where reporting actionable variants without confirmation is considered highest risk. Approaches that allow the models to be applied to any variant interpretation (specifically primary or actionable) have a higher risk for adverse impact on patient care. Approaches that allow for variants from any genomic region (specifically outside Genome in a Bottle [GIAB] benchmark regions) have a higher risk of reporting a false positive. Variants that are classified as “not eligible” either did not have a validated model or require confirmation test due to the approach. Confirmation order rate is the percentage of variants that are either not eligible or predicted false, indicating that a confirmation test would be ordered for that variant prior to reporting. The results from our clinical approach (nonactionable only) are emphasized.