| Literature DB >> 33832433 |
Jeffrey N Dudley1, Celine S Hong2, Marwan A Hawari1, Jasmine Shwetar1, Julie C Sapp1, Justin Lack3,4, Henoke Shiferaw1, Jennifer J Johnston1, Leslie G Biesecker1.
Abstract
BACKGROUND: The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging.Entities:
Keywords: Mosaic variants; Prediction of mosaic variants; Somatic overgrowth disorder
Mesh:
Substances:
Year: 2021 PMID: 33832433 PMCID: PMC8028235 DOI: 10.1186/s12859-021-04090-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Overview of approach. a Model building from a control dataset. Nucleotide counts from each BAM file in the control dataset were used to build the model. Nucleotides (d) are summed at each position (n) and aggregated across all BAM files in the control dataset (N). Position-Based Variant Identification (PBVI) splits counts into two sperate matrices dependent on whether the count is on the forward (f) or reverse (r) strand. b Variant calling overview and workflow
Sensitivity and proxy positive predictive value (pPPV) summarized for each caller at varying depths. The pPPV values are shown in parenthesis
| Simulation 1 | Simulation 2 | Simulation 3 | Mean | SE | |
|---|---|---|---|---|---|
| 150× | 0.46 (0.90) | 0.44 (0.88) | 0.45 (0.88) | 0.45 (0.89) | 0.01 (0.01) |
| 300× | 0.71 (0.92) | 0.72 (0.92) | 0.70 (0.93) | 0.71 (0.92) | 0.01 (0.00) |
| 600× | 0.87 (0.91) | 0.85 (0.88) | 0.89 (0.92) | 0.87 (0.90)_ | 0.01 (0.01) |
| 1200× | 0.96 (0.88) | 0.96 (0.80) | 0.98 (0.78) | 0.97 (0.82) | 0.01 (0.03) |
| 150× | 0.55 (0.82) | 0.36 (0.76) | 0.31 (0.79) | 0.41 (0.79) | 0.07 (0.02) |
| 300× | 0.74 (0.87) | 0.55 (0.83) | 0.46 (0.85) | 0.58 (0.85) | 0.08 (0.01) |
| 600× | 0.84 (0.89) | 0.71 (0.86) | 0.60 (0.88) | 0.71 (0.87) | 0.07 (0.01) |
| 1200× | 0.93 (0.87) | 0.77 (0.87) | 0.67 (0.89) | 0.79 (0.87) | 0.08 (0.01) |
| 150× | 0.89 (0.45) | 0.82 (0.40) | 0.84 (0.39) | 0.84 (0.42) | 0.02 (0.02) |
| 300× | 0.90 (0.77) | 0.86 (0.71) | 0.86 (0.73) | 0.87 (0.74) | 0.01 (0.02) |
| 600× | 0.87 (0.87) | 0.85 (0.83) | 0.83 (0.87) | 0.85 (0.86) | 0.01 (0.01) |
| 1200× | 0.87 (0.87) | 0.85 (0.83) | 0.83 (0.87) | 0.85 (0.86) | 0.01 (0.01) |
Fig. 2Sensitivity comparison at different depths. a The average sensitivity and standard error is calculated and plotted for different depths tested. b The average total number of variant calls and standard error is plotted. c The proxy positive predictive value (pPPV) and standard error is plotted for different depths tested
Fig. 3Sensitivity plot for different depths (DP) and Variant Allele Fractions (VAFs). a–d Different depths tested. DP150, DP300, DP600, DP1200 indicates sensitivity plots for 150×, 300×, 600×, and 1200× depths, respectively. The VAF range is indicated as lower bound (inclusive)- upper bound (exclusive)
Fig. 4Simulation of model variables. a Classification performance on simulated dataset of low-level variants as function of model size. b Minimum variant allele fraction (VAF) as function of alternate nucleotides present in model. The line color represents depth in the test sample
Fig. 5Minimum Variant Allele Fraction (VAF) for positions in PIK3CA with reference nucleotide cytosine required for Position-Based Variant Identification using the OverGrowth (PBVI-OG) model at 600× test sample depth. a Minimum VAF callable at every position where the reference nucleotide is cytosine and for each alternate nucleotide was calculated and plotted for the model. Blue (adenine), green (guanine), yellow (thymine). b Boxplot of minimum VAF detected for all nucleotide changes