| Literature DB >> 32245364 |
Leilei Wu1, Qinfang Deng2, Ze Xu3, Songwen Zhou4, Chao Li5,6, Yi-Xue Li7,8,9.
Abstract
BACKGROUND: Hybrid capture-based next-generation sequencing of DNA has been widely applied in the detection of circulating tumor DNA (ctDNA). Various methods have been proposed for ctDNA detection, but low-allelic-fraction (AF) variants are still a great challenge. In addition, no panel-wide calling algorithm is available, which hiders the full usage of ctDNA based 'liquid biopsy'. Thus, we developed the VBCALAVD (Virtual Barcode-based Calling Algorithm for Low Allelic Variant Detection) in silico to overcome these limitations.Entities:
Keywords: Low-AF SNV; Panel-wide calling algorithm; Stereotypical noise; Stochastic noise; Virtual barcode; ctDNA
Mesh:
Substances:
Year: 2020 PMID: 32245364 PMCID: PMC7118954 DOI: 10.1186/s12859-020-3412-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flowchart of the virtual barcode based calling algorithm for low allelic variant detection
Fig. 2Comparison between the virtual tag and real tag using 3 Oncosmart2 UMI samples. a Mean family numbers and corresponding SD from 10 samples (20,000 random genomic positions per sample) obtained using the UMI alone (blue bar), the real tag (orange bar) and the virtual tag (red bar). b Significant linear relationship between virtual family numbers and real family numbers for 20,000 genomic positions (R2 = 1.0). c Recovery rate distribution for real family numbers at 20,000 genomic positions. d Percentage of wrongly clustered virtual family calculated by different assigned numbers of real family. e Comparison between f = 1.0 virtual family numbers (orange bar) and f = 1.0 real family numbers (blue bar) among six positive sites in one UMI sample with an AF level of 0.1%. f Mean fraction and corresponding SD of the panel-wide error-free position before (green bar) and after application of the real tag (blue bar) and the virtual tag (red bar) in 3 UMI samples
Fig. 3Noise profile among the 30 background samples (BGs) before and after application of the virtual barcode. a Panel-wide error position percentage in every BG before and after application of the virtual barcode (Oncosmart2 BGs: blue square to red square; Oncosmart3 BGs: gradient blue to gradient yellow). b Numbers of non-reference alleles at six positive sites among the top 7 high-sequence-depth controls at six positive sites. c Numbers of the variant f = 1.0 virtual family at six positive sites among the top 7 high-sequence-depth controls. d Significant linear relationship between the panel-wide mean depth and the panel-wide error position percentage among 30 BGs (green dot: 16 RSDs; red dot: 14 controls; R2 = 85.56%). e Relationship between the fraction of the error position with f = 1.0 virtual family and the panel-wide mean depth among 30 BGs after application of the virtual barcode (16 RSDs, green dots; 14 controls, red dots). A significant linear relationship was observed in the 16 RSDs (R2 = 80.82%). f Boxplot of family degree for 11 Oncosmart2 RSDs, 5 Oncosmart3 RSDs, 14 Oncosmart2 controls, HWT samples and 2 tumor samples. Compared with controls, the Oncosmart2 and Oncosmart3 RSDs had significantly higher family degrees; *** means P < 0.001. g Boxplot of panel-wide median family size between 14 controls and 16 RSDs; *** means P < 0.001. (H) Significantly higher error percentage in the 16 high-template RSDs (blue line) than in 14 low-template controls (red line)
Fig. 4Stereotypical noise characteristics and effectiveness of fine-tuning filters. a Stereotypical site numbers from 14 Oncosmart2 controls and 11 Oncosmart2 RSDs: 121 shared sites among the 14 Oncosmart2 controls and 11 Oncosmart2 RSDs (brown region), nine sites from only from the controls (red region), and 135 sites only from the RSDs (green region). b Significant linear relationship between the incidence rate in 25 Oncosmart2 BGs and the incidence rate in 529 Oncosmart2 cfDNA samples among 121 shared polishing sites (R2 = 67.8%). c Percentages of 12 substitution types among 265 polishing sites (blue bar) and 121 shared sites (red bar). d Fraction of positions that completely consisted of false families (orange bar) at stochastic f = 1.0 site in every Oncosmart2 BG sample. e Direct correlation between family degree and panel-wide singleton ratio among all samples (dashed line represents 2.0). f Significant linear relationship (R2 = 96%) between panel-wide singleton ratio and mean variant singleton ratio from high-AF sites (AF > =0.05) among 30 BGs. g Effectiveness of sample-level strategy to remove variant singleton ratio outliners at the FDR < 0.01 level for all samples; Blue bar: filtered numbers; Orange bar: the corresponding mean variant singleton ratio. h ROC curve based on the optimal template feature (updated f = 1.0 virtual family numbers plus qualified variant singletons) at every AF level under a theoretical confidence level ranging from 80 to 99.5%
Information on the best distribution among 265 polishing sites
| Distributions | Best numbers | Percentage (%) | Mean sample size | Sample size range |
|---|---|---|---|---|
| Dweibull | 11 | 4.15 | 21.181818 | 8~95 |
| Lognorm | 18 | 6.79 | 108.052632 | 19~354 |
| Alpha | 19 | 7.17 | 104.157895 | 8~475 |
| Exponnorm | 24 | 9.06 | 122.791667 | 21~545 |
| Weibull_min | 25 | 9.43 | 89.16 | 7~550 |
| Nct | 27 | 10.19 | 173.962963 | 9~514 |
| Gamma | 33 | 12.45 | 139.757576 | 8~525 |
| Beta | 39 | 14.72 | 139.794872 | 6~479 |
| Johnsonsu | 69 | 26.04 | 109.115942 | 8~529 |
Fig. 5Systematic evaluation of the effectiveness of all filters. a, b Fraction of panel-wide error-free positions in the 14 Oncosmart2 controls and 11 Oncosmart2 RSDs obtained with each filter. c Numbers of false-positive sites retained among 11 Oncosmart2 RSDs. d–h Panel-wide sensitivity and PPVs obtained with our algorithm (red circles) and five published calling algorithms using Oncosmart2 RSDs with AF values ranging from 0.1 to 5%