| Literature DB >> 31375105 |
Dimitrios Kleftogiannis1,2, Marco Punta1, Anuradha Jayaram3, Shahneen Sandhu4, Stephen Q Wong4, Delila Gasi Tandefelt5, Vincenza Conteduca6, Daniel Wetterskog3, Gerhardt Attard7, Stefano Lise8.
Abstract
BACKGROUND: Targeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs).Entities:
Keywords: Cancer genomics; Deep sequencing; Error correction; Ion torrent; Liquid biopsies; Next generation sequencing (NGS); Targeted sequencing; Variant calling
Mesh:
Substances:
Year: 2019 PMID: 31375105 PMCID: PMC6679440 DOI: 10.1186/s12920-019-0557-9
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Graphical representation of AmpliSolve’s workflow for estimating the noise levels and detecting SNVs. The workflow comprises the following steps: a Screening the available normal samples to identify reads supporting alleles other than the reference. b Error estimation per position, per nucleotide and per strand for all positions in the gene panel based on the distribution of alternative allele counts in (a); only alternative counts corresponding to VAF < 5% are taken into consideration; c For each genomic position in a tumor sample, the method identifies the total coverage of the position and the number of reads supporting the alternative alleles, if any. d Given the information from steps b and c the method applies a Poisson distribution-based model to compute the p-value that the variant (red line) is real. This p-value is then transformed to a quality score that is used by AmpliSolve together with additional quality criteria to identify SNVs
Fig. 2Assessing AmpliSolve’s performance using normal samples. a Median AmpliSolve FDR (%) as a function of the model pseudo-count parameter, when using different numbers M of normal samples as training set and testing on the remaining normal samples. We consider as TP all normal variants with VAF ≥ 20% and as FP all normal variants with VAF < 20% (see Text). We consider all AmpliSolve calls that have Q-score ≥ 20. b Same as (a) when considering only AmpliSolve calls with a ‘PASS’ quality flag (see Text)
Fig. 3Assessing AmpliSolve’s sensitivity using synthetic data. a-e AmpliSolve TPR (Sensitivity) values in in-silico synthetic variant experiments. We test different combinations of VAF, depth of coverage and C parameter values (see Text)
Fig. 4Benchmarking AmpliSolve calls with Illumina WGS calls. a Venn diagram of mutations on 10 samples sequenced with both Ion Torrent and Illumina platforms and called respectively by AmpliSolve and by MutPlat. Low coverage positions denote mutations excluded by AmpliSolve because poorly covered (< 100 reads on at least one strand, ‘uncallable’ by AmpliSolve). b Scatter plot of VAFs in WGS and AmpliSeq data. Note that all the SNVs not called by AmpliSolve (green point) have some support in the data and are reported in its output (hence they have AF > 0) but are filtered out, mostly because of strand bias. c Same as (b) but for VAFs< 20%. Note that some concordant calls (purple points) have WGS AF = 0; these are real germline variants with no support in the tumor (Methods). For the sake of this comparison, both in (b) and in (c) we don’t consider the 49 mutations at positions of low coverage in Ion Ampliseq data (see (a)) (‘uncallable’ for AmpliSolve)
Comparison between AmpliSolve and SiNVICT calls across the targeted panel. MutPlat calls on Illumina WGS data have been used as ground-truth. SiNVICT levels correspond to confidence levels in the calls (6 being the highest). TP = True Positives, FP=False Positives, FN = False Negatives, TPR = True Positives Rate (Sensitivity), PPV=Positive Predictive Value (Precision), F1 = Harmonic mean of Precision and Sensitivity
| TP | FP | FN | TPR | PPV | F1 | ||
|---|---|---|---|---|---|---|---|
| AmpliSolve | 525 | 31 | 78 | 87% | 94% | 90% | |
| SiNVICT | Level 1 | 591 | 156 | 12 | 98% | 79% | 88% |
| Level 2 | 587 | 154 | 16 | 97% | 79% | 87% | |
| Level 3 | 575 | 34 | 28 | 95% | 94% | 95% | |
| Level 4 | 575 | 34 | 28 | 95% | 94% | 95% | |
| Level 5 | 141 | 12 | 457 | 24% | 92% | 38% | |
| Level 6 | 104 | 3 | 494 | 17% | 97% | 29% | |
Fig. 5Validating AmpliSolve performance with ddPCR experiments. a Venn diagram of mutations in 96 samples at 3 positions as determined by AmpliSolve and ddPCR experiments. False positives refer to variants called by AmpliSolve and not detected by ddPCR, false negatives the opposite. In 256 out of 288 cases neither AmpliSolve nor ddPCR detect a mutation. b Scatter plot of the VAFs in the ddPCR and Ion Torrent data. Most of the SNVs missed by AmpliSolve (green points) have some support in the NGS data but they cannot be distinguished from noise. Because of the log scale, we arbitrarily set AF = 10− 4 for negative calls with AF = 0. Similarly, we set AF = 1 for ddPCR calls for which no allele frequency information is available
Comparison of SNV calling on 96 samples at 3 genomics positions. The 3 positions on the AR gene were screened by ddPCR used here as ground-truth. SINVICT Levels 5 and 6 and Levels 1, 2, 3 and 4 have been grouped as they give the same results. TP = True Positives, FP=False Positives, FN = False Negatives, TPR = True Positives Rate (Sensitivity), PPV=Positive Predictive Value (Precision), F1 = Harmonic mean of Precision and Sensitivity
| TP | FP | FN | TPR | PPV | F1 | ||
|---|---|---|---|---|---|---|---|
| AmpliSolve | 19 | 2 | 11 | 63% | 90% | 74% | |
| SiNVICT | Levels 1,2,3,4 | 17 | 0 | 13 | 57% | 100% | 73% |
| Levels 5,6 | 9 | 0 | 21 | 30% | 100% | 46% | |
| deepSNV | 15 | 3 | 15 | 50% | 83% | 62% | |