| Literature DB >> 23842810 |
Nicola D Roberts1, R Daniel Kortschak, Wendy T Parker, Andreas W Schreiber, Susan Branford, Hamish S Scott, Garique Glonek, David L Adelson.
Abstract
MOTIVATION: With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors. Four recently published algorithms for the detection of somatic SNV sites in matched cancer-normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka. In this analysis, we apply these four SNV calling algorithms to cancer-normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient. The candidate SNV sites returned by each algorithm are filtered to remove likely false positives, then characterized and compared to investigate the strengths and weaknesses of each SNV calling algorithm.Entities:
Mesh:
Year: 2013 PMID: 23842810 PMCID: PMC3753564 DOI: 10.1093/bioinformatics/btt375
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Frequency distribution of probability scores for somatic candidates in the raw output from the CML exome, with sites unique to each caller in a light shade and sites returned by multiple callers in a dark shade. Note that gaps between SomaticSniper and Strelka frequency peaks are an artefact due to the phred scaling used by these tools
Fig. 2.Probability scores of somatic candidates in common between pairs of algorithms for the CML exome. Pearson correlation coefficients between pairs are VS&SS 0.50, VS&JS 0.59, VS&ST 0.42, SS&JS 0.23, SS&ST 0.21 and JS&ST 0.46
Pass rates (%) of candidate sites (somatic and LOH) through the strand bias filter and the combination of all other filters
| Algorithm name | Strand bias | All other filters |
|---|---|---|
| VarScan | 50.3 | 58.3 |
| SomaticSniper | 50.4 | 66.2 |
| JSM2 | 45.8 | 47.9 |
| Strelka | 70.9 | 89.9 |
The combined filters for variant base and mapping quality, nearby SNVs, spanning deletions and adjacent indels were applied after the removal of sites with 100% strand bias.
All differences are significant, except between VarScan and SomaticSniper with the strand bias filter.
Fig. 3.Overlaps between somatic SNV candidate sets in the filtered output for the CML exome
Fig. 4.Proportion of somatic sites found by multiple callers as the probability score threshold of each caller is increased to 1.0 and the number of candidate sites reduces
Fig. 5.Proportion of somatic candidates present in dbSNP as the probability score threshold of each caller is increased to 1.0 and the number of candidate sites reduces
Fig. 6.The proportion of total depth contributed by the most common variant base in the cancer (smooth lines) and normal (jagged lines) for somatic sites uniquely returned by VarScan (red), SomaticSniper (green), JSM2 (orange) and Strelka (blue). The horizontal axis is the scaled index of each site after sorting by variant proportion in the cancer (scaled index chosen for comparisons across different sample sizes)