| Literature DB >> 24250782 |
Karin S Kassahn1, Oliver Holmes, Katia Nones, Ann-Marie Patch, David K Miller, Angelika N Christ, Ivon Harliwong, Timothy J Bruxner, Qinying Xu, Matthew Anderson, Scott Wood, Conrad Leonard, Darrin Taylor, Felicity Newell, Sarah Song, Senel Idrisoglu, Craig Nourse, Ehsan Nourbakhsh, Suzanne Manning, Shivangi Wani, Anita Steptoe, Marina Pajic, Mark J Cowley, Mark Pinese, David K Chang, Anthony J Gill, Amber L Johns, Jianmin Wu, Peter J Wilson, Lynn Fink, Andrew V Biankin, Nicola Waddell, Sean M Grimmond, John V Pearson.
Abstract
Somatic mutation calling from next-generation sequencing data remains a challenge due to the difficulties of distinguishing true somatic events from artifacts arising from PCR, sequencing errors or mis-mapping. Tumor cellularity or purity, sub-clonality and copy number changes also confound the identification of true somatic events against a background of germline variants. We have developed a heuristic strategy and software (http://www.qcmg.org/bioinformatics/qsnp/) for somatic mutation calling in samples with low tumor content and we show the superior sensitivity and precision of our approach using a previously sequenced cell line, a series of tumor/normal admixtures, and 3,253 putative somatic SNVs verified on an orthogonal platform.Entities:
Mesh:
Year: 2013 PMID: 24250782 PMCID: PMC3826759 DOI: 10.1371/journal.pone.0074380
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Variant calling software tools.
| Software | Tumor normal joint analysis | Output germline variants | Indels | Statistical method | Reference |
| qSNP | X | X | empirically determined set of heuristics optimized for sensitivity in low purity tumors | present study | |
| GATK | n/a | X | Bayesian model for genotype likelihood, can take into account multiple samples for calibration |
| |
| Strelka | X | X | Bayesian model of tumor as a mixture of normal sample with somatic variation |
| |
| SomaticSniper | X | X | Bayesian comparison of genotype likelihoods based on MAQ genotype model |
| |
| diBayes | n/a | Bayesian model for presence of non-reference allele (color-space data) | Applied Biosystems BioScope™ | ||
| VarScan2 | X | X | X | heuristics to determine genotypes and Fisher's exact test to examine read count differences, also outputs CNV regions for exome data |
|
| SNVMix | n/a | probabilistic binomial mixture model accounting for tumor ploidy and purity |
|
Classification of germline and somatic events.
| Normal genotype | Tumor genotype | Details | Classification |
| Hom | Het | Variant is reference allele; G/G>A/G | Germline |
| Hom | Het | Variant novel; A/A>A/G | Somatic |
| Het | Hom | Tumor allele same; A/G>G/G | Germline |
| Het | Hom | Tumor allele different; A/G>T/T | Somatic |
| Hom | Hom | Same; G/G>G/G | Germline |
| Hom | Hom | Different; A/A>G/G | Somatic |
| Het | Het | Same; A/G>A/G | Germline |
| Het | Het | Different; A/G>T/G | Somatic |
All examples assume ‘A’ as the reference allele, ‘G’ as the variant, and ‘Hom’ and ‘Het’ denote homozygous and heterozygous respectively.
check coverage in normal to exclude under-calling.
could indicate LOH in tumor.
Post-processing checks performed by qSNP.
| Annotation | Variant type | Description |
| PASS | Somatic, Germline | (Passed all post-processing checks) AND (min 5 mutant reads) AND (min 4 novel starts not considering read pair) |
| COVN12 | Somatic | Less than 12 reads coverage in matched normal sample |
| COVN8 | Germline | Less than 8 reads coverage in matched normal sample |
| SAN3 | Germline | Less than 3 reads of same allele in normal |
| COVT8 | Germline | Less than 8 reads coverage in tumor |
| SAT3 | Germline | Less than 3 reads of same allele in tumor |
| GERM | Somatic | Mutation is a germline variant in another patient |
| MIN | Somatic | Mutation also found in pileup of normal BAM |
| MIUN | Somatic | Mutation also found in pileup of unfiltered normal BAM |
| NNS | Somatic, Germline | Less than 4 novel starts not considering read pair |
| MR | Somatic, Germline | Less than 5 variant reads |
| MER | Somatic | Mutation same as reference |
| SBIAS | Somatic | Strand bias (Illumina only) |
Details of verification using amplicon-based sequencing on the Ion Torrent.
| Verification across65primary pancreatic adenocarcinomas with mean tumor purity 38% (range 6 to 83%) | |
| Total verified somatic (TP) | 717 |
| qSNP pass calls | |
|
| 704 |
|
| 28 |
|
| 506 |
| Precision TP/(TP+FP) | 57% |
| Sensitivity TP/(TP+FN) | 98% |
Figure 1Non-independent reads confounding mutation calls.
Read pairs are colored by the chromosome map position of the second read in the pair. MarkDuplicates fails to correctly identify these non-independent read pairs as PCR duplicates due to the different map locations of the second read.
Controlled mixture experiment to assess the effect of reducing tumor purity on somatic mutation detection using the SOLiD v4 platform.
| Mixture (%tumor) | Cov. 80% | Mean cov. | qSNP | GATK | ||||
| VS | FP | U | VS | FP | U | |||
| 100 | 17× | 62.16 | 84 | 17 | 2 | 50 | 7 | 1 |
| 80 | 19× | 72.13 | 73 | 5 | 10 | 49 | 1 | 2 |
| 60 | 18× | 67.49 | 66 | 6 | 6 | 45 | 0 | 4 |
| 40 | 19× | 67.67 | 57 | 1 | 8 | 38 | 0 | 3 |
| 20 | 23× | 81.96 | 35 | 3 | 2 | 15 | 0 | 1 |
| 10 | 22× | 79.35 | 13 | 5 | 5 | 0 | 0 | 1 |
| 20 | 49× | 161.11 | 48 | 5 | 6 | 18 | 0 | 8 |
| 10 | 47× | 152.11 | 15 | 4 | 5 | 0 | 0 | 8 |
raw.vcf files were passed through qSNP post-processing checks outlined in Table 3 to remove likely false positives such as positions with evidence in the matched normal.
VS verified somatic; FP false positive; U untested.
Controlled mixture experiment to assess the effect of reducing tumor purity on somatic mutation detection using the HiSeq2000 platform.
| Mixture (%tumor) | Cov. 80% | Mean cov. | qSNP | GATK | Strelka | ||||||
| VS | FP | U | VS | FP | U | VS | FP | U | |||
| 100 | 26× | 61.43 | 82 | 1 | 72 | 80 | 1 | 72 | 77 | 1 | 66 |
| 80 | 19× | 43.05 | 77 | 0 | 60 | 76 | 0 | 57 | 75 | 2 | 57 |
| 60 | 17× | 40.57 | 65 | 1 | 45 | 62 | 1 | 39 | 60 | 2 | 44 |
| 40 | 18× | 43.36 | 60 | 0 | 45 | 55 | 0 | 30 | 56 | 1 | 45 |
| 20 | 22× | 51.83 | 47 | 0 | 22 | 37 | 0 | 14 | 48 | 1 | 26 |
.vcf files were passed through qSNP post-processing checks outlined in Table 3 to remove likely false positives such as positions with evidence in the matched normal.
calls from ‘pass’ category.
VS verified somatic; FP false positive; U untested.
Figure 2Overlap in somatic mutation calls.
Verified somatic mutation calls were compared across three callers in 5 different tumor purity mixtures. Values are number of calls in 100%, 80%, 60%, 40% and 20% tumor content mixture, from top to bottom.
Benchmarking qSNP on sequencing data from the SOLiD v4 and HiSeq 2000 platforms using COLO-829 variants verified by either WTSI (WTSI only, qSNP+WTSI) or QCMG (qSNP only).
| Caller | Details | SOLiD v4 | HiSeq 2000 | SOLiD v4 and HiSeq 2000 | ||||||
| VS | C | U | VS | C | U | VS | C | U | ||
|
| 381 | 33 | 23,544 | 385 | 39 | 23,660 | 333 | 30 | 19,276 | |
|
| <12× coverage in normal | 18 | 5 | 1,329 | 0 | 0 | 104 | 0 | 0 | 26 |
| mutation also in normal | 8 | 0 | 455 | 19 | 2 | 1,105 | 0 | 0 | 19 | |
| germline in another patient | 0 | 0 | 7 | 1 | 0 | 6 | 0 | 0 | 5 | |
| did not pass post-filters | 16 | 1 | 1,548 | 24 | 0 | 1,623 | 1 | 0 | 86 | |
| qSNP germline call | 0 | 0 | 24 | 0 | 0 | 63 | 0 | 0 | 10 | |
| no call - <3 reads evidence | 0 | 0 | 5,735 | 22 | 2 | 5,945 | 0 | 0 | 3,531 | |
| no call - other | 31 | 4 | 200 | 3 | 0 | 336 | 2 | 0 | 0 | |
|
| 25 | 0 | 6,486 | 26 | 0 | 13,098 | 22 | 0 | 2,674 | |
min 5 mutant reads and 4 novel starts not considering pair.
VS verified somatic; C cosmic; U untested.