Literature DB >> 23712659

Mutascope: sensitive detection of somatic mutations from deep amplicon sequencing.

Shawn E Yost¹, Hakan Alakus, Hiroko Matsui, Richard B Schwab, Kristen Jepsen, Kelly A Frazer, Olivier Harismendy.

Abstract

SUMMARY: We present Mutascope, a sequencing analysis pipeline specifically developed for the identification of somatic variants present at low-allelic fraction from high-throughput sequencing of amplicons from matched tumor-normal specimen. Using datasets reproducing tumor genetic heterogeneity, we demonstrate that Mutascope has a higher sensitivity and generates fewer false-positive calls than tools designed for shotgun sequencing or diploid genomes. AVAILABILITY: Freely available on the web at http://sourceforge.net/projects/mutascope/. CONTACT: oharismendy@ucsd.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene

Mesh：

Year: 2013 PMID： 23712659 PMCID： PMC3712217 DOI： 10.1093/bioinformatics/btt305

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

The accurate detection of somatic mutations in tumors is critical for precise diagnostic and selection of targeted therapies (Boyd, 2013), but the low-allelic fraction frequently encountered in heterogeneous or poor cellularity clinical specimens renders this task challenging. In current clinical assays, amplicons covering the exons of 10–100 cancer genes are amplified via polymerase chain reaction-based or analogous approaches and sequenced at high depth to identify mutations present in <5% of a DNA sample (Harismendy ). Despite high coverage depth, the error rate resulting from systematic sequencing bias (Harismendy ), can hinder the detection of mutations. Although experimental (Hiatt ) or analytical (McKenna ) methods, or comparison with the normal DNA (Cibulskis ; Koboldt ) can mitigate this effect, most analysis strategies were developed for sequencing of random shotgun DNA fragments, and thus do not take into account systematic errors specific to amplicon sequencing. In amplicon sequencing, loci are covered by reads with identical genomic starting positions, and because the error rate increases along the length of the read (Fig. 1a), a variable consensus error rate exists over the target (Fig. 1b). Analytical strategies specifically designed for amplicon sequencing have the potential to enhance the mutation detection accuracy of current clinical assays, especially at low-allelic fraction.

Fig. 1.

Mutascope principle and performance. (a) The sequencing error rate varies based on the read type (blue and red), position in the read (x-axis) or reference base sequenced (lines). (b) Paired reads (red and blue) from shotgun and amplicon sequencing distribute differently over the targeted region (gray box) resulting in different consensus error rates (right panel). (c–e) Comparison of 4–6 tools by ROC analysis showing the classification of mutations at low-allelic fraction (1–10%) in the MIX samples (c), after down-sampling reads to 50 or 10% of maximum coverage (d), or using 1 and 10% allelic fraction variants from TNS pairs. (f) Evolution of the true-positive rate and positive predicted value from the MIX sample low-allele frequency variants (1–10%) before (dotted line) and after (continuous line) application of high-confidence filters Here, we present Mutascope, a software dedicated to the detection of mutations at low-allelic fraction from amplicon sequencing of matched tumor-normal samples pairs. Mutascope determines the amplicon of origin for each read and measures the specific experimental error rate from sequencing the normal DNA. The mutations in the tumor are then identified by comparison with the error rate using a binomial statistics and classified as germ line or somatic by comparison with the normal DNA. A set of filters adapted to amplicon sequencing then eliminates false-positive calls. We used two experimental datasets, a mixture of 8 normal DNA (MIX) and a set of 80 tumor-normal spiked-in (TNS) pairs derived from 38 different normal germ line DNA samples to measure the performance of the approach in comparison with other mutation callers.

2 METHODS

Data generation: The data from the MIX sample or used in the preparation of the TNS pairs was generated using microdroplet polymerase chain reaction amplification (Harismendy ) of 1736 amplicons from 47 genes clinically actionable for breast cancer (Supplementary Methods), followed by high-throughput sequencing of 151 nt long paired-end reads on a MiSeq sequencer (Illumina, San Diego, CA, USA) resulting in 981-fold average coverage depth. The data are available through the Short Reads Archive (SRA) at the NCBI (SRA067609 and SRA067610). Analysis principle: Mutascope aligns the reads to the genome using BWA-SW algorithm (Li and Durbin, 2009). Multi-mapping reads, reads with a low Smith–Watterman score, or not aligning to the specified amplicons are removed. Mutascope then determines the amplicon of origin for each read and measures the error rate using the normal DNA sequencing, stratified by the major drivers of sequencing errors: nucleotide, position in the read and read type (forward or reverse). The allelic fraction of the mutation is compared with this error rate using a binomial test for significance; the mutations are then classified as germ line or somatic using a Fisher exact test. The germ line genotypes are determined using a Bayesian likelihood method. Finally, Mutascope filters out false-positive variants using, for example, read group bias, low-average mutant allele quality or predicted false positive from non-specific amplification. Benchmarking: The benchmarking was performed using ROCR package (Sing ). The prediction score used for the classification corresponded to the binomial P-value for Mutascope, somatic P-value for VarScan (Koboldt ), tumor Fstar LOD score for MuTect (Cibulskis ) and VCF quality score for LoFreq, GATK and Illumina MiSeq Reporter (McKenna ; Wilm ). All false-negative prediction scores were set to 0. All benchmarked tools relied on the same alignment performed by Mutascope, except for Illumina MiSeq reporter that performs its own alignment. Whenever allowed, each tool was run without extensive previous filtering to strictly compare the accuracy of the mutation detection. Complete methods are available as Supplementary Methods.

3 RESULTS

We benchmarked Mutascope against other mutation callers using sequencing data generated from a mixture of 8 normal DNA samples with known genotypes (MIX sample) resulting in ‘somatic mutations’ at variable allelic fraction. The classification of the 162 somatic mutations at low-allelic fraction (0.01–0.1) by Mutascope was more accurate than other standard tools (area under the curve: 0.97—Fig. 1c). Not surprisingly, tools designed to identify heterozygotes in diploid genomes were missing most mutations (GATK), whereas tools dedicated to tumor-normal pairs performed better (VarScan and MuTect). To estimate the impact of coverage depth, we selected reads from the MIX sample down to 50 or 10% (490 and 98×, respectively) of the maximum. As expected, the sensitivity decreased equally for all the tools considered (Fig. 1d). To expand the performance evaluation to additional mutations and experimental conditions, we prepared a set of 80 TNS pairs by mixing reads obtained from sequencing 38 normal DNA. Using these, we interrogated 402 unique ‘somatic mutations’ (between 17 and 55 per pair) at an allelic fraction of 0.01 or 0.1 (40 pairs each). Mutascope was more accurate to detect mutations at an allelic fraction of 0.1 rather than 0.01 (Fig. 1e), and in the former case, its performance was comparable with MuTect and superior to VarScan or LoFreq. Finally, we tested the effect of the empirical filters applied by each tool after the classification. These filters are important to eliminate false positives resulting from unpredictable sources of error and not accounted for by the statistical model. Although Mutascope’s filters, such as the read group bias and non-specific amplification, are specifically compatible with amplicons sequencing, we adjusted the parameters of the other tools to ensure a fair comparison, such as strand bias and minimum alternate allele frequency filters. When applied to the mutations at low-allelic fraction in the MIX samples, these filters increase the sensitivity and positive predictive value (Fig. 1f and Supplementary Discussion). The set of high-confidence filters from MuTect affects the sensitivity the most. This observation highlights synergies between Mutascope’s two core statistical components: the experimentally driven mutation detection (binomial test) and tumor-normal comparison (Fisher test) resulting in a superior performance. Therefore, by design, Mutascope specifically optimizes the mutation detection and filtering for deep amplicon sequencing. The resulting higher accurate detection of somatic mutations at low-allelic fraction increases utility in cancer molecular diagnostics.

10 in total

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043

2. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.

Authors: Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson
Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043

3. ROCR: visualizing classifier performance in R.

Authors: Tobias Sing; Oliver Sander; Niko Beerenwinkel; Thomas Lengauer
Journal: Bioinformatics Date: 2005-08-11 Impact factor: 6.937

4. Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation.

Authors: Joseph B Hiatt; Colin C Pritchard; Stephen J Salipante; Brian J O'Roak; Jay Shendure
Journal: Genome Res Date: 2013-02-04 Impact factor: 9.043

Review 5. Diagnostic applications of high-throughput DNA sequencing.

Authors: Scott D Boyd
Journal: Annu Rev Pathol Date: 2012-11-01 Impact factor: 23.472

6. Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing.

Authors: Olivier Harismendy; Richard B Schwab; Lei Bao; Jeff Olson; Sophie Rozenzhak; Steve K Kotsopoulos; Stephanie Pond; Brian Crain; Mark S Chee; Karen Messer; Darren R Link; Kelly A Frazer
Journal: Genome Biol Date: 2011-12-20 Impact factor: 13.583

7. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

8. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.

Authors: Andreas Wilm; Pauline Poh Kim Aw; Denis Bertrand; Grace Hui Ting Yeo; Swee Hoe Ong; Chang Hua Wong; Chiea Chuen Khor; Rosemary Petric; Martin Lloyd Hibberd; Niranjan Nagarajan
Journal: Nucleic Acids Res Date: 2012-10-12 Impact factor: 16.971

9. Evaluation of next generation sequencing platforms for population targeted sequencing studies.

Authors: Olivier Harismendy; Pauline C Ng; Robert L Strausberg; Xiaoyun Wang; Timothy B Stockwell; Karen Y Beeson; Nicholas J Schork; Sarah S Murray; Eric J Topol; Samuel Levy; Kelly A Frazer
Journal: Genome Biol Date: 2009-03-27 Impact factor: 13.583

10. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.

Authors: Kristian Cibulskis; Michael S Lawrence; Scott L Carter; Andrey Sivachenko; David Jaffe; Carrie Sougnez; Stacey Gabriel; Matthew Meyerson; Eric S Lander; Gad Getz
Journal: Nat Biotechnol Date: 2013-02-10 Impact factor: 54.908

10 in total

13 in total

1. Recurrent activating mutations of CD28 in peripheral T-cell lymphomas.

Authors: J Rohr; S Guo; J Huo; A Bouska; C Lachel; Y Li; P D Simone; W Zhang; Q Gong; C Wang; A Cannon; T Heavican; A Mottok; S Hung; A Rosenwald; R Gascoyne; K Fu; T C Greiner; D D Weisenburger; J M Vose; L M Staudt; W Xiao; G E O Borgstahl; S Davis; C Steidl; T McKeithan; J Iqbal; W C Chan
Journal: Leukemia Date: 2015-12-31 Impact factor: 11.528

2. BAP1 mutation is a frequent somatic event in peritoneal malignant mesothelioma.

Authors: Hakan Alakus; Shawn E Yost; Brian Woo; Randall French; Grace Y Lin; Kristen Jepsen; Kelly A Frazer; Andrew M Lowy; Olivier Harismendy
Journal: J Transl Med Date: 2015-04-16 Impact factor: 5.531

3. MADGiC: a model-based approach for identifying driver genes in cancer.

Authors: Keegan D Korthauer; Christina Kendziorski
Journal: Bioinformatics Date: 2015-01-07 Impact factor: 6.937

4. Fast and scalable inference of multi-sample cancer lineages.

Authors: Victoria Popic; Raheleh Salari; Iman Hajirasouliha; Dorna Kashef-Haghighi; Robert B West; Serafim Batzoglou
Journal: Genome Biol Date: 2015-05-06 Impact factor: 13.583

5. Tumor evolution and intratumor heterogeneity of an epithelial ovarian cancer investigated using next-generation sequencing.

Authors: Jung-Yun Lee; Jung-Ki Yoon; Boyun Kim; Soochi Kim; Min A Kim; Hyeonseob Lim; Duhee Bang; Yong-Sang Song
Journal: BMC Cancer Date: 2015-02-26 Impact factor: 4.430

6. Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals.

Authors: August Y Huang; Xiaojing Xu; Adam Y Ye; Qixi Wu; Linlin Yan; Boxun Zhao; Xiaoxu Yang; Yao He; Sheng Wang; Zheng Zhang; Bowen Gu; Han-Qing Zhao; Meng Wang; Hua Gao; Ge Gao; Zhichao Zhang; Xiaoling Yang; Xiru Wu; Yuehua Zhang; Liping Wei
Journal: Cell Res Date: 2014-10-14 Impact factor: 25.617

7. LoLoPicker: detecting low allelic-fraction variants from low-quality cancer samples.

Authors: Jian Carrot-Zhang; Jacek Majewski
Journal: Oncotarget Date: 2017-06-06

8. Identifying DNase I hypersensitive sites as driver distal regulatory elements in breast cancer.

Authors: Matteo D Antonio; Donate Weghorn; Agnieszka D Antonio-Chronowska; Florence Coulet; Katrina M Olson; Christopher DeBoever; Frauke Drees; Angelo Arias; Hakan Alakus; Andrea L Richardson; Richard B Schwab; Emma K Farley; Shamil R Sunyaev; Kelly A Frazer
Journal: Nat Commun Date: 2017-09-05 Impact factor: 17.694

9. Evaluation of ultra-deep targeted sequencing for personalized breast cancer care.

Authors: Olivier Harismendy; Richard B Schwab; Hakan Alakus; Shawn E Yost; Hiroko Matsui; Farnaz Hasteh; Anne M Wallace; Hannah L Park; Lisa Madlensky; Barbara Parker; Philip M Carpenter; Kristen Jepsen; Hoda Anton-Culver; Kelly A Frazer
Journal: Breast Cancer Res Date: 2013-12-10 Impact factor: 6.466

10. Analysis of amplicon-based NGS data from neurological disease gene panels: a new method for allele drop-out management.

Authors: Susanna Zucca; Margherita Villaraggia; Stella Gagliardi; Gaetano Salvatore Grieco; Marialuisa Valente; Cristina Cereda; Paolo Magni
Journal: BMC Bioinformatics Date: 2016-11-08 Impact factor: 3.169