Literature DB >> 27288500

Rapid and powerful detection of subtle allelic imbalance from exome sequencing data with hapLOHseq.

F Anthony San Lucas¹, Smruthy Sivakumar², Selina Vattathil², Jerry Fowler³, Eduardo Vilar⁴, Paul Scheet⁵.

Abstract

MOTIVATION: The detection of subtle genomic allelic imbalance events has many potential applications. For example, identifying cancer-associated allelic imbalanced regions in low tumor-cellularity samples or in low-proportion tumor subclones can be used for early cancer detection, prognostic assessment and therapeutic selection in cancer patients. We developed hapLOHseq for the detection of subtle allelic imbalance events from next-generation sequencing data.
RESULTS: Our method identified events of 10 megabases or greater occurring in as little as 16% of the sample in exome sequencing data (at 80×) and 4% in whole genome sequencing data (at 30×), far exceeding the capabilities of existing software. We also found hapLOHseq to be superior at detecting large chromosomal changes across a series of pancreatic samples from TCGA.
AVAILABILITY AND IMPLEMENTATION: hapLOHseq is available at scheet.org/software, distributed under an open source MIT license. CONTACT: pscheet@alum.wustl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27288500 PMCID： PMC5039922 DOI： 10.1093/bioinformatics/btw340

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

A critical mechanism by which cancer cells operate is through activation of oncogenes or inactivation of tumor suppressor genes. This may happen via acquired chromosomal alterations, such as amplification, deletion or copy-neutral loss-of-heterozygosity (cn-LOH), that result in allelic imbalance (AI). Because AI provides insights into the progression towards malignancy and metastasis, AI detection can be applied to help in cancer prognosis and therapeutic decision-making (Diep ). In the last decade, array comparative genomic hybridization (aCGH) and single-nucleotide polymorphism (SNP) array based approaches have become popular technologies for AI detection (Carter, 2007). More recently, the advent of next-generation sequencing (NGS) technologies for studies of cancer genomic variation (Zhao ) has brought the potential for finer resolution (detection and boundary refinement) due to a denser marker set and arbitrarily high coverage. However, costs limit depth of coverage through whole-genome sequencing (WGS) and therefore whole-exome sequencing (WES) is often preferred for genomic studies. WES presents a particular challenge for inferring AI, since coverage is variable across regions and the target is limited to ∼3% of the genome. Several tools model expected coverage based on summaries from a WES reference panel to account for technical variability in coverage and then infer copy number changes or AI. However, these approaches fail to detect subtle chromosomal AI events in NGS data—i.e. aberrations such as amplifications, deletions and cn-LOH occurring in a small fraction (< 30% for WES, <10% for WGS) of the DNA in a heterogeneous sample, scenarios highly relevant to comprehensive tumor profiling and diagnostics. For example, the median tumor cellularity for TCGA pancreatic adenocarcinoma (PDAC) samples is 53% (n = 186). Of these, 57 samples (31%) have tumor fractions of 30% or less, which is below AI detection levels of currently available tools (Supplementary Fig. S1). This gap is therefore significant, especially since tissue samples are often limited and additional surveys with complementary technologies such as aCGH may not be possible.

2 Methods

hapLOHseq is an application for detecting AI events in NGS data, motivated by the logic underlying a method for AI detection in SNP array data (Vattathil and Scheet, 2013). Inputs to hapLOHseq consist of a variant call format (VCF) file with an allelic depth (AD) field containing the read depths of the reference and alternate alleles, generated from either WGS or WES, and a set of haplotype estimates. The output from hapLOHseq is a list of putative AI regions of the genome built from a detailed report of probabilities for each heterozygous genotype residing in a region of AI. When assessing high-purity tumor samples for which paired normal samples are available for inferring germline genotypes, one can directly compare genotypes of the tumor and normal samples and clearly characterize copy number changes. However, when the sample contains a high proportion of normal cells (low tumor cellularity), the genotypes called from the tumor sample will reflect the germline rather than the tumor. To extract maximal information from the data, hapLOHseq leverages a lower-level data source: the allele-specific read counts. hapLOHseq achieves its power by capturing signals among multiple sites jointly (haplotype level) rather than relying on imbalances observed marginally at heterozygous sites. First, a user statistically estimates germline haplotypes from variant sites called in a paired normal sample, or the tumor sample itself when cellularity is low. (For convenience, the hapLOHseq download contains a companion phasing utility, allowing direct application to a single VCF.) The AI detection method then: (i) assesses similarity between the observed reference allele frequencies (RAFs) from the NGS data and the haplotypes; and (ii) identifies regions where this similarity achieves statistical significance, indicating haplotype, or allelic, imbalance. Similarity between the alleles in relative abundance and one of the germline haplotypes suggests an imbalance of a segment of an inherited chromosome due to, perhaps, acquired alterations. To assess the correlation, we first determine a putative ‘excess haplotype’ by applying a threshold to the RAFs at each marker independently in a ‘frequency-based phasing algorithm’. By default, the threshold is defined as the median variant allele frequency across the genome (but should be close to 0.5). The alleles with frequencies above the threshold constitute one putative haplotype. Where no imbalance exists, the RAF-based haplotype estimate reflects noise. Otherwise, where AI does exist, the RAF-based haplotypes should bear some resemblance to the statistically estimated haplotypes. This resemblance is quantified with phase concordance, a measure of similarity that accommodates errors in the statistical haplotype estimates. A hidden Markov model (HMM) is then applied to assess the spatial aggregation of markers with evidence for haplotype-RAF consistency and to compute a probability of regions of the genome being in AI (Vattathil and Scheet, 2013).

3 Results

We obtained the WGS and WES reads for the tumor and paired normal sample of a glioblastoma patient (TCGA-19-2620) from The Cancer Genome Atlas (TCGA) in addition to the SNP-array-based copy number and LOH calls published by the TCGA consortium (Brennan ). To assess the sensitivity of hapLOHseq, we created in silico mixtures of the reads from the tumor and normal BAM files at multiple ratios, mimicking varying tumor purities. The published purity estimate for the original tumor sample was 80%, which we accounted for in our simulations. We created in silico mixtures of 4, 8, 12, 16, 20, 28, 40, 56 and 80% tumor. We then applied hapLOHseq with default settings to the VCFs created from processing the mixed BAM files, first phasing haplotypes with MACH (Li ). Results are depicted in Figure 1. In WES data, hapLOHseq is able to detect large regions of AI at tumor purities of 16% and virtually all events at 40%. The sensitivity of hapLOHseq is greater on WGS data, with all events discovered at 8% purity and some events at a mere 4%. For comparison, we assessed performance of the following exome AI detection tools: ADTEx, FREEC and ExomeDepth (Amarasinghe ; Boeva ; Plagnol ). hapLOHseq outperformed these methods (Supplementary Fig. S1). ADTex, the best among the other methods, was able to detect events at 28% tumor purity, in-line with its published detectable limit of 30% (Amarasinghe ).

Fig. 1.

hapLOHseq discovers subtle AI in NGS. (A) Top panel shows AI events from a TCGA SNP array. Red lines show hapLOHseq evidence for AI in WES (16, 40%) and WGS (4, 8%) mixtures. (B) WES of two adenomas from an FAP patient; hapLOHseq detects AI on chr. 5q We also downloaded and analyzed the exome sequences of 12 tumor-normal pairs of TCGA PDAC samples using hapLOHseq and the other tools (Supplementary Fig. S2). hapLOHseq consistently outperformed these methods in its ability to efficiently identify chromosomal AI regions of the genome (Supplementary Figs S2 and S3). Finally, we applied hapLOHseq to WES experiments on two adenomas from a patient with familial adenomatous polyposis (FAP), a cancer syndrome resulting from a germline mutation in APC (5q), where the acquired second somatic mutation (or ‘hit’; Knudson, 1971) may be an LOH event (Galiatsatos and Foulkes, 2006). Figure 1 shows discovery of AI in both adenomas, where in the second sample there is no visual perturbation of the RAFs. In summary, hapLOHseq is able to detect AI in WES and WGS data at low cell fractions. Further, it does not require a paired normal sample (whereas ADTEx, for example, does). Interestingly, we observe sufficiently strong dependency among alleles (linkage disequilibrium) in WES data for our approach to excel. hapLOHseq should be useful for the detection and profiling of AI in tumor samples that are either heavily diluted with normal tissue cells or in heterogeneous tumors or premalignant lesions.

Funding

This work was supported by the Department of Translational Molecular Pathology Fellowship at The University of Texas MD Anderson Cancer Center, NIH grants R03CA176788, U24CA143883, U01GM92666, R01HG005859 and by The University of Texas MD Anderson Cancer Center Core Support Grant. Conflict of Interest: none declared.

11 in total

Review 1. Methods and strategies for analyzing copy number variation using DNA microarrays.

Authors: Nigel P Carter
Journal: Nat Genet Date: 2007-07 Impact factor: 38.330

2. The somatic genomic landscape of glioblastoma.

Authors: Cameron W Brennan; Roel G W Verhaak; Aaron McKenna; Benito Campos; Houtan Noushmehr; Sofie R Salama; Siyuan Zheng; Debyani Chakravarty; J Zachary Sanborn; Samuel H Berman; Rameen Beroukhim; Brady Bernard; Chang-Jiun Wu; Giannicola Genovese; Ilya Shmulevich; Jill Barnholtz-Sloan; Lihua Zou; Rahulsimham Vegesna; Sachet A Shukla; Giovanni Ciriello; W K Yung; Wei Zhang; Carrie Sougnez; Tom Mikkelsen; Kenneth Aldape; Darell D Bigner; Erwin G Van Meir; Michael Prados; Andrew Sloan; Keith L Black; Jennifer Eschbacher; Gaetano Finocchiaro; William Friedman; David W Andrews; Abhijit Guha; Mary Iacocca; Brian P O'Neill; Greg Foltz; Jerome Myers; Daniel J Weisenberger; Robert Penny; Raju Kucherlapati; Charles M Perou; D Neil Hayes; Richard Gibbs; Marco Marra; Gordon B Mills; Eric Lander; Paul Spellman; Richard Wilson; Chris Sander; John Weinstein; Matthew Meyerson; Stacey Gabriel; Peter W Laird; David Haussler; Gad Getz; Lynda Chin
Journal: Cell Date: 2013-10-10 Impact factor: 41.582

3. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.

Authors: Yun Li; Cristen J Willer; Jun Ding; Paul Scheet; Gonçalo R Abecasis
Journal: Genet Epidemiol Date: 2010-12 Impact factor: 2.135

4. The order of genetic events associated with colorectal cancer progression inferred from meta-analysis of copy number changes.

Authors: Chieu B Diep; Kristine Kleivi; Franclim R Ribeiro; Manuel R Teixeira; Ole C Lindgjaerde; Ragnhild A Lothe
Journal: Genes Chromosomes Cancer Date: 2006-01 Impact factor: 5.006

Review 5. Familial adenomatous polyposis.

Authors: Polymnia Galiatsatos; William D Foulkes
Journal: Am J Gastroenterol Date: 2006-02 Impact factor: 10.864

6. Mutation and cancer: statistical study of retinoblastoma.

Authors: A G Knudson
Journal: Proc Natl Acad Sci U S A Date: 1971-04 Impact factor: 11.205

7. Haplotype-based profiling of subtle allelic imbalance with SNP arrays.

Authors: Selina Vattathil; Paul Scheet
Journal: Genome Res Date: 2012-10-01 Impact factor: 9.043

8. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data.

Authors: Valentina Boeva; Tatiana Popova; Kevin Bleakley; Pierre Chiche; Julie Cappo; Gudrun Schleiermacher; Isabelle Janoueix-Lerosey; Olivier Delattre; Emmanuel Barillot
Journal: Bioinformatics Date: 2011-12-06 Impact factor: 6.937

9. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling.

Authors: Vincent Plagnol; James Curtis; Michael Epstein; Kin Y Mok; Emma Stebbings; Sofia Grigoriadou; Nicholas W Wood; Sophie Hambleton; Siobhan O Burns; Adrian J Thrasher; Dinakantha Kumararatne; Rainer Doffinger; Sergey Nejentsev
Journal: Bioinformatics Date: 2012-08-31 Impact factor: 6.937

10. Inferring copy number and genotype in tumour exome data.

Authors: Kaushalya C Amarasinghe; Jason Li; Sally M Hunter; Georgina L Ryland; Prue A Cowin; Ian G Campbell; Saman K Halgamuge
Journal: BMC Genomics Date: 2014-08-28 Impact factor: 3.969

10 in total

Review 1. Informatics for cancer immunotherapy.

Authors: J Hammerbacher; A Snyder
Journal: Ann Oncol Date: 2017-12-01 Impact factor: 32.976

2. Directional allelic imbalance profiling and visualization from multi-sample data with RECUR.

Authors: Yasminka A Jakubek; F Anthony San Lucas; Paul Scheet
Journal: Bioinformatics Date: 2019-07-01 Impact factor: 6.937

3. Large-scale analysis of acquired chromosomal alterations in non-tumor samples from patients with cancer.

Authors: Y A Jakubek; K Chang; S Sivakumar; Y Yu; M R Giordano; J Fowler; C D Huff; H Kadara; E Vilar; P Scheet
Journal: Nat Biotechnol Date: 2019-11-04 Impact factor: 54.908

4. Defining the Comprehensive Genomic Landscapes of Pancreatic Ductal Adenocarcinoma Using Real-World Endoscopic Aspiration Samples.

Authors: Alexander Semaan; Vincent Bernard; Jaewon J Lee; Justin W Wong; Jonathan Huang; Daniel B Swartzlander; Bret M Stephens; Maria E Monberg; Brian R Weston; Manoop S Bhutani; Kyle Chang; Paul A Scheet; Anirban Maitra; Yasminka A Jakubek; Paola A Guerrero
Journal: Clin Cancer Res Date: 2020-11-13 Impact factor: 13.801

5. Dynamic changes during the treatment of pancreatic cancer.

Authors: Robert A Wolff; Andrea Wang-Gillam; Hector Alvarez; Hervé Tiriac; Dannielle Engle; Shurong Hou; Abigail F Groff; Anthony San Lucas; Vincent Bernard; Kelvin Allenson; Jonathan Castillo; Dong Kim; Feven Mulu; Jonathan Huang; Bret Stephens; Ignacio I Wistuba; Matthew Katz; Gauri Varadhachary; YoungKyu Park; James Hicks; Arul Chinnaiyan; Louis Scampavia; Timothy Spicer; Chiara Gerhardinger; Anirban Maitra; David Tuveson; John Rinn; Gregory Lizee; Cassian Yee; Arnold J Levine
Journal: Oncotarget Date: 2018-02-13

6. Investigation of somatic CNVs in brains of synucleinopathy cases using targeted SNCA analysis and single cell sequencing.

Authors: Diego Perez-Rodriguez; Maria Kalyva; Melissa Leija-Salazar; Tammaryn Lashley; Maxime Tarabichi; Viorica Chelban; Steve Gentleman; Lucia Schottlaender; Hannah Franklin; George Vasmatzis; Henry Houlden; Anthony H V Schapira; Thomas T Warner; Janice L Holton; Zane Jaunmuktane; Christos Proukakis
Journal: Acta Neuropathol Commun Date: 2019-12-23 Impact factor: 7.801

7. Chromosomal imbalances detected via RNA-sequencing in 28 cancers.

Authors: Zuhal Ozcan; Francis A San Lucas; Justin W Wong; Kyle Chang; Konrad H Stopsack; Jerry Fowler; Yasminka A Jakubek; Paul Scheet
Journal: Bioinformatics Date: 2022-01-06 Impact factor: 6.937

8. Allelic complexity of KMT2A partial tandem duplications in acute myeloid leukemia and myelodysplastic syndromes.

Authors: Harrison K Tsai; Christopher J Gibson; H Moses Murdock; Phani Davineni; Marian H Harris; Eunice S Wang; Lukasz P Gondek; Annette S Kim; Valentina Nardi; R Coleman Lindsley
Journal: Blood Adv Date: 2022-07-26

9. Pan cancer patterns of allelic imbalance from chromosomal alterations in 33 tumor types.

Authors: Smruthy Sivakumar; F Anthony San Lucas; Yasminka A Jakubek; Zuhal Ozcan; Jerry Fowler; Paul Scheet
Journal: Genetics Date: 2021-03-03 Impact factor: 4.562

10. Multiomics profiling of primary lung cancers and distant metastases reveals immunosuppression as a common characteristic of tumor cells with metastatic plasticity.

Authors: Won-Chul Lee; Alexandre Reuben; Xin Hu; Nicholas McGranahan; Runzhe Chen; Ali Jalali; Marcelo V Negrao; Shawna M Hubert; Chad Tang; Chia-Chin Wu; Anthony San Lucas; Whijae Roh; Kenichi Suda; Jihye Kim; Aik-Choon Tan; David H Peng; Wei Lu; Ximing Tang; Chi-Wan Chow; Junya Fujimoto; Carmen Behrens; Neda Kalhor; Kazutaka Fukumura; Marcus Coyle; Rebecca Thornton; Curtis Gumbs; Jun Li; Chang-Jiun Wu; Latasha Little; Emily Roarty; Xingzhi Song; J Jack Lee; Erik P Sulman; Ganesh Rao; Stephen Swisher; Lixia Diao; Jing Wang; John V Heymach; Jason T Huse; Paul Scheet; Ignacio I Wistuba; Don L Gibbons; P Andrew Futreal; Jianhua Zhang; Daniel Gomez; Jianjun Zhang
Journal: Genome Biol Date: 2020-11-04 Impact factor: 13.583

10 in total