| Literature DB >> 27986820 |
Misko Dzamba1, Arun K Ramani2, Pawel Buczkowicz3,4,5, Yue Jiang2, Man Yu3,4,5, Cynthia Hawkins3,4,5, Michael Brudno1,2.
Abstract
The genomic alterations associated with cancers are numerous and varied, involving both isolated and large-scale complex genomic rearrangements (CGRs). Although the underlying mechanisms are not well understood, CGRs have been implicated in tumorigenesis. Here, we introduce CouGaR, a novel method for characterizing the genomic structure of amplified CGRs, leveraging both depth of coverage (DOC) and discordant pair-end mapping techniques. We applied our method to whole-genome sequencing (WGS) samples from The Cancer Genome Atlas and identify amplified CGRs in at least 5.2% (10+ copies) to 17.8% (6+ copies) of the samples. Furthermore, ∼95% of these amplified CGRs contain genes previously implicated in tumorigenesis, indicating the importance and widespread occurrence of CGRs in cancers. Additionally, CouGaR identified the occurrence of 'chromoplexy' in nearly 63% of all prostate cancer samples and 30% of all bladder cancer samples. To further validate the accuracy of our method, we experimentally tested 17 predicted fusions in two pediatric glioma samples and validated 15 of these (88%) with precise resolution of the breakpoints via qPCR experiments and Sanger sequencing, with nearly perfect copy count concordance. Additionally, to further help display and understand the structure of CGRs, we have implemented CouGaR-viz, a generic stand-alone tool for visualization of the copy count of regions, breakpoints, and relevant genes.Entities:
Mesh:
Year: 2016 PMID: 27986820 PMCID: PMC5204335 DOI: 10.1101/gr.211201.116
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Overview of CouGaR algorithm. Tumor and normal samples are processed through a five-step algorithm. (A) We identify regions that are potentially amplified (dark blue) across two different chromosomes (red and yellow lines) in the tumor samples (left two contigs) compared to normal samples (right two contigs). We compute depth of coverage (DOC) information and cluster discordant read pairs to represent novel (with respect to hg19) adjacencies in the genome. (B) We identify continuous regions of amplification in the tumor genome using an HMM and DOC information from both tumor and normal samples. (C) We add a single super-source/-sink node, and using a min-cost circulation algorithm, we solve for the copy count of each region in the tumor genome. (D) Finally, a minimal set of circular and linear contigs that explain the coverage is found by formulating an integer programming problem that puts a penalty term on the number of unique contigs used.
Figure 2.Analysis of DIPG29. (A) The predicted CGRs are convolved into a unique structure as visualized by CouGaR-Viz (the four contigs identified are illustrated in Supplemental Fig. S1). Genomic segments are represented by red lines and are interrupted by black vertical lines to show breakpoints. Directional arrows are used to show connections between the segments, and thickness of the arrows and red segments represent the identified copy counts. Genes overlapping the positive strand are depicted as green boxes and genes overlapping the negative strand are shown in purple. Breakpoints that were selected for testing are shown as letters (A–I) in circles. Here, green circles indicate breakpoints that were validated, and yellow circles indicate breakpoints that failed to validate. (B) Nine breakpoints were selected for validation, and for each of these, unique primers were designed and copy counts were estimated with qPCR. GPX7 gene was used as a control to normalize the counts. For seven of the positively tested breakpoints, the copy counts estimated by qPCR are shown in purple (error bars show standard deviation), and copy counts estimated by CouGaR are shown as orange bars. In all cases, the qPCR results match the predicted copy counts.
A comparison of the results from the analysis of four tumor samples
TCGA tumor samples analyzed and enrichment of cancer genes for each subtype
Figure 3.Distribution of multichromosome contigs. For each of the tumor types, we looked at the chromosomes that each of the contigs span and binned each sample based on the most chromosomes spanned by any contig. In the bar chart, we show the percentage of samples based on the contig with the largest number of chromosomes (1–9). For each cancer type, we also report as a fraction the number of samples with a contig spanning three or more chromosomes. We notice that prostate (PRAD) and bladder (BLCA) cancers have a high occurrence of multichromosome contigs. On the other end of the spectrum are colon (COAD) and rectal (READ) cancers, with most contigs contained in one chromosome.
Figure 4.Tumor adjacency graph. (A,B) Reference and tumor genome, respectively. (C) The bidirected graph representation of the tumor genome. (D) The directed graph equivalent to C.