| Literature DB >> 25319062 |
F Favero1, T Joshi1, A M Marquard1, N J Birkbak1, M Krzystanek1, Q Li2, Z Szallasi3, A C Eklund4.
Abstract
BACKGROUND: Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described.Entities:
Keywords: cancer genomics; copy number alterations; mutations; next-generation sequencing; software
Mesh:
Year: 2014 PMID: 25319062 PMCID: PMC4269342 DOI: 10.1093/annonc/mdu479
Source DB: PubMed Journal: Ann Oncol ISSN: 0923-7534 Impact factor: 32.976
Figure 1.Representative output of the Sequenza algorithm. Exome sequencing data from an ovarian tumor (TCGA-42-2591-01A) and matched normal (TCGA-42-2591-10A) specimen were applied to Sequenza. (A) The log posterior probability (LPP) of the observed data were calculated for a range of candidate ploidy and cellularity values. The point estimate is the ploidy and cellularity with maximum LPP. The 95% confidence region is the smallest (not necessarily contiguous) set of points with a total posterior probability >0.95. The background color indicates the rank of the LPP (blue = most likely, white = least likely), provided here to contrast other possible parameters that are very unlikely under our model but might still be of interest. Local maxima are indicated with a ‘+’ and indicate possible alternative solutions. (B) Observed depth ratio and BAF values for each genomic segment (black circles and dots) along with the representative joint LPP density (colors). The representative joint LPP density is calculated for the cellularity and ploidy estimates identified in (A), and is calculated for a hypothetical representative 10 Mb segment. The actual joint LPP density is dependent on segment size and variability and thus varies quantitatively but not qualitatively for each segment. Observed segments with highly unlikely DR and BAF values may indicate subclonality, measurement errors, or incorrect model parameters. (C) Chromosome plot indicating mutant allele frequency (top panel), B allele frequency (middle panel), and depth ratio (bottom panel) according to genomic position. Here, chromosome 1 is shown. The mutant allele frequency at a given position is the fraction of reads with a mutation, and is displayed if >0.1 for each genomic position with sufficient sequencing depth. For the sake of visualization, the B allele frequency and depth ratio are summarized within 1 Mb windows staggered every 0.5 Mb. Within each window, a thick black line indicates the median value, and a blue bar indicates the interquartile range. Red lines indicate segmented values. The thin dotted lines indicate the expectation values under the fitted model; their placement is based on the estimated cellularity, ploidy, and copy number profile. In the top panel, the dotted lines indicate the number of alleles with mutation, with the lowest line starting at one. In the middle panel, the dotted lines indicate the minor allele copy number, with the lowest line starting at zero. In the lower panel, the dotted lines indicate the copy number.
Figure 2.Comparison of cellularity and ploidy estimates and copy number profiles derived from exome sequence to those derived from SNP array and testing on simulated data. (A–C) Matched tumor-normal exome sequencing and SNP array data from 10 ovarian cancer patients and 20 renal cell carcinoma patients were obtained from TCGA. Exome data was analyzed with Sequenza, and SNP array data were analyzed with ASCAT. (A) Ploidy and (B) cellularity estimates were compared between the two platforms. (C) Copy number profiles were compared by calculating the absolute difference in estimated copy number for each genomic position (ΔCN). The figure indicates the fraction of the covered genome with each level of ΔCN. Asterisks indicate tumors for which the Sequenza cellularity estimate is lower than 0.4. (D and E) Sequenza (D) ploidy and (E) cellularity estimates from simulated whole-genome sequencing with varying cellularity for cell lines HCC1954 and HCC1143. Vertical lines indicate 95% confidence intervals on the estimates. Dashed horizontal lines indicate ploidy estimates of the same cell lines by SNP array in an independent study [4].
Performance of various algorithms on TCGA exome data
| Algorithm | |||||
|---|---|---|---|---|---|
| Sequenza | 0.90 (0.91) | 0.42 (0.94) | 0.69 | 0.095 (0.087) | 0.95 (0.25) |
| ABSOLUTE | 0.19 (0.61) | 0.13 (0.50) | 0.08 | 0.35 (0.19) | 1.81 (1.08) |
| absCN-seq | 0.46 (0.65) | −0.26 (0.46) | 0.02 | 0.16 (0.13) | 1.91 (0.76) |
, = Pearson correlation of cellularity or ploidy estimates (respectively) with those of ASCAT. = median (over all samples) fraction of the genome with copy number estimate equal to that of ASCAT. = median (over all samples) Pearson correlation of copy number profile with that of ASCAT. The numbers in parentheses indicate the result when the set of alternative solutions is visually inspected.