Literature DB >> 25319062

Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.

F Favero1, T Joshi1, A M Marquard1, N J Birkbak1, M Krzystanek1, Q Li2, Z Szallasi3, A C Eklund4.   

Abstract

BACKGROUND: Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described.
MATERIALS AND METHODS: We developed Sequenza, a software package that uses paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific copy number profiles and mutation profiles. We applied Sequenza, as well as two previously published algorithms, to exome sequence data from 30 tumors from The Cancer Genome Atlas. We assessed the performance of these algorithms by comparing their results with those generated using matched SNP arrays and processed by the allele-specific copy number analysis of tumors (ASCAT) algorithm.
RESULTS: Comparison between Sequenza/exome and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions). This performance was noticeably superior to previously published algorithms. In addition, in artificial data simulating normal-tumor admixtures, Sequenza detected the correct ploidy in samples with tumor content as low as 30%.
CONCLUSIONS: The agreement between Sequenza and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient not only for identifying small scale mutations but also for estimating cellularity and inferring DNA copy number aberrations.
© The Author 2014. Published by Oxford University Press on behalf of the European Society for Medical Oncology.

Entities:  

Keywords:  cancer genomics; copy number alterations; mutations; next-generation sequencing; software

Mesh:

Year:  2014        PMID: 25319062      PMCID: PMC4269342          DOI: 10.1093/annonc/mdu479

Source DB:  PubMed          Journal:  Ann Oncol        ISSN: 0923-7534            Impact factor:   32.976


introduction

Cancer is a genetic disease in which specific mutations or genomic aberrations can enable tumor initiation or progression, and in certain cases can determine the effectiveness of specific anticancer therapies. Several tumor resequencing projects have collected and analyzed genetic material from large cohorts of patients in an effort to identify important somatic events that may represent drug targets or predictive biomarkers [1]. In such projects, nonsynonymous substitutions and short indels in coding regions are typically detected by analyzing exome sequencing data derived from matched pairs of tumor and normal tissues of cancer patients, whereas larger aberrations such as copy number alterations or loss of heterozygosity (LOH) are typically detected using genome-wide single nucleotide polymorphism (SNP) arrays, which remains the current state-of-the-art. Tumor tissue specimens comprise a mixture of cancer cells and normal cells; therefore, analysis of tumor data must take the specimen cellularity into consideration [2-5]. However, it is currently not possible to make a histological estimate of tumor cellularity and extract high-quality DNA from the very same specimen; therefore, cellularity estimates based on histology are commonly made from an adjacent tumor section which often does not reflect the cellularity of the section used for DNA sequencing. Thus, using the DNA itself to make cellularity estimates is an appealing approach. Several methods have been described that estimate, and then correct for, tumor cellularity in SNP array data in order to improve copy number profiles [2-5] or in DNA sequencing data for mutation calling [6]. Copy number profiles can be inferred from sequencing data of sufficient depth and coverage, by using the relative number of reads mapped to a given genomic position (depth ratio) as an indicator of copy number. This approach has recently been demonstrated in algorithms such as VarScan 2 [7] and APOLLOH [8], wherein inferred copy number profiles from whole-exome sequencing alone (WES profiles) are largely concordant with profiles inferred from SNP array data (SNP profiles). APOLLOH estimates the tumor cellularity, whereas VarScan 2 does not. In addition, algorithms such as PurityEst [9] and PurBayes [10] are specialized to estimate tumor cellularity directly from paired tumor-normal sequence data. Only recently, newer tools including absCN-seq [11] and newer versions of ABSOLUTE [4] have provided methods to estimate cellularity and ploidy and calculate copy number profiles directly from exome sequencing data. In such algorithms, accurate cellularity and ploidy estimation is essential for the generation of correct copy number profiles. Here we describe Sequenza, a software package that uses paired tumor-normal exome or whole-genome sequencing data to estimate tumor cellularity and ploidy and to infer allele-specific tumor copy number profiles. Using publicly available matched tumor-normal data, we compare the results of exome sequence data analyzed by Sequenza with SNP array data from the same tumors analyzed by allele-specific copy number analysis of tumors (ASCAT). For comparison, we also assess the performance of the previously described algorithms absCN-seq and ABSOLUTE.

materials and methods

algorithm

Sequenza is based on a probabilistic model applied to segmented data. The observations include the average depth ratio (tumor versus normal) and B allele frequency (the lesser of the two allelic fractions as measured at germline heterozygous positions) for each segment. The model parameters include overall tumor ploidy and cellularity, and segment-specific copy number and minor allele copy number. The location of the segments and the segment-level dispersion are taken as known constants. We estimate model parameters using a maximum a posteriori approach in which prior probabilities are defined for the copy number such that two copies (by default) are preferred over other values. Under this model, given values for cellularity and ploidy, the segment-level parameters can be quickly estimated. Thus, we solve the overall estimation problem using a grid-based search over reasonable values of cellularity and ploidy (see supplementary Methods, available at ).

implementation

The Sequenza software consists of two distinct parts: a python-based preprocessing tool, and an R package implementing the model fitting and visualization functions (supplementary Figure S1, available at ). The python script ‘sequenza-utils’ has two roles. First, it calculates the GC content in sliding windows from a genome reference file in FASTA format. Second, it processes the sequencing data from the tumor and normal specimens, which must be in the Pileup format, as output by SAMtools [12]. For genomic positions with sufficient sequencing depth (by default, >20 reads total from tumor and normal specimens), the script extracts sequencing depth, determines homozygous and heterozygous positions in the normal specimen, and calculates the variant alleles and allelic frequency from the tumor specimen. The output is a tab-delimited text file suitable for import into R. Additionally, ‘sequenza-utils’ is compatible with the pypy python implementation [13], which performs around six times faster than the standard python implementation. The ‘sequenza’ R package is used to perform the analysis on the output of the sequenza-utils and is implemented with three high-level functions (supplementary Figure S1B, available at ): first, sequenza.extract efficiently reads the input file into R, performs GC-content normalization of the tumor versus normal depth ratio, and performs allele-specific segmentation using the ‘copynumber’ package [14]. Second, sequenza.fit applies the model described in the supplementary Material, available at , to infer cellularity and ploidy parameters and copy number profiles. Alternative solutions are also provided, using local maxima of the posterior probability space. Finally, sequenza.results returns the results of the estimation together with alternative solutions and visualization of the data and the model along the genome and the individual chromosomes. Detailed methods are available in supplementary Methods, available at . The software has a web page at http://www.cbs.dtu.dk/biotools/sequenza and is freely available from CRAN.

data and analysis

Thousands of specimens are available from the TCGA; we arbitrarily selected the first 10 ovarian serous carcinomas (OVCA) and 20 clear-cell renal cell carcinomas (KIRC) sample IDs as of May 2013, when sorted alphabetically. The SNP arrays for ovarian serous carcinomas and renal clear-cell carcinomas were obtained on 22 January 2010 and 17 November 2011, respectively. Exome sequence data, previously aligned to the human genome version hg19, was obtained in BAM format in May 2013. The SNP array files were preprocessed using the aroma.affymetrix package [15] as described [16], and copy number variations were determined using ASCAT version 2.1 [3]; sex chromosomes were excluded from the analysis. The Sequenza results were obtained using version 2.1.0 with default parameters; the input was generated by the python script sequenza-utils.py version 2.1.0 with default binning size of 50 bases for the exome sequencing or 200 bases for the whole-genome sequencing. The absCN-seq results were obtained using version 1.0 with default parameters; the input was the same genomic segments used by Sequenza as well as high-quality somatic mutations calls detected by VarScan2 as described in the software documentation. The ABSOLUTE results were obtained using software version 1.0.6 with default parameters except that the platform was specified as ‘Illumina_WES’; the input was the same genomic segments used with Sequenza and absCN-seq. Exome sequencing data from 31 of the NCI-60 tumor cell lines, aligned to the genome version hg19, were downloaded in May 2014 in the BAM format [17]. Whole-genome sequencing, aligned to the hg19 genome in the BAM format at ×30 of coverage, of two cell lines HCC1143 and HCC1954, matching normal blood, and simulated admixtures at tumor cellularity of 20%, 40%, 60%, and 80%, were obtained in March 2014 from the TCGA4 benchmark cohort (https://cghub.ucsc.edu/datasets/benchmark_download.html). All BAM files were processed to remove PCR duplicates and low-quality mappings with Picard, and then converted to pileup format using SAMtools [12].

results

application of sequenza to tumor exome sequencing data

To compare Sequenza WES profiles with the current state-of-the-art, SNP profiles, we obtained paired tumor-normal exome and Affymetrix SNP6 arrays from 10 OVCA patients [18] and 20 KIRC patients [19]. We chose renal and ovarian cancer because these represent two widely different cancer types: clear-cell renal cancer has low cellularity and few copy number variations, whereas ovarian cancer typically shows extensive copy number alterations and high tumor cellularity. The exome data were processed with Sequenza using default settings. Running on a single CPU core, this required an average per-specimen running time of 4 h for preprocessing, 30 min for segmentation, and 4 min for model fitting and parameter estimation. Results from a representative sample are shown in Figure 1. Of the 20 renal cancer copy number profiles, 17 exhibited 3p loss (supplementary Figure S5, available at ), consistent with previous observations of renal cancer [19].
Figure 1.

Representative output of the Sequenza algorithm. Exome sequencing data from an ovarian tumor (TCGA-42-2591-01A) and matched normal (TCGA-42-2591-10A) specimen were applied to Sequenza. (A) The log posterior probability (LPP) of the observed data were calculated for a range of candidate ploidy and cellularity values. The point estimate is the ploidy and cellularity with maximum LPP. The 95% confidence region is the smallest (not necessarily contiguous) set of points with a total posterior probability >0.95. The background color indicates the rank of the LPP (blue = most likely, white = least likely), provided here to contrast other possible parameters that are very unlikely under our model but might still be of interest. Local maxima are indicated with a ‘+’ and indicate possible alternative solutions. (B) Observed depth ratio and BAF values for each genomic segment (black circles and dots) along with the representative joint LPP density (colors). The representative joint LPP density is calculated for the cellularity and ploidy estimates identified in (A), and is calculated for a hypothetical representative 10 Mb segment. The actual joint LPP density is dependent on segment size and variability and thus varies quantitatively but not qualitatively for each segment. Observed segments with highly unlikely DR and BAF values may indicate subclonality, measurement errors, or incorrect model parameters. (C) Chromosome plot indicating mutant allele frequency (top panel), B allele frequency (middle panel), and depth ratio (bottom panel) according to genomic position. Here, chromosome 1 is shown. The mutant allele frequency at a given position is the fraction of reads with a mutation, and is displayed if >0.1 for each genomic position with sufficient sequencing depth. For the sake of visualization, the B allele frequency and depth ratio are summarized within 1 Mb windows staggered every 0.5 Mb. Within each window, a thick black line indicates the median value, and a blue bar indicates the interquartile range. Red lines indicate segmented values. The thin dotted lines indicate the expectation values under the fitted model; their placement is based on the estimated cellularity, ploidy, and copy number profile. In the top panel, the dotted lines indicate the number of alleles with mutation, with the lowest line starting at one. In the middle panel, the dotted lines indicate the minor allele copy number, with the lowest line starting at zero. In the lower panel, the dotted lines indicate the copy number.

Representative output of the Sequenza algorithm. Exome sequencing data from an ovarian tumor (TCGA-42-2591-01A) and matched normal (TCGA-42-2591-10A) specimen were applied to Sequenza. (A) The log posterior probability (LPP) of the observed data were calculated for a range of candidate ploidy and cellularity values. The point estimate is the ploidy and cellularity with maximum LPP. The 95% confidence region is the smallest (not necessarily contiguous) set of points with a total posterior probability >0.95. The background color indicates the rank of the LPP (blue = most likely, white = least likely), provided here to contrast other possible parameters that are very unlikely under our model but might still be of interest. Local maxima are indicated with a ‘+’ and indicate possible alternative solutions. (B) Observed depth ratio and BAF values for each genomic segment (black circles and dots) along with the representative joint LPP density (colors). The representative joint LPP density is calculated for the cellularity and ploidy estimates identified in (A), and is calculated for a hypothetical representative 10 Mb segment. The actual joint LPP density is dependent on segment size and variability and thus varies quantitatively but not qualitatively for each segment. Observed segments with highly unlikely DR and BAF values may indicate subclonality, measurement errors, or incorrect model parameters. (C) Chromosome plot indicating mutant allele frequency (top panel), B allele frequency (middle panel), and depth ratio (bottom panel) according to genomic position. Here, chromosome 1 is shown. The mutant allele frequency at a given position is the fraction of reads with a mutation, and is displayed if >0.1 for each genomic position with sufficient sequencing depth. For the sake of visualization, the B allele frequency and depth ratio are summarized within 1 Mb windows staggered every 0.5 Mb. Within each window, a thick black line indicates the median value, and a blue bar indicates the interquartile range. Red lines indicate segmented values. The thin dotted lines indicate the expectation values under the fitted model; their placement is based on the estimated cellularity, ploidy, and copy number profile. In the top panel, the dotted lines indicate the number of alleles with mutation, with the lowest line starting at one. In the middle panel, the dotted lines indicate the minor allele copy number, with the lowest line starting at zero. In the lower panel, the dotted lines indicate the copy number.

comparison between exome/Sequenza and SNP array/ASCAT profiles

There is no tumor gold standard that could be used to validate the performance of Sequenza. However, the use of SNP arrays processed by ASCAT is an established approach for determining copy number profiles; therefore, a positive agreement between these two platforms would confirm the performance of Sequenza. Hereafter, for simplicity, we use the terms ‘Sequenza’ and ‘ASCAT’ with the understanding that it is actually the combined measurement platform/software that is being considered. Sequenza and ASCAT both provide estimates of cellularity and ploidy, and we found a strong correlation for both parameters (r = 0.90 and r = 0.42, respectively, Figure 2A and B, Table 1). Interestingly, the ploidy comparison seems to be characterized by a few large outliers, many of which have low cellularity. Details about three highly discordant samples are shown in supplementary Figures S6–S8, available at .
Figure 2.

Comparison of cellularity and ploidy estimates and copy number profiles derived from exome sequence to those derived from SNP array and testing on simulated data. (A–C) Matched tumor-normal exome sequencing and SNP array data from 10 ovarian cancer patients and 20 renal cell carcinoma patients were obtained from TCGA. Exome data was analyzed with Sequenza, and SNP array data were analyzed with ASCAT. (A) Ploidy and (B) cellularity estimates were compared between the two platforms. (C) Copy number profiles were compared by calculating the absolute difference in estimated copy number for each genomic position (ΔCN). The figure indicates the fraction of the covered genome with each level of ΔCN. Asterisks indicate tumors for which the Sequenza cellularity estimate is lower than 0.4. (D and E) Sequenza (D) ploidy and (E) cellularity estimates from simulated whole-genome sequencing with varying cellularity for cell lines HCC1954 and HCC1143. Vertical lines indicate 95% confidence intervals on the estimates. Dashed horizontal lines indicate ploidy estimates of the same cell lines by SNP array in an independent study [4].

Table 1.

Performance of various algorithms on TCGA exome data

AlgorithmrρrψFΔCN=0RMSEρRMSEψ
Sequenza0.90 (0.91)0.42 (0.94)0.690.095 (0.087)0.95 (0.25)
ABSOLUTE0.19 (0.61)0.13 (0.50)0.080.35 (0.19)1.81 (1.08)
absCN-seq0.46 (0.65)−0.26 (0.46)0.020.16 (0.13)1.91 (0.76)

, = Pearson correlation of cellularity or ploidy estimates (respectively) with those of ASCAT. = median (over all samples) fraction of the genome with copy number estimate equal to that of ASCAT. = median (over all samples) Pearson correlation of copy number profile with that of ASCAT. The numbers in parentheses indicate the result when the set of alternative solutions is visually inspected.

Performance of various algorithms on TCGA exome data , = Pearson correlation of cellularity or ploidy estimates (respectively) with those of ASCAT. = median (over all samples) fraction of the genome with copy number estimate equal to that of ASCAT. = median (over all samples) Pearson correlation of copy number profile with that of ASCAT. The numbers in parentheses indicate the result when the set of alternative solutions is visually inspected. Comparison of cellularity and ploidy estimates and copy number profiles derived from exome sequence to those derived from SNP array and testing on simulated data. (A–C) Matched tumor-normal exome sequencing and SNP array data from 10 ovarian cancer patients and 20 renal cell carcinoma patients were obtained from TCGA. Exome data was analyzed with Sequenza, and SNP array data were analyzed with ASCAT. (A) Ploidy and (B) cellularity estimates were compared between the two platforms. (C) Copy number profiles were compared by calculating the absolute difference in estimated copy number for each genomic position (ΔCN). The figure indicates the fraction of the covered genome with each level of ΔCN. Asterisks indicate tumors for which the Sequenza cellularity estimate is lower than 0.4. (D and E) Sequenza (D) ploidy and (E) cellularity estimates from simulated whole-genome sequencing with varying cellularity for cell lines HCC1954 and HCC1143. Vertical lines indicate 95% confidence intervals on the estimates. Dashed horizontal lines indicate ploidy estimates of the same cell lines by SNP array in an independent study [4]. Both Sequenza and ASCAT return a list of genomic segments, each with an estimated copy number state. However, the breakpoints between segments are different, and the genomic coverage of the two platforms is not the same. We compared only the positions covered in the segmentation for both platforms (Figure 2C). Aside from samples where the Sequenza and ASCAT ploidy estimates disagree, the genome fraction with perfect agreement ( equal to zero) was generally high, with a median value of 69%. However, as expected, samples with ploidy disagreement were also discordant in their copy number profiles. To assess copy number profile agreement between the two platforms in a ploidy-independent manner, we carried out hierarchical clustering of the sample profiles using a Pearson correlation distance metric. In all but one case, sample profiles from the same patient derived from different platforms clustered together (data not shown), and are thus more similar to each other than to other profiles, even when the ploidy call is substantially different between the two algorithms.

comparison with other methods

We are aware of two previously published methods to generate copy number profiles from exome sequencing data in a way that accounts for tumor cellularity: ABSOLUTE [4] and absCN-seq [11]. We assessed the performance of these methods using the same criteria as we used to evaluate Sequenza. Similarly to above, we use the terms ‘ABSOLUTE’ and ‘absCN-seq’ to indicate the results derived from exome sequencing applied to each specific method, and we compared the results of each algorithm with the results from ASCAT applied to SNP array data. To focus the comparison on ploidy and copy number estimation algorithms rather than segmentation algorithms, we used the same segmented input (processed by copynumber [14]) as input to each algorithm. The comparison results for each algorithm are summarized in Table 1. First, we compared cellularity and ploidy estimates. The ABSOLUTE estimates of cellularity and ploidy were weakly correlated with ASCAT estimates (r = 0.19 and r = 0.13, supplementary Figure S2A and B). The absCN-seq estimates were moderately correlated with the ASCAT cellularity estimate (r = 0.46), but had a negative correlation with the ASCAT ploidy estimate (r = −0.26, supplementary Figure S3A and B, available at ). Next, we compared segment-wise copy number estimates. As expected from the low agreement in ploidy estimates, the majority of samples showed substantial disagreement with ASCAT copy number estimates (supplementary Figure S2C and S3C, available at ). Previous publications of copy number inference algorithms have stated the performance obtained after manual selection of a single solution from a set of multiple solutions proposed by the algorithm. Thus, we manually inspected the list of possible solutions from the three algorithms and selected the solution with best agreement to the SNP array solution. As expected, this resulted in increased accuracy for all three algorithms, with Sequenza obtaining the highest agreement (Table 1).

application to cell line data

We applied Sequenza to exome sequencing data from 31 cell lines from the NCI-60 panel [17], and compared the estimated ploidy with previously published modal chromosome numbers derived from spectral karyotyping [20]. These particular samples were selected to compare Sequenza performance with previously published results [11]. To accommodate the lack of matched normals in this dataset, we modified our algorithm to calculate the depth ratio and identify the heterozygous positions from two different sources: we used the near-diploid hematopoietic cell line SR as the normal genome for depth ratio calculation, and the selected cell line itself to determine heterozygous positions. However, with this approach, any LOH regions in the cell line would result in the absence of identified heterozygous positions; thus, we adjusted to zero the B allele frequency of segments with fewer than three heterozygous positions per megabase. Despite the suboptimal input data, we obtained a root mean square error (RMSE) between the karyotype-derived ploidy and Sequenza-estimated ploidy of 1.2 (supplementary Figure S4A, available at ), comparable with results of absCN-seq applied to the same data (0.55) [11]. For comparison to previously published results in which manual inspection of solutions was carried out, we carried out a similar analysis in which we visually inspected two to four alternative solutions, and for eight of the samples selected a solution different from the point estimate, resulting in an RMSE of 0.44 (supplementary Figure S4B, available at ). This can be compared with previously published results in which absCN-seq obtained an RMSE of 0.34 using the same data [11], or to results obtained with SNP array of the NCI-60 cohort with an RMSE of 0.54 using ABSOLUTE and 0.85 using ASCAT [4]. To assess how ploidy estimation accuracy is affected by low cellularity, we analyzed simulated tumor-normal admixtures at proportions of 100%, 80%, 60%, 40%, and 20% provided by the ‘TCGA benchmark 4’ whole-genome sequencing of the HCC1143 and HCC1954 cell lines [21]. Transformations from the normal-tumor reads admixture percentage to tumor content have to consider the tetraploid genomes of the cell line. Result from the simulations shows that the algorithm estimates the correct ploidy until the cellularity values decrease to below 0.3 (Figure 2D and E).

discussion

We have described a simple model to infer accurate copy number profiles from next-generation sequencing data and its implementation in the software package Sequenza. For the majority of specimens we analyzed, we observed a strong agreement between the output of Sequenza and the output from ASCAT using matched SNP array data. The few cases with substantial disagreement in copy number profile seem to stem from disagreement in the ploidy and were more common in specimens with low cellularity. It is possible to determine ploidy experimentally using flow cytometry [22], but this was not carried out on the TCGA specimens. In cases where experimentally derived ploidy data are available, it is possible with Sequenza to explicitly specify the ploidy rather than determine it by model fitting. One advantage of SNP arrays over exome sequencing is the genomic coverage. SNP arrays are often designed to both determine SNP genotypes and detect copy number changes. In particular, the Affymetrix SNP6.0 platform used for the samples tested in this manuscript covers more than 900 000 positions evenly distributed in the genome for copy number detection, and another 900 000 SNP positions, of which on average ∼26% are heterozygous in a given individual. This design allows for highly accurate allele-specific determination of copy number profiles. In contrast, exome enrichment kits are generally not designed for the purpose of determining copy number states. Covered genomic regions are based on known exons, which are on average ∼150 bases in size, and are not evenly distributed throughout the genome. Inference of allele-specific copy numbers requires heterozygous positions and thus can only be achieved for those exons that include SNPs. When working with the exome sequencing data, we recorded an average of 45 000 heterozygous positions for each patient, corresponding to ∼1/5 the number identified by the SNP arrays. However, it seems likely that whole-genome sequencing will eventually become more cost efficient and widely used than exome sequencing. Sequenza is compatible with whole-genome data, and we expect this to result in increased accuracy due to better genomic coverage and increased number of heterozygous positions. In fact, when processing available whole-genome sequencing data (data not shown), we identified an average of 1.7 × 106 heterozygous SNPs, and genotyping and depth information for 2.6 × 109 positions. We are aware of four other methods also designed to estimate copy number profiles in tumor samples of unknown cellularity, but only two of these are designed to work on exome sequencing data. The three algorithms have many common elements in their models, but several important differences. AbsCN-seq uses a least squares method to estimate the most likely model, providing a fast running time; whereas ABSOLUTE and Sequenza use likelihood or posterior probability to estimate the best solution. ABSOLUTE incorporates prior probabilities from previous karyotype analyses, whereas Sequenza uses much simpler prior probabilities on copy numbers that are the same on each segment to estimate the best solution. AbsCN-seq does not incorporate prior probabilities. Additionally, Sequenza and ABSOLUTE provide graphical reports to further inspect the alternative solutions, whereas absCN-seq reports only the numerical alternative cellularity and ploidy values. One possible advantage of Sequenza over the other two algorithms is the use of the B allele frequency, which not only provides additional information beyond the depth ratio, but also enables calculation of allele-specific copy number, whereas the other algorithms provide only absolute copy number profiles. However, the requirement for the B allele frequency is a drawback in cases where it is not possible to accurately determine the heterozygous positions, for example in cell lines where the normal sample is not available. In our comparison with previously published methods ABSOLUTE and absCN-seq, we found that Sequenza shows substantially stronger agreement with SNP array-based cellularity, ploidy, and copy number estimates. However, in the analysis of cell line exome data, absCN-seq performed better than Sequenza, likely because Sequenza relies on identification of heterozygous positions from a matched normal sample that was not available for these cell lines. One limitation of Sequenza as well as its competing algorithms is that the segmentation is taken as a given; a more sophisticated analysis would consider uncertainty in the assignment of segment boundaries. Also, Sequenza does not account for possible heterogeneity of mutations within a tumor specimen, which has important consequences for patient diagnosis and for identification of driver mutations [23, 24]. However, it is possible to use the variant allele frequency and corresponding copy number states from Sequenza as input for external software such as PyClone [25] in order to resolve subclonal structures.

funding

This work was supported by the European Commission 7th Framework Programme (HEALTH-2010-F2-259303); the Danish Council for Independent Research (09-073053/FSS); and the Breast Cancer Research Foundation (to ZS). Funding for open access charge: the Danish Council for Independent Research (09-073053/FSS).

disclosure

The authors have declared no conflicts of interest.
  23 in total

1.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers.

Authors:  Sohrab P Shah; Andrew Roth; Rodrigo Goya; Arusha Oloumi; Gavin Ha; Yongjun Zhao; Gulisa Turashvili; Jiarui Ding; Kane Tse; Gholamreza Haffari; Ali Bashashati; Leah M Prentice; Jaswinder Khattra; Angela Burleigh; Damian Yap; Virginie Bernard; Andrew McPherson; Karey Shumansky; Anamaria Crisan; Ryan Giuliany; Alireza Heravi-Moussavi; Jamie Rosner; Daniel Lai; Inanc Birol; Richard Varhol; Angela Tam; Noreen Dhalla; Thomas Zeng; Kevin Ma; Simon K Chan; Malachi Griffith; Annie Moradian; S-W Grace Cheng; Gregg B Morin; Peter Watson; Karen Gelmon; Stephen Chia; Suet-Feung Chin; Christina Curtis; Oscar M Rueda; Paul D Pharoah; Sambasivarao Damaraju; John Mackey; Kelly Hoon; Timothy Harkins; Vasisht Tadigotla; Mahvash Sigaroudinia; Philippe Gascard; Thea Tlsty; Joseph F Costello; Irmtraud M Meyer; Connie J Eaves; Wyeth W Wasserman; Steven Jones; David Huntsman; Martin Hirst; Carlos Caldas; Marco A Marra; Samuel Aparicio
Journal:  Nature       Date:  2012-04-04       Impact factor: 49.962

2.  International network of cancer genome projects.

Authors:  Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal:  Nature       Date:  2010-04-15       Impact factor: 49.962

Review 3.  Advances for studying clonal evolution in cancer.

Authors:  Li Ding; Benjamin J Raphael; Feng Chen; Michael C Wendl
Journal:  Cancer Lett       Date:  2013-01-23       Impact factor: 8.679

4.  Karyotypic complexity of the NCI-60 drug-screening panel.

Authors:  Anna V Roschke; Giovanni Tonon; Kristen S Gehlhaus; Nicolas McTyre; Kimberly J Bussey; Samir Lababidi; Dominic A Scudiero; John N Weinstein; Ilan R Kirsch
Journal:  Cancer Res       Date:  2003-12-15       Impact factor: 12.701

5.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing.

Authors:  Marco Gerlinger; Andrew J Rowan; Stuart Horswell; James Larkin; David Endesfelder; Eva Gronroos; Pierre Martinez; Nicholas Matthews; Aengus Stewart; Charles Swanton; M Math; Patrick Tarpey; Ignacio Varela; Benjamin Phillimore; Sharmin Begum; Neil Q McDonald; Adam Butler; David Jones; Keiran Raine; Calli Latimer; Claudio R Santos; Mahrokh Nohadani; Aron C Eklund; Bradley Spencer-Dene; Graham Clark; Lisa Pickering; Gordon Stamp; Martin Gore; Zoltan Szallasi; Julian Downward; P Andrew Futreal
Journal:  N Engl J Med       Date:  2012-03-08       Impact factor: 91.245

6.  Comprehensive molecular characterization of clear cell renal cell carcinoma.

Authors: 
Journal:  Nature       Date:  2013-06-23       Impact factor: 49.962

7.  Integrated genomic analyses of ovarian carcinoma.

Authors: 
Journal:  Nature       Date:  2011-06-29       Impact factor: 49.962

8.  Absolute quantification of somatic DNA alterations in human cancer.

Authors:  Scott L Carter; Kristian Cibulskis; Elena Helman; Aaron McKenna; Hui Shen; Travis Zack; Peter W Laird; Robert C Onofrio; Wendy Winckler; Barbara A Weir; Rameen Beroukhim; David Pellman; Douglas A Levine; Eric S Lander; Matthew Meyerson; Gad Getz
Journal:  Nat Biotechnol       Date:  2012-05       Impact factor: 54.908

9.  A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6.

Authors:  Henrik Bengtsson; Pratyaksha Wirapati; Terence P Speed
Journal:  Bioinformatics       Date:  2009-06-17       Impact factor: 6.937

10.  Copynumber: Efficient algorithms for single- and multi-track copy number segmentation.

Authors:  Gro Nilsen; Knut Liestøl; Peter Van Loo; Hans Kristian Moen Vollan; Marianne B Eide; Oscar M Rueda; Suet-Feung Chin; Roslin Russell; Lars O Baumbusch; Carlos Caldas; Anne-Lise Børresen-Dale; Ole Christian Lingjaerde
Journal:  BMC Genomics       Date:  2012-11-04       Impact factor: 3.969

View more
  247 in total

1.  Genomic and Transcriptomic Profiling of Combined Hepatocellular and Intrahepatic Cholangiocarcinoma Reveals Distinct Molecular Subtypes.

Authors:  Ruidong Xue; Lu Chen; Chong Zhang; Masashi Fujita; Ruoyan Li; Shu-Mei Yan; Choon Kiat Ong; Xiwen Liao; Qiang Gao; Shota Sasagawa; Yanmeng Li; Jincheng Wang; Hua Guo; Qi-Tao Huang; Qian Zhong; Jing Tan; Lisha Qi; Wenchen Gong; Zhixian Hong; Meng Li; Jingmin Zhao; Tao Peng; Yinying Lu; Kiat Hon Tony Lim; Arnoud Boot; Atushi Ono; Kazuaki Chayama; Zemin Zhang; Steve George Rozen; Bin Tean Teh; Xin Wei Wang; Hidewaki Nakagawa; Mu-Sheng Zeng; Fan Bai; Ning Zhang
Journal:  Cancer Cell       Date:  2019-05-23       Impact factor: 31.743

2.  SITC cancer immunotherapy resource document: a compass in the land of biomarker discovery.

Authors:  Siwen Hu-Lieskovan; Srabani Bhaumik; Kavita Dhodapkar; Jean-Charles J B Grivel; Sumati Gupta; Brent A Hanks; Sylvia Janetzki; Thomas O Kleen; Yoshinobu Koguchi; Amanda W Lund; Cristina Maccalli; Yolanda D Mahnke; Ruslan D Novosiadly; Senthamil R Selvan; Tasha Sims; Yingdong Zhao; Holden T Maecker
Journal:  J Immunother Cancer       Date:  2020-12       Impact factor: 13.751

3.  Recurrent Tumor Cell-Intrinsic and -Extrinsic Alterations during MAPKi-Induced Melanoma Regression and Early Adaptation.

Authors:  Chunying Song; Marco Piva; Lu Sun; Aayoung Hong; Gatien Moriceau; Xiangju Kong; Hong Zhang; Shirley Lomeli; Jin Qian; Clarissa C Yu; Robert Damoiseaux; Mark C Kelley; Kimberley B Dahlman; Philip O Scumpia; Jeffrey A Sosman; Douglas B Johnson; Antoni Ribas; Willy Hugo; Roger S Lo
Journal:  Cancer Discov       Date:  2017-09-01       Impact factor: 39.397

4.  Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing.

Authors:  Yuchao Jiang; Yu Qiu; Andy J Minn; Nancy R Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2016-08-29       Impact factor: 11.205

Review 5.  Current developments in molecular monitoring in chronic myeloid leukemia.

Authors:  Justine Ellen Marum; Susan Branford
Journal:  Ther Adv Hematol       Date:  2016-07-15

6.  Combined TP53 and RB1 Loss Promotes Prostate Cancer Resistance to a Spectrum of Therapeutics and Confers Vulnerability to Replication Stress.

Authors:  Michael D Nyquist; Alexandra Corella; Ilsa Coleman; Navonil De Sarkar; Arja Kaipainen; Gavin Ha; Roman Gulati; Lisa Ang; Payel Chatterjee; Jared Lucas; Colin Pritchard; Gail Risbridger; John Isaacs; Bruce Montgomery; Colm Morrissey; Eva Corey; Peter S Nelson
Journal:  Cell Rep       Date:  2020-05-26       Impact factor: 9.423

7.  DNA Sequencing of Small Bowel Adenocarcinomas Identifies Targetable Recurrent Mutations in the ERBB2 Signaling Pathway.

Authors:  Liana Adam; F Anthony San Lucas; Richard Fowler; Yao Yu; Wenhui Wu; Yulun Liu; Huamin Wang; David Menter; Michael T Tetzlaff; Joe Ensor; Ganiraju Manyam; Stefan T Arold; Chad Huff; Scott Kopetz; Paul Scheet; Michael J Overman
Journal:  Clin Cancer Res       Date:  2018-10-23       Impact factor: 12.531

8.  Immunologic Recognition of a Shared p53 Mutated Neoantigen in a Patient with Metastatic Colorectal Cancer.

Authors:  Winifred Lo; Maria Parkhurst; Paul F Robbins; Eric Tran; Yong-Chen Lu; Li Jia; Jared J Gartner; Anna Pasetto; Drew Deniger; Parisa Malekzadeh; Thomas E Shelton; Todd Prickett; Satyajit Ray; Scott Kivitz; Biman C Paria; Isaac Kriley; David S Schrump; Steven A Rosenberg
Journal:  Cancer Immunol Res       Date:  2019-02-01       Impact factor: 11.151

9.  CDC73 Germline Mutation in a Family With Mixed Epithelial and Stromal Tumors.

Authors:  Cathy D Vocke; Christopher J Ricketts; Mark W Ball; Laura S Schmidt; Adam R Metwalli; Lindsay A Middelton; J Keith Killian; Javed Khan; Paul S Meltzer; William F Simonds; Maria J Merino; W Marston Linehan
Journal:  Urology       Date:  2018-11-16       Impact factor: 2.649

10.  Global copy number profiling of cancer genomes.

Authors:  Xuefeng Wang; Mengjie Chen; Xiaoqing Yu; Natapol Pornputtapong; Hao Chen; Nancy R Zhang; R Scott Powers; Michael Krauthammer
Journal:  Bioinformatics       Date:  2015-11-16       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.