| Literature DB >> 20858232 |
Christopher Yau1, Dmitri Mouradov, Robert N Jorissen, Stefano Colella, Ghazala Mirza, Graham Steers, Adrian Harris, Jiannis Ragoussis, Oliver Sieber, Christopher C Holmes.
Abstract
We describe a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours.Entities:
Mesh:
Year: 2010 PMID: 20858232 PMCID: PMC2965384 DOI: 10.1186/gb-2010-11-9-r92
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Example cancer SNP data. (Top panel) SNP data showing the distribution of Log R Ratio (LRR) and B allele frequencies (BAF) values across chromosome 1 for a cancer cell line (HCC1395) and its matched normal (HCC1395BL). The normal sample is characterized by a typical diploid pattern of zero mean LRR (copy number 2) and BAF values distributed around 0, 0.5 and 1 (genotypes AA, AB and BB) with occasional aberrations due to copy germline number variants (CNV). The cancer cell line consists of complex patterns of LRR and BAF values due to a variety of copy number alterations and loss-of-heterozygosity events. (Bottom panel) SNP data is shown for a single copy deletion and duplication on chromosome 21 for various normal-cancer cell line dilutions. In the presence of normal DNA contamination, the LRR signals for the deletion and duplication are diminished in magnitude and the distribution of the BAF values reflects the aggregated effect of mixed normal and cancer genotypes at each SNP. Note - the Log R Ratio values are smoothed and thinned for illustrative purposes.
Figure 2Illustrating the statistical model. (a) The tumor sample consists of DNA contributions from an unknown number of clones (here, we illustrate three clones) and normal cells in different proportions. Each clone has its own set of tumor genotypes which are derived from the normal genotypes by the loss or duplication of alleles. (b) Our statistical model assumes that, at each locus, there exists a normal and a common tumor genotype. OncoSNP estimates the normal and common tumor genotype and the proportion of the sample explained by each genotype from the SNP data. The situation depicted at SNP 5 involves clones with different tumor genotypes - this is not considered under our model.
OncoSNP tumor states
| Tumor states | |||
|---|---|---|---|
| 1 | 0 | (-, AA), (-, AB), (-, BB) | Homozygous deletion |
| 2 | 1 | (A, AA), (A, AB), (B, AB), (B, BB) | Hemizygous deletion |
| 3 | 2 | (AAAA, AA), (AAAB, AB), (ABBB, AB), (BBBB, BB) | Normal |
| 4 | 3 | (AAA, AA), (AAB, AB), (ABB, AB), (BBB, BB) | Single copy duplication |
| 5 | 4 | (AAAA, AA), (AAAB, AB), (ABBB, AB), (BBBB, BB) | 4n monoallelic amplification |
| 6 | 4 | (AAAA, AA), (AABB, AB), (BBBB, BB) | 4n balanced amplification |
| 7 | 5 | (AAAAA, AA), (AAAAB, AB), (ABBBB, AB), (BBBBB, BB) | 5n monoallelic amplification |
| 8 | 5 | (AAAAA, AA), (AAABB, AB), (AABBB, AB), (BBBBB, BB) | 5n unbalanced amplification |
| 9 | 6 | (AAAAAA, AA), (AAAAAB, AB), (ABBBBB, AB), (BBBBBB, BB) | 6n unbalanced amplification |
| 10 | 6 | (AAAAAA, AA), (AAAABB, AB), (AABBBB, AB), (BBBBB, BB) | 6n unbalanced amplification |
| 11 | 6 | (AAAAAA, AA), (AAABBB, AB), (BBBBB, BB) | 6n unbalanced amplification |
| 12 | 2 | (AA, AA), (AA, AB), (BB, AB), (BB, BB) | 2n somatic LOH |
| 13 | 3 | (AAA, AA), (AAA, AB), (BBB, AB), (BBB, BB) | 3n somatic LOH |
| 14 | 4 | (AAAA, AA), (AAAA, AB), (BBBB, AB), (BBBB, BB) | 4n somatic LOH |
| 15 | 5 | (AAAAA, AA), (AAAAA, AB), (BBBBB, AB), (BBBBB, BB) | 5n somatic LOH |
| 16 | 6 | (AAAAAA, AA), (AAAAAA, AB), (BBBBBB, AB), (BBBBBB, BB) | 6n somatic LOH |
| 17 | 2 | (AA, AA), (BB, BB) | 2n germline LOH |
| 18 | 2 | (AAA, AA), (BBB, BB) | 3n germline LOH |
| 19 | 2 | (AAAA, AA), (BBBB, BB) | 4n germline LOH |
| 20 | 2 | (AAAAA, AA), (BBBBB, BB) | 5n germline LOH |
| 21 | 2 | (AAAAAA, AA), (BBBBBB, BB) | 6n germline LOH |
Description of the 21 tumor states showing corresponding copy numbers and genotypes. OncoSNP assigns a score of each SNP being in each of the twenty-one tumor states.
Cancer cell lines
| Cancer cell lines | ||
|---|---|---|
| HL60 | 46 (44-46) | Liang et al. (1999) |
| HT29 | 70 (69-73) | Adbel-Rahman et al. (2000) |
| SW1417 | 70 (66-71) | Adbel-Rahman et al. (2000) |
| SW403 | 64 (60-65) | Adbel-Rahman et al. (2000) |
| SW480 | 58 (52-59) | Adbel-Rahman et al. (2000) |
| SW620 | 48 (45-49) | Adbel-Rahman et al. (2000) |
| SW837 | 38 (38-40) | Adbel-Rahman et al. (2000) |
| LIM1863 | 80 (66-82) | Adbel-Rahman et al. (2000) |
| MDA-MB-175 | 84 (82-89) | ATCC |
| MDA-MB-468 | 64 (60-67) | ATCC |
A list of cancer cell lines analyzed and estimates of their chromosome number retrieved from the literature.
Figure 3Estimating baseline Log R Ratio adjustments due to ploidy. OncoSNP Log R Ratio baseline adjustments (red) for cancer cell lines (a) HL60 (Chr10), (b) HT29 (Chr3) and (c) SW1417 (Chr8). HL60 has a near-diploid karyotype and OncoSNP has correctly identified that no Log R Ratio baseline adjustment is required. HT29 and SW1417 have complex polyploid karyotypes and transformation of the SNP data to a virtual diploid state needs to baseline ambiguity for the Log R Ratio. For example, in (b) and (c), regions of allelic balance with negative Log R Ratios are identified. OncoSNP correctly locates the true baseline level for the Log R Ratio. In (d) the estimated Log R Ratio baseline adjustment for the ten cancer cell lines analyzed is found to show a strong linear correlation to the modal chromosome number of each cell line. Baseline adjustments are standardized for comparison against the Log R Ratio level associated with copy number 3 as the SNP data were acquired from different versions of the Illumina SNP array.
Figure 4Example analysis of the normal-cancer cell line (SW837) mixture series. Copy number and LOH state classifications for chromosome 1 of the colon cancer cell line SW837.
Figure 5OncoSNP analysis of three normal-cancer cell line mixture series. Chromosome number estimates and copy number and LOH state misclassification rates for three normal-cancer cell line mixture series. OncoSNP produces the greatest self-consistency of the three methods tested. Red - OncoSNP, Green - GenoCN, Blue - GAP.
Figure 6A comparison of genome-wide copy number estimates using four variants of the OncoSNP model. Heatmaps are shown for genome-wide copy numbers from four variants of our model: (i) Germline model involving no Log R Ratio baseline correction or normal contamination, (ii) Ploidy-only model estimation of baseline correction used, (iii) Normal-only model estimation of normal DNA contamination used and (iv) Full model the complete OncoSNP model incorporating both baseline and normal DNA contamination estimation. The full model is able to accurately reproduce the same copy number profile for both cell lines (SW837/SW403) even in the presence of increasing levels of normal DNA contamination. If normal contamination or baseline correction estimation is not used incorrect copy number profiles maybe given.
Figure 7Genome-wide copy number profiles of primary breast tumors. Genome-wide copy number profiles for three primary breast tumors (non-dissected and microdissected) using OncoSNP, GenoCN and Genome Alteration Print (GAP).
Figure 8Analysis of a tumor sample with an unknown ploidy status and normal DNA contamination. A likelihood contour plot shows that there are three modes each corresponding to an alternative explanation of the SNP data: (a) the tumor has near-diploid karyotype and contaminated with 50% normal DNA content, (b) the tumor has a tetraploid karyotype with 60% normal DNA content and (c) the tumor has a near-triploid karyotype with negligible normal DNA content. The maximum log-likelihood at each mode is very similar.