| Literature DB >> 19837654 |
Chris D Greenman1, Graham Bignell, Adam Butler, Sarah Edkins, Jon Hinton, Dave Beare, Sajani Swamy, Thomas Santarius, Lina Chen, Sara Widaa, P Andy Futreal, Michael R Stratton.
Abstract
High-throughput oligonucleotide microarrays are commonly employed to investigate genetic disease, including cancer. The algorithms employed to extract genotypes and copy number variation function optimally for diploid genomes usually associated with inherited disease. However, cancer genomes are aneuploid in nature leading to systematic errors when using these techniques. We introduce a preprocessing transformation and hidden Markov model algorithm bespoke to cancer. This produces genotype classification, specification of regions of loss of heterozygosity, and absolute allelic copy number segmentation. Accurate prediction is demonstrated with a combination of independent experimental techniques. These methods are exemplified with affymetrix genome-wide SNP6.0 data from 755 cancer cell lines, enabling inference upon a number of features of biological interest. These data and the coded algorithm are freely available for download.Entities:
Mesh:
Year: 2009 PMID: 19837654 PMCID: PMC2800165 DOI: 10.1093/biostatistics/kxp045
Source DB: PubMed Journal: Biostatistics ISSN: 1465-4644 Impact factor: 5.899
Fig. 1.Allelic intensities for a single SNP across multiple samples. (A) The A allele intensity is plotted against the B allele intensity for each wild-type training sample at a single polymorphic probe. The MAP estimates of the linearly separated mean allelic intensities for genotypes AA, AB, and BB are indicated in red. (B) The same allelic intensities are plotted using the cancer samples. The significant reduction in clustering is evident.
Fig. 2.Genome-wide copy number estimates of diploid, triploid, and quadraploid samples HCC1806, HCC1187, and ZR-75-30, respectively. Copy number estimates are obtained using SKY (dashed), Birdsuite (red), and PICNIC (green).
Fig. 3.Absolute copy number, genotype intensity, and break-point likelihoods for cancer cell lines HCC1187. Each plot contains 3 sections. First are copy number intensities, followed by genotype intensities. Associated genotypes are indicated. Green and blue lines indicate total and minor estimated copy number. Black and red lines represent heterozygous and homozygous segments. Finally, the likelihoods of state change are plotted. The horizontal scale is genomic position in megabases. Vertical scales represent chromosomal copy number. (A and B) derive from chromosomes 14 and 19, respectively.
Genotype classes by copy number states. A description of possible genotypes for the first few minor and major copy numbers
| Total copy number ( | Number of genotype classes | Number of minor alleles ( | 1 | 2 |
| 0 | 1 | DEL | – | – |
| 1 | 1 | {A,B} | – | – |
| 2 | 2 | {AA,BB} | {AA,AB,BB} | – |
| 3 | 2 | {AAA,BBB} | {AAA,AAB,ABB,BBB} | – |
| 4 | 3 | {AAAA,BBBB} | {AAAA,AAAB,ABBB,BBBB} | {AAAA,AABB,BBBB} |
| 5 | 3 | {AAAAA,BBBBB} | {AAAAA,AAAAB, ABBBB,BBBBB} | {AAAAA,AAABB, AABBB,BBBBB} |
Validation methods. Results are summarized for validation of homozygous deletions, genotypes, LOH, copy number, break points, and amplifications. Statistics used include true positive and false positive rates (TPR, FPR), the percentage of correct calls and the mean error
| Data type | Validation set | Test set | Statistic | PICNIC | Birdsuite |
| Copy number | SKY | HCC1806(diploid) | % Correct | 65.35% | 59.83% |
| Copy number | SKY | HCC1187(triploid) | % Correct | 80.56% | 52.55% |
| Copy number | SKY | ZR-75-30(quadraploid) | % Correct | 77.67% | 6.43% |
| Homozygous deletions | confirmatory PCR for 7 known TSGs | 102 cell lines | TPR (FPR) | 77.55% (0.15%) | 59.18% (0.15%) |
| Genotypes | cDNA hom genotyping | 108 cell lines | % Correct | 96.45% | 70.13% |
| LOH | 400 microsatellite markers | 755 cell lines | TPR (FPR) | 58.20% (5.34%) | NA |
| Break points | SKY | HCC1806(diploid) | TPR | 55.41% | 56.76% |
| Break points | SKY | HCC1187(triploid) | TPR | 46.81% | 48.94% |
| Break points | SKY | ZR-75-30(quadraploid) | TPR | 75.51% | 63.27% |
| Amplicons | qPCR of GLO1 amplified cluster | 58 cell lines | Mean error | 5.44% | 11.51% |