| Literature DB >> 19581427 |
Wei Sun1, Fred A Wright, Zhengzheng Tang, Silje H Nordgard, Peter Van Loo, Tianwei Yu, Vessela N Kristensen, Charles M Perou.
Abstract
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.Entities:
Mesh:
Year: 2009 PMID: 19581427 PMCID: PMC2935461 DOI: 10.1093/nar/gkp493
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Six states of genoCNV
| State | Copy number | Genotype |
|---|---|---|
| 1 | 2 | AA, AB, BB |
| 2 | 2 | AA, BB |
| 3 | 0 | Null |
| 4 | 1 | A, B |
| 5 | 3 | AAA, AAB, ABB, BBB |
| 6 | 4 | AAAA, AAAB, AABB, ABBB, BBBB |
Figure 1.BAF and LRR of chromosome 5 of TCGA sample 02_0099, as well as the results of genoCNA and PennCNV. The y-axis of the results of genoCNA/PennCNV corresponds to copy number. For a certain copy number, there may be different states, which are distinguished by different colors. In this example, for copy number 2, ‘light blue’ and ‘dark blue’ indicate States 1 and 2 of genoCNA, respectively. When copy number is 3, ‘orange’ and ‘dark red’ indicate States 5 and 6 of genoCNA, respectively.
Nine states of genoCNA
| State | Copy number | Genotype |
|---|---|---|
| 1 | 2 | AA, AB, BB |
| 2 | 2 | AA, (AA, |
| 3 | 0 | Null |
| 4 | 1 | (A, |
| 5 | 3 | (AAA, |
| 6 | 3 | (AAA, |
| 7 | 4 | (AAAA, |
| 8 | 4 | (AAAA, |
| 9 | 4 | (AAAA, |
Genotype classes in parenthesis, such as (A, AB) are due to normal tissue contamination of genotype A from tumor tissue and genotype AB from normal tissue. Here we use underscore to indicate that the genotype is from normal tissue contamination.
Correspondence between genotypes in normal tissue and tumor tissue
| Normal | HMM states and genotypes in tumor tissue | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| AA | AA | AA | Null | A | AAA | AAA | AAAA | AAAA | AAAA |
| BB | BB | BB | Null | B | BBB | BBB | BBBB | BBBB | BBBB |
| AB | AB | AA, BB | Null | A, B | AAB, ABB | AAA, BBB | AABB | AAAA, BBBB | AAAB, ABBB |
Comparison of PennCNV (P) and genoCNV (X) by the number/proportion of CNVs that match the common CNVs reported by McCarroll et al. (8)
| 36 CEU samples | 51 YRI samples | 75 CHB+JPT samples | ||||
|---|---|---|---|---|---|---|
| P | X | P | X | P | X | |
| Total | 1483 | 1444 | 2113 | 2289 | 2550 | 2440 |
| Match (%) | 479 (32) | 478 (33) | 889 (42) | 886 (39) | 997 (39) | 961 (39) |
Figure 2.BAF and LRR of chromosome 17 of TCGA sample 02_0003, as well as the results of genoCNA and PennCNV.
Figure 3.BAF and LRR of chromosome 13 of TCGA sample 02_0114, as well as the results of genoCNA with three different setups in terms whether we assume tissue contamination and whether to use genotype from normal tissue. When copy number is 3, two colors ‘orange’ and ‘dark red’ indicate States 5 and 6 of genoCNA, respectively.
Proportion of the SNPs of TCGA sample 02_0114 that have high posterior probabilities of copy number/genotype calls
| Tissue contamination | Genotype from normal tissue | Posterior probability | |||
|---|---|---|---|---|---|
| ≥0.8 | ≥0.9 | ≥0.95 | ≥0.99 | ||
| No | No | 99.8/97.1 | 99.7/95.8 | 99.6/93.6 | 99.3/84.7 |
| Yes | No | 99.9/97.9 | 99.8/97.0 | 99.7/95.3 | 99.5/87.8 |
| Yes | Yes | 100.0/98.9 | 99.9/98.6 | 99.9/98.3 | 99.8/97.0 |
There are two numbers in each cell: a/b, where a is the proportion for copy number states and b is the proportion for genotype states.
Figure 4.Scatter plot of LRR and BAF for 547 458 SNPs of TCGA sample 02_0114. These SNPs are from CNA regions (including copy number neutral LOH) and the posterior probabilities of the most likely genotype class are >0.95.
Figure 5.(a) Comparison of the proportion of tumor sample (pT) estimated using mean BAF values when copy number is 1 [genotype (A, AB)] and two [genotype (AA, AB)]. Each point corresponds to one sample. The diagonal line is y = x. (b) The distribution of pT, estimated using mean BAF values when copy number is 1.
Figure 6.Parental-specific deletion at chromosome 17 of TCGA sample 02_0007. (a) The BAF, LRR and copy number calls by genoCN around a CNA (deletion) region. (b) In the BAF of the SNPs (of which the genotypes are heterozygous in normal tissue) in this CNA region. (c) This shows which haplotype the remaining allele belongs to at each SNP.