| Literature DB >> 21695067 |
Susann Stjernqvist1, Tobias Rydén, Chris D Greenman.
Abstract
SNP allelic copy number data provides intensity measurements for the two different alleles separately. We present a method that estimates the number of copies of each allele at each SNP position, using a continuous-index hidden Markov model. The method is especially suited for cancer data, since it includes the fraction of normal tissue contamination, often present when studying data from cancer tumors, into the model. The continuous-index structure takes into account the distances between the SNPs, and is thereby appropriate also when SNPs are unequally spaced. In a simulation study we show that the method performs favorably compared to previous methods even with as much as 70% normal contamination. We also provide results from applications to clinical data produced using the Affymetrix genome-wide SNP 6.0 platform.Entities:
Keywords: allelic copy number; cancer; hidden Markov model; normal cell contamination
Year: 2011 PMID: 21695067 PMCID: PMC3118450 DOI: 10.4137/CIN.S6873
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Genotype sets for the different states of the Markov chain, sorted in the order given by the total copy number and copy number of the minor allele.
| 1 | (0,0) | { } |
| 2 | (1,0) | {A, B} |
| 3 | (2,0) | {AA, BB} |
| 4 | (2,1) | {AA, AB, BB} |
| 5 | (3,0) | {AAA, BBB} |
| 6 | (3,1) | {AAA, AAB, ABB, BBB} |
| 7 | (4,0) | {4A, 4B} |
| 8 | (4,1) | {4A, 3AB, A3B, 4B} |
| 9 | (4,2) | {4A, 2A2B, 4B} |
| 10 | (5,0) | {5A, 5B} |
| 11 | (5,1) | {5A, 4AB, A4B, 5B} |
| 12 | (5,2) | {5A, 3A2B, 2A3B, 5B} |
| 13 | (6,0) | {6A, 6B} |
| 14 | (6,1) | {6A, 5AB, A5B, 6B} |
| 15 | (6,2) | {6A, 4A2B, 2A4B, 6B} |
| 16 | (6,3) | {6A, 3A3B, 6B} |
Figure 1Proportions of probes at which the Markov state was incorrectly reconstructed by the Viterbi algorithm with MAP parameter estimates computed by the EM algorithm. Markov transition rates were λ = η = 10−7 (top left), λ = 10−7, η = 10−9 (top right), λ = 10−9, η = 10−7 (bottom left), λ = η = 10−9 (bottom right) (unit: bp−1). Confidence intervals were obtained by exponentiating two-sided 95% student-t confidence limits based on the log-proportions for 10 genome replicates.
Combined genotype sets for the different states of the Markov chain, in a model with normal contamination γ. The weights for the respective combined genotypes are the Hardy-Weinberg weights as in the model without normal tissue contamination, and the total and minor copy numbers for the abberated components are as in Table 1.
| 1 | {2γA, γAγB, 2γB} |
| 2 | {(1 + γ)A, AγB, γAB, (1 + γ )B} |
| 3 | {2A, (2 − γ)AγB, γA(2 − γ)B, 2B} |
| 4 | {AA, AB, BB} |
| 5 | {(3 − γ)A, (3 − 2γ)AγB, γA(3 − 2γ)B, (3 − γ)B} |
| 6 | {(3 − γ)A, (2 − γ)AB, A(2 − γ)B, (3 − γ)B} |
| 7 | {(4 − 2γ)A, (4 − 3γ)AγB, γA(4 − 3γ)B, (4 − 2γ)B} |
| 8 | {(4 − 2γ)A, (3 − 2γ)AB, A(2 − γ)B, (4 − 2γ)B} |
| 9 | {(4 − 2γ)A, (2 − γ)A(2 − γ)B, (4 − 2γ)B} |
| 10 | {(5 − 3γ)A, (5 − 4γ)AγB, γA(5 − 4γ)B, (5 − 3γ)B} |
| 11 | {(5 − 3γ)A, (4 − 3γ)AB, A(4 − 3γ)B, (5 − 3γ)B} |
| 12 | {(5 − 3γ)A, (3 − 2γ)A(2 − γ)B, (2 − γ)A(3 − 2γ)B, (5 − 3γ)B} |
| 13 | {(6 − 4γ)A, (6 − 5γ)AγB, γA(6 − 5γ)B, (6 − 4γ)B} |
| 14 | {(6 − 4γ)A, (5 − 4γ)AB, A(5 − 4γ)B, (6 − 4γ)B} |
| 15 | {(6 − 4γ)A, (4 − 3γ)A(2 − γ)B, (2 − γ)A(4 − 3γ)B, (6 − 4γ)B} |
| 16 | {(6 − 4γ)A, (3 − 2γ)A(3 − 2γ)B, (6 − 4γ)B} |
Figure 2Estimates of normal contamination γ for iterations 1–10 of the EM algorithm and three simulated replicates with different values of γ: γ = 0.3 (top), γ = 0.5 (middle), and γ = 0.7 (bottom). The initial value for γ was 0.5 in all simulations.
Figure 3Top: Viterbi reconstruction of the Markov path for chromosome 3 in PD1753. Bottom: sum of (standardized) allele intensities for probes within the same chromosome (grey dots), and the copy number of the corresponding state (black solid line).
Figure 4Scatter plot of standardized measured allele intensities in the segment reconstructed to Markov state 2 in Figure 3. The fraction of normal contamination was estimated at 0.53.
Figure 5Scatter plot of standardized measured allele intensities in the segment reconstructed to Markov state 4 in Figure 3.