| Literature DB >> 24849202 |
Yen-Jen Lin1, Yu-Tin Chen1, Shu-Ni Hsu2, Chien-Hua Peng3, Chuan-Yi Tang4, Tzu-Chen Yen5, Wen-Ping Hsieh2.
Abstract
Copy number variation (CNV) has been reported to be associated with disease and various cancers. Hence, identifying the accurate position and the type of CNV is currently a critical issue. There are many tools targeting on detecting CNV regions, constructing haplotype phases on CNV regions, or estimating the numerical copy numbers. However, none of them can do all of the three tasks at the same time. This paper presents a method based on Hidden Markov Model to detect parent specific copy number change on both chromosomes with signals from SNP arrays. A haplotype tree is constructed with dynamic branch merging to model the transition of the copy number status of the two alleles assessed at each SNP locus. The emission models are constructed for the genotypes formed with the two haplotypes. The proposed method can provide the segmentation points of the CNV regions as well as the haplotype phasing for the allelic status on each chromosome. The estimated copy numbers are provided as fractional numbers, which can accommodate the somatic mutation in cancer specimens that usually consist of heterogeneous cell populations. The algorithm is evaluated on simulated data and the previously published regions of CNV of the 270 HapMap individuals. The results were compared with five popular methods: PennCNV, genoCN, COKGEN, QuantiSNP and cnvHap. The application on oral cancer samples demonstrates how the proposed method can facilitate clinical association studies. The proposed algorithm exhibits comparable sensitivity of the CNV regions to the best algorithm in our genome-wide study and demonstrates the highest detection rate in SNP dense regions. In addition, we provide better haplotype phasing accuracy than similar approaches. The clinical association carried out with our fractional estimate of copy numbers in the cancer samples provides better detection power than that with integer copy number states.Entities:
Mesh:
Year: 2014 PMID: 24849202 PMCID: PMC4029584 DOI: 10.1371/journal.pone.0096841
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A haplotype tree with only two loci.
Figure 2Tree merging example.
Each branch is attached with a conditional probability C from the left node to the right node and a joint probability J of the haplotype between the left node and the right node. Panel (a) presents a tree before merging Node 2.1 and Node 2.3, and panel (b) presents the tree after merging.
Figure 3Branching to derive the amplification or deletion state.
The thick lines are marked with the formula of transition probabilities, while the thin lines can be calculated accordingly.
Groups of genotypic states with respective characteristics.
| Emission probability model | Description | Genotypic States |
| M0 | Allele A is present in both chromosome with total copy number greater than two, and Allele B is not present in either chromosome. | A+A+,A+A, AA+ |
| M1 | Allele A is present in at least one chromosome with total copy number close to two, and Allele B is barely present in either chromosome. | AA,A−A+, A+A−, A+B−, B−A+ |
| M2 | Allele A is present in one chromosome with copy number greater than one, and Allele B is present in the other chromosome with one normal copy. | A+B, BA+ |
| M3 | Both chromosomes present events of gain. | A+B+, B+A+ |
| M4 | Allele A is present in one chromosome with one normal copy, and Allele A or B is present in the other chromosome with less than one copy. | AB−,B−A,AA−,A−A |
| M5 | Allele A is present in one chromosome with one normal copy, and Allele B is present in the other chromosome with one normal copy. It is a normal heterozygote. | AB, BA |
| M6 | Allele B is present in one chromosome with copy number greater than one, and Allele A is present in the other chromosome with one normal copy. This is a symmetric model of M2 with flipping alleles. | AB+, B+A |
| M7 | Both chromosomes present events of loss. | B−B−,A−A−,A−B−, B−A−, |
| M8 | Allele B is present in one chromosome with one normal copy, and Allele A or Bis present in the other chromosome with less than one copy. This is a symmetric model of M4. | A−B, BA−, BB−, B−B |
| M9 | Allele B is present in at least one chromosome with total copy number close to two and, allele A is barely present in either chromosome. This is a symmetric model of M1. | BB, B−B+, B+B−, B+A−, A−B+ |
| M10 | Allele B is present in both chromosome with total copy number greater than two, and allele A is not present in either chromosome. This is a symmetric model of M0. | B+B+, B+B, BB+ |
Figure 4Centers of the eleven emission probability models.
The green, blue and red circles represent the observed samples with genotype AA, AB and BB according to the genotypes identified by Birdseed. The stars are the centers of the three initial groups. All the other centers are assigned according to the grids passing through the stars. M0 and M10 are extended from M1 and M9 so that distance between centers of M1 and M0 is equal to the distance between centers of M1 and M4, and the distance between centers of M9 and M10 is equal to the distance between centers of M9 and M8.
The mean vector and variance component assignment of the emission probability models at the initiation step.
| Emission probability model (M) | Mean (Mean of A signal, Mean of B signal) | Covariance (Variance of A signal, Covariance between A and B signal, Variance of B signal) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The assignments of M1, M5 and M9 are extended to all the other models.
Figure 5The shaded region applied to filter unreliable CNV prediction.
Figure 6A toy example of simulation.
The two male samples in the figure both provide their X chromosomes with their original amplifications or deletions. The chromosome from sample 1 carries amplifications from SNP1 to SNP3 and deletions at SNP6 and SNP7. The chromosome from sample 2 carries amplification at SNP3 and deletions at SNP5 and SNP6.
Comparison of concordant events across the four algorithms using the benchmark events published by Kidd et al (2008).
| Method | Concordance Rate | Sensitivity |
| PennCNV |
| 212/1989 (10.65%) |
| COKGEN | 151/440 (34.3%) | 151/1989 (7.59%) |
| GenoCNV | 550/6002 (9.2%) | 550/1989 (27.65%) |
| QuantiSNP | 373/1617 (23.1%) | 373/1989 (18.75%) |
Concordance Rate = number of concordant events/number of predicted events. The concordant events refer to the predicted segments that overlap with the benchmark events for least one SNP loci.
Comparison of concordant events across the three algorithms using the benchmark events published by Kidd et al (2008).
| Method | Concordance Rate | Sensitivity |
| HaplotypeCN |
| 24/1746 (1.37%) |
| PennCNV-SNP | 61/170 (35.9%) | 61/1746 (3.49%) |
| cnvHap | 89/365 (24.4%) | 89/1746 (5.10%) |
Concordance Rate = number of concordant events/number of predicted events. The concordant events refer to the predicted segments that overlap with the benchmark events for least one SNP loci.
Comparison of concordant events across the four algorithms using the benchmark events published by McCarroll et al (2008).
| Method | Concordance Rate | Sensitivity |
| PennCNV | 12166/21069 (57.7%) | 12166/42017 (28.83%) |
| COKGEN | 8930/14624 (61.1%) | 8930/42017 (31.65%) |
| GenoCNV | 29866/212398 (14.1%) | 29866/42017 (71.08%) |
| QuantiSNP | 19665/53800 (36.6%) | 19665/42017 (46.80%) |
Concordance Rate = number of concordant events/number of predicted events. The concordant events refer to the predicted segments that overlap with the benchmark events for least one SNP loci.
Comparison of concordant events across the three algorithms using only SNP probes events published by McCarroll et al (2008).
| Method | Concordance Rate | Sensitivity |
| HaplotypeCN |
| 1235/22844 (5.41%) |
| PennCNV-SNP | 3206/6759 (47.43%) | 3206/22844 (14.03%) |
| cnvHap | 4087/22702 (18.0%) | 4087/22844 (17.89%) |
Concordance Rate = number of concordant events/number of predicted events. The concordant events refer to the predicted segments that overlap with the benchmark events for least one SNP loci.
Comparison across the three algorithms in terms of overlapping regions, total copy numbers and allele specific copy numbers.
| Method | Rregion | Rtotal | Rspecific |
| HaplotypeCN |
|
|
|
| PennCNV-SNP | 91.163% (196/215) | 92.85% (182/196) | NA |
| cnvHap | 64.305% (236/367) | 85.16% (201/236) | 72.25% (2382/3297) |