| Literature DB >> 26178880 |
Abstract
BACKGROUND: Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly.Entities:
Mesh:
Year: 2015 PMID: 26178880 PMCID: PMC4503296 DOI: 10.1186/s12859-015-0651-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Procedure of propagating particles in deterministic sequential Monte Carlo. Each of K particles at step t−1 is propagated to L possible states at step t. Among K×L possible particles, only K particles with the highest weight are selected
Fig. 2Information about heterozygous sites provided by paired-end reads and organized in the observation matrix X. Erroneous base characters are highlighted in red font
The performance comparison on a CEU NA12878 data set sequenced using the 454 platform in the 1000 Genomes Project
| ParticleHap | HapCUT | ReFHap | |||||||
|---|---|---|---|---|---|---|---|---|---|
| chr | nPhased | MEC | Time(s) | nPhased | MEC | Time(s) | nPhased | MEC | Time(s) |
| 1 | 66661 | 2045 | 1.07 | 66616 | 2293 | 28.03 | 66490 | 2111 | 5.79 |
| 2 | 78002 | 2742 | 1.11 | 77970 | 2857 | 35.71 | 77853 | 2698 | 7.78 |
| 3 | 66217 | 2111 | 3.96 | 66178 | 2349 | 29.50 | 66071 | 2203 | 6.20 |
| 4 | 69939 | 2386 | 3.94 | 69901 | 2591 | 37.16 | 69786 | 2410 | 9.01 |
| 5 | 63723 | 1971 | 4.75 | 63693 | 2156 | 28.06 | 63605 | 2044 | 6.00 |
| 6 | 69750 | 3312 | 5.25 | 69706 | 3544 | 58.60 | 69580 | 3318 | 13.39 |
| 7 | 54330 | 1867 | 3.31 | 54302 | 2059 | 27.23 | 54202 | 1908 | 7.19 |
| 8 | 56406 | 1700 | 3.89 | 56382 | 1828 | 25.87 | 56281 | 1690 | 5.60 |
| 9 | 42244 | 1335 | 2.15 | 42230 | 1472 | 20.02 | 42157 | 1365 | 4.52 |
| 10 | 50022 | 1618 | 2.73 | 49998 | 1814 | 23.44 | 49900 | 1662 | 5.22 |
| 11 | 46141 | 1411 | 2.72 | 46124 | 1569 | 21.66 | 46051 | 1467 | 4.89 |
| 12 | 43333 | 1467 | 2.32 | 43315 | 1581 | 20.00 | 43251 | 1495 | 3.92 |
| 13 | 36952 | 1286 | 0.68 | 36937 | 1398 | 18.80 | 36872 | 1311 | 7.10 |
| 14 | 30349 | 887 | 0.38 | 30334 | 982 | 13.25 | 30293 | 916 | 2.90 |
| 15 | 26626 | 975 | 0.88 | 26614 | 1055 | 11.46 | 26567 | 968 | 2.60 |
| 16 | 31675 | 1156 | 0.87 | 31662 | 1257 | 14.84 | 31612 | 1185 | 3.98 |
| 17 | 21054 | 1206 | 0.59 | 21048 | 1223 | 11.19 | 21010 | 1172 | 5.35 |
| 18 | 28784 | 851 | 0.37 | 28769 | 936 | 11.94 | 28717 | 855 | 2.67 |
| 19 | 17018 | 653 | 0.25 | 17006 | 761 | 8.35 | 16961 | 687 | 4.25 |
| 20 | 21679 | 737 | 0.43 | 21673 | 790 | 9.50 | 21635 | 735 | 2.84 |
| 21 | 14737 | 485 | 0.41 | 14736 | 525 | 6.82 | 14714 | 500 | 2.09 |
| 22 | 12929 | 388 | 0.28 | 12925 | 433 | 5.38 | 12891 | 395 | 1.74 |
A comparison of the number of phased SNPs(nPhased), the MEC scores(MEC) and running time(Time) for different haplotype assembly algorithms, ParticleHap, HapCUT and ReFHap, on all of 22 chromosomes
The performance comparison on a simulated data set for g e=0.04
| ParticleHap | HapCUT | ReFHap | ||||||
|---|---|---|---|---|---|---|---|---|
| n | c | ImpGeAc | ReconRate | Time(s) | ReconRate | Time(s) | ReconRate | Time(s) |
| 100 | 4 | 0.6254 | 0.9785 | 0.02 | 0.9598 | 0.66 | 0.9497 | 0.11 |
| 6 | 0.6252 | 0.9794 | 0.02 | 0.9570 | 0.84 | 0.9481 | 0.25 | |
| 8 | 0.5977 | 0.9792 | 0.03 | 0.9590 | 1.01 | 0.9524 | 0.56 | |
| 10 | 0.5737 | 0.9780 | 0.03 | 0.9582 | 1.17 | 0.9517 | 1.23 | |
| 200 | 4 | 0.5935 | 0.9779 | 0.07 | 0.9594 | 1.78 | 0.9499 | 0.26 |
| 6 | 0.5977 | 0.9783 | 0.08 | 0.9597 | 2.26 | 0.9518 | 0.88 | |
| 8 | 0.5840 | 0.9757 | 0.09 | 0.9596 | 2.71 | 0.9524 | 2.56 | |
| 10 | 0.5758 | 0.9777 | 0.11 | 0.9593 | 3.13 | 0.9528 | 5.80 | |
| 300 | 4 | 0.6013 | 0.9715 | 0.17 | 0.9591 | 3.39 | 0.9493 | 0.53 |
| 6 | 0.5848 | 0.9720 | 0.20 | 0.9596 | 4.34 | 0.9511 | 2.12 | |
| 8 | 0.5842 | 0.9695 | 0.22 | 0.9598 | 5.14 | 0.9525 | 6.35 | |
| 10 | 0.5671 | 0.9703 | 0.24 | 0.9599 | 5.90 | 0.9537 | 14.68 | |
A comparison of reconstruction rate(ReconRate) and running time(Time) for different haplotype assembly algorithms, ParticleHap, HapCUT and ReFHap, on the simulated data for g e=0.04. For ParticleHap, the improvement rates of genotyping accuracy (ImpGeAc) are also reported
The performance comparison on a simulated data set for g e=0.08
| ParticleHap | HapCUT | ReFHap | ||||||
|---|---|---|---|---|---|---|---|---|
| n | c | ImpGeAc | ReconRate | Time(s) | ReconRate | Time(s) | ReconRate | Time(s) |
| 100 | 4 | 0.6211 | 0.9618 | 0.02 | 0.9193 | 0.66 | 0.9009 | 0.11 |
| 6 | 0.5941 | 0.9610 | 0.02 | 0.9184 | 0.86 | 0.8997 | 0.25 | |
| 8 | 0.5970 | 0.9585 | 0.03 | 0.9184 | 1.04 | 0.9017 | 0.56 | |
| 10 | 0.5845 | 0.9572 | 0.03 | 0.9193 | 1.19 | 0.9041 | 1.26 | |
| 200 | 4 | 0.6262 | 0.9615 | 0.08 | 0.9186 | 1.80 | 0.8979 | 0.27 |
| 6 | 0.5938 | 0.9418 | 0.09 | 0.9198 | 2.30 | 0.9021 | 0.89 | |
| 8 | 0.6050 | 0.9389 | 0.10 | 0.9193 | 2.75 | 0.9019 | 2.53 | |
| 10 | 0.5997 | 0.9438 | 0.12 | 0.9193 | 3.16 | 0.9039 | 5.84 | |
| 300 | 4 | 0.6245 | 0.9432 | 0.18 | 0.9197 | 3.43 | 0.8995 | 0.53 |
| 6 | 0.6069 | 0.9397 | 0.21 | 0.9198 | 4.49 | 0.9009 | 2.11 | |
| 8 | 0.6058 | 0.9315 | 0.23 | 0.9186 | 5.54 | 0.9009 | 6.29 | |
| 10 | 0.5792 | 0.9192 | 0.27 | 0.9187 | 6.37 | 0.9029 | 15.00 | |
A comparison of reconstruction rate(ReconRate) and running time(Time) for different haplotype assembly algorithms, ParticleHap, HapCUT and ReFHap, on the simulated data for g e=0.08. For ParticleHap, the improvement rates of genotyping accuracy (ImpGeAc) are also reported