| Literature DB >> 21464966 |
Daniele Catanzaro1, Martine Labbé, Luciano Porretta.
Abstract
The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al.[1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation.Entities:
Mesh:
Year: 2011 PMID: 21464966 PMCID: PMC3064666 DOI: 10.1371/journal.pone.0017937
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Graphical representation of an instance of PPH and the corresponding solution.
| Instance of PPH | ||||
| Genotypes | SNPs | |||
| Genotype 1 | 2 | 1 | 1 | 2 |
| Genotype 2 | 1 | 0 | 1 | 1 |
| Genotype 3 | 1 | 0 | 2 | 2 |
| Genotype 4 | 2 | 0 | 1 | 1 |
| Genotype 5 | 2 | 1 | 0 | 1 |
Performances of the CRM for PPH-UD when considering input data having an error ratio of 1%.
| Dataset | Time (sec.) | Gap (%) | Nodes | ||||||
| Average | Max | Min | Average | Max | Min | Average | Max | Min | |
|
| |||||||||
| 50×10 | 9.919 | 34.771 | 2.381 | 0.000 | 0.000 | 0.000 | 1.000 | 1 | 1 |
| 50×10r4 | 11.446 | 37.582 | 2.899 | 0.000 | 0.000 | 0.000 | 1.000 | 1 | 1 |
| 50×10r16 | 12.247 | 31.963 | 3.144 | 0.556 | 8.333 | 0.000 | 1.467 | 7 | 1 |
| 50×30 | 43.089 | 94.633 | 11.440 | 0.743 | 5.882 | 0.000 | 5.867 | 42 | 1 |
| 30×50 | 35.471 | 130.316 | 4.053 | 1.569 | 9.091 | 0.000 | 34.490 | 531 | 1 |
| 30×75 | 57.063 | 211.341 | 9.842 | 1.475 | 6.250 | 0.000 | 42.000 | 159 | 1 |
| 30×100 | 85.675 | 254.772 | 21.987 | 0.285 | 2.778 | 0.000 | 30.222 | 186 | 3 |
|
| |||||||||
| 50×10 | 10.601 | 88.367 | 1.857 | 8.333 | 0.000 | 0.000 | 1.000 | 1 | 0 |
| 50×30 | 41.606 | 134.018 | 4.833 | 0.401 | 5.882 | 0.000 | 18.200 | 81 | 1 |
| 30×50 | 30.277 | 89.376 | 1.531 | 1.144 | 4.785 | 0.000 | 60.400 | 231 | 1 |
| 30×75 | 98.110 | 235.745 | 25.101 | 0.410 | 3.947 | 0.000 | 175.000 | 1039 | 1 |
| 30×100 | 624.549 | 7209.090 | 44.103 | 0.866 | 4.545 | 0.000 | 2290.667 | 31284 | 17 |
|
| |||||||||
| CHR10-CEU | 35.717 | 81.623 | 0.465 | 1.667 | 5.000 | 0.000 | 124.000 | 341 | 1 |
| CHR21-CEU | 8.811 | 23.387 | 0.106 | 0.000 | 0.000 | 0.000 | 2.000 | 3 | 1 |
| CHR10-HBC | 111.930 | 308.471 | 0.000 | 2.381 | 7.143 | 0.000 | 800.667 | 1750 | 0 |
| CHR21-HBC | 33.311 | 99.340 | 0.069 | 5.556 | 16.667 | 0.000 | 29.667 | 83 | 1 |
| CHR10-JPT | 16.022 | 46.252 | 0.758 | 1.458 | 4.375 | 0.000 | 41.667 | 121 | 1 |
| CHR21-JPT | 14.004 | 19.798 | 8.209 | 0.018 | 0.035 | 0.000 | 39.500 | 76 | 3 |
| CHR10-YRI | 2491.275 | 7245.020 | 47.468 | 1.562 | 2.652 | 0.034 | 5477.333 | 16323 | 53 |
| CHR21-YRI | 2548.248 | 7641.350 | 0.459 | 2.667 | 8.000 | 0.000 | 143.667 | 429 | 1 |
Performances of the CRM for PPH-UD when considering input data having an error ratio of 5%.
| Dataset | Time (sec.) | Gap (%) | Nodes | ||||||
| Average | Max | Min | Average | Max | Min | Average | Max | Min | |
|
| |||||||||
| 50×10 | 9.775 | 35.814 | 2.030 | 0.000 | 0.000 | 0.000 | 1.000 | 1 | 1 |
| 50×10r4 | 8.699 | 62.160 | 2.642 | 0.000 | 0.000 | 0.000 | 1.000 | 1 | 1 |
| 50×10r16 | 11.055 | 44.405 | 2.635 | 0.000 | 0.000 | 0.000 | 2.067 | 7 | 1 |
| 50×30 | 43.273 | 122.443 | 7.878 | 1.338 | 5.882 | 0.000 | 5.533 | 49 | 1 |
| 30×50 | 36.903 | 178.408 | 4.270 | 1.620 | 9.091 | 0.000 | 34.760 | 827 | 1 |
| 30×75 | 40.422 | 115.258 | 10.835 | 1.383 | 6.250 | 0.000 | 9.900 | 37 | 1 |
| 30×100 | 98.348 | 354.685 | 7.330 | 0.484 | 2.590 | 0.000 | 43.700 | 244 | 1 |
|
| |||||||||
| 50×10 | 12.878 | 135.199 | 1.576 | 1.000 | 8.333 | 0.000 | 1.000 | 1 | 1 |
| 50×30 | 42.623 | 193.406 | 3.422 | 0.498 | 4.762 | 0.000 | 14.667 | 111 | 1 |
| 30×50 | 32.124 | 120.886 | 1.576 | 0.837 | 4.762 | 0.000 | 62.467 | 562 | 1 |
| 30×75 | 110.105 | 323.692 | 29.245 | 0.644 | 3.819 | 0.000 | 203.200 | 1317 | 3 |
| 30×100 | 639.736 | 7210.800 | 45.658 | 0.642 | 4.000 | 0.000 | 2527.867 | 34214 | 6 |
|
| |||||||||
| CHR10-CEU | 25.568 | 76.186 | 0.033 | 5.000 | 5.000 | 0.000 | 83.667 | 249 | 1 |
| CHR21-CEU | 12.137 | 32.109 | 0.593 | 0.000 | 0.000 | 0.000 | 16.333 | 27 | 1 |
| CHR10-HBC | 54.896 | 150.916 | 1.570 | 2.381 | 7.143 | 0.000 | 121.000 | 321 | 1 |
| CHR21-HBC | 33.719 | 100.581 | 0.071 | 10.317 | 16.667 | 0.000 | 9.667 | 23 | 1 |
| CHR10-JPT | 5.204 | 14.500 | 0.003 | 1.668 | 5.000 | 0.000 | 5.000 | 13 | 1 |
| CHR21-JPT | 9.691 | 19.591 | 0.021 | 1.830 | 5.490 | 0.000 | 43.667 | 123 | 1 |
| CHR10-YRI | 2864.727 | 7254.880 | 49.311 | 0.889 | 2.667 | 0.000 | 5464.000 | 13956 | 77 |
| CHR21-YRI | 2551.640 | 7651.670 | 0.165 | 8.684 | 26.051 | 0.000 | 128.000 | 382 | 1 |
Performances of the CRM for PPH-UD when considering input data having an error ratio of 10%.
| Dataset | Time (sec.) | Gap (%) | Nodes | ||||||
| Average | Max | Min | Average | Max | Min | Average | Max | Min | |
|
| |||||||||
| 50×10 | 9.641 | 47.717 | 2.461 | 0.000 | 0.000 | 0.000 | 1.000 | 1 | 1 |
| 50×10r4 | 16.057 | 62.160 | 2.663 | 0.000 | 0.000 | 0.000 | 1.067 | 2 | 1 |
| 50×10r16 | 13.838 | 44.178 | 2.678 | 0.000 | 0.000 | 0.000 | 2.667 | 9 | 1 |
| 50×30 | 38.338 | 85.872 | 9.874 | 1.445 | 8.889 | 0.000 | 6.733 | 35 | 1 |
| 30×50 | 39.799 | 222.460 | 3.249 | 1.192 | 8.333 | 0.000 | 40.020 | 1113 | 1 |
| 30×75 | 43.441 | 138.641 | 10.366 | 1.579 | 6.250 | 0.000 | 22.800 | 58 | 1 |
| 30×100 | 120.666 | 323.663 | 17.943 | 0.303 | 2.778 | 0.000 | 78.800 | 538 | 1 |
|
| |||||||||
| 50×10 | 13.174 | 126.922 | 1.765 | 8.333 | 0.000 | 0.000 | 1.000 | 1 | 0 |
| 50×30 | 40.194 | 84.860 | 2.418 | 0.919 | 5.882 | 0.000 | 23.933 | 265 | 1 |
| 30×50 | 27.455 | 73.425 | 1.488 | 0.922 | 4.737 | 0.000 | 18.467 | 66 | 1 |
| 30×75 | 108.737 | 325.864 | 33.529 | 0.814 | 4.348 | 0.000 | 250.733 | 1539 | 3 |
| 30×100 | 1563.970 | 7208.110 | 37.634 | 0.773 | 4.000 | 0.000 | 11673.800 | 74593 | 2 |
|
| |||||||||
| CHR10-CEU | 32.592 | 95.712 | 0.209 | 0.000 | 0.000 | 0.000 | 144.667 | 431 | 1 |
| CHR21-CEU | 4.935 | 11.259 | 0.544 | 1.852 | 5.556 | 0.000 | 2.000 | 4 | 1 |
| CHR10-HBC | 185.619 | 529.879 | 2.228 | 2.381 | 7.143 | 0.015 | 1739.333 | 4926 | 1 |
| CHR21-HBC | 42.578 | 127.162 | 0.074 | 10.317 | 16.667 | 0.000 | 33.000 | 93 | 1 |
| CHR10-JPT | 19.037 | 56.795 | 0.003 | 1.667 | 5.000 | 0.000 | 33.667 | 99 | 1 |
| CHR21-JPT | 8.433 | 19.635 | 2.716 | 2.225 | 6.667 | 0.000 | 33.667 | 99 | 1 |
| CHR10-YRI | 2992.214 | 7231.800 | 30.751 | 0.877 | 2.632 | 0.000 | 6538.000 | 14597 | 9 |
| CHR21-YRI | 2534.816 | 7600.780 | 1.083 | 4.119 | 12.356 | 0.000 | 117.000 | 349 | 1 |
Performances of the CRM for PPH-UD when considering input data having an error ratio of 15%.
| Dataset | Time (sec.) | Gap (%) | Nodes | ||||||
| Average | Max | Min | Average | Max | Min | Average | Max | Min | |
|
| |||||||||
| 50×10 | 10.092 | 57.006 | 1.560 | 0.000 | 0.000 | 0.000 | 1.000 | 1 | 1 |
| 50×10r4 | 11.570 | 33.895 | 3.275 | 0.000 | 0.000 | 0.000 | 1.000 | 1 | 1 |
| 50×10r16 | 9.209 | 18.627 | 2.790 | 1.068 | 8.333 | 0.000 | 1.467 | 5 | 1 |
| 50×30 | 44.276 | 157.222 | 7.467 | 1.123 | 5.882 | 0.000 | 4.133 | 15 | 1 |
| 30×50 | 39.772 | 222.039 | 6.187 | 1.006 | 8.333 | 0.000 | 42.700 | 1149 | 1 |
| 30×75 | 49.558 | 160.645 | 8.048 | 1.409 | 6.250 | 0.000 | 19.500 | 67 | 1 |
| 30×100 | 97.879 | 262.453 | 23.986 | 0.664 | 3.836 | 0.000 | 32.800 | 98 | 1 |
|
| |||||||||
| 50×10 | 9.925 | 126.922 | 2.113 | 1.000 | 8.333 | 0.000 | 1.133 | 1 | 1 |
| 50×30 | 38.689 | 84.860 | 4.000 | 0.400 | 5.882 | 0.000 | 11.267 | 265 | 1 |
| 30×50 | 33.272 | 73.425 | 1.793 | 0.614 | 4.737 | 0.000 | 54.400 | 66 | 1 |
| 30×75 | 88.030 | 325.864 | 32.260 | 0.800 | 4.348 | 0.000 | 95.400 | 1539 | 3 |
| 30×100 | 631.495 | 7207.900 | 50.928 | 1.157 | 4.270 | 0.000 | 2371.200 | 32622 | 12 |
|
| |||||||||
| CHR10-CEU | 27.340 | 68.299 | 1.669 | 0.000 | 0.000 | 0.000 | 53.667 | 155 | 3 |
| CHR21-CEU | 13.455 | 36.601 | 0.522 | 0.000 | 0.000 | 0.000 | 6.667 | 12 | 1 |
| CHR10-HBC | 87.770 | 248.825 | 2.228 | 4.347 | 7.143 | 0.015 | 384.333 | 1127 | 1 |
| CHR21-HBC | 39.878 | 118.866 | 0.075 | 10.317 | 16.667 | 0.000 | 27.000 | 69 | 1 |
| CHR10-JPT | 23.436 | 69.457 | 0.002 | 1.667 | 5.000 | 0.000 | 72.667 | 213 | 1 |
| CHR21-JPT | 2403.091 | 7188.610 | 2.781 | 2.222 | 6.667 | 0.000 | 4406.333 | 13125 | 1 |
| CHR10-YRI | 702.533 | 1777.450 | 62.414 | 0.000 | 0.000 | 0.000 | 583.000 | 1635 | 48 |
| CHR21-YRI | 2545.198 | 7630.880 | 1.500 | 5.449 | 16.346 | 0.000 | 123.333 | 368 | 1 |
Performances of the CRM for PPH (RM version) on Brown and Harrower's datasets [30].
| Dataset | Time (sec.) | Gap (%) | Nodes | ||||||
| Average | Max | Min | Average | Max | Min | Average | Max | Min | |
|
| |||||||||
| 50×10 | 1.143 | 2.404 | 0.102 | 0.000 | 0 | 0 | 1.000 | 1 | 1 |
| 50×10r4 | 1.730 | 6.104 | 0.043 | 1.179 | 10 | 0 | 1.000 | 1 | 1 |
| 50×10r16 | 8.092 | 30.623 | 2.011 | 1.644 | 10.7692 | 0 | 1.533 | 9 | 1 |
| 50×30 | 11.772 | 53.42 | 2.732 | 2.440 | 7.14286 | 0 | 2.000 | 15 | 1 |
| 30×50 | 8.922 | 47.467 | 0.73 | 1.694 | 7.69231 | 0 | 10.260 | 75 | 1 |
| 30×75 | 15.624 | 35.693 | 1.358 | 1.649 | 6.66667 | 0 | 24.300 | 92 | 1 |
| 30×100 | 10.142 | 31.994 | 2.593 | 1.402 | 7.35294 | 0 | 8.500 | 25 | 1 |
|
| |||||||||
| 50×10 | 0.634 | 1.726 | 0.127 | 0.513 | 7.69231 | 0 | 2.400 | 11 | 1 |
| 50×30 | 11.882 | 30.411 | 1.59 | 1.164 | 6.25 | 0 | 11.867 | 35 | 1 |
| 30×50 | 10.764 | 24.108 | 0.815 | 0.890 | 4.09091 | 0 | 20.533 | 61 | 1 |
| 30×75 | 22.389 | 61.869 | 3.537 | 1.038 | 5.55556 | 0 | 62.286 | 387 | 1 |
| 30×100 | 74.925 | 462.791 | 12.953 | 1.521 | 4.7619 | 0 | 216.071 | 1679 | 8 |
|
| |||||||||
| CHR10-CEU | 102.792 | 305.103 | 0.774 | 0.000 | 0 | 0 | 270.333 | 807 | 1 |
| CHR21-CEU | 18.868 | 54.562 | 0.428 | 1.515 | 4.54545 | 0 | 49.667 | 145 | 1 |
| CHR10-HBC | 38.058 | 96.324 | 8.746 | 2.593 | 7.77778 | 0 | 67.000 | 151 | 1 |
| CHR21-HBC | 0.182 | 0.456 | 0.017 | 0.000 | 0 | 0 | 8.000 | 19 | 1 |
| CHR10-JPT | 0.895 | 1.583 | 0.368 | 1.515 | 4.54545 | 0 | 7.000 | 11 | 1 |
| CHR21-JPT | 1.781 | 2.87 | 0.967 | 0.833 | 2.5 | 0 | 15.667 | 29 | 1 |
| CHR10-YRI | 73.723 | 116.127 | 31.353 | 1.111 | 3.33333 | 0 | 89.667 | 123 | 63 |
| CHR21-YRI | 2349.331 | 6819.2 | 50.012 | 0.000 | 0 | 0 | 3815.667 | 11199 | 123 |
Accuracy of the CRM for PPH-UD under different error ratios.
| Dataset | 1(%) | 5(%) | 10(%) | 15(%) | ||||||||
| Average | Max | Min | Average | Max | Min | Average | Max | Min | Average | Max | Min | |
|
| ||||||||||||
| 50×10 | 100.00 | 100.00 | 100.00 | 99.01 | 100.00 | 92.31 | 100.00 | 100.00 | 100.00 | 98.02 | 100.00 | 90.91 |
| 50×10r4 | 100.00 | 100.00 | 100.00 | 99.51 | 100.00 | 88.89 | 99.02 | 100.00 | 88.89 | 99.51 | 100.00 | 88.89 |
| 50×10r16 | 99.11 | 100.00 | 92.31 | 98.22 | 100.00 | 84.62 | 98.22 | 100.00 | 92.31 | 96.00 | 100.00 | 75.00 |
| 50×30 | 98.82 | 100.00 | 92.86 | 96.85 | 100.00 | 78.57 | 92.91 | 100.00 | 70.59 | 94.88 | 100.00 | 71.43 |
| 30×50 | 95.68 | 100.00 | 57.14 | 90.22 | 100.00 | 57.14 | 89.83 | 100.00 | 50.00 | 84.24 | 100.00 | 25.00 |
| 30×75 | 96.59 | 100.00 | 81.25 | 88.07 | 100.00 | 62.50 | 86.36 | 100.00 | 62.50 | 77.84 | 100.00 | 6.25 |
| 30×100 | 96.59 | 100.00 | 81.25 | 94.89 | 100.00 | 82.35 | 90.91 | 100.00 | 75.00 | 90.34 | 100.00 | 64.71 |
|
| ||||||||||||
| 50×10 | 95.07 | 100.00 | 81.25 | 92.61 | 100.00 | 78.57 | 96.55 | 100.00 | 86.67 | 94.58 | 100.00 | 81.25 |
| 50×30 | 92.38 | 100.00 | 82.35 | 88.41 | 100.00 | 64.71 | 89.40 | 100.00 | 64.71 | 86.42 | 100.00 | 64.71 |
| 30×50 | 88.55 | 100.00 | 63.16 | 86.87 | 100.00 | 68.42 | 85.52 | 100.00 | 68.75 | 81.14 | 100.00 | 59.09 |
| 30×75 | 85.50 | 100.00 | 60.00 | 80.97 | 100.00 | 56.52 | 79.46 | 100.00 | 66.67 | 80.97 | 100.00 | 56.52 |
| 30×100 | 76.72 | 100.00 | 68.00 | 82.18 | 95.00 | 72.00 | 75.00 | 95.71 | 65.00 | 78.74 | 95.71 | 60.00 |
|
| ||||||||||||
| CHR10-CEU | 89.39 | 100.00 | 80.00 | 83.33 | 100.00 | 70.83 | 78.79 | 86.36 | 70.83 | 68.18 | 80.00 | 54.17 |
| CHR21-CEU | 93.75 | 100.00 | 83.33 | 91.67 | 100.00 | 83.33 | 58.33 | 75.00 | 33.33 | 56.25 | 83.33 | 27.78 |
| CHR10-HBC | 78.05 | 90.00 | 64.71 | 68.29 | 92.86 | 23.53 | 53.66 | 85.71 | 23.53 | 51.22 | 80.00 | 17.65 |
| CHR21-HBC | 99.84 | 100.00 | 73.68 | 99.81 | 100.00 | 68.42 | 65.63 | 100.00 | 47.37 | 71.88 | 100.00 | 52.63 |
| CHR10-JPT | 80.95 | 100.00 | 72.73 | 71.43 | 100.00 | 55.00 | 66.67 | 90.91 | 45.00 | 59.52 | 100.00 | 30.00 |
| CHR21-JPT | 84.91 | 94.12 | 66.67 | 69.81 | 80.95 | 60.00 | 50.94 | 57.14 | 41.18 | 43.40 | 52.94 | 33.33 |
| CHR10-YRI | 73.26 | 76.00 | 69.44 | 66.28 | 80.00 | 52.78 | 53.49 | 84.00 | 30.56 | 48.84 | 68.00 | 25.00 |
| CHR21-YRI | 63.29 | 100.00 | 50.94 | 50.63 | 100.00 | 35.85 | 41.77 | 100.00 | 20.75 | 40.51 | 100.00 | 26.42 |