| Literature DB >> 29970002 |
Stefano Beretta1, Murray D Patterson2, Simone Zaccaria3, Gianluca Della Vedova1, Paola Bonizzoni1.
Abstract
BACKGROUND: Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions. Unfortunately, current state-of-the-art dynamic programming approaches designed for long reads deal only with limited coverages.Entities:
Keywords: Haplotype assembly; High coverage; Long reads; Minimum error correction; Single individual haplotyping
Mesh:
Year: 2018 PMID: 29970002 PMCID: PMC6029272 DOI: 10.1186/s12859-018-2253-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Switch error percentage on the real Ashkenazim dataset, Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 |
| 0.662 | 0.342 | 0.342 | 0.342 | 2.813 | 3.303 | 3.547 |
| 30 | 0.324 | 0.623 | 0.337 | 0.333 |
| 2.420 | 2.980 | 3.133 |
| 35 |
| 0.601 | 0.324 | 0.332 | 0.333 | 2.221 | - | 2.933 |
| 40 |
| 0.575 | 0.336 | 0.332 | 0.332 | 2.027 | - | 2.691 |
| 45 |
| 0.533 | 0.348 | 0.336 | 0.328 | 1.932 | - | 2.522 |
| 50 |
| 0.490 | 0.340 |
| 0.327 | 1.864 | - | 2.303 |
| 55 |
| 0.452 | 0.327 | 0.331 |
| 1.774 | - | 2.268 |
| 60 | 0.327 | 0.452 | 0.326 |
|
| 1.740 | - | 2.123 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (lowest value) for each dataset is boldfaced
Hamming distance on the real Ashkenazim dataset, Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 | 0.54 | 2.41 | 0.64 | 0.84 |
| 3.96 | 3.42 | 5.53 |
| 30 | 0.35 | 2.18 | 0.64 | 0.60 |
| 3.46 | 3.41 | 5.38 |
| 35 |
| 2.02 | 0.37 | 0.42 | 0.37 | 3.99 | - | 5.62 |
| 40 |
| 1.66 | 0.45 | 0.44 |
| 3.10 | - | 5.08 |
| 45 | 0.38 | 1.80 | 0.43 | 0.42 |
| 3.02 | - | 4.49 |
| 50 | 0.41 | 1.47 | 0.41 | 0.38 |
| 2.84 | - | 4.32 |
| 55 | 0.40 | 0.87 |
| 0.41 | 0.37 | 3.28 | - | 4.67 |
| 60 | 0.39 | 1.25 |
| 0.36 | 0.35 | 3.60 | - | 5.06 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (lowest value) for each dataset is boldfaced
QAN50 on the real Ashkenazim dataset, Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 | 79452 | 76856 |
|
| 78192 | 48097 | 45492 | 45445 |
| 30 |
| 80150 | 80426 | 80426 | 80150 | 52713 | 50806 | 49308 |
| 35 |
| 81464 | 81757 | 81757 | 81464 | 54182 | - | 51766 |
| 40 |
| 82758 | 83802 | 83802 | 83263 | 57589 | - | 55014 |
| 45 |
| 86001 |
|
| 86001 | 59161 | - | 57008 |
| 50 | 89669 | 89738 |
|
| 89306 | 60380 | - | 59447 |
| 55 |
|
| 91224 | 91224 | 90718 | 62652 | - | 59582 |
| 60 | 94913 | 92938 |
|
| 92565 | 64710 | - | 62655 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (highest value) for each dataset is boldfaced
Time in seconds of the tools on real Ashkenazim datasets of Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 | 591 | 39456 | 1115 | 9278 | 1563 | 80 | 43573 | 3 |
| 30 | 1292 | 46564 | 1031 | 10753 | 1596 | 196 | 72696 | 4 |
| 35 | 2193 | 50071 | 1122 | 11959 | 1888 | 308 | - | 4 |
| 40 | 3095 | 50301 | 1247 | 12570 | 2160 | 499 | - | 5 |
| 45 | 3888 | 51570 | 1308 | 12735 | 2388 | 822 | - | 6 |
| 50 | 4579 | 53030 | 1395 | 12996 | 2731 | 1192 | - | 8 |
| 55 | 5103 | 54012 | 1534 | 13252 | 2983 | 1777 | - | 9 |
| 60 | 5550 | 53496 | 1605 | 13469 | 3216 | 2493 | - | 13 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare
Peak of RAM usage in Megabytes of the tools on real Ashkenazim datasets of Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 | 1370 | 2263 | 930 | 5510 | 3266 | 3005 | 4693 | 3005 |
| 30 | 1661 | 2562 | 931 | 6195 | 3270 | 3005 | 5355 | 3005 |
| 35 | 1966 | 2908 | 931 | 6513 | 3276 | 3005 | - | 3005 |
| 40 | 2291 | 3231 | 931 | 6483 | 3279 | 3005 | - | 3005 |
| 45 | 2636 | 3190 | 952 | 6937 | 3283 | 3005 | - | 3005 |
| 50 | 3158 | 3286 | 1007 | 7144 | 3287 | 3005 | - | 3005 |
| 55 | 3549 | 3479 | 1042 | 7229 | 3292 | 3005 | - | 3005 |
| 60 | 3968 | 5412 | 1073 | 7430 | 3296 | 3005 | - | 3005 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare
Switch error percentage on simulated datasets of Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 |
| 0.218 |
| 0.039 | 0.037 | 1.081 | 1.487 | 2.112 |
| 30 |
| 0.181 | 0.035 | 0.031 | 0.037 | 0.725 | 1.166 | 1.430 |
| 35 |
| 0.161 | 0.033 | 0.037 | 0.037 | 0.537 | 0.879 | 1.086 |
| 40 |
| 0.148 |
| 0.030 | 0.037 | 0.425 | - | 0.901 |
| 45 |
| 0.139 | 0.024 | 0.024 |
| 0.404 | - | 0.781 |
| 50 | 0.020 | 0.134 | 0.020 | 0.024 |
| 0.324 | - | 0.586 |
| 55 | 0.022 | 0.126 | 0.024 | 0.022 |
| 0.273 | - | 0.565 |
| 60 |
| 0.108 |
| 0.024 | 0.022 | 0.248 | - | 0.470 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (lowest value) for each dataset is boldfaced
Hamming Distance percentage on simulated datasets of Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 | 0.35 | 1.30 | 0.42 | 0.42 |
| 3.43 | 2.66 | 5.95 |
| 30 |
| 1.92 | 0.43 | 0.42 | 0.38 | 2.52 | 2.43 | 4.55 |
| 35 |
| 1.37 | 0.34 | 0.55 | 0.37 | 1.92 | 2.09 | 3.93 |
| 40 | 0.26 | 1.18 |
| 0.41 | 0.27 | 1.86 | - | 3.31 |
| 45 | 0.34 | 1.02 | 0.32 | 0.34 |
| 1.95 | - | 3.12 |
| 50 |
| 1.18 | 0.78 | 0.81 | 0.73 | 1.42 | - | 3.14 |
| 55 | 0.28 | 1.13 | 0.76 | 0.76 |
| 1.48 | - | 3.19 |
| 60 |
| 1.26 | 0.17 | 0.16 | 0.57 | 1.49 | - | 3.49 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (lowest value) for each dataset is boldfaced
QAN50 results of the tools on real simulated datasets of Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 |
| 85002 | 87581 | 87581 | 51183 | 50325 | 49121 | 45846 |
| 30 |
| 87599 | 92831 | 92565 | 57323 | 56745 | 54412 | 52138 |
| 35 |
| 92483 | 96167 | 95611 | 61204 | 60612 | 59047 | 56881 |
| 40 |
| 95818 |
| 97270 | 64979 | 64535 | - | 60748 |
| 45 |
| 98674 |
|
| 68274 | 66973 | - | 64003 |
| 50 |
| 100826 |
|
| 73159 | 73256 | - | 69457 |
| 55 | 105243 | 103348 |
|
| 74273 | 74402 | - | 71058 |
| 60 | 107121 | 105243 |
|
| 76497 | 76497 | - | 73256 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (highest value) for each dataset is boldfaced
Time in seconds of the tools on simulated datasets of Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 | 572 | 38863 | 1027 | 9686 | 205 | 44 | 33988 | 3 |
| 30 | 1317 | 47367 | 883 | 11095 | 238 | 91 | 56165 | 3 |
| 35 | 2167 | 18813 | 954 | 11650 | 286 | 167 | 80061 | 4 |
| 40 | 3052 | 20007 | 1048 | 12760 | 323 | 269 | - | 5 |
| 45 | 3754 | 56403 | 1161 | 12678 | 367 | 423 | - | 6 |
| 50 | 4399 | 57135 | 1170 | 12860 | 412 | 672 | - | 6 |
| 55 | 4882 | 56745 | 1287 | 13174 | 467 | 1019 | - | 7 |
| 60 | 5277 | 21070 | 1336 | 13407 | 496 | 1536 | - | 9 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare
Peak of RAM usage in Megabytes of the tools on simulated datasets of Chromosome 1
| Avg. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| Cov. | cov. 15x | cov. 20x | ||||||
| 25 | 1378 | 2180 | 930 | 5161 | 3262 | 3007 | 4284 | 3007 |
| 30 | 1667 | 4187 | 930 | 6117 | 3266 | 3008 | 5320 | 3008 |
| 35 | 1984 | 2134 | 931 | 6558 | 3270 | 3008 | 5709 | 3008 |
| 40 | 2315 | 2186 | 932 | 6780 | 3272 | 3009 | - | 3009 |
| 45 | 2665 | 5037 | 932 | 7043 | 3276 | 3010 | - | 3010 |
| 50 | 3180 | 5223 | 932 | 7058 | 3279 | 3010 | - | 3010 |
| 55 | 3591 | 5483 | 996 | 7212 | 3282 | 3011 | - | 3011 |
| 60 | 4009 | 2374 | 1039 | 7294 | 3286 | 3011 | - | 3011 |
For each dataset, its row identified by its average coverage (Avg. Cov.). We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare
Switch error percentage on datasets of NA12878
| Chrom. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| cov. 15x | cov. 20x | |||||||
| 1 | 1.929 | - | 1.926 | 1.924 |
| - | - | 2.191 |
| 2 | 0.038 | - | 0.050 | 0.035 |
| - | - | 0.374 |
| 3 | 0.044 | - | 0.045 | 0.039 |
| - | - | 0.381 |
| 4 | 2.042 | - | 2.052 | 2.048 |
| - | - | 2.237 |
| 5 | 1.829 | - | 1.828 |
| 1.825 | - | - | 1.998 |
| 6 | 1.991 | - | 1.990 | 1.991 |
| - | - | 2.205 |
| 7 |
| - | 0.669 | 0.666 | 0.660 | - | - | 0.924 |
| 8 |
| - | 1.746 | 1.748 | 1.749 | - | - | 1.992 |
| 9 | 1.966 | - | 1.965 | 1.966 |
| 2.140 | - | 2.187 |
| 10 | 0.949 | - | 0.949 | 0.948 |
| 1.171 | - | 1.232 |
| 11 | 2.092 | - | 2.101 | 2.101 |
| 2.282 | - | 2.325 |
| 12 |
| - | 0.055 | 0.048 | 0.043 | 0.319 | - | 0.405 |
| 13 | 0.051 | - | 0.036 | 0.049 |
| 0.285 | - | 0.349 |
| 14 | 0.034 | - | 0.042 | 0.039 |
| 0.347 | - | 0.421 |
| 15 | 0.055 | 0.331 | 0.069 | 0.065 |
| 0.358 | - | 0.427 |
| 16 |
| 0.289 |
| 0.029 | 0.027 | 0.322 | - | 0.420 |
| 17 | 0.055 | 0.277 | 0.071 | 0.067 |
| 0.337 | - | 0.426 |
| 18 | 1.895 | - | 1.879 |
| 1.889 | 2.072 | - | 2.122 |
| 19 | 2.629 | - | 2.642 | 2.644 |
| 2.807 | - | 2.914 |
| 20 |
| 0.277 | 0.046 |
|
| 0.412 | - | 0.451 |
| 21 | 0.033 | - | 0.044 | 0.041 |
| 0.364 | - | 0.408 |
| 22 | 2.102 | 2.323 | 2.106 | 2.114 |
| 2.378 | - | 2.452 |
Each row corresponds to a chromosome. The dataset consists of all reads aligned to the chromosome. We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (lowest value) for each dataset is boldfaced
Hamming Distance percentage on datasets of NA12878
| Chrom. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| cov. 15x | cov. 20x | |||||||
| 1 | 2.12 | - |
| 2.10 | 2.11 | - | - | 6.16 |
| 2 | 0.51 | - | 0.49 |
| 0.77 | - | - | 4.91 |
| 3 |
| - | 0.42 | 0.42 | 0.48 | - | - | 4.74 |
| 4 | 2.47 | - | 2.15 | 2.18 |
| - | - | 6.44 |
| 5 | 2.33 | - | 2.53 | 2.22 |
| - | - | 6.56 |
| 6 | 3.39 | - | 3.02 | 3.20 |
| - | - | 7.15 |
| 7 | 1.16 | - |
|
| 1.36 | - | - | 5.05 |
| 8 | 2.44 | - | 2.46 | 2.54 |
| - | - | 6.14 |
| 9 | 2.45 | - | 2.31 | 2.49 |
| 5.68 | - | 6.23 |
| 10 | 1.19 | - | 1.16 | 1.18 |
| 3.89 | - | 5.29 |
| 11 | 2.08 | - | 2.06 | 2.06 |
| 4.25 | - | 5.08 |
| 12 | 0.43 | - | 0.48 |
| 0.51 | 2.92 | - | 5.54 |
| 13 | 0.41 | - | 0.63 | 0.57 |
| 4.01 | - | 4.84 |
| 14 | 0.21 | - | 0.48 | 0.58 |
| 3.01 | - | 3.24 |
| 15 |
| 3.39 | 0.24 | 0.34 | 0.34 | 4.18 | - | 5.49 |
| 16 |
| 2.09 | 0.45 | 0.88 | 0.28 | 1.65 | - | 2.87 |
| 17 | 0.50 | 2.84 | 0.38 | 0.79 |
| 2.89 | - | 4.61 |
| 18 | 1.80 | - | 1.67 |
| 1.68 | 4.77 | - | 8.10 |
| 19 | 3.19 | - | 3.14 | 3.40 |
| 4.37 | - | 7.32 |
| 20 | 1.37 | 3.47 | 0.16 |
|
| 2.99 | - | 4.07 |
| 21 |
| - |
|
| 1.95 | 5.37 | - | 4.22 |
| 22 |
| 4.92 | 1.84 | 1.83 |
| 4.83 | - | 6.52 |
Each row corresponds to a chromosome. The dataset consists of all reads aligned to the chromosome. We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (lowest value) for each dataset is boldfaced
QAN50 results of the tools on datasets of NA12878
| Chrom. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| cov. 15x | cov. 20x | |||||||
| 1 | 91098 | - | 91668 |
| 89249 | - | - | 84863 |
| 2 | 210603 | - | 210098 |
|
| - | - | 177388 |
| 3 |
| - | 227732 |
| 229655 | - | - | 170494 |
| 4 | 90639 | - |
| 90639 | 89868 | - | - | 84861 |
| 5 | 99011 | - | 99012 |
| 98900 | - | - | 91745 |
| 6 |
| - | 94200 |
| 93894 | - | - | 85483 |
| 7 |
| - | 155773 | 155773 | 155209 | - | - | 135095 |
| 8 | 90928 | - |
| 90836 | 90661 | - | - | 84076 |
| 9 | 85172 | - |
| 85469 |
| 82917 | - | 80957 |
| 10 | 123171 | - | 123171 |
| 122317 | 114172 | - | 112861 |
| 11 | 84153 | - | 84108 | 84108 |
| 81526 | - | 79057 |
| 12 | 224308 | - | 224308 |
| 224308 | 190161 | - | 174540 |
| 13 |
| - | 228310 | 228310 | 227286 | 178173 | - | 175124 |
| 14 |
| - |
| 227040 | 220294 | 186476 | - | 181826 |
| 15 |
| 153527 | 173950 | 176529 | 176529 | 147339 | - | 138185 |
| 16 |
| 160049 |
| 190884 | 189342 | 158848 | - | 152960 |
| 17 | 162690 | 151262 |
|
| 162328 | 140216 | - | 133887 |
| 18 | 93705 | - |
| 93705 |
| 87076 | - | 83383 |
| 19 |
| - |
| 62568 | 62233 | 59716 | - | 58694 |
| 20 | 165921 | 163062 | 165921 |
| 165921 | 140498 | - | 140034 |
| 21 | 222171 | - |
| 221786 | 222171 | 149165 | - | 146675 |
| 22 | 82618 | 73223 |
| 82618 |
| 72117 | - | 70718 |
Each row corresponds to a chromosome. The dataset consists of all reads aligned to the chromosome. We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare. The best result (lowest value) for each dataset is boldfaced
Time in seconds on datasets of NA12878
| Chrom. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| cov. 15x | cov. 20x | |||||||
| 1 | 20183 | - | 3300 | 41626 | 6301 | - | - | 980 |
| 2 | 21913 | - | 3686 | 46937 | 6758 | - | - | 1075 |
| 3 | 19325 | - | 2994 | 38040 | 5536 | - | - | 776 |
| 4 | 21744 | - | 3031 | 40083 | 5998 | - | - | 862 |
| 5 | 18416 | - | 2943 | 36674 | 5169 | - | - | 790 |
| 6 | 17792 | - | 2658 | 35189 | 5640 | - | - | 759 |
| 7 | 14321 | - | 2409 | 32550 | 4429 | - | - | 744 |
| 8 | 15930 | - | 2421 | 29902 | 4578 | - | - | 669 |
| 9 | 11307 | - | 1886 | 23586 | 3369 | 86913 | - | 635 |
| 10 | 13943 | - | 2244 | 27638 | 3914 | 86941 | - | 670 |
| 11 | 13291 | - | 1983 | 25419 | 3916 | 86833 | - | 567 |
| 12 | 12684 | - | 1916 | 25865 | 4054 | 86814 | - | 554 |
| 13 | 11100 | - | 1474 | 20288 | 2952 | 86686 | - | 406 |
| 14 | 9017 | - | 1265 | 17658 | 2644 | 86684 | - | 384 |
| 15 | 6934 | 63221 | 1114 | 14218 | 2102 | 86700 | - | 368 |
| 16 | 7426 | 69771 | 1265 | 16323 | 2589 | 86783 | - | 461 |
| 17 | 6460 | 54037 | 956 | 12312 | 1832 | 86669 | - | 312 |
| 18 | 8440 | - | 1152 | 15794 | 2497 | 86671 | - | 353 |
| 19 | 3625 | - | 826 | 10368 | 1617 | 86668 | - | 296 |
| 20 | 5878 | 55032 | 827 | 11815 | 1594 | 86600 | - | 243 |
| 21 | 3561 | - | 560 | 7508 | 1308 | 86585 | - | 226 |
| 22 | 2835 | 31617 | 505 | 7059 | 1002 | 86568 | - | 195 |
Each row corresponds to a chromosome. The dataset consists of all reads aligned to the chromosome. We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare
Peak of RAM usage in Megabytes of the tools on datasets of NA12878
| Chrom. | HapCHAT | HapCol | WhatsHap | WhatsHap | HapCUT2 | ReFHap | ProbHap | FastHare |
|---|---|---|---|---|---|---|---|---|
| cov. 15x | cov. 20x | |||||||
| 1 | 6361 | - | 2983 | 13259 | 3351 | - | - | 3050 |
| 2 | 7082 | - | 3173 | 13938 | 3362 | - | - | 3056 |
| 3 | 6180 | - | 2669 | 12672 | 3329 | - | - | 3041 |
| 4 | 7531 | - | 2685 | 12959 | 3334 | - | - | 3046 |
| 5 | 5882 | - | 2551 | 12364 | 3320 | - | - | 3033 |
| 6 | 5649 | - | 2325 | 12120 | 3312 | - | - | 3031 |
| 7 | 4597 | - | 2080 | 11167 | 3309 | - | - | 3022 |
| 8 | 5075 | - | 2091 | 11164 | 3302 | - | - | 3023 |
| 9 | 3583 | - | 1639 | 9345 | 3285 | 17915 | - | 3009 |
| 10 | 4059 | - | 1819 | 10164 | 3303 | 9766 | - | 3017 |
| 11 | 3965 | - | 1814 | 10135 | 3290 | 9632 | - | 3018 |
| 12 | 4011 | - | 1787 | 10229 | 3288 | 13984 | - | 3016 |
| 13 | 7950 | - | 1449 | 8982 | 3267 | 8371 | - | 3006 |
| 14 | 2857 | - | 1281 | 8198 | 3261 | 10024 | - | 2998 |
| 15 | 2232 | 8437 | 1077 | 7370 | 3257 | 8302 | - | 2993 |
| 16 | 3116 | 19698 | 1128 | 7703 | 3263 | 10328 | - | 2995 |
| 17 | 7844 | 7845 | 962 | 6737 | 3253 | 5941 | - | 2990 |
| 18 | 3542 | - | 1152 | 7810 | 3254 | 15868 | - | 2995 |
| 19 | 1721 | - | 793 | 6055 | 3244 | 8808 | - | 2983 |
| 20 | 8496 | 9966 | 865 | 6612 | 3242 | 7973 | - | 2985 |
| 21 | 1329 | - | 611 | 5211 | 3229 | 7852 | - | 2977 |
| 22 | 3324 | 7782 | 542 | 4912 | 3225 | 7904 | - | 2975 |
Each row corresponds to a chromosome. The dataset consists of all reads aligned to the chromosome. We report the results obtained by running the tools with maximum coverage 30 × for HapCHAT, 25 × for HapCol, 15 × and 20 × for WhatsHap. No maximum coverage was set for HapCUT2, ReFHap, ProbHap, and FastHare
List of SNV positions when the adaptive procedure of subsection Adaptive k-cMEC was activated for real Ashkenazim and simulated datasets of Chromosome 1
| Chr.1 | 4 to 7 | 5 to 8 | |
|---|---|---|---|
| Data | Avg. Cov. | ||
| Ashkenazim | Cov. 25 | ||
| Cov. 30 | |||
| Cov. 35 | 35556 | ||
| Cov. 40 | |||
| Cov. 45 | 35581 | ||
| Cov. 50 | 35593, 42897 | ||
| Cov. 55 | 3528 | ||
| Cov. 60 | 46338, 46339 | ||
| Simulated | Cov. 25 | 35569, 38788 | 26778 |
| Cov. 30 | 35594, 38815, 38817 | 26800 | |
| Cov. 35 | 38827 | 26811 | |
| Cov. 40 | 38837 | 38834, 38835, 38836 | |
| Cov. 45 | 38844 | 38842 | |
| Cov. 50 | 38849 | ||
| Cov. 55 | |||
| Cov. 60 | |||
For each dataset, its row is identified by its average coverage (Avg. Cov.). The positions in column ‘4 to 7’ are those for which the number of corrections was increased from 4 to 7, and similarly for the column ‘5 to 8’
Comparison of the switch error positions on the Ashkenazim datasets of Chromosome 1 obtained with HapCHAT
| Ashkenazim | Cov. 25 | Cov. 30 | Cov. 35 | Cov. 40 | Cov. 45 | Cov. 50 | Cov. 55 |
|---|---|---|---|---|---|---|---|
| Cov. 30 | 75/2/4 | ||||||
| Cov. 35 | 72/4/7 | 74/2/3 | |||||
| Cov. 40 | 71/6/8 | 74/4/4 | 75/2/1 | ||||
| Cov. 45 | 71/6/8 | 73/4/4 | 75/2/1 | 77/0/0 | |||
| Cov. 50 | 70/7/9 | 72/5/5 | 73/4/3 | 75/2/2 | 75/2/2 | ||
| Cov. 55 | 71/6/8 | 73/4/4 | 73/4/3 | 75/2/2 | 75/2/2 | 75/2/2 | |
| Cov. 60 | 71/7/8 | 73/5/4 | 73/5/3 | 75/3/2 | 75/3/2 | 75/3/2 | 76/2/1 |
For each pair of datasets having different coverages, we report the number of positions in which a switch error occurred as follows: those in common between the two datasets, those only found in the dataset of the row, and those only found the dataset of the column, respectively
Fig. 1Switch error rate and Hamming distance as a function of running time. As achieved by HapCHAT and WhatsHap at different maximum coverages on the real Ashkenazim Chromosome 1 dataset. For each tool and each maximum coverage, we represent a point for each of the 8 possible values of the average coverage
Fig. 2Quality measures on the real Ashkenazim Chromosome 1 dataset. We present the bar plots showing the measures of switch error percentage and QAN50 achieved by HapCHAT, WhatsHap, and HapCUT2 on the Ashkenazim Chromosome 1 dataset at different coverage values
Fig. 3Quality measures on the real NA12878 dataset. We present the bar plots showing the measures of switch error percentage and QAN50 achieved by HapCHAT, WhatsHap, and HapCUT2 on the different chromosome datasets of NA12878