| Literature DB >> 18681957 |
Robert M Nowak1, Rafał Płoski.
Abstract
BACKGROUND: Laboratory techniques used to determine haplotypes are often too expensive for large-scale studies and lack of phase information is commonly overcome using likelihood-based calculations. Whereas a number of programs are available for that purpose, none of them can handle loci with both multiple and null alleles.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18681957 PMCID: PMC2526998 DOI: 10.1186/1471-2105-9-330
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Assumed and estimated haplotype frequencies
| haplotype | frequency | ||||
| assumed | Arlequin | PHASE | Haplo-IHP | NullHap | |
| 0.2 | 0.068 | 0.068 | 0.294 | 0.20 | |
| 0.2 | 0.172 | 0.172 | 0.294 | 0.20 | |
| 0.1 | 0.034 | 0.034 | 0.147 | 0.10 | |
| 0.02 | 0.038 | 0.038 | 0.029 | 0.02 | |
| 0.02 | 0.007 | 0.007 | 0.0 | 0.02 | |
| 0.02 | 0.017 | 0.017 | 0.0 | 0.02 | |
| 0.02 | 0.007 | 0.007 | 0.0 | 0.02 | |
| 0.02 | 0.017 | 0.017 | 0.0 | 0.02 | |
| 0.1 | 0.089 | 0.089 | 0.147 | 0.10 | |
| 0.02 | 0.125 | 0.125 | 0.029 | 0.02 | |
| 0.02 | 0.028 | 0.028 | 0.029 | 0.02 | |
| 0.02 | 0.042 | 0.042 | 0.029 | 0.02 | |
| 0.02 | 0.015 | 0.015 | 0.0 | 0.02 | |
| 0.02 | 0.035 | 0.035 | 0.0 | 0.02 | |
| 0.02 | 0.015 | 0.015 | 0.0 | 0.02 | |
| 0.02 | 0.035 | 0.035 | 0.0 | 0.02 | |
| 0.02 | 0.028 | 0.029 | 0.0 | 0.02 | |
| 0.02 | 0.078 | 0.078 | 0.0 | 0.02 | |
| 0.02 | 0.019 | 0.019 | 0.0 | 0.02 | |
| 0.02 | 0.039 | 0.039 | 0.0 | 0.02 | |
| 0.02 | 0.013 | 0.013 | 0.0 | 0.02 | |
| 0.02 | 0.033 | 0.033 | 0.0 | 0.02 | |
| 0.02 | 0.013 | 0.013 | 0.0 | 0.02 | |
| 0.02 | 0.033 | 0.033 | 0.0 | 0.02 | |
| error | - | 79% | 79% | 82% | 0% |
The assumed and estimated haplotype frequencies for a polymorphism with 3 loci: A(multiallelic with null variant), B(multiallelic), C(biallelic with null variant).
Haplotype estimation frequency error
| No | example description | error | |||
| Arlequin | PHASE | Haplo-IHP | NullHap | ||
| 1 | biallelic loci: A( | 0% | 0% | 0% | 0% |
| 2 | biallelic loci: A( | 61% | 50% | 1% | 0% |
| 3 | multiallelic loci: A( | 0% | 1% | 78% | 0% |
| 4 | multiallelic loci: A( | 62% | 62% | 100% | 0% |
| 5 | multiallelic and biallelic loci with null variants: A( | 62% | 48% | 64% | 0% |
| 6 | details in Table 2, A( | 79% | 79% | 82% | 0% |
Haplotype estimation frequency error for six polymorphisms with varying locus characteristics.
Figure 1Effect of sample size on accuracy of estimation. Effect of sample size on accuracy of estimation of haplotype frequencies. Ten samples of 25, 50, 100, 200, 500, 1000 individuals were generated from population in HWE. The error in function of sample size is shown. The haplotype distribution is given in example 5 (red) and example 6 (blue) in Table 2, respectively.
Figure 2Effect of HWE violation on accuracy of estimation. Effect of HWE violation on the accuracy of the algorithm. The figure shows the error in function of inbreeding coefficient f for two polymorphisms characterized in Table 2 (example 5 – red line, example 6 – blue line).
Figure 3Effect of haplotype frequency on the error of the estimation. Effect of haplotype frequency on the error of the estimation. Ten samples of 1000 individuals were generated for population in HWE, for a 2 locus polymorphism: A with variants A0, A1, A2 and B with variants B0, B1. The graph shows the error of haplotype frequency estimation in function of assumed frequency of this haplotype.
Computational time comparison
| loci | number of haplotypes | observ. | null alleles | time for application | |||
| Arlequin | Phase | HaploIHP | NullHap | ||||
| 2 | 6 | 100 | no | 0.13 s | 46 s | 0.5 s | 0.07 s |
| 2 | 9 | 100 | no | 0.06 s | 47 s | 0.15 s | 0.04 s |
| 3 | 8 | 100 | no | 0.04 s | 69 s | 0.58 s | 0.02 s |
| 2 | 504 | 99 | no | 0.22 s | 53 s | - | 37 s |
| 2 | 540 | 99 | no | 0.34 s | 58 s | - | 39 s |
| 5 | 32 | 200 | yes | - | - | 14 s | 0.78 s |
| 7 | 128 | 200 | yes | - | - | 145 s | 13 s |
| 8 | 256 | 200 | yes | - | - | 450 s | 61 s |
| 9 | 512 | 200 | yes | - | - | 1300 s(8 s) | 209 s |
| 10 | 1024 | 200 | yes | - | - | 3 h (8 s) | 2300 s |
| 11 | 2048 | 200 | yes | - | - | 24 h (10 s) | 3 h |
| 15 | 32768 | 100 | yes | - | - | - | 48 h |
Computational time for considered applications (HaploIHP in parenthesis with greedy algorithm). Results presented only for applications able to handle the given polymorphism, otherwise '-'.
The distribution of KIR haplotypes
| Haplo-type # | Psoriatis N = 116 (%) | Controls N = 123 (%) | OR | P value* | ||||||
| DS2 | DL2/3 | DS3/5 | DL1 | DS1 | DS4/1D | |||||
| 1 | null | 3 | null | 1 | 1 | 1D | 20 (17) | 0 | 52.5 | 0.00018 |
| 2 | 1 | 2 | 3 | null | 1 | 1D | 11 (9.6) | 0 | 27 | 0.0058 |
| 3 | null | 3 | 3 | null | null | 1D | 6 (5.3) | 2 (1.5) | 2.9 | NS |
| 4 | null | 3 | 5 | 1 | 1 | null | 6 (5.2) | 7 (5.6) | 0.9 | NS |
| 5 | null | 3 | 3 | 1 | null | 1D | 15 (13) | 30 (24) | 0.5 | NS |
| 6 | 1 | 2 | 5 | null | 1 | null | 3 (2.5) | 7 (6.4) | 0.5 | NS |
| 7 | null | 3 | null | 1 | null | 1D | 0 | 16 (13) | 0.03 | 0.00018 |
| 8 | 1 | 2 | - | null | 1 | 1D | 17 (15) | 0 | 43.4 | 0.00018 |
| 9 | null | 3 | - | 1 | 1 | 1D | 16 (14) | 0 | 40.6 | 0.00018 |
| 10 | null | 3 | - | 1 | 1 | DS4 | 6 (5.3) | 0 | 14.5 | NS |
| 11 | 1 | 2 | - | 1 | null | 1D | 7 (6) | 3 (2.3) | 2.4 | NS |
| 12 | null | 3 | - | 1 | 1 | null | 19 (16) | 14 (11) | 1.6 | NS |
| 13 | null | 3 | - | null | null | 1D | 6 (5) | 7 (5.7) | 0.9 | NS |
| 14 | null | 3 | - | 1 | null | 1D | 19 (16) | 44 (36) | 0.4 | NS |
| 15 | 1 | 2 | - | null | null | DS4 | 3 (2.4) | 8 (6.6) | 0.4 | NS |
| 16 | 1 | 2 | - | null | null | 1D | 0 | 7 (5.6) | 0.07 | NS |
| 17 | 1 | 2 | - | 1 | 1 | null | 0 | 7 (5.6) | 0.07 | NS |
| 18 | null | 3 | - | 1 | null | DS4 | 0 | 7 (5.6) | 0.07 | NS |
*with Bonferroni correction (correction factor = 18)
The distribution of KIR haplotypes among psoriasis patients and controls obtained with NullHap based on genotypes reported by Luszczek et al. [16]. Only haplotypes with frequency > 5% in either group are shown. Odds ratio (OR) calculated according to Haldane [22], P value calculated by Fisher exact test. NS -not significant.