| Literature DB >> 25161663 |
Jihua Wu1, Guo-Bo Chen2, Degui Zhi1, Nianjun Liu1, Kui Zhang1.
Abstract
The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information. Meantime, many haplotypes and partial haplotype patterns have been previously identified due to tight linkage disequilibrium (LD) among these clustered genes thus can be incorporated to facilitate haplotype inference. In this paper, we developed a hidden Markov model (HMM) based method that can incorporate identified haplotypes or partial haplotype patterns for haplotype inference from present-absent data of clustered genes (e.g., KIR genes). We compared its performance with an expectation maximization (EM) based method previously developed in terms of haplotype assignments and haplotype frequency estimation through extensive simulations for KIR genes. The simulation results showed that the new HMM based method outperformed the previous method when some incorrect haplotypes were included as identified haplotypes and/or the standard deviation of haplotype frequencies were small. We also compared the performance of our method with two methods that do not use previously identified haplotypes and haplotype patterns, including an EM based method, HPALORE, and a HMM based method, MaCH. Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference. The new software package HaploHMM is available and can be downloaded at http://www.soph.uab.edu/ssg/files/People/KZhang/HaploHMM/haplohmm-index.html.Entities:
Keywords: Hidden Markov model; KIR genes; haplotype; haplotype inference; haplotype patterns
Year: 2014 PMID: 25161663 PMCID: PMC4129397 DOI: 10.3389/fgene.2014.00267
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
17 KIR gene haplotypes with their frequencies used in simulations.
| 1 | 1 0 0 1 0 1 1 0 1 0 0 0 0 1 | 0.552 |
| 2 | 1 1 1 0 0 0 1 0 1 1 0 0 0 1 | 0.003 |
| 3 | 1 1 1 0 0 1 1 0 1 1 0 0 0 1 | 0.015 |
| 4 | 1 0 0 1 0 1 1 1 0 0 0 0 0 1 | 0.006 |
| 5 | 1 1 1 0 0 0 1 0 1 0 0 0 0 1 | 0.101 |
| 6 | 1 1 1 0 1 1 1 1 0 1 1 0 1 1 | 0.037 |
| 7 | 1 1 1 0 0 0 1 1 0 1 0 1 1 1 | 0.028 |
| 8 | 1 1 1 0 1 1 1 0 1 0 1 0 0 1 | 0.064 |
| 9 | 1 0 0 1 0 1 1 1 0 1 0 1 1 1 | 0.107 |
| 10 | 1 0 0 1 1 0 1 1 0 1 1 1 1 1 | 0.003 |
| 11 | 1 1 1 0 1 1 1 1 0 0 1 0 0 1 | 0.015 |
| 12 | 1 0 0 1 1 1 1 1 0 0 1 0 1 1 | 0.018 |
| 13 | 1 0 0 1 0 1 1 0 1 0 0 0 1 1 | 0.003 |
| 14 | 1 1 1 0 0 1 1 0 1 1 1 0 0 1 | 0.006 |
| 15 | 1 1 1 0 0 0 1 0 1 1 0 1 1 1 | 0.006 |
| 16 | 1 1 1 0 0 1 1 0 1 0 0 0 0 1 | 0.022 |
| 17 | 1 1 1 0 1 0 1 1 0 1 1 1 1 1 | 0.012 |
The 14 KIR genes are 3DL3, 2DS2, 2DL2, 2DL3, 2DL5B, 2DL1, 2DL4, 3DS1, 3DL1, 2DL5A, 2DS3, 2DS5, 2DS1, and 3DL2. Among these haplotypes, 10 most frequent haplotypes, 1, 3, 5, 6, 7, 8, 9, 11, 12, and 16 were selected as identified haplotypes in HaploHMM and HaploIHP.
The different haplotype frequency distributions used in the simulation.
| 0.552 | 0.131 | 96 | 66 |
| 0.502 | 0.119 | 93 | 61 |
| 0.452 | 0.105 | 89 | 56 |
| 0.402 | 0.093 | 86 | 51 |
| 0.352 | 0.802 | 82 | 46 |
| 0.302 | 0.068 | 79 | 41 |
| 0.252 | 0.055 | 75 | 36 |
| 0.202 | 0.044 | 72 | 31 |
| 0.152 | 0.033 | 68 | 26 |
| 0.102 | 0.024 | 65 | 21 |
We started with the original frequencies that the most haplotype has the frequency of 55.2%, then gradually decreased the frequency of this major haplotype to 10.2% by 5%, and increased the frequencies of ten haplotypes with the lowest frequency by 0.5%. The corresponding frequency of most frequent haplotype (haplotype 1 in Table 1), the standard deviation, the haplotype frequency accounted by the identified haplotypes and the incorrect haplotypes are listed. It is worth noting that the total of 17 haplotypes were used in the simulation and the summation of haplotype frequencies is equal to 1, thus the standard deviation times 17 is the coefficient of variation (CV).
Figure 1Average values of six measures (. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained when no incorrect haplotypes were included as identified haplotypes.
Figure 2Average values of six measures (. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained when two incorrect haplotypes were included as identified haplotypes.
Figure 3Average values of SE with the sample size of 50, 100, and 200 and the assumption of HWE under different haplotype frequency distributions. The x-axis represents the standard deviation of haplotype frequencies used in simulations. (A–C) represent the results when no incorrect haplotypes were included as identified haplotypes while (D–F) represent the results when some incorrect haplotypes were included as identified haplotypes.
Figure 4Average values of IE with the sample size of 100 and with HWE, excessive heterozygosity, and excessive homozygosity. The x-axis represents the standard deviation of haplotype frequencies used in simulations. (A–C) Represent the results when no incorrect haplotypes were included as identified haplotypes while (D–F) represent the results when some incorrect haplotypes were included as identified haplotypes.
Figure 5Average values of six measures (. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained when there was no missing data and there was no incorrect haplotypes included as identified haplotypes.
Figure 6Average values of six measures (. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained Results were obtained when there was no missing data and there were two incorrect haplotypes included as identified haplotypes.