| Literature DB >> 18230173 |
Bruno F Bettencourt1, Margarida R Santos, Raquel N Fialho, Ana R Couto, Maria J Peixoto, João P Pinheiro, Hélder Spínola, Marian G Mora, Cristina Santos, António Brehm, Jácome Bruges-Armas.
Abstract
BACKGROUND: HLA haplotype analysis has been used in population genetics and in the investigation of disease-susceptibility locus, due to its high polymorphism. Several methods for inferring haplotype genotypic data have been proposed, but it is unclear how accurate each of the methods is or which method is superior. The accuracy of two of the leading methods of computational haplotype inference--Expectation-Maximization algorithm based (implemented in Arlequin V3.0) and Bayesian algorithm based (implemented in PHASE V2.1.1)--was compared using a set of 122 HLA haplotypes (A-B-Cw-DQB1-DRB1) determined through direct counting. The accuracy was measured with the Mean Squared Error (MSE), Similarity Index (IF) and Haplotype Identification Index (IH).Entities:
Mesh:
Substances:
Year: 2008 PMID: 18230173 PMCID: PMC2268655 DOI: 10.1186/1471-2105-9-68
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Pairs of alleles at HLA different loci, with complete LD (|D'| = 1) (n = 61 individuals)
| A*01-B*37 | DR*01-DQB*04 | A*31-DQB*05 |
| A*02-B*41 | DR*03-DQB*02 | A*31-DR*01 |
| A*02-B*58 | DR*04-DQB*03 | A*33-DQB*03 |
| A*03-B*49 | DR*07-DQB*02 | A*33-DR*11 |
| A*03-B*56 | DR*09-DQB*03 | B*40-DQB*06 |
| A*11-B*40 | DR*10-DQB*05 | B*40-DR*13 |
| A*23-B*45 | DR*11-DQB*03 | B*41-DQB*03 |
| A*26-B*52 | DR*12-DQB*03 | B*41-DR*13 |
| A*29-Cw*16 | B*45-DQB*02 | |
| A*33-B*18 | B*45-DR*07 | |
| A*33-Cw*05 | B*49-DQB*03 | |
| B*07-Cw*07 | B*49-DR*13 | |
| B*14-Cw*08 | B*51-DR*09 | |
| B*15-Cw*03 | B*52-DQB*06 | |
| B*37-Cw*06 | B*52-DR*15 | |
| B*40-Cw*02 | B*56-DQB*05 | |
| B*41-Cw*07 | B*56-DR*01 | |
| B*45-Cw*06 | B*57-DQB*03 | |
| B*49-Cw*07 | B*57-DR*04 | |
| B*51-Cw*14 | B*58-DQB*03 | |
| B*52-Cw*12 | B*58-DR*04 | |
| B*56-Cw*01 | Cw*12-DQB*06 | |
| B*57-Cw*06 | ||
| B*58-Cw*07 |
HLA Class II haplotype (2 loci) frequencies above 0.01 determined by direct counting and computational methods (n = 61 individuals)
| 1 | DQB1*03-DRB1*04 | 0.164 | 0.164 | 0.163 |
| 2 | DQB1*02-DRB1*07 | 0.148 | 0.148 | 0.146 |
| 3 | DQB1*05-DRB1*01 | 0.139 | 0.139 | 0.139 |
| 4 | DQB1*03-DRB1*11 | 0.123 | 0.123 | 0.122 |
| 5 | DQB1*06-DRB1*15 | 0.074 | 0.074 | 0.074 |
| 6 | DQB1*06-DRB1*13 | 0.074 | 0.073 | 0.072 |
| 7 | DQB1*02-DRB1*03 | 0.074 | 0.074 | 0.073 |
| 8 | DQB1*04-DRB1*08 | 0.041 | 0.041 | 0.041 |
| 9 | DQB1*05-DRB1*15 | 0.025 | 0.025 | 0.025 |
| 10 | DQB1*03-DRB1*13 | 0.025 | 0.033 | 0.031 |
| 11 | DQB1*05-DRB1*10 | 0.025 | 0.025 | 0.024 |
| 12 | DQB1*05-DRB1*14 | 0.016 | 0.016 | 0.014 |
| 13 | DQB1*02-DRB1*13 | 0.016 | 0.008 | 0.010 |
| 14 | DQB1*03-DRB1*12 | 0.016 | 0.016 | 0.016 |
| 15 | DQB1*03-DRB1*09 | 0.016 | 0.016 | 0.016 |
| 0.992 | 0.989 | |||
| 1.000 | 1.000 | |||
| 7.3E-06 | 5.2E-06 | |||
HLA Class I haplotype (3 loci) frequencies above 0.01 determined by direct counting and computational methods (n = 61 individuals)
| 1 | A*24-B*27-Cw*02 | 0.074 | 0.096 | 0.053 |
| 2 | A*02-B*27-Cw*01 | 0.049 | 0.074 | 0.039 |
| 3 | A*02-B*27-Cw*02 | 0.033 | 0.034 | 0.031 |
| 4 | A*03-B*27-Cw*02 | 0.033 | 0.041 | 0.040 |
| 5 | A*03-B*35-Cw*04 | 0.033 | 0.041 | 0.033 |
| 6 | A*02-B*15-Cw*02 | 0.025 | 0.010 | 0.024 |
| 7 | A*11-B*35-Cw*04 | 0.025 | 0.033 | - |
| 8 | A*24-B*27-Cw*01 | 0.025 | 0.018 | 0.018 |
| 9 | A*26-B*27-Cw*02 | 0.025 | 0.016 | 0.024 |
| 10 | A*29-B*44-Cw*16 | 0.025 | 0.025 | 0.024 |
| 11 | A*01-B*37-Cw*06 | 0.016 | 0.016 | 0.016 |
| 12 | A*02-B*07-Cw*07 | 0.016 | 0.032 | 0.025 |
| 13 | A*02-B*27-Cw*06 | 0.016 | 0.016 | 0.016 |
| 14 | A*02-B*44-Cw*05 | 0.016 | 0.008 | 0.010 |
| 15 | A*03-B*07-Cw*07 | 0.016 | - | 0.016 |
| 16 | A*11-B*27-Cw*01 | 0.016 | - | 0.007 |
| 17 | A*23-B*44-Cw*04 | 0.016 | 0.016 | 0.016 |
| 18 | A*24-B*07-Cw*07 | 0.016 | 0.009 | - |
| 19 | A*26-B*52-Cw*12 | 0.016 | 0.016 | 0.016 |
| 20 | A*30-B*35-Cw*06 | 0.016 | 0.016 | 0.016 |
| 21 | A*32-B*27-Cw*01 | 0.016 | - | 0.016 |
| 0.906 | 0.944 | |||
| 0.923 | 0.950 | |||
| 3.7E-05 | 2.2E-05 | |||
| 1 | A*11-B*14-Cw*08 | 0.008 | 0.015 | |
| 2 | A*32-B*27-Cw*07 | 0.016 | - | |
| 3 | A*02-B*18-Cw*01 | - | 0.011 | |
| 4 | A*29-B*27-Cw*02 | - | 0.017 | |
Extended HLA haplotype (5 loci) frequencies above 0.01 determined by direct counting and computational methods (n = 61 individuals)
| 1 | A*24-B*27-Cw*02-DQB1*03-DRB1*04 | 0.033 | 0.041 | 0.036 |
| 2 | A*02-B*27-Cw*01-DQB1*05-DRB1*01 | 0.033 | 0.049 | 0.040 |
| 3 | A*24-B*27-Cw*01-DQB1*05-DRB1*15 | 0.025 | 0.025 | 0.020 |
| 4 | A*24-B*27-Cw*02-DQB1*02-DRB1*03 | 0.025 | 0.033 | 0.013 |
| 5 | A*03-B*27-Cw*02-DQB1*03-DRB1*04 | 0.025 | 0.025 | 0.018 |
| 6 | A*26-B*27-Cw*02-DQB1*03-DRB1*11 | 0.025 | 0.025 | 0.021 |
| 7 | A*26-B*52-Cw*12-DQB1*06-DRB1*15 | 0.016 | 0.016 | 0.015 |
| 8 | A*29-B*44-Cw*16-DQB1*02-DRB1*07 | 0.016 | 0.008 | 0.006 |
| 9 | A*30-B*35-Cw*06-DQB1*04-DRB1*08 | 0.016 | 0.016 | - |
| 10 | A*24-B*07-Cw*07-DQB1*06-DRB1*15 | 0.016 | - | - |
| 11 | A*23-B*44-Cw*04-DQB1*02-DRB1*07 | 0.016 | 0.016 | 0.016 |
| 12 | A*11-B*27-Cw*01-DQB1*05-DRB1*01 | 0.016 | - | 0.005 |
| 13 | A*02-B*27-Cw*02-DQB1*03-DRB1*04 | 0.016 | - | 0.014 |
| 14 | A*02-B*15-Cw*02-DQB1*05-DRB1*10 | 0.016 | 0.016 | 0.016 |
| 0.955 | 0.952 | |||
| 0.880 | 0.923 | |||
| 1.3E-05 | 1.1E-05 | |||
| 1 | A*01-B*08-Cw*07-DQB1*02-DRB1*03 | - | 0.015 | |
| 2 | A*02-B*07-Cw*02-DQB1*03-DRB1*04 | 0.016 | - | |
| 3 | A*02-B*15-Cw*03-DQB1*03-DRB1*01 | 0.016 | - | |
| 4 | A*02-B*27-Cw*01-DQB1*02-DRB1*07 | - | 0.011 | |
| 5 | A*02-B*27-Cw*07-DQB1*02-DRB1*03 | 0.016 | - | |
| 6 | A*03-B*27-Cw*01-DQB1*05-DRB1*01 | - | 0.015 | |
| 7 | A*03-B*27-Cw*02-DQB1*02-DRB1*07 | 0.016 | 0.013 | |
| 8 | A*11-B*14-Cw*08-DQB1*02-DRB1*07 | 0.016 | - | |
| 9 | A*24-B*35-Cw*04-DQB1*05-DRB1*01 | - | 0.020 | |
| 10 | A*29-B*27-Cw*02-DQB1*05-DRB1*01 | 0.016 | - | |
| 11 | A*29-B*27-Cw*16-DQB1*02-DRB1*07 | 0.016 | 0.009 | |
Figure 1Influence of the number of loci on haplotype frequency estimation. A – Similarity index between the frequencies obtained by the used computer packages and the real haplotype frequencies. B – Comparison between the number of different haplotypes identified by the computer packages and the number of different haplotypes obtained by segregation study. C – Overall difference in haplotype frequencies between estimated and true values. The two-locus haplotypes were composed by Class II alleles (DQB1-DRB1), the three-locus haplotypes were composed by Class I alleles (A-B-Cw) and the five-locus haplotypes were the extended haplotypes (A-B-Cw-DQB1-DRB1), all with a sample size of n = 61 individuals. Unbroken line denotes comparisons of Arlequin V3.0 to real data; dotted line, comparisons between PHASE v2.1.1 and real data.
Figure 2Influence of sample size on haplotype frequency estimation. A – Similarity index between the frequencies estimated by the computer packages and the real haplotype frequencies. B – Comparison between the number of different haplotypes estimated by the computer packages and the number of haplotypes obtained by segregation. C – Overall difference in haplotype frequencies between estimated and true values. Unbroken line denotes comparisons of Arlequin V3.0 to real data; dotted line, comparisons between PHASE v2.1.1 and real data.