| Literature DB >> 25707395 |
S-S Khor1, W Yang2, M Kawashima1, S Kamitsuji2, X Zheng3, N Nishida1,4, H Sawai1, H Toyoda1, T Miyagawa1, M Honda5, N Kamatani2, K Tokunaga1.
Abstract
Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case-control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5-98.8% obtained with the original HIBAG references to 95.2-99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25707395 PMCID: PMC4762906 DOI: 10.1038/tpj.2015.4
Source DB: PubMed Journal: Pharmacogenomics J ISSN: 1470-269X Impact factor: 3.550
Numbers of individuals and numbers of unique HLA alleles represented in each data set
| 415 | 415 | NA | 415 | 415 | 415 | |
| 416 | 416 | NA | 416 | 416 | 416 | |
| 418 | 418 | NA | 418 | 418 | 418 | |
| 2994 | 2994 | 2994 | 2994 | NA | 2994 | |
| NA | 398 | NA | 86 | 398 | 398 | |
| 606 | 713 | 611 | 696 | 612 | 527 | |
| 2901 | 3886 | 2916 | 3713 | 2985 | 2489 | |
| 17 | 33 | NA | 27 | 14 | 12 | |
| 23 | 50 | 19 | 34 | NA | 18 | |
| NA | 29 | NA | 20 | 13 | 15 | |
| 43 | 72 | 34 | 49 | 19 | 29 | |
| 83 | 142 | 49 | 79 | 27 | 49 | |
Abbreviations: HLA, human leukocyte antigen; JPDSC, Japan PGx Data Science Consortium; THC, Tokyo Healthy Control.
Figure 1Schematic workflow for evaluation of the HIBAG R package.
Summary of two-field prediction accuracies (call rates) based on published HIBAG Asian or HIBAG multi-ethnic parameters as references when THC data were used as the independent Japanese validation set
| 319 | 321 | 256 | 276 | 226 | |
| 606 | 713 | 696 | 612 | 527 | |
| 80 (25.1) | 74 (23.1) | 61 (23.8) | 66 (23.9) | 56 (24.8) | |
| 91.3 (100) | 90.5 (100) | 91.7 (100) | 96.7 (100) | 96.9 (100) | |
| 710 | 531 | 714 | 805 | 526 | |
| 606 | 713 | 696 | 612 | 527 | |
| 162 (22.8) | 105 (19.8) | 144 (20.2) | 152 (18.9) | 109 (20.7) | |
| 93.4 (100) | 91.1 (100) | 92.2 (100) | 98.6 (100) | 96.8 (100) | |
| 983 | 986 | 922 | 1041 | 707 | |
| 606 | 713 | 696 | 612 | 527 | |
| 176 (17.9) | 200 (20.3) | 176 (19.1) | 178 (17.1) | 169 (23.9) | |
| 93.3 (100) | 91.4 (100) | 96.2 (100) | 98.7 (100) | 92.3 (100) | |
| 297 | 293 | 219 | 249 | 207 | |
| 2901 | 3886 | 3713 | 2985 | 2489 | |
| 84 (28.3) | 69 (23.5) | 60 (27.4) | 67 (26.9) | 56 (27.1) | |
| 89.8 (100) | 88.9 (100) | 82.5 (100) | 92.5 (100) | 96.7 (100) | |
| 649 | 489 | 631 | 692 | 435 | |
| 2901 | 3886 | 3713 | 2985 | 2489 | |
| 154 (23.7) | 100 (20.4) | 136 (21.6) | 138 (19.9) | 99 (22.8) | |
| 93.3 (100) | 90.0 (100) | 88.6 (100) | 98.4 (100) | 96.4 (100) | |
| 910 | 915 | 861 | 906 | 573 | |
| 2901 | 3886 | 3713 | 2985 | 2489 | |
| 178 (19.6) | 179 (19.6) | 186 (21.6) | 178 (19.6) | 166 (29.0) | |
| 93.2 (100) | 90.1 (100) | 88.9 (100) | 98.8 (100) | 97.3 (100) | |
Abbreviation: SNP, single nucleotide polymorphism.
Summary of two-field prediction accuracies (call rate) based on published HIBAG Asian or HIBAG multi-ethnic parameters as references when JPDSC data were used as the independent Japanese validation set
| 260 | 336 | 347 | 319 | 273 | |
| 606 | 713 | 611 | 696 | 527 | |
| 40 (15.4) | 55 (16.4) | 55 (15.9) | 51 (16.0) | 46 (16.8) | |
| Accuracy (call rate) | 92.8 (100) | 92.2 (100) | 96.6 (100) | 91.9 (100) | 96.1 (100) |
| 910 | 915 | 945 | 861 | 573 | |
| 2901 | 3886 | 2916 | 3713 | 2489 | |
| 186 (20.4) | 198 (21.6) | 208 (22.0) | 188 (21.8) | 158 (27.6) | |
| 92.0 (100) | 92.0 (100) | 96.5 (100) | 88.5 (100) | 97.0 (100) | |
Abbreviation: SNP, single nucleotide polymorphism.
Accuracies of two-field HLA predictions in internal validation test with THC or JPDSC data and different genotyping platforms
| 371 | 381 | 379 | 410 | 305 | |
| 212 | 216 | 213 | 210 | 210 | |
| 203 | 199 | 202 | 205 | 205 | |
| 94.6 (100) | 93.0 (100) | 95.5 (100) | 98.3 (100) | 97.8 (100) | |
| 95.3 (94.6) | 95.9 (84.9) | 97.3 (91.6) | 99.0 (96.6) | 98.2 (94.1) | |
| 1175 | 1044 | 1806 | 1962 | 1372 | |
| 212 | 218 | 215 | 212 | 211 | |
| 204 | 198 | 201 | 204 | 205 | |
| 95.1 (100) | 91.7 (100) | 95.8 (100) | 98.3 (100) | 96.6 (100) | |
| 95.5 (98.0) | 95.1 (88.4) | 97.9 (94.0) | 98.3 (100) | 97.4 (95.1) | |
| 1133 | 1437 | 1350 | 1436 | 1051 | |
| 213 | 219 | 215 | 213 | 212 | |
| 205 | 199 | 203 | 205 | 206 | |
| 93.9 (100) | 92.2 (100) | 95.8 (100) | 98.3 (100) | 97.8 (100) | |
| 95.2 (95.6) | 95.6 (85.4) | 97.6 (91.6) | 99.5 (96.6) | 98.2 (97.1) | |
| 1555 | 1668 | 1772 | 1778 | 1381 | |
| 1502 | 1513 | 1502 | 1508 | 1501 | |
| 1492 | 1481 | 1492 | 1486 | 1493 | |
| 95.7 (100) | 97.1 (100) | 98.9 (100) | 98.4 (100) | 98.8 (100) | |
| 96.7 (96.7) | 97.6 (96.4) | 99.0 (99.7) | 98.7 (98.7) | 99.0 (99.3) | |
Abbreviations: HLA, human leukocyte antigen; JPDSC, Japan PGx Data Science Consortium; SNP, single nucleotide polymorphism; THC, Tokyo Healthy Control.
Summary of accuracies for two-field HLA predictions from cross-validation tests with the THC and JPDSC data sets
| | ||||
| No. of SNPs | 1133 | 1437 | 1350 | 1051 |
| No. of training samples | 418 | 418 | 418 | 418 |
| No. of missing SNPs (%) | 35 (3.1) | 58 (4.0) | 19 (1.4) | 28 (2.7) |
| Call threshold=0 | 93.5 (100) | 95.5 (100) | 96.8 (100) | 97.8 (100) |
| Call threshold=0.5 | 94.7 (94.8) | 97.4 (92.6) | 98.0 (95.0) | 98.2 (98.2) |
| | ||||
| No. of SNPs | 1623 | 1750 | 1807 | 1301 |
| No. of training samples | 2994 | 2994 | 2994 | 2994 |
| No. of missing SNPs (%) | 733 (45.2) | 639 (36.5) | 868 (48.0) | 590 (45.3) |
| Call threshold=0 | 95.3 (100) | 96.9 (100) | 98.1 (100) | 98.7 (100) |
| Call threshold=0.5 | 96.2 (96.7) | 97.5 (95.7) | 99.1 (96.7) | 98.8 (99.5) |
Abbreviations: HLA, human leukocyte antigen; JPDSC, Japan PGx Data Science Consortium; SNP, single nucleotide polymorphism; THC, Tokyo Healthy Control.
Summary of accuracies for two-field HLA predictions for the Japanese narcolepsy with cataplexy data set based on THC references
| No. of SNPs | 381 | 379 | 410 | 305 |
| No. of training samples | 418 | 418 | 418 | 418 |
| No. of validation samples | 398 | 86 | 398 | 398 |
| Call threshold=0 | 94.5 (100) | 92.4 (100) | 98.6 (100) | 97.2 (100) |
| Call threshold=0.5 | 97.0 (79.1) | 95.1 (83.7) | 99.6 (69.6) | 97.7 (97.7) |
Only 86 individuals are typed at HLA-DRB1.