| Literature DB >> 28697761 |
Marc Jeanmougin1, Josselin Noirel1, Cédric Coulonges1, Jean-François Zagury2.
Abstract
BACKGROUND: The major histocompatibility complex (MHC) region of the human genome, and specifically the human leukocyte antigen (HLA) genes, play a major role in numerous human diseases. With the recent progress of sequencing methods (eg, Next-Generation Sequencing, NGS), the accurate genotyping of this region has become possible but remains relatively costly. In order to obtain the HLA information for the millions of samples already genotyped by chips in the past ten years, efficient bioinformatics tools, such as SNP2HLA or HIBAG, have been developed that infer HLA information from the linkage disequilibrium existing between HLA alleles and SNP markers in the MHC region.Entities:
Keywords: Human leukocyte antigen; Imputation; Major histocompatibility complex
Mesh:
Substances:
Year: 2017 PMID: 28697761 PMCID: PMC5504728 DOI: 10.1186/s12859-017-1746-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of imputation scores using vanilla SNP2HLA vs our method: the results are quite similar
| HLA (two-field) | Test population |
|
|
|---|---|---|---|
|
| 865 | 97.6 | 97.1 |
|
| 1495 | 96 | 94.9 |
|
| 813 | 95.8 | 95.9 |
|
| 800 | 87.9 | 87.8 |
|
| 974 | 94.8 | 96.4 |
Fig. 1HLA-check pipeline: We start by augmenting the panel to test to get more SNP using an imputation phase with the 1000genomes data, then we compare those SNP alleles with their theoretical values for all possible HLA allele pairs
Fig. 2Histogram of HLA-check distance (D) distribution obtained for each HLA locus (A, B, C, DRB, DPA, DPB, DQA, DQB) comparing the actual T1DGC HLA types (blue) with a randomized set of HLA types from T1DGC (red). In the x-axis, the D value, in the y-axis, the number of subjects (genotypes) obtaining this value. For each SNP genotype, D was computed as described in the Material and Methods. One can see the clear-cut difference between the two distributions for class I HLA genes
Number of SNP markers used for each HLA gene (exonic SNPs that can be imputed from 1000genomes)
| HLA |
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| #snps | 118 | 118 | 110 | 41 | 46 | 42 | 71 | 74 |
Test subjects eliminated a priori from the 1958BC test dataset
| HLA |
|
|
|
|
|---|---|---|---|---|
| 1958BC (two-field) | 18/865 | 60/1495 | 77/813 | 28/974 |
| 1958BC (one-field) | 35/1669 | 61/1562 | 99/1291 | 103/1701 |
Imputation accuracy without any processing, then with the filtering applied with our scoring method
| Gene | Test population | Base imputation | People removed | Sub-population imputation |
|---|---|---|---|---|
| HLA (two-field) | ||||
|
| 865 | 97.6 | 30 (40% correct) | 99.6 |
|
| 1495 | 96 | 112 (65% correct) | 98.5 |
|
| 813 | 95.8 | 81 (60% correct) | 99.5 |
|
| 974 | 94.8 | 40 (60% correct) | 96.1 |
| HLA (one-field) | ||||
|
| 1669 | 97.8 | 62 (40% correct) | 99.9 |
|
| 1562 | 97.1 | 119 (70% correct) | 99.5 |
|
| 1291 | 95.8 | 125 (60% correct) | 99.9 |
|
| 1701 | 98.2 | 175 (90% correct) | 99.2 |
T1DGC was used as our reference panel and 1958BC as our test panel. We also precise the percentage of correct imputations that were removed
Imputation accuracy for HIBAG. Unlike SNP2HLA, we did not use T1DGC as a reference panel but a precomputed model due to the way HIBAG works. Nevertheless, our results are very similar to those obtained with SNP2HLA
| HLA (two-field) | Test population | Base imputation | People removed | Sub-population imputation |
|---|---|---|---|---|
|
| 865 | 97 | 36 | 99.5 |
|
| 1495 | 95 | 84 | 97.3 |
|
| 813 | 95 | 83 | 99.7 |