| Literature DB >> 21518453 |
Xinyi Cindy Zhang1, Shuying Sue Li, Hongwei Wang, John A Hansen, Lue Ping Zhao.
Abstract
BACKGROUND: Numerous immune-mediated diseases have been associated with the class I and II HLA genes located within the major histocompatibility complex (MHC) consisting of highly polymorphic alleles encoded by the HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1 loci. Genotyping for HLA alleles is complex and relatively expensive. Recent studies have demonstrated the feasibility of predicting HLA alleles, using MHC SNPs inside and outside of HLA that are typically included in SNP arrays and are commonly available in genome-wide association studies (GWAS). We have recently described a novel method that is complementary to the previous methods, for accurately predicting HLA alleles using unphased flanking SNPs genotypes. In this manuscript, we address several practical issues relevant to the application of this methodology.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21518453 PMCID: PMC3111398 DOI: 10.1186/1471-2156-12-39
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Characteristics of the three study cohorts
| FHCRC | WTCCC* | STEP | ||
|---|---|---|---|---|
| Caucasian | 1280 | 2730 | 411 | |
| Mestizo/Mestiza | 212 | |||
| Ethnicity | Others (African, Asian and other ethnic groups) | 226 | 209 | |
| HLA-A, -B, and -C | 1483 | 1929,1812,1558 | 832 | |
| HLA Loci | HLA-DRB1 | 1481 | 1969 | 832 |
| HLA-DQB1 | 1434 | 1965 | ||
| HLA-DPB1 | 832 | |||
| Affy 5.0 array | 1506 | |||
| SNP genotyping platforms | Affy 500K | 1480 | ||
| Affy 6.0 array | 2706 | |||
| Illumina 550K | 1438 | |||
| Illumina 1.2M | 2692 | 832 | ||
* In the WTCCC cohort, dubious samples were excluded according to WTCCC suggestions.
Number of samples in each study cohort that were of each ethnicity, genotyped for each HLA loci and genotyped on each SNP genotyping platform.
Numbers of overlapping SNPs between HapMap project and four genotyping platforms in the extended MHC region 1
| HapMap | Affy 500K | Affy 6.0 | Illumina 550K | Illumina 1.2M | |
|---|---|---|---|---|---|
| HapMap | 10600 | 1307 | 1979 | 1762 | 4382 |
| Affy 500K | 1422 | 1363 | 272 | 610 | |
| Affy 6.0 | 2325 | 459 | 988 | ||
| Illumina 550K | 1944 | 1885 | |||
| Illumina 1.2M | 6303 |
1 The range of extended MHC region is chr6: 28,799,220 - 34,204,868 [12].
Comparison of prediction accuracies for HLA predictive models built with and without imputed SNPs
| Without Imputed SNPs | With Imputed SNPs | ||||||
|---|---|---|---|---|---|---|---|
| HLA- | CT = 0 | CT = 0.5 | CT = 0.9 | CT = 0 | CT = 0.5 | CT = 0.9 | |
| Intermediate Resolution | A | 97 | 97(100)* | 99(87) | 98 | 98(100) | 99(89) |
| B | 95 | 95(99) | 98(86) | 96 | 96(100) | 98(88) | |
| C | 98 | 98(100) | 98(97) | 99 | 99(99) | 99(97) | |
| DRB1 | 93 | 93(99) | 97(78) | 93 | 93(100) | 98(78) | |
| DQB1 | 96 | 97(99) | 98(95) | 97 | 97(100) | 97(94) | |
| High Resolution | A | 95 | 95(100) | 97(86) | 97 | 97(100) | 98(86) |
| B | 93 | 93(98) | 96(78) | 93 | 94(97) | 96(72) | |
| C | 97 | 97(99) | 98(95) | 98 | 98(100) | 98(93) | |
| DRB1 | 83 | 87(85) | 94(48) | 87 | 88(95) | 95(59) | |
| DQB1 | 94 | 95(100) | 95(94) | 95 | 95(100) | 96(96) | |
* Prediction accuracy % (call rate %)
Using the Caucasian samples genotyped on the Affy 5.0 array in the FHCRC cohort, the accuracies of the predictive models built with the training set (N = 633) were
evaluated on the validation set (N = 627) for HLA-A, -B, -C, -DRB1 and -DQB1 at intermediate and high resolution, with CT = 0, 0.5 and 0.9.
Figure 1Comparison of prediction accuracies between models built with and without imputed SNPs from four arrays. Half of the common set of samples genotyped on Affy 500K, Affy 6.0, Illumina 550K, Illumina 1.2M arrays in the WTCCC cohort were used as the training set (N = 501) and the other half were used as the validation set (N = 500). Each panel shows a comparison of prediction accuracies for the validation set, with models built using only SNPs observed from the array or using HapMap SNPs observed and imputed from the array. The confidence threshold was set at CT = 0.
Kappa coefficients of observed or imputed SNP genotypes between genotyping platforms using the WTCCC cohort data
| Platform Pair | I-O | I-I | |||
|---|---|---|---|---|---|
| Affy 500K & Affy 6.0 | 0.9969 | 0.9901 | 0.9860 | 0.9978 | 0.9968 |
| Illumina 550K & Illumina 1.2M | 0.9996 | 0.9945 | 0.9943 | 0.9988 | 0.9977 |
| Affy 6.0 & Illumina 1.2M | 0.9970 | 0.9947 | 0.9913 | 0.9976 | 0.9955 |
| Affy 500K & Illumina 550K | 0.9976 | 0.9942 | 0.9847 | 0.9959 | 0.9942 |
| Affy 500K & Illumina 1.2M | 0.9962 | 0.9956 | 0.9876 | 0.9954 | 0.9931 |
| Illumina 550K & Affy 6.0 | 0.9979 | 0.9897 | 0.9934 | 0.9978 | 0.9962 |
* O stands for observed genotypes of HapMap SNPs;
+ I stands for imputed genotypes using IMPUTE v2 with HapMap reference panel of Caucasian, African and Asian.
# Overall stands for observed/imputed genotypes of HapMap SNPs;
Figure 2Comparison of genotyping arrays, with respect to their prediction accuracies. The training and validation sets were the same as those in Figure 1. Each panel shows a comparison of prediction accuracies of models built with and without imputed SNPs from four genotyping arrays at intermediate and high HLA resolution. The confidence threshold was set at CT = 0.
Figure 3Comparison of cross-platform prediction accuracies among four genotyping arrays. Each square panel shows the accuracies of HLA predictive models built using SNPs observed and imputed from one genotyping array and validated using SNPs observed and imputed from another array. The confidence threshold was set at CT = 0.
Figure 4Comparison of prediction accuracies across populations of the same ethnic group. The accuracies of HLA predictive models trained on samples from the WTCCC cohort (genotyped on Affy 500K) and validated on Caucasian samples in the FHCRC cohort (genotyped on Affy 5.0), and vice versa. Both observed and imputed HapMap SNPs were applied. The confidence threshold was set at CT = 0.
Figure 5Comparison of prediction accuracies between ethnic-specific model and multi-ethnic model. Building predictive models using both observed and imputed HapMap SNPs of multi-ethnicity or Caucasian only samples from the STEP cohort, the prediction accuracies for Caucasian/Mestizo/other ethnicities are shown in each panel. The size of the validation set is shown below each panel. Both the STEP and WTCCC cohort were genotyped on Illumina 1.2M. The confidence threshold was set at CT = 0.
Characteristics of samples in the training set of the final HLA predictive model
| Ethnicity | Freq | Array | Freq | HLA- | Intermediate | High |
|---|---|---|---|---|---|---|
| Caucasian | 4119 | Affy 5.0 | 1483 | A | 4027 | 4025 |
| Mestizo | 212 | Illumina 1.2M | 3279 | B | 3919 | 3897 |
| Hispanic | 137 | C | 3674 | 3648 | ||
| Black | 106 | DRB1 | 4063 | 4032 | ||
| Other | 188 | DQB1 | 3156 | 3066 | ||
| DPB1 | 832 | 832 |
Selected SNPs in the final HLA predictive models and their prediction accuracies
| Selected SNPs | Accuracy%(Call Rate%) | |||||||
|---|---|---|---|---|---|---|---|---|
| HLA- | Count | start | Stop | Length | CT = 0 | CT = 0.5 | CT = 0.9 | |
| Intermediate | A | 33 | 29,850,894 | 30,111,284 | 260,390 | 98 | 99(97) | 99(89) |
| B | 57 | 31,267,324 | 31,554,345 | 287,021 | 95 | 95(99) | 96(90) | |
| C | 36 | 31,280,634 | 31,458,359 | 177,725 | 99 | 99(100) | 99(93) | |
| DRB1 | 33 | 32,506,503 | 32,793,404 | 286,901 | 98 | 98(100) | 99(81) | |
| DQB1 | 20 | 32,444,139 | 32,778,222 | 334,083 | 98 | 98(100) | 99(94) | |
| DPB1 | 64 | 33,106,707 | 33,249,258 | 142,551 | - | - | - | |
| High | A | 59 | 29,918,924 | 30,171,347 | 252,423 | 96 | 97(95) | 97(79) |
| B | 85 | 31,370,902 | 31,561,526 | 190,624 | 94 | 95 (95) | 96 (81) | |
| C | 50 | 31,187,623 | 31,444,736 | 257,113 | 98 | 98(99) | 98(86) | |
| DRB1 | 41 | 32,477,466 | 32,811,823 | 334,357 | 91 | 94(92) | 98(49) | |
| DQB1 | 28 | 32,517,462 | 32,870,439 | 352,977 | 98 | 98(98) | 98(92) | |
| DPB1 | 69 | 33,093,572 | 33,290,873 | 197,301 | - | - | - | |
Prediction accuracies of HLA alleles with frequency exceeding a threshold
| Training set | Intermediate Resolution | High Resolution | |||||
|---|---|---|---|---|---|---|---|
| HLA- | 0.05* | 0.03 | 0 | 0.05 | 0.03 | 0 | |
| Caucasians in the FHCRC cohort (N = 1280) | A | 99 | 99 | 98 | 99 | 99 | 97 |
| B | 98 | 97 | 96 | 97 | 97 | 94 | |
| C | 97 | 96 | 96 | 97 | 97 | 96 | |
| DRB1 | 97 | 97 | 97 | 91 | 91 | 88 | |
| DQB1 | 99 | 99 | 99 | 98 | 98 | 98 | |
| Caucasians in the STEP cohort (N = 832) | A | 99 | 99 | 95 | 99 | 98 | 95 |
| B | 97 | 97 | 95 | 97 | 97 | 93 | |
| C | 96 | 94 | 94 | 95 | 95 | 94 | |
| DRB1 | 98 | 98 | 98 | 98 | 95 | 91 | |
* Threshold for HLA allele frequency
The accuracies of HLA predictive models, built with a training set of Caucasian samples from the FHCRC cohort
(genotyped on Affy 5.0) or the STEP cohort (genotyped on Illumina 1.2M), were evaluated using the WTCCC samples
(genotyped on Affy 500K) as the validation set (N = 1480). Only HLA alleles with frequency more than threshold
0.05, 0.03 or 0 in the validation set were evaluated for their prediction accuracies. The confidence threshold was set at CT = 0.