| Literature DB >> 25908942 |
Yazhi Huang1, Jing Yang1, Dingge Ying1, Yan Zhang1, Vorasuk Shotelersuk2, Nattiya Hirankarn3, Pak Chung Sham4, Yu Lung Lau1, Wanling Yang1.
Abstract
Human leukocyte antigen (HLA) typing from next generation sequencing (NGS) data has the potential for widespread applications. Here we introduce a novel tool (HLAreporter) for HLA typing from NGS data based on read-mapping using a comprehensive reference panel containing all known HLA alleles, followed by de novo assembly of the gene-specific short reads. Accurate HLA typing at high-digit resolution was achieved when it was tested on publicly available NGS data, outperforming other newly developed tools such as HLAminer and PHLAT. HLAreporter can be downloaded from http://paed.hku.hk/genome/.Entities:
Year: 2015 PMID: 25908942 PMCID: PMC4407542 DOI: 10.1186/s13073-015-0145-3
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1HLAreporter detection flow using the HLA-DRB1 gene as an example. Classification of reads to a specific gene using CRP panel-based mapping is shown in stages 1 and 2. Assembly and contig-HLA matching are shown in stages 3, 4, and 5.
Figure 2Mapping efficiency for HLA-DRB1 genes. Sample SRR360148 is heterozygote with alleles DRB1*01:01 and DRB1*15:01. Sample SRR359103 is heterozygote with alleles DRB1*03:01 and DRB1*07:01. (a) Difference in the number of reads captured on the exon 2 region for sample SRR360148. Clearly a multiple allele-based mapping panel that contains eight different alleles of HLA-DRB1 outperforms a single reference hg19*15:01:01:01 (that is, PGF). (b) The total number of reads captured using hg19*15:01:01:01 as reference only versus a multiple allele-based reference panel for samples SRR360148 and SRR359103. Using one mapping reference would lose quite a number of short reads.
HLA predictions of class I and class II genes
|
|
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 359102 | *30:01; *30:02; *30:04; *66:01 | *30:02:01G; *66:01:01G | *15:83; *18:01; *18:26; *41:01; *45:01; *50:01 | *18:01:01G; *41:02:01 | *05:01; *17:01 | *05:01:01G; *17:01:01G | *03:01; *07:01 | *03:01:01G | *03:01:01 | *02:01 | *02:01:01G | *02:01:01 |
| 359103 | *01:01; *01:03; *02:01; *02:03; *11:02; *68:08 | *01:01:01G; *02:01:01G | *18:01; *18:03; *57:01 | *18:01:01G; *57:01:01G | *07:01 | *07:01:01G | *03:01; *07:01 | *03:01:01G; *07:01:01G | *03:01:01; *07:01:01 | *02:01; *03:03 | *02:01:01G; *03:03:02G | *02:01:01; *03:03:02 |
| 359108 | - | *03:01:01G; *68:02:01Gp | - | *35:01:01G; *53:01:01 | *04:01 | *04:01:01G | - | *04:05:01; *08:04:01 | *04:05:01; *08:04:01 | - | *03:01:01G; *03:02:01G | *03:01:04; *03:02:01 |
| 359298 | *11:01; *11:02; *11:50; *24:02; *24:07; *24:20c | *11:02:01G; *24:07p | *27:04; *27:25; *39:34; *40:02; *40:06 | *27:04:01G; *39:05:01 | *08:01; *08:21; *12:02; *12:03 | *08:01:01G; *12:02:01Gp | *04:03; *08:03; *12:01; *14:54c | *08:03:02; *12:02:01 | *08:03:02; *12:02:01 | *03:01; *06:01 | *03:01:01G; *06:01:01G | *03:01:01; *06:01:01 |
| 359295 | *02:03; *03:01 | *02:03:01G; *03:01:01Gp | *35:01; *35:03; *37:01; *55:02; *55:48; *56:01 | *35:03:01G; *55:02:01Gp | *01:02; *04:01; *04:03; *12:03; *15:02; *15:16c | *04:01:01G; *12:03:01G | *04:03; *07:01; *08:03; *14:05 | *08:02:01; *14:05:01 | *08:02:01; *14:05:01 | *03:02; *03:03; *03:05; *05:03 | *04:02:01; *05:03:01G | *04:02:01; *05:03:01 |
| 360655 | *30:01; *30:02; *30:04; *32:01; *74:01; *74:11 | *30:02:01G; *74:01:01G | *15:03; *57:01; *57:06; *57:11 | *15:03:01G; *57:03:01p | *02:02; *02:11; *07:01 | *02:10; *07:01:01Gp | *07:01; *08:03; *11:01; *13:02 | *11:01:02; *13:02:01 | *11:01:02; *13:02:01 | *05:01; *05:03; *06:09 | *05:02:01G; *06:09:01 | *05:02:01; *06:09:01 |
| 360288 | *02:01 | *02:01:01G; *02:11:01G | *15:01; *15:07; *15:32; *35:14; *58:01 | *15:04; *35:05:01 | *01:02; *04:01; *04:03; *04:06 | *01:02:01G; *04:01:01G | *04:03; *07:01 | *04:11:01; *09:01:02 | *04:11:01; *09:01:02 | *03:02 | *03:02:01G; *03:03:02G | *03:02:01; *03:03:02 |
| 360391 | *02:01; *02:48; *68:01 | *02:01:01G; *68:01:02Gp | *07:02; *40:02; *40:06 | *07:02:01G; *40:02:01Gp | *03:03; *03:04; *07:02 | *03:04:01G; *07:02:01Gp | *01:01; *07:01 | *01:03; *09:01:02 | *01:03; *09:01:02 | *03:03; *05:01 | *03:03:02G; *05:01:01G | *03:03:02; *05:01:01 |
| 360148 | *01:01; *02:01; *36:01 | *02:01:01G; *36:01 | *07:02; *35:01; *35:41; *40:01; *40:79; *53:01c | *35:01:01G; *40:01:01Gp | *03:02; *03:04; *04:01; *04:03; *15:02; *15:17c | *03:04:01G; *04:01:01G | *01:01; *01:02; *07:01; *15:01 | *01:01:01G; *15:01:01G | *01:01:01; *15:01:01 | *05:01; *06:02 | *06:02:01G; *05:01:01G | *06:02:01; *05:01:01 |
| 359301 | *02:03; *11:02; *31:01; *32:01; *74:01; *74:11 | *02:03:01G; *31:01:02Gp | *13:01; *48:01 | *13:01:01G; *48:01:01G | *03:03; *03:04 | *03:03:01G; *03:04:01G | *07:01; *08:03; *11:01 | *11:01:01G; *13:12:01 | *11:01:01; *13:12:01 | *03:01 | *03:01:01G | *03:01:01 |
| 359098 | - | *03:01:01G; *68:02:01Gp | - | *35:01:01G; *53:01:01 | *04:01 | *04:01:01G | - | *04:05:01; *08:04:01 | *04:05:01; *08:04:01 | - | *03:01:01G; *03:02:01G | *03:01:04; *03:02:01 |
a‘SRR’ is the prefix of each sample name, which is not explicitly shown in the table due to space limitations. bAll alleles with identical exon 2 and 3 sequences are reported. For example, after examining exons 2 and 3 of allele DQB1*03:03:02G, HLAreporter reports alleles 03:03:02:01/03:03:02:02/03:03:02:03. Since the last two digits out of eight digit-based HLA nomenclature are determined by intronic sequences, we only present the first six digits *03:03:02 in the table after examining the minor exon. cAdditional ambiguity at four-digit resolution is not shown. pPhase was reported.
Statistics of HLA typing results from 51 HapMap samples
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 10 × = 100% and 20× ≥98% | HLA-A | 51 | 3 | 6% | 100% | 100% |
| HLA-B | 51 | 7 | 14% | 100% | 100% | |
| HLA-C | 51 | 4 | 8% | 100% | 100% | |
| HLA-DRB1 | 46 | 28 | 61% | 100% | 100% | |
| HLA-DQB1 | 51 | 20 | 39% | 100% | 100% | |
| HLA-DQA1 | 51 | 20 | 39% | 100% | 100% | |
| 10 × = 100% and 20× ≥90% | HLA-A | 51 | 18 | 35% | 81% | 100% |
| HLA-B | 51 | 18 | 35% | 83% | 100% | |
| HLA-C | 51 | 11 | 22% | 100% | 100% | |
| HLA-DRB1 | 46 | 40 | 87% | 99% | 100% | |
| HLA-DQB1 | 51 | 27 | 53% | 100% | 100% | |
| HLA-DQA1 | 51 | 35 | 69% | 100% | 100% | |
| 10× ≥95% | HLA-DRB1 | 46 | 45 | 98% | 96% | 98% |
| HLA-DQB1 | 51 | 46 | 90% | 91% | 99% | |
| HLA-DQA1 | 51 | 43 | 84% | 100% | 100% |
aQuality standard (QS) ‘10 × ’ represents the percentage of locations with coverage depth greater than 10-fold on the targeted exon. Accordingly, ‘10× ≥95%’ means the pre-defined percentage (that is, ‘10 × ’) is 95% or above. (‘20 × ’ has a similar definition). bThe four-digit (two-digit) percentage is equal to the number of HLA calls at four-digit (two-digit) resolution without ambiguity divided by the total number of alleles.
Figure 3HLA distribution profiles. (a) Allele frequency distribution of HLA-DRB1 in the Hong Kong (HK) Chinese and China Canton Han populations. (b) Haplotype frequency distribution of DRB1-DQB1 in the Hong Kong Chinese and China Canton Han populations.
HLA long haplotypes observed in our data and their population distribution records in a public database
|
|
|
|
|
|---|---|---|---|
| A*33:03-B*58:01-C*03:02-DRB1*03:01-DQB1*02:01 | 3.36 | USA Asian; V; SK; G | 2.21; 3.50; 1.90; 0.25 |
| A*02:07-B*46:01-C*01:02-DRB1*14:01-DQB1*05:02 | 1.34 | USA Asianb | 0.13 |
| A*02:07-B*46:01-C*01:02-DRB1*04:05-DQB1*04:01 | 1.01 | USA Asian | 0.11 |
| A*02:07-B*46:01-C*01:02-DRB1*09:01-DQB1*03:03 | 1.68 | USA Asian | 1.54; 2.00 |
| A*11:01-B*13:01-C*03:04-DRB1*16:02-DQB1*05:02 | 0.67 | USA Asian; Hispanic | 0.18; 0.05 |
| A*11:01-B*15:02-C*08:01-DRB1*15:01-DQB1*06:01 | 1.68 | USA Asian | 0.31 |
| A*11:01-B*15:02-C*08:01-DRB1*12:02-DQB1*03:01 | 3.02 | USA Asian; Yunnanc | 1.41; 1.70-3.40 |
aFull names of the populations in the database are USA Asian pop 2 (USA Asian), Vietnam Hanoi Kinh pop 2 (V), South Korea pop 3 (SK), Germany DKMS-Turkey minority (G), USA Hispanic pop 2 (Hispanic), China Yunan (Yunan). bThis population has a relatively large sample size of 1,772 in the database. cSeveral populations in this Yunnan group with small sample size are not shown.