| Literature DB >> 28499414 |
Sojeong Ka1, Sunho Lee2, Jonghee Hong1, Yangrae Cho2, Joohon Sung3, Han-Na Kim4, Hyung-Lae Kim5, Jongsun Jung6.
Abstract
BACKGROUND: Several recent studies showed that next-generation sequencing (NGS)-based human leukocyte antigen (HLA) typing is a feasible and promising technique for variant calling of highly polymorphic regions. To date, however, no method with sufficient read depth has completely solved the allele phasing issue. In this study, we developed a new method (HLAscan) for HLA genotyping using NGS data.Entities:
Keywords: HLA typing; HLAscan; Next-generation sequencing; Phasing issue
Mesh:
Substances:
Year: 2017 PMID: 28499414 PMCID: PMC5427585 DOI: 10.1186/s12859-017-1671-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1HLAscan workflow. The algorithm of HLAscan is explained schematically in five main steps. Step 1 depicts collection of read sequences of HLA genes produced from a sample. Step 2 demonstrates alignment of HLA-A gene read sequence to the human reference genome sequence. In step 3, HLA-A read sequences are aligned to specific allele types. From the candidate alleles, true allele types are determined by applying a score function (step 3 to step 4) and resolving phasing issues (step 4 to step 5). Gray vertical lines under reference sequences represent positions with sequence variance. Black arrows in alleles A*02, A*03, and A*05 of step 3 indicate genetic positions with no sequence reads aligned. Circled bases in step 4, A and T in A*01, and T in A*04 represent unique sequences that are not redundant with base sequences in any other ranked alleles
Comparison of the performance of three methods using 1000 Genomes Project data
| Methods | No. of examined alleles | Phase* | Wrong | Wrong | Accuracy | Accuracy |
|---|---|---|---|---|---|---|
| HLAreporter1 | 110 | 13 | 2 | 2 | 98% | 98% |
| PHLAT2 | 100 | - | 3 | 5 | 97% | 95% |
| HLAscan3 | 110 | - | 0 | 0 | 100% | 100% |
(1 Published [17]; 2 Published [15]; 3 In this study). * Multiple alleles were predicted due to ambiguous localization of sequence variants or unsolved phasing issues of various sequences
Comparison of HLA typing accuracies using HapMap data
| Gene | A | B | C | DQB1 | DRB1 | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| # alleles | 36 | 36 | 22 | 92 | 90 | |||||
| Methods | HLA reporter | HLA scan | HLA reporter | HLA scan | HLA reporter | HLA scan | HLA reporter | HLA scan | HLA reporter | HLA scan |
| Phase | 5 | - | 6 | - | 4 | - | 0 | - | 2 | - |
| Inaccurate | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 3 |
| Inaccurate* | 7 | 0 | 6 | 0 | 1 | 2 | 10 | 8 | 4 | 4 |
| Accuracy | 100% | 100% | 100% | 100% | 100% | 100% | 98.9% | 100% | 97.8% | 96.7% |
| Accuracy | 80.5% | 100% | 83.3% | 100% | 95.5% | 90.9% | 89.1% | 91.3% | 95.6% | 95.6% |
Comparison of typing results obtained using HLAreporter and HLAscan for HLA-A, −B, and -C (class I) and HLA-DRB1 and -DQB1 (class II). Verified HLA typing results were reported elsewhere [12]. * Inaccurate typing includes both mistyped and ambiguous cases
Differences in typing results of HapMap data. Known HLA typing results were reported elsewhere [12]
| Genes | Known HLA type | Predictions of HLAscan | # of the case | ||
|---|---|---|---|---|---|
| Allele1 | Allele2 | Allele1 | Allele2 | ||
|
| xx:yy* | 02:01 | xx:yy* |
| 6 |
| pp:qq* | 06:05 | pp:qq* |
| 2 | |
|
| 15:01 | 15:01 | 15:01 |
| 3 |
| 11:04 | 14:01 | 11:04 |
| 1 | |
Asterisks (*) indicate alleles with multiple types
Fig. 2An example of mistyping DQB1*02:02:01:01 as DQB1*02:01:01:01. Sequence view showing actual alignment of sequence reads at exon 3 of DQB1*02:02:01:01 a and DQB1*02:02:01:01 (b). Consecutive dots under base calls represent sequence reads, and spaces without dots indicate that no sequence reads are aligned to the corresponding sequences. Pink spaces at position 161 show the status of sequence alignment over the SNP position that differs between DQB1*02:02:01:01 and DQB1*02:01:01:01. Actual mapping view of the sequence reads from NA11830 sample was generated in SAMtools tview
Accuracy prediction of PCR-SBT, HLAreporter, and HLAscan using samples from five Korean subjects
| Samples | Method |
|
|
| |||
|---|---|---|---|---|---|---|---|
| 77072421 NS1512240004 | PCR-SBT | 02:06 | 02:10 | 40:02 | 55:02 | 04:05 | 11:01 |
| HLAreporter | 02:10 | 02:10 | 40:02:01 | 55:02:01 | 04:05:01 | 11:01:01 | |
| HLAscan | 02:06:01 | 02:10 | 40:02:01 | 55:02:01 | 04:05:01 | 11:01:01 | |
| 77072412 NS1512240008 | PCR-SBT | 24:02 | 31:01 | 35:01 | 51:02 | 09:01 | 09:01 |
| HLAreporter | 24:82 | 31:01:02 | 35:42:02 | 51:02:02 | 09:01:02 | 09:01:02 | |
| HLAscan | 24:02:01 | 31:01:13 | 35:01:01 | 51:02:01 | 09:01:02 | 09:01:02 | |
| 77072374 NS1512240012 | PCR-SBT | 02:01 | 33:03 | 15:01 | 44:03 | 09:01 | 13:02 |
| HLAreporter | 02:01:01 | 33:03:01 | 15:01:01 | 44:03:11 | 09:01:02 | 13:02:01 | |
| HLAscan | 02:01:01 | 33:03:23 | 15:01:01 | 44:03:01 | 09:01:02 | 13:02:01 | |
| 77072406 NS1512240016 | PCR-SBT | 11:01 | 26:01 | 44:02 | 46:01 | 09:01 | 13:01 |
| HLAreporter | 11:01:01 | 26:01:01 | 44:02:01 | 46:01:01 | 09:01:02 | 13:01:01 | |
| HLAscan | 11:01:01:01 | 26:01:01:01 | 44:02:01 | 46:01:01 | 09:01:02 | 13:01:01 | |
| 77072287 NS1512240020 | PCR-SBT | 02:01 | 02:06 | 13:01 | 40:02 | 08:02 | 12:02 |
| HLAreporter | 02:01:01 | 02:01:01 | 13:01:01 | 40:02:01 | 08:02:01 | 12:02:01 | |
| HLAscan | 02:01:01 | 02:06:01 | 13:01:01 | 40:02:01 | 08:02:01 | 12:02:01 | |
Typing results different from those obtained by SBT methods are marked in red
Accuracy of HLA typing using data from nine families. Results obtained at the four-digit level are summarized in this table. A total of 520 alleles were examined with 94% accuracy (489 correct), 2.3% (12 cases) missed, and 3.7% (19 cases) mistyped
| 9 families | 90× (10 individuals) | 60× (17 individuals) | 30× (25 individuals) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # alleles | correct | missing | wrong | # alleles | correct | missing | wrong | # alleles | correct | missing | wrong | |
| HLA-A | 20 | 20 | 0 | 0 | 34 | 32 | 0 | 2 | 50 | 47 | 2 | 1 |
| HLA-B | 20 | 20 | 0 | 0 | 34 | 33 | 0 | 1 | 50 | 45 | 1 | 4 |
| HLA-C | 20 | 20 | 0 | 0 | 34 | 33 | 0 | 1 | 50 | 49 | 1 | 0 |
| HLA-DQB1 | 20 | 20 | 0 | 0 | 34 | 33 | 0 | 1 | 50 | 50 | 0 | 0 |
| HLA-DRB1 | 20 | 20 | 0 | 0 | 34 | 33 | 0 | 1 | 50 | 49 | 1 | 0 |
| All | 100 | 100 | 0 | 0 | 170 | 164 | 0 | 6 | 250 | 240 | 5 | 5 |
| Percentage | 100 | 0 | 0 | 96.5 | 0 | 3.5 | 96 | 2 | 2 | |||
Fig. 3Analysis of typing accuracy as a function of coverage depth. ROC curve depicting sensitivity and specificity of HLA gene prediction by HLAscan depending on depth coverage. Sensitivity and (1-specificity) were calculated by the ROC Analysis software [24], and curves in different colors were plotted for accumulated datasets at different coverage depth cutoffs