| Literature DB >> 27802932 |
Denis C Bauer1, Armella Zadoorian1,2, Laurence O W Wilson1, Natalie P Thorne3,4,5,6.
Abstract
Motivation: Despite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative.Entities:
Mesh:
Substances:
Year: 2018 PMID: 27802932 PMCID: PMC6019030 DOI: 10.1093/bib/bbw097
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Genomic location and nomenclature. (A) Genomic location of the HLA Class I and II genes. (B) Nomenclature of HLA Alleles. (C) Tools grouped by category as discussed in the text. Underlined names denote tools designed for RNA and DNA data.
PCR-based method used in the gold standard data set
| Data set | PCR method | Sample numbers |
|---|---|---|
| De Bakker | PCR-SSOP amplification followed by visualization hybridization patterns via autoradiography | 229 |
| Erlich | SO hybridization and exon sequencing using the Roche 454 GS FLX Titanium platform | 12 |
| Warren | PCR amplicons cloned and sequenced using an ABI 3730XL instrument (Class I only) | 16 |
| Liu | PCR amplification followed by Sanger sequencing of the exons (SBT) | 13 |
| Bai | PCR amplification followed by Sanger sequencing of the exons (SBT) (HLA-A and -B loci only) | 5 |
| Gourraud | PCR amplification followed by Sanger sequencing of the exons (SBT) | 1233 |
Figure 2Gold standard. (A) Samples in common between different studies. The image does not show the 33 samples from Bai et al. (5), Liu et al. (13) and Warren et al. (16), as only three samples from Liu et al. intersect with the other studies. (B) Agreement in HLA typing of 42 samples where there were discordant results between at least two studies. (C) Total number of samples with HLA typing information and tested in this study, as well as the number of these samples used by the different prediction tools for development. Y-axis on square root scale.
Overview of the computational HLA typing methods published to date
| Tool name | Class | Resolution | Chrom-specific | Input | Method | Approach | Maintained | Tested | Data set | Self-reported two-digit accuracy (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| P | I+II | 4 digits | Y | DNA | Alignment | Two-step Bayesian classification approach involving alignment by reads to IMGT reference; keeping best scoring alignment | Y | N | 253 HapMap WES | 97 |
| HLA | I+II | 4 digits | Y | DNA | Alignment+ Assembly | Identifies HLA allele matches based on scoring system for assembled contigs. | Y | N | 82 WES from 1KG and HapMap | 100 |
| HLA | I+II | 6 digits | Y | DNA | Alignment | Coverage-based genotype of exclusively mapped read intended for use on in-solution targeted capturing | N | N | 357 cell lines | 99 |
| HLA | I+II | 8 digits | N | DNA | Alignment | Variational Bayesian and posterior distribution to optimize read alignments and scoring system for typing. | Y | Y | simulated data, CEU trio | 100 |
| I | 4 digits | N | DNA/RNA | Alignment | Constructed hit matrix and used integer linear programming for the optimization. | Y | Y | 253 1KG exome | 97 | |
| PHLAT [1] | I+II | 6 digits | Y | DNA/RNA | Alignment | Gaussian distribution for testing statistical significance of selected candidate alleles. | Y | Y | 50 HapMap RNA, 10 1KG exome, 15 Hapmap exome | 99 |
| O | I+II | 6 digits | Y | DNA | Alignment | Uses own formula for allowing mismatches, then successively discards alleles until it reports allele pairs containing high number of mapped reads and adequate exon coverage. | Y | N | 447 1KG genome, 217 1KG exome | 90 |
| HLA | I+II | 4 digits | Y | RNA | Alignment | Builds weighted alignment tree (generates own alignment probability). | N | N | simulated data, own, 50 HapMap RNAseq, 16 CRC RNAseq | 99 |
| ATHLATES [18] | I+II | 4 digits | Y | DNA | Assembly | Identifies candidate alleles based on their Hamming distance. | N | N | 16 1KG WES, 13 own | 99 |
| I+II | 4 digits | Y | RNA | Alignment | Calculated variability at positions across exons 2 and 3 using Shannon’s entropy, and information content using binary logarithm formulation. | Y | Y | 50 HapMap RNAseq and 37 own RNAseq | 96 | |
| HLA | I+II | 4 digits | Y | DNA/RNA | Alignment or Assembly | Putative HLA alleles are characterized based on scoring system of assembled contigs. | Y | Y | simulated data; 16 own RNAseq; 20 HapMap | NA |
| [24] | I+II | 4 digits | NA | DNA | Alignment + Assembly | HLA typing based on coverage information of aligned reads supplemented by contig matching for unseen mutations | N | N | 40 cell lines, 59 WGS | 99 |
Chrom-specific refers to the ability of the tool to predict the allele on each chromosome separately rather than returning the two most likely genotypes overall. 1000 Genomes abbreviated
Requires the commercial aligner Novoalign.
Code not executable (conversation with developer).
Code not executable (no reply from developer).
Limit on samples for free version.
Communication with developer: discontinued.
Reports sensitivity and specificity.
No code available.
Accuracy table NGS data for Class I + II
| Data set (Samples) | Tool | Accuracy (Success) | ||
|---|---|---|---|---|
| WGS | optitype | 35% (71%) | 6 | |
| (993) | hlavbseq | 0 | ||
| hlaminer assembly | 17% (36%) | 23% (49%) | 19 | |
| hlaminer alignment | 15% (26%) | 20% (35%) | 0 | |
| phlat | 38% (46%) | 0 | ||
| seq2hla* | 7% (12%) | 9% (32%) | 0 | |
| WES | optitype | 49% (98%) | 1 | |
| (992) | hlavbseq | 68% (68%) | 0 | |
| hlaminer assembly | 43% (49%) | 53% (61%) | 0 | |
| hlaminer alignment | 26% (27%) | 42% (43%) | 0 | |
| phlat | 0 | |||
| seq2hla* | 60% (61%) | 71% (71%) | 0 | |
| RNA | optitype | 50% (99%) | 0 | |
| (373) | hlavbseq* | 67% (67%) | 80% (80%) | 0 |
| hlaminer assembly | 52% (61%) | 61% (71%) | 0 | |
| hlaminer alignment | 20% (20%) | 30% (30%) | 0 | |
| phlat | 0 | |||
| seq2hla | 79% (79%) | 0 |
HLA typing results for four-digit resolution on 1000 Genomes Project samples. Bold highlights the best performance in the category.
‘*’ labels tools that were not designed to handle DNA or RNA data, respectively.
predicts Class I only, hence can only achieve an accuracy of 50%.
Please see Supplementary Tables S1–S3 for Class I comparison.
Figure 3Accuracy and success rate for each tool for the three different data sets and two different resolutions (two and four digits).
Figure 4Association between coverage and accuracy. (A) Class I + II accuracy versus the average coverage over the HLA region (6:29677984-33485635) as mapped by Razers3 [35]. Note while only predicts Class I loci, the plot shows some samples reaching >50% owing to these samples lacking a PCR-determined Class II genotype. (B) Correlation of the prediction accuracy for each sample between the different tools as well as the read coverage in this sample.
Figure 5Runtime of the different tools showing the breakdown of different tasks. Y-axis on square root scale.
Figure 6Memory consumption of the different tools. Y-axis on square root scale.