| Literature DB >> 28547825 |
V Albrecht1, C Zweiniger1, V Surendranath1, K Lang1, G Schöfl1, A Dahl2, S Winkler3, V Lange1, I Böhme1, A H Schmidt1,4.
Abstract
The high-throughput department of DKMS Life Science Lab encounters novel human leukocyte antigen (HLA) alleles on a daily basis. To characterise these alleles, we have developed a system to sequence the whole gene from 5'- to 3'-UTR for the HLA loci A, B, C, DQB1 and DPB1 for submission to the European Molecular Biology Laboratory - European Nucleotide Archive (EMBL-ENA) and the IPD-IMGT/HLA Database. Our workflow is based on a dual redundant sequencing strategy. Using shotgun sequencing on an Illumina MiSeq instrument and single molecule real-time (SMRT) sequencing on a PacBio RS II instrument, we are able to achieve highly accurate HLA full-length consensus sequences. Remaining conflicts are resolved using the R package DR2S (Dual Redundant Reference Sequencing). Given the relatively high throughput of this strategy, we have developed the semi-automated web service TypeLoader, to aid in the submission of sequences to the EMBL-ENA and the IPD-IMGT/HLA Database. In the IPD-IMGT/HLA Database release 3.24.0 (April 2016; prior to the submission of the sequences described here), only 5.2% of all known HLA alleles have been fully characterised together with intronic and UTR sequences. So far, we have applied our strategy to characterise and submit 1056 HLA alleles, thereby more than doubling the number of fully characterised alleles. Given the increasing application of next generation sequencing (NGS) for full gene characterisation in clinical practice, extending the HLA database concomitantly is highly desirable. Therefore, we propose this dual redundant sequencing strategy as a workflow for submission of novel full-length alleles and characterisation of sequences that are as yet incomplete. This would help to mitigate the predominance of partially known alleles in the database.Entities:
Keywords: HLA typing; NGS; PacBio; full-length gene sequencing; novel HLA alleles
Mesh:
Substances:
Year: 2017 PMID: 28547825 PMCID: PMC6084308 DOI: 10.1111/tan.13057
Source DB: PubMed Journal: HLA ISSN: 2059-2302 Impact factor: 4.513
Tools and databases
| Tools/databases | Description |
|---|---|
| NGSengine | Platform‐independent software for the high‐resolution identification of HLA alleles by NGS ( |
| DR2S | Dual Redundant Reference Sequencing (DR2S) is an R package designed to facilitate the generation of reliable, full‐length phase‐defined reference sequences for novel HLA alleles ( |
| TypeLoader | Automatic EMBL‐ENA and IPD‐IMGT/HLA Database Upload Generator. This web service takes FASTA and GenDx generated XML files, processes them and automatically generates data for direct submission to the EMBL‐ENA and the IPD‐IMGT/HLA Database ( |
| ENA | The EMBL‐ENA provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation ( |
| IPD‐IMGT/HLA Database | The IPD‐IMGT/HLA Database provides a specialist database for sequences of the MHC and includes the official sequences named by the WHO Nomenclature Committee for factors of the HLA System. The IPD‐IMGT/HLA Database is part of the international ImMunoGeneTics project (IMGT) ( |
EMBL‐ENA, European Molecular Biology Laboratory – European Nucleotide Archive; HLA, human leukocyte antigen; NGS, next generation sequencing; WHO, World Health Organization.
Breakdown of whole‐gene sequence submissions from DKMS Life Science Lab to IPD‐IMGT/HLA release 3.27.0
| HLA locus | Whole‐gene sequence submissions | ||||
|---|---|---|---|---|---|
| Distinct | |||||
| Novel alleles | Extended sequences | Total | Confirmatory | Total | |
| A | 163 | 72 | 235 | 41 | 276 |
| B | 98 | 54 | 152 | 10 | 162 |
| C | 282 | 120 | 402 | 58 | 460 |
| DQB1 | 59 | 43 | 102 | 49 | 151 |
| DPB1 | 4 | 3 | 7 | 0 | 7 |
| Total | 606 | 292 | 898 | 158 | 1,056 |
HLA, human leukocyte antigen.
Figure 1Genomic organisation of the human leukocyte antigen (HLA) loci. The class II alleles are about 2 to 3 times the length of the class I alleles. All primers applied during this project are located outside the UTR regions
Figure 2Workflow for full‐length HLA gene characterisation showing the dual redundant sequencing strategy using the MiSeq and PacBio RS II platforms. A, MiSeq requires a fragmentation step owing to its inability to completely sequence molecule fragments longer than 600 bp; barcodes are attached during library preparation. B, Barcoding is carried out as a part of the polymerase chain reaction (PCR)
Figure 3Limitations of each sequencing method. A, Illustration of the inability of accurate phasing using short sequencing‐by‐synthesis (SBS) reads ( represents a yet unnamed novel allele). B, Inability to call homopolymer consensus sequences accurately due to a high insertion/deletion sequencing error rate with long single molecule real‐time (SMRT) reads
Proportion of null and Q alleles among the submitted sequences
| HLA locus | Submitted unique sequences | Null alleles | Q alleles | Total null and Q alleles |
|---|---|---|---|---|
| A | 235 | 14 | 4 | 18 (8%) |
| B | 152 | 5 | 0 | 5 (3%) |
| C | 402 | 14 | 6 | 20 (5%) |
| DQB1 | 102 | 0 | 2 | 2 (2%) |
| DPB1 | 7 | 0 | 0 | 0 (0%) |
| Total | 898 | 33 | 12 | 45 (5%) |
HLA, human leukocyte antigen.
Effect of the submitted dataset on the number of fully characterised alleles in the IPD‐IMGT/HLA Database release 3.27.0
| HLA locus | IPD‐IMGT/HLA Database release 3.27.0 | Submitted fully characterised alleles | |
|---|---|---|---|
| All described alleles | All fully characterised alleles | ||
| A | 3830 | 469 (12.2%) | 235 (6.1%) |
| B | 4646 | 461 (9.9%) | 152 (3.3%) |
| C | 3382 | 558 (16.5%) | 402 (11.9%) |
| DRB1 | 2010 | 41 (2.0%) | 0 (0%) |
| DQB1 | 1054 | 122 (11.6%) | 102 (9.7%) |
| DPB1 | 740 | 46 (6.2%) | 7 (0.9%) |
| Total | 15 662 | 1697 (10.8%) | 898 (5.7%) |
HLA, human leukocyte antigen.
Including 5′ and 3′UTR.
Figure 4Effects on the IPD‐IMGT/HLA Database. The number of fully characterised human leukocyte antigen (HLA) alleles including the 5′‐ and 3′‐UTR in the IPD‐IMGT/HLA Database release 3.27.0 was more than doubled after the submission of 898 unique full‐length sequences (submitted novel alleles [red], genomic extension of extant allele sequences [green]. Extant fully characterised HLA alleles in IPD‐IMGT/HLA Database release 3.27.0 [blue])
Frequency of novel alleles submitted during this project among DKMS samples
| HLA allele | Observations | Total samples | Frequency (%) |
|---|---|---|---|
|
| 82 | 0.6 | 6.4 × 10−5 |
|
| 62 | 0.6 | 4.8 × 10−5 |
|
| 11 | 1.3 | 4.1 × 10−6 |
|
| 6 | 1.4 | 2.2 × 10−6 |
|
| 4 | 1.3 | 1.5 × 10−6 |
|
| 3 | 1.3 | 1.1 × 10−6 |
|
| 3 | 1.4 | 1.1 × 10−6 |
|
| 3 | 1.4 | 1.1 × 10−6 |
|
| 3 | 2.0 | 7.4 × 10−7 |
|
| 2 | 1.4 | 7.4 × 10−7 |
|
| 1 | 0.7 | 7.4 × 10−7 |
|
| 1 | 1.3 | 3.7 × 10−7 |
|
| 1 | 1.3 | 3.7 × 10−7 |
|
| 1 | 0.7 | 7.2 × 10−7 |
|
| 1 | 2.0 | 2.5 × 10−7 |
|
| 1 | 2.0 | 2.5 × 10−7 |
|
| 1 | 1.4 | 3.7 × 10−7 |
|
| 1 | 2.0 | 2.5 × 10−7 |
|
| 1 | 0.7 | 7.2 × 10−7 |
|
| 1 | 0.7 | 7.2 × 10−7 |
|
| 1 | 2.0 | 2.5 × 10−7 |
|
| 1 | 0.7 | 7.2 × 10−7 |
|
| 1 | 2.0 | 2.5 × 10−7 |
|
| 1 | 0.7 | 7.2 × 10−7 |
|
| 1 | 1.4 | 3.7 × 10−7 |
|
| 1 | 0.7 | 7.2 × 10−7 |
|
| 1 | 1.4 | 3.7 × 10−7 |
Enquiry date: August 30, 2016.
Depending on the time of submission and incorporation into the IPD‐IMGT/HLA Database, the total number of samples differs.