| Literature DB >> 30337930 |
Shingo Suzuki1, Swati Ranade2, Ken Osaki3, Sayaka Ito1, Atsuko Shigenari1, Yuko Ohnuki1, Akira Oka4, Anri Masuya5, John Harting2, Primo Baybayan2, Miwako Kitazume3, Junichi Sunaga3, Satoko Morishima6, Yasuo Morishima7, Hidetoshi Inoko5, Jerzy K Kulski1,8, Takashi Shiina1.
Abstract
Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.Entities:
Keywords: HLA; Ion PGM; NGS; PacBio RS II; SMRT sequencing; genotyping; human leukocyte antigen; next-generation sequencing
Mesh:
Year: 2018 PMID: 30337930 PMCID: PMC6180199 DOI: 10.3389/fimmu.2018.02294
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1DNA based HLA-genotyping scheme for sequencing platforms representing the first-generation, second-generation and third-generation technologies. The phase ambiguities of the SNV G/A at loci SNV1, SNV2, and SNV3 are easier to resolve using 2nd and/or 3rd generation sequencing than 1st generation Sanger sequencing.
Characterization of 253 distinct full-length HLA alleles.
| A | 20 | 92 | 0 | 1 | 0 | 1.1% | 2 | 3.3% |
| B | 46 | 92 | 2 | 5 | 7 | 15.2% | 8 | 23.9% |
| C | 26 | 92 | 0 | 2 | 9 | 12.0% | 29 | 43.5% |
| DRB1 | 41 | 92 | 0 | 45 | 0 | 48.9% | 37 | 89.1% |
| DQA1 | 36 | 92 | 0 | 28 | 1 | 31.5% | 21 | 54.3% |
| DQB1 | 28 | 92 | 0 | 32 | 5 | 40.2% | 5 | 45.7% |
| DPA1 | 16 | 92 | 3 | 10 | 1 | 15.2% | 69 | 90.2% |
| DPB1 | 40 | 92 | 0 | 30 | 0 | 32.6% | 22 | 56.5% |
| Total | 253 | 736 | 5 | 153 | 23 | 24.6% | 193 | 50.8% |
Identification and classification of novel variants in exonic regions.
| Locus | HLA-B | HLA-B | HLA-DPA1 | HLA-DPA1 | ||
| Novel allele name | B*15:428 | B*39:02:03 | DPA1*02:07:01:01 | DPA1*02:08 | ||
| Reference | B*15:28 | B*39:02:02 | DPA1*02:02:01 | DPA1*02:01:01:02 | ||
| IMGT num. of reference | HLA00191 | HLA00275 | HLA00508 | HLA14197 | ||
| Reference position | 25 | 1,008 | 251 | 361 | 442 | 4893 |
| Location | Exon 1 | Exon 5 | Exon 3 | Exon 3 | Exon 3 | Exon 4 |
| Nucleotide Reference | G | C | G | A | A | G |
| Nucleotide Variant | C | T | A | G | G | A |
| Amino acid substitution | V9L | - | V122M | - | - | R249H |
A hyphen indicates synonymous substitution.
The number of indels and SNVs among the alleles at each HLA locus.
| A | 20 | 5,436 | 13 | 5,396 | 336 | 3.1 |
| B | 46 | 4,575 | 11 | 4,541 | 285 | 1.4 |
| C | 26 | 4,760 | 12 | 4,733 | 246 | 2.0 |
| DRB1 | 41 | 18,740 | 144 | 10,262 | 1,727 | 2.2 |
| DQA1 | 36 | 7,801 | 77 | 7,240 | 957 | 3.4 |
| DQB1 | 28 | 8,393 | 107 | 7,551 | 1,247 | 5.3 |
| DPA1 | 16 | 9,766 | 28 | 9,662 | 348 | 2.2 |
| DPB1 | 40 | 12,306 | 22 | 12,147 | 343 | 0.7 |
Figure 2Gene structure, DR supratypes and phylogenetic relationships using the HLA-DRB1 allele sequences. Three of the DR supratypes are labeled DR53, DR51, and DR52, and their component alleles are listed underneath the horizontal lines. The phylogenetic tree of the HLA-DRB1 alleles ranging from DRB1*07 to the left of the figure and DRB1*03 to the right of the figure was constructed using the Neighbor-Joining method and a 1,962 bp nucleotide alignment that included 261 bp of AluJb, 492 bp of L2a, 281 bp of AluSc8, 655 bp of L2a, and 65 bp of MER53 in intron 1, and 145 bp of MIR in intron 2 and 63 bp of MIR in intron 5. Numbers on the branches are bootstrap support values. Red letters indicate Alu and LINE sequences that may have been inserted in the comparatively recent period during the last 10 million years. Gray bars indicate the exonic regions and the white open regions with the retroelement lists between the gray bars represent introns 1 to 5 in between exons 1 and 6.
Figure 3Nucleotide diversity and phylogenetic analyses using the HLA-DPB1 allele sequences. (A) Nucleotide diversity profiles were constructed using three nucleotide alignments, 12,306 bp (whole gene region), 5,551 bp (segment 1), and 6,755 bp (segment 2). The nucleotide lengths (bp) are shown with indels (no parenthesis) and without indels (in parenthesis). The A1, A2, and A3 matrix windows show the diversity profiles using 38 HLA-DPB1 alleles (18 DP2 group alleles and 20 DP5 group alleles), except for two recombinant alleles, DPB1*17:01:01:01 and DPB1*19:01:01:01. The red peak and valley profiles within the matrix windows indicate the changes in SNV/Kb across the aligned sequences and the black bars indicate indel numbers among the alleles. The average SNV/Kb for the sequence alignment is shown on the top line of each matrix window. (B) Phylogenetic trees using the 19 representative HLA-DPB1 alleles were constructed by the Neighbor-Joining method. The three phylogenetic trees (B1–B3) represent the 12,147 bp nucleotide alignment of HLA-DPB1 whole gene region (B1), the 5,542 bp nucleotide alignment of the enhancer-promoter region to exon 2 (B2), and the 6,605 bp nucleotide alignment of the intron 2–3′UTR region (B3). Red letters and backgrounds indicate the DP2 group (rs9277534: A) and blue letters and backgrounds indicate the DP5 group (rs9277534: G) alleles. Numbers on the branches are bootstrap support values.
Comparison of rheumatoid arthritis (RA)-susceptible, resistant and non-association alleles in the DRB1*04 group sequences.
| DRB1*04:05 | 1.4 × | 3.3 | DRB1*04:05:01:01 | L | L | E | Q | R | R | A | (+) | (+) | |||
| DRB1*04:05:01:02 | (+) | (+) | |||||||||||||
| DRB1*04:05:01:03 | (+) | (+) | |||||||||||||
| DRB1*04:01 | 4.3 × | 2.8 | DRB1*04:01:01:03 | L | L | E | Q | K | R | A | (+) | (+) | |||
| DRB1*04:10 | 0.01 | 1.8 | DRB1*04:10:03 | L | L | E | Q | R | R | A | (+) | (+) | |||
| DRB1*04:06 | 0.0005 | 0.5 | DRB1*04:06:01 | L | L | E | Q | R | R | A | (+) | (+) | |||
| DRB1*04:03 | 0.0012 | 0.5 | DRB1*04:03:01:02 | L | L | E | Q | R | R | A | (+) | (+) | |||
| DRB1*04:07 | 0.0001 | 0.1 | DRB1*04:07:01:02 | L | L | E | Q | R | R | A | (+) | (+) | |||
| Other alleles | LIF | L | E | QRD | RKEA | R | AG | AEQLR | – | – | – | ||||
RA-susceptible (red) and resistant (blue) alleles are shown in this table. Amino acid and nucleotide positions are bases on the sequence alignment tool of the IPD-IMGT/HLA database (.