| Literature DB >> 29352165 |
Takahiro Mimori1,2, Jun Yasuda3,4, Yoko Kuroki1,5, Tomoko F Shibata1,2, Fumiki Katsuoka1,2, Sakae Saito1,2, Naoki Nariai6, Akira Ono1, Naomi Nakai-Inagaki7, Kazuharu Misawa1,2, Keiko Tateno1, Yosuke Kawai1,2,8,9, Nobuo Fuse1,2,10, Atsushi Hozawa1,2, Shinichi Kuriyama1,2,11, Junichi Sugawara1,2,10, Naoko Minegishi1,2, Kichiya Suzuki1,2,10, Kengo Kinoshita1,8, Masao Nagasaki12,13,14, Masayuki Yamamoto1,2.
Abstract
Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29352165 PMCID: PMC6462828 DOI: 10.1038/s41397-017-0010-4
Source DB: PubMed Journal: Pharmacogenomics J ISSN: 1470-269X Impact factor: 3.550
Fig. 1ToMMo HLA panel construction with PSARP. a Overview of ToMMo HLA panel construction. The workflow of constructing the ToMMo HLA panel by primer separation assembly and refinement pipeline (PSARP). PSARP consists of three subparts: assembly, refinement, and filtering. The details of each part are described in the Materials and Methods. b Illustration of refinement in PSARP. The left panel of the figure shows a multiple sequence alignment (MSA) of HLA-A gene in the Draft HLA panel, which is used for variant identification. In the middle panel, the variant validation process is illustrated, in which Sample 1 has heterozygous draft alleles for HLA-A (A-1 and A-3) and Sample 2 has homozygous alleles (A-2). For a variant in sample 1, “AGTT” in A-1 is supported by WGS reads, whereas “GC--” in A-3 is not supported, as WGS read was “GCT-”. For Sample 2, the variant for A-2 allele is supported. After the correction process, A-3 is merged into A-2. The right panel shows the Refined HLA panel derived from the correction process
Fig. 2Overview of PCR products of designed primers. Locations of 5′-end and 3′-end primer sequences for HLA-A, HLA-B, and HLA-C genes are shown in hg19 coordinated with the target and co-amplified PCR products. Primer names “A–F” and “A–R” stand respectively for forward and reverse primers for HLA-A gene. For each of “A–F(1)”, “C–F(1)”, and “B–R(1)”, the edit distance of the primer sequence from the corresponding sequence in hg19 reference is shown in parenthesis after the primer name
Summary of sequence extensions in the ToMMo HLA panel
| HLA type | ||||||
|---|---|---|---|---|---|---|
| A | B | C | H | Total | ||
| Number of alleles | 26 | 67 | 37 | 9 | 139 | |
| Mean length | 5171 (3470) | 4098 (3239) | 4465 (3350) | 5155 (3457) | – | |
| Upstream region | Mean length | 833 (284) | 338 (250) | 546 (282) | 820 (299) | – |
| Novel variants | 27 | 2 | 9 | 9 | 47 | |
| Downstream region | Mean length | 1425 (273) | 1077 (307) | 1022 (171) | 1437 (261) | – |
| Novel variants | 46 | 37 | 35 | 27 | 145 | |
HLA human leukocyte antigen, ToMMo Tohoku Medical Megabank Organization
Novel alleles in ToMMo HLA panel compared with the closest subtypes in IPD-IMGT/HLA database
| HLA type | |||||
|---|---|---|---|---|---|
| A | B | C | H | Total | |
| Novel up to 8-digit | 8 | 21 | 5 | 6 | 40 |
| Add intron sequences | 1 | 3 | 1 | 3 | 8 |
| Novel up to 6-digit | 2 | 4 | 1 | 5 | 12 |
| Novel up to 4-digit | 2 | 2 | 1 | 5 | 10 |
IPD immuno polymorphism database, IMGT international ImMunoGeneTics information system, HLA human leukocyte antigen, ToMMo Tohoku Medical Megabank Organization
Variants in coding region of novel alleles
| ToMMo HLA | Freq. | IMGT/HLA | Exon | Pos. | Var. | AA pos. | AA alt. |
|---|---|---|---|---|---|---|---|
| A_00012 | 1 | A*11:01:01 | 2 | 53 | A > G | 43 | K > E |
| A*11:77 | 4 | 256 | A > G | 292 | K > E | ||
| A_00021 | 1 | A*26:03:01 | 3 | 73 | A > G | 139 | Q > R |
| B_00021 | 2 | B*39:02:01 | 2 | 173 | A > G | Synonymous | |
| B*39:02:02 | 5 | 113 | T > C | Synonymous | |||
| B_00028 | 1 | B*40:01:02 | 3 | 254 | G > C | Synonymous | |
| B_00053 | 1 | B*54:01:01 | 2 | 69 | G > T | 48 | A > S |
| B_00063 | 2 | B*56:04 | 3 | 20 | G > C | 121 | R > S |
| 3 | 76 | TA > AC | 140 | L > Y | |||
| 3 | 120 | A > C | 155 | S > R | |||
| 3 | 134 | G > C | Synonymous | ||||
| 3 | 216 | CT > AC | 187 | L > T | |||
| C_00014 | 1 | C*03:04:01 | 3 | 63 | G > A | 136 | G > R |
IMGT international ImMunoGeneTics information system, HLA human leukocyte antigen, ToMMo Tohoku Medical Megabank Organization, Freq. frequency, Pos. position, Var. variant, AA amino acid, alt. alternative
Fig. 3Allele distribution of 208 samples in the ToMMo HLA panel. The alleles in the ToMMo HLA panel are shown for each HLA gene, in which each row corresponds to a unique allele and its width is proportional to the allele frequency within the 208 samples. Four inner columns of each row correspond to the 4-digit, 6-digit, and 8-digit IMGT/HLA names of the allele and the allele itself from left to right. A rectangle filled with color indicates a novel sequence identified in the panel, except that a gray color fills untyped alleles. The rightmost column is fully filled with colors since every allele in the panel has novel external sequence that is not found in the database
Fig. 4Coverage of 1KJPN allele distribution within the ToMMo HLA panel. A distribution of 1KJPN alleles that were covered within the ToMMo HLA panel for each combination of HLA-A, HLA-B, and HLA-C genes at 4-digit, 6-digit, and 8-digit resolutions is shown. The figures on top of the stacked bars are overall fractions of 1KJPN alleles that were covered within the panel. Allele names are shown for those with frequencies >2% in the 1KJPN population