| Literature DB >> 23714642 |
Kazuyoshi Hosomichi1, Timothy A Jinam, Shigeki Mitsunaga, Hirofumi Nakaoka, Ituro Inoue.
Abstract
BACKGROUND: The human leukocyte antigen (HLA) region, the 3.8-Mb segment of the human genome at 6p21, has been associated with more than 100 different diseases, mostly autoimmune diseases. Due to the complex nature of HLA genes, there are difficulties in elucidating complete HLA gene sequences especially HLA gene haplotype structures by the conventional sequencing method. We propose a novel, accurate, and cost-effective method for generating phase-defined complete sequencing of HLA genes by using indexed multiplex next generation sequencing.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23714642 PMCID: PMC3671147 DOI: 10.1186/1471-2164-14-355
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Size selection of the Nextera DNA libraries by agarose gel size selection. (A) Electropherogram of DNA library analyzed by 2100 Bioanalyzer. The library size of the Nextera DNA Sample Prep Kits was 150 bp to more than 10 kb (mean size: 902 bp). (B) Bioanalyzer electropherogram of a selected DNA library by cutting from the agarose gel. We selected large fragments with sizes ranging from 500 to 2,000 bp to remove short DNA fragments for effective HLA gene haplotype phasing. The size selection also determines an actual molar concentration for bridge PCR to generate clusters in flowcell, because DNA fragments with over 1.5 kb size are not efficiently amplified. The mean size of the selected fragments was 1,561 bp.
Figure 2Schematic workflow of the phase-defined HLA gene sequencing. (A) The individual tagging method for HLA homozygous samples. The 2 × 250 bp paired-end reads of the pooled amplicons were aligned to six HLA gene sequences from the hg19 and the consensus sequences were determined. Most of the analytical tools shown here are standard use for genome sequence alignment and variant detection. For generating consensus sequence, we used original perl scripts to list variants and to construct HLA gene sequences. (B) The gene-tagging method for HLA heterozygous samples. The 2 × 250 bp paired-end reads of the each amplicon were aligned to the corresponding gene and six genes were separately analyzed to avoid mismapping. In the alignment step, 2 × 250 bp paired-end sequence reads were aligned to reference sequence using BWA and SAMtools. SNVs were detected by UnifiedGenotyper in GATK. Paired-end reads harboring SNVs in both forward and reverse reads were extracted to construct two phased HLA gene haplotype sequences using our original perl script. Finally, two HLA gene haplotype sequences from an individual were generated with phase-defined SNVs and Indels as HLA gene haplotypes.
Alignment result of the HLA gene sequence and genotype in the HLA homozygous cell lines
| AKIBA | 219.5 | 0 | 100 | | |||
| | 655.5 | 0 | 100 | | |||
| | 74.0 | 0 | 100 | | |||
| | 346.9 | 0 | 100 | | |||
| | 33.2 | 0 | 100 | | |||
| | 604.9 | 0 | 100 | | |||
| AMAI | 35.2 | 0 | 100 | | |||
| | 159.3 | 0 | 100 | | |||
| | 52.2 | 0 | 100 | | |||
| | 92.6 | 0 | 100 | 2bp (TG) deletion | |||
| | 133.6 | 0 | 100 | | |||
| | 94.9 | 0 | 100 | different from ECACC | |||
| ARBO | 133.5 | 0 | 100 | | |||
| | 208.8 | 0 | 100 | | |||
| | 108.2 | 0 | 100 | | |||
| | 43.4 | - | 86.6 | not determined | | ||
| | 40.0 | - | 71.5 | not determined | | ||
| | 100.6 | 0 | 99.2 | not determined | | ||
| BM15 | 79.3 | 0 | 100 | | |||
| | 198.7 | 0 | 100 | | |||
| | 34.0 | 0 | 100 | | |||
| | 90.9 | 0 | 98.3 | not determined | | ||
| | 14.2 | 0 | 100 | 2 intronic and 1 nonsynonymous novel variants | |||
| | 132.0 | 0 | 100 | 24 intronic novel variants | |||
| BM16 | 28.6 | 0 | 100 | | |||
| | 249.0 | 0 | 100 | | |||
| | 183.3 | 0 | 100 | 3 intronic novel variants | |||
| | 55.4 | 0 | 100 | 43 intronic novel variants | |||
| | 36.3 | 0 | 100 | 1 intronic novel variant | |||
| | 58.3 | 0 | 100 | 4bp (GGAA) insertion | |||
| BM21 | 61.0 | 0 | 100 | | |||
| | 233.0 | 0 | 100 | | |||
| | 176.3 | 0 | 100 | | |||
| | 80.0 | 0 | 100 | | |||
| | 17.8 | 9† | 99.7 | | |||
| | 103.8 | 0 | 100 | | |||
| BM92 | 24.1 | 0 | 100 | | |||
| | 458.0 | 0 | 100 | | |||
| | 218.0 | 0 | 100 | | |||
| | 2.9 | - | 32.9 | not determined | | ||
| | 1.1 | - | 0 | not determined | | ||
| | 115.6 | 0 | 100 | | |||
| Boleth | 25.7 | 1† | 100 | | |||
| | 273.3 | 0 | 100 | | |||
| | 260.2 | 0 | 100 | | |||
| | 27.1 | 0 | 98.4 | not determined | | ||
| | 67.2 | 0 | 100 | | |||
| | 99.1 | 0 | 100 | | |||
| BSM | 56.9 | 0 | 100 | | |||
| | 238.6 | 0 | 100 | | |||
| | 204.3 | 0 | 100 | | |||
| | 18.3 | 0 | 98.4 | not determined | | ||
| | 91.1 | 0 | 100 | | |||
| | 57.6 | 0 | 100 | | |||
| Carogero | 28.2 | 0 | 100 | | |||
| | 22.8 | 0 | 100 | | |||
| | 6.1 | - | 91 | not determined | | ||
| | 16.8 | 0 | 100 | | |||
| | 27.5 | 0 | 100 | | |||
| | 18.8 | 0 | 99.7 | not determined | | ||
| COX | 99.1 | 0 | 100 | | |||
| | 322.2 | 0 | 100 | | |||
| | 123.5 | 0 | 100 | | |||
| | 284.9 | 0 | 100 | | |||
| | 367.3 | 0 | 100 | | |||
| | 270.5 | 0 | 100 | | |||
| DBB | 454.2 | 0 | 100 | | |||
| | 636.8 | 0 | 100 | | |||
| | 142.5 | 0 | 100 | | |||
| | 14.3 | - | 90.7 | not determined | | ||
| | 1.9 | - | 70.8 | not determined | | ||
| | 568.0 | 0 | 100 | | |||
| DHI | 2.2 | - | 12.9 | not determined | | ||
| | 292.9 | 0 | 100 | | |||
| | 250.1 | 0 | 100 | | |||
| | 57.2 | 0 | 100 | | |||
| | 140.1 | 0 | 100 | | |||
| | 302.2 | 0 | 100 | | |||
| DKB | 358.6 | 0 | 100 | 1 intronic novel variant | |||
| | 502.8 | 0 | 100 | | |||
| | 634.1 | 0 | 100 | | |||
| | 61.4 | - | 65 | not determined | | ||
| | 310.1 | 0 | 100 | | |||
| | 259.2 | 0 | 100 | | |||
| HARA | 122.6 | 0 | 100 | different from ECACC | |||
| | 214.3 | 0 | 100 | | |||
| | 189.1 | 0 | 100 | | |||
| | 4.5 | 0 | 97.5 | not determined | | ||
| | 124.6 | 0 | 100 | | |||
| | 52.9 | - | 97.6 | not determined | | ||
| HHK | 265.9 | 0 | 100 | | |||
| | 84.6 | 0 | 100 | | |||
| | 11.5 | 0 | 100 | | |||
| | 47.7 | 0 | 100 | | |||
| | 103.3 | 0 | 100 | | |||
| | 65.1 | 0 | 100 | | |||
| HOKKAIDO | 8.0 | 0 | 100 | | |||
| | 42.8 | 0 | 100 | | |||
| | 99.0 | 0 | 100 | | |||
| | 8.2 | - | 11.6 | not determined | | ||
| | 48.4 | - | 57.4 | not determined | | ||
| | 70.4 | 0 | 100 | | |||
| JBUSH | 10.9 | 1† | 100 | | |||
| | 128.8 | 0 | 100 | | |||
| | 298.3 | 0 | 100 | | |||
| | 23.7 | 0 | 100 | | |||
| | 80.1 | 0 | 100 | | |||
| | 86.7 | 0 | 100 | | |||
| JEST | 11.3 | 0 | 100 | | |||
| | 80.3 | 0 | 100 | | |||
| | 76.8 | 0 | 100 | 1 intronic novel variant | |||
| | 16.3 | 0 | 100 | | |||
| | 87.5 | 0 | 100 | 1 intronic novel variant | |||
| | 69.1 | 0 | 100 | | |||
| K265 | 127.1 | 0 | 100 | | |||
| | 163.7 | 0 | 100 | | |||
| | 54.3 | 0 | 100 | | |||
| | 92.7 | 0 | 100 | | |||
| | 257.5 | 0 | 100 | | |||
| | 107.8 | 0 | 100 | | |||
| LBUF | 26.9 | 0 | 100 | | |||
| | 326.3 | 0 | 100 | | |||
| | 249.1 | 0 | 100 | | |||
| | 16.6 | 0 | 100 | | |||
| | 2.9 | - | 18.9 | not determined | | ||
| | 177.9 | 0 | 100 | | |||
| LKT3 | 86.9 | 0 | 100 | | |||
| | 213.3 | 0 | 100 | | |||
| | 48.7 | 0 | 100 | | |||
| | 23.4 | - | 55.4 | not determined | | ||
| | 1.1 | - | 0 | not determined | | ||
| | 227.8 | 0 | 100 | | |||
| MADULA | 86.8 | 0 | 100 | | |||
| | 206.9 | 0 | 100 | | |||
| | 223.8 | 0 | 100 | | |||
| | 42.6 | 0 | 100 | | |||
| | 123.9 | - | 86.3 | not determined | | ||
| | 215.6 | 0 | 100 | | |||
| PITOUT | 91.5 | 0 | 100 | | |||
| | 51.8 | 0 | 100 | | |||
| | 74.6 | 0 | 99.4 | not determined | | ||
| | 3.2 | | 40.4 | not determined | | ||
| | 38.3 | 0 | 100 | different from ECACC | |||
| | 340.7 | 0 | 100 | | |||
| RMAL | 75.8 | 0 | 100 | | |||
| | 68.6 | 0 | 100 | | |||
| | 21.9 | 0 | 100 | | |||
| | 63.5 | 0 | 100 | 3 intronic novel variants | |||
| | 16.2 | 0 | 100 | | |||
| | 66.8 | 0 | 100 | | |||
| SAVC | 99.4 | 0 | 100 | | |||
| | 140.3 | 0 | 100 | | |||
| | 45.3 | 0 | 100 | | |||
| | 2.4 | - | 50.3 | not determined | | ||
| | 1.2 | - | 12.9 | not determined | | ||
| | 129.2 | 0 | 100 | | |||
| SRACH | 14.5 | - | 1.5 | not determined | | ||
| | 388.3 | 0 | 100 | | |||
| | 688.5 | 0 | 100 | | |||
| | 61.5 | 0 | 100 | | |||
| | 174.0 | - | 91 | not determined | | ||
| | 207.0 | 0 | 100 | | |||
| T182 | 12.4 | 0 | 100 | | |||
| | 113.5 | 0 | 100 | | |||
| | 106.0 | 0 | 100 | | |||
| | 4.9 | - | 0 | not determined | | ||
| | 72.2 | 0 | 100 | | |||
| | 69.2 | 0 | 98.3 | not determined | | ||
| TAB089 | 13.3 | 0 | 98.3 | not determined | | ||
| | 101.9 | 0 | 100 | | |||
| | 36.9 | 0 | 100 | | |||
| | 2.7 | - | 91.1 | not determined | | ||
| | 49.1 | 1† | 100 | | |||
| | 39.2 | 0 | 100 | 9 intronic novel variants | |||
| TOK | 286.2 | 0 | 100 | | |||
| | 241.5 | 0 | 100 | | |||
| | 127.9 | 0 | 100 | | |||
| | 163.7 | 0 | 100 | | |||
| | 243.1 | 0 | 100 | | |||
| | 175.2 | 0 | 100 | | |||
| VAVY | 36.1 | 0 | 100 | | |||
| | 141.6 | 0 | 100 | | |||
| | 535.1 | 0 | 100 | | |||
| | 43.4 | 0 | 100 | | |||
| | 126.4 | 0 | 100 | | |||
| | 57.0 | 0 | 100 | | |||
| WT100 | 66.4 | 0 | 100 | | |||
| | 112.9 | 0 | 100 | | |||
| | 15.9 | 0 | 100 | | |||
| | 01:01 | 143.5 | 0 | 100 | | ||
| | 136.4 | 0 | 100 | 1 intronic novel variant | |||
| | 122.0 | 0 | 100 | | |||
| WT47 | 56.5 | 0 | 100 | | |||
| | 192.5 | 0 | 100 | | |||
| | 47.8 | 0 | 100 | | |||
| | 116.1 | 0 | 100 | 4 intronic novel variants | |||
| | 807.8 | 0 | 100 | | |||
| 172.2 | 0 | 100 |
* Percentage of the HLA gene covered by reads that passed the Q20 threshold for nucleotide quality.
† Confirmed by sanger sequence.
Sequence quality and closest HLA allele designation for heterozygous samples
| E1 | 4306.6 | 97.1/100 | |||
| | 6668.5 | 100/99.6 | |||
| | 5250.9 | 100/100 | |||
| | 272.2 | 99.7/95.6 | |||
| | 3322.3 | 100/99.5 | |||
| | 780.8 | 95.5/100 | |||
| E8 | 4303.7 | 100† | |||
| | 3904.7 | 100/100 | |||
| | 2829.4 | 99.7/100 | |||
| | 612.3 | 100/100 | |||
| | 3132.6 | 100/100 | |||
| | 588.7 | 100/100 | |||
| E11 | 3880.2 | 100/100 | |||
| | 3695.0 | 100/100 | |||
| | 4490.7 | 100/100 | |||
| | 501.8 | 78.3/99.2 | |||
| | 3751.8 | 100/100 | |||
| | 748.2 | 100/100 | |||
| E17 | 808.4 | 100/100 | |||
| | 1708.7 | 100/100 | |||
| | 2507.7 | 100/100 | |||
| | 2080.3 | 100/100 | |||
| | 1638.1 | 100† | |||
| | 1649.6 | 97.9/98.4 | |||
| E25 | 1042.7 | 100/100 | |||
| | 2150.0 | 100/100 | |||
| | 4871.7 | 100/100 | |||
| | 460.3 | 100/32.8 | |||
| | 1396.3 | 100/100 | |||
| | 2289.0 | 99.6/100 | |||
| E28 | 1645.3 | 97.2/100 | |||
| | 824.8 | 99.7† | |||
| | 2799.6 | 100/100 | |||
| | 236.5 | 100† | |||
| | 817.5 | 100† | |||
| | 343.0 | 100† | |||
| E30 | 1711.4 | 100/100 | |||
| | 2670.8 | 100/100 | |||
| | 1899.1 | 100/100 | |||
| | 399.4 | 17.1/99.8 | |||
| | 773.1 | 100/97.8 | |||
| | 460.4 | 100/100 | |||
| M13 | 3689.7 | 97.2/100 | |||
| | 3386.3 | 100/100 | |||
| | 3508.5 | 100/100 | |||
| | 1029.3 | 100/99.5 | |||
| | 577.6 | 100/100 | |||
| | 1981.8 | 100/100 | |||
| M14 | 1690.3 | 100/100 | |||
| | 4353.2 | 100/100 | |||
| | 4733.5 | 100/100 | |||
| | 446.2 | 100/99.5 | |||
| | 763.0 | 98.8/98.9 | |||
| | 1209.5 | 99.8/100 | |||
| M15 | 1153.2 | 100/100 | |||
| | 1448.1 | 99.7/99.8 | |||
| | 3312.9 | 100/100 | |||
| | 463.4 | 100† | |||
| | 280.9 | 100† | |||
| | 2932.9 | 99.5/100 | |||
| M20 | 7631.1 | 100/100 | |||
| | 7675.7 | 100/100 | |||
| | 7566.7 | 100/100 | |||
| | 4746.2 | 100† | |||
| | 6952.9 | 100/100 | |||
| 4631.3 | 100/99.6 |
¶ Percentage of the HLA gene covered by reads that passed the Q20 threshold for nucleotide quality.
*Mismatched bases with recorded sequences in IMGT/HLA database observed.
† Homozygous HLA allele.
Figure 3The HLA alleles and HLA haplotypes in two trio (A and C) one quartet (B) families. Each individual in child-parents families was sequenced as described. Each HLA gene call was consistent with the hereditary pattern. HLA allele was inferred by the IMG/HLA database and shared between parents and child(ren) with consistent pattern and without recombination.