Literature DB >> 26311539

The impact of next-generation sequencing technologies on HLA research.

Kazuyoshi Hosomichi1, Takashi Shiina2, Atsushi Tajima1, Ituro Inoue3.   

Abstract

In the past decade, the development of next-generation sequencing (NGS) has paved the way for whole-genome analysis in individuals. Research on the human leukocyte antigen (HLA), an extensively studied molecule involved in immunity, has benefitted from NGS technologies. The HLA region, a 3.6-Mb segment of the human genome at 6p21, has been associated with more than 100 different diseases, primarily autoimmune diseases. Recently, the HLA region has received much attention because severe adverse effects of various drugs are associated with particular HLA alleles. Owing to the complex nature of the HLA genes, classical direct sequencing methods cannot comprehensively elucidate the genomic makeup of HLA genes. Thus far, several high-throughput HLA-typing methods using NGS have been developed. In HLA research, NGS facilitates complete HLA sequencing and is expected to improve our understanding of the mechanisms through which HLA genes are modulated, including transcription, regulation of gene expression and epigenetics. Most importantly, NGS may also permit the analysis of HLA-omics. In this review, we summarize the impact of NGS on HLA research, with a focus on the potential for clinical applications.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26311539      PMCID: PMC4660052          DOI: 10.1038/jhg.2015.102

Source DB:  PubMed          Journal:  J Hum Genet        ISSN: 1434-5161            Impact factor:   3.172


Introduction

The sequencing of the entire human genome in 2007, after a 4-year process, provided important insights into our complete genomic makeup.[1] Subsequently, the genomic sequences of J. Watson, African, Chinese, Korean and Japanese genomes were reported.[2, 3, 4, 5, 6] Analyses of the personal genomes of individuals have provided information on human genetic variation and complexity. Additionally, rapid progress in next-generation sequencing (NGS) technology has led to revolutionary changes in medical genomics, supplying massive sequencing data for human samples. Indeed, the 1000 Genome Project has already reported novel variants, both rare and common, from population-scale sequencing.[7] Various study designs have been applied to NGS, including DNA target resequencing, RNA sequencing for transcriptome analysis, chromatin immunoprecipitation sequencing, bisulfite sequencing for methylome analysis and others. The Encyclopedia of DNA Elements (ENCODE) project has examined the role of 99% of non-protein-coding DNA,[8] revealing substantial interactions between proteins and DNA and the transcription of functional elements other than mRNA encoding proteins. Moreover, various types of NGS technologies have been developed, including smaller-scale benchtop and long-read NGS systems. Benchtop NGS systems, such as GS Jr, ionPGM and MiSeq, allow researchers to make fine adjustments for various smaller-scale studies. For example, some panels of focused target genes, such as genes related to cancer and inherited diseases, are now available for sequencing. Human leukocyte antigen (HLA) genes have a long research history as important targets in biomedical science and treatment. The HLA region on chromosome 6p21 comprises six classical HLA genes and at least 132 protein-coding genes. This region has important roles in regulation of the immune system as well as fundamental molecular and cellular processes.[9] The sequencing of a continuous 3.6-Mb HLA genomic region with annotation of 224 genes was reported by the MHC Sequencing Consortium in 1999.[10] In addition, the MHC Haplotype Project was carried out between 2000 and 2006 by the Sanger Institute, providing genomic sequences and gene annotations of eight different HLA-homozygous haplotypes to build a framework and resource for association studies of all HLA-linked diseases; these haplotypes were registered as UCSC hg19 or NCBI GRCh37 reference assemblies.[11, 12, 13] This small segment of 3.6 Mb occupies only 0.13% of the human genome but is associated with more than 100 diseases, mostly autoimmune diseases such as diabetes, rheumatoid arthritis, psoriasis and asthma. Furthermore, specific alleles of the HLA genes are strongly associated with hypersensitivities to specific drugs. For example, strong associations between carbamazepine-induced Stevens-Johnson syndrome or toxic epidermal necrolysis and HLA-B*15:02,[14, 15] abacavir-induced liver injury and HLA-B*57:01,[16, 17, 18, 19] and allopurinol-induced Stevens-Johnson syndrome or toxic epidermal necrolysis and HLA-B*58:01[20] have been reported in various populations. For a better understanding of the disease causality and adverse effects of drugs, the haplotype structure of the HLA region should be extensively and unambiguously determined. Therefore, a specific analytical procedure should be developed for completion of HLA sequencing and haplotype determination. NGS technologies have potential advantages over the Sanger method in the sequencing of HLA genes, that is, sequences of haplotype structure can be obtained at high throughput. To date, several high-throughput HLA-typing methods using NGS have been developed.[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] Importantly, HLA typing using NGS provides both high-throughput and high-resolution capabilities (Figure 1). Additionally, as reported by the ENCODE Project, HLA gene sequencing alone is not sufficient for developing a complete understanding of the genetic makeup of the HLA locus. The expression levels of HLA genes can have crucial roles in the pathogenesis of diseases; thus, detection of regulatory single-nucleotide variants (SNVs) and insertions and deletions (Indels) located outside of exons is necessary. If phase-defined complete sequencing of HLA genes, including functional regulatory regions, is performed, novel alleles associated with disease risks and adverse effects of drugs could be obtained, and the expression levels of genes that affect biological processes could be clarified.
Figure 1

HLA typing to provide sequencing data for the HLA gene(s) and regions. The HLA sequencing data of NGS could be analyzed from various points of view. The minimum scope of polymorphisms is the genotype of an SNV, and the maximum scope is the HLA haplotype sequence as a set of alleles from each HLA gene. The phase-determined sequence of the HLA allele can be applied for HLA typing as a reference. The resolution of HLA typing is classified into the following four categories: two-digit for alleles, four-digit for specific HLA proteins, six-digit for specific HLA coding sequence (CDS) and eight-digit as specific HLA genome sequences including untranslated regions and introns.

PCR-based HLA sequencing using NGS

PCR-based methods, involving an amplicon-sequence step and a sequence capture step, are commonly used for library preparation. Most of the NGS-based HLA-typing methods have been developed using such techniques. In 2009, two HLA-typing methods using a Roche GS FLX system were reported (Table 1).[21, 22] The first NGS-based HLA-typing method focused only on key exons, which have commonly been analyzed using sequence-specific oligonucleotide-primed PCR (PCR-SSO) with fluorescent beads and sequencing-based typing (PCR-sequencing-based typing) using direct sequencing. Additionally, various PCR designs, such as long PCR and reverse transcription-PCR, have been applied for NGS HLA typing.[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] These PCR-based HLA-typing methods are primarily different based on primer design and the type of sequencer (Figure 2a). In particular, the long PCR method enables sequencing of the entire HLA gene, including the intron, untranslated region, and upstream and downstream regions, thus realizing high-resolution and high-throughput HLA typing. Importantly, HLA typing should be carried out by determining complete HLA gene sequences based on the physical determination of DNA sequences, but not HLA-type imputation or estimation based on the IMGT/HLA database. Indeed, the phase-defined sequencing method includes an HLA-typing method as a part of the pipeline for determination of complete HLA gene sequences.[33] Moreover, some studies have shown that PCR dropout or allelic imbalance may occur during the PCR step; these issues are unpredictable and tedious to resolve. Several companies have recently released NGS HLA-typing kits based on long PCR products for library preparation; these kits include Illumina TruSight HLA, One Lambda NXType, GenDX NGS-go AmpX and Omixon Holotype HLA. Using these kits, 11 (HLA-A, -C, -B, -DRB3, -DRB4, -DRB5, -DRB1, -DQA1, -DQB1, -DPA1 and -DPB1), 8 (HLA-A, -C, -B, -DRB1, -DQA1, -DQB1, -DPA1 and -DPB1), 5 (HLA-A, -C, -B, -DRB1 and -DQB1) and 5 (HLA-A, -C, -B, -DRB1 and -DQB1) genes have been amplified, respectively.
Table 1

PCR-based HLA typing using NGS

YearAuthorJournalLocusPCR methodTarget regionSequencerData analysis
2009Gabriel C et al.[21]Human ImmunologyHLA-A, -BPCR for each exonExons 1, 2, 3 and 4GS FLX (Roche)AVA (Roche) Assign SBT(Conexio Genomics)
2009Bentley G et al.[22]Tissue AntigensHLA-A, -C, -B, -DRB1, -DQA1, -DQB1, DPB1PCR for each exonExons 2, 3 and 4 of A, B and C; exon 2 of DRB1, DPB1, DQA1; exons 2 and 3 of DQB1GS FLX (Roche)HLA typing software (Conexio Genomics)
2010Lind C et al.[23]Human ImmunologyHLA-A, -C, -B, -DRB1, -DQB1Long PCREntire gene of A, B and C exons 2–3 of DRB1 and DQB1GS FLX (Roche)Assign MPS software (Conexio Genomics)
2010Lank SM et al.[24]Human ImmunologyHLA-A, -C, -BRT-PCRExons 2, 3 and 4 of A, B and CGS FLX (Roche)BLAT
2011Erlich et al.[25]BMC GenomicsHLA-A, -C, -BPCRExons 2, 3 and 4 of A, B and CGS FLX (Roche)GATK
2011Holcomb CL et al.[26]Tissue AntigensHLA-A, -C, -B, -DPB1, -DQA1, -DQB1, -DRB1, -DRB3/4/5PCR for each exonExons 2, 3 and 4 of A, B and C; exon 2 of DRB1, DRB3/4/5, DPB1, DQA1; exons 2 and 3 of DQB1GS FLX (Roche)Assign ATF (Conexio Genomics)
2012Wang C et al.[27]Proc Natl Acad Sci U S A.HLA-A, -C, -B, -DRB1Long PCRExons 1 -7 of A, B and C exons 2 -5 of DRB1GAIIx (Illumina) HiSeq2000 (Illumina) MiSeq (Illumina)BLASTN
2012Shiina T et al.[28]Tissue AntigensHLA-A, -C, -B, -DRB1, -DQA1, -DQB1, -DPA1, -DPB1Long PCREntire gene (2 amplicons for DRB1and DPB1)GS Junior (Roche) ionPGM (Thermo)BLAT Sequencher (GeneCodes)
2012Lank SM et al.[29]BMC GenomicsHLA class I, all HLA class II lociRT-PCRExons 1 -7 of HLA class I (two amplicons) exons 1 -4 of HLA class IIGS Junior (Roche)BLAT
2013Moonsamy PV et al.[30]Tissue AntigensHLA-A, -C, -B, -DRB1, -DRB3/4/5PCRExons 2 and 3 of A, B and C exon 2 of DRB1, DRB3/4/5, DQB1GS FLX (Roche)Assign ATF 454 (Conexio Genomics)
2013Ringquist S et al.[31]PLoS OneHLA-DRB1PCRExon 2GS FLX (Roche)CAPSeq (Original)
2013Danzer M et al.[32]BMC GenomicsHLA-A, -C, -B, -DRB1, -DRB3/4/5,-DQB1, -DPB1PCR for each exonExons 2, 3 and 4 of A exons 1, 2, 3 and 4 of B; exons 1, 2, 3, 4 and 7 of C; exon 2 and 3 of DRB1, DRB3/4/5, DQB1; exon 2 of DPB1GS Junior (Roche)Assign ATF (Conexio Genomics)
2013Hosomichi K et al.[33]BMC GenomicsHLA-A, -C, -B, -DRB1,-DQB, -DPB1Long PCREntire geneMiSeq (Illumina)Phase-defined sequencing (Original)
2013Trachtenberg EA et al.[34]Methods Mol BiolHLA-A, -C, -B, -DRB1, -DRB3/4/5,- DQA1, -DQB1, -DPB1PCR for each exonExons 2, 3 and 4 of A, B and C; exon 2 of DRB1, DRB3/4/5, DPB1, DQA1; exons 2 and 3 of DQB1GS FLX (Roche)Assign ATF (Conexio Genomics)
2014Ozaki Y et al.[35]Tissue AntigensHLA- DRB1, -DRB3/4/5Long PCRExons 2–6 of DRB1, DRB3/4/5GS Junior (Roche)BLAT Sequencher (GeneCodes)
2014Hajeer AH et al.[36]Tissue AntigensHLA-A, -C, -B, -DRB1,-DQB1PCRExons 2 and 3 of A, B and C exon 2 of DRB1 exons 2 and 3 of DQB1GS FLX (Roche) GS Junior (Roche)Assign ATF 454 (Conexio Genomics)
2014Hosomichi K et al.[37]BMC GenomicsHLA-BLong PCRentire geneMiSeq (Illumina)Phase-defined sequencing (Original)
2014Smith AG et al.[38]Human ImmunologyHLA-DRB1, -DRB3/4/5, -DQA1, -DQB1, -DPB, -DPA1PCR for each exonExons 2 and 3 of DQA1, DQB1, DRB1, DRB3/4/5 exon 2 of DPA1 and DPB1MiSeq (Illumina)GeMS (Scisco Genetics)
2014Ehrenberg PK et al.[39]BMC GenomicsHLA-A, -C, -B, -DRB1Long PCREntire geneMiSeq (Illumina)Omixon target (Omixon)
2014Zhou M et al.[40]Tissue AntigensHLA-A, -C, -B, -DRB1, -DQB1PCRExons 1–7 of A, B and C (4 amplicons); exon 2 of DRB1; exons 2 and 3 of DQB1HiSeq2000 (Illumina)BGI computing procedure (Original)
2015Lan JH et al.[41]Human ImmunologyHLA-A, -C, -B, - DRB1, -DQB1Long PCREntire gene (2 amplicons for DRB1)MiSeq (Illumina)NGSengine (Gen Dx)
2015Ozaki Y et al.[42]BMC GenomicsHLA-A, -C, -B, -DRB1, -DRB3/4/5,- DQA1, -DQB1, -DPB1Long PCREntire genes of A, B and C exons 2–4 of DRB1, DRB3/4/5, DQA1, DQB1 exons 2–6 of DPB1ionPGM (Thermo)SeaBass (Original, In-house)
Figure 2

Preparation of HLA gene fragments for the DNA library. DNA fragments of the HLA genes are prepared by PCR-based (a) or hybridization-based (b) methods. (a) Many publications describing PCR-based methods have used different PCR designs such as short PCR for target exons (blue bar) or long PCR for entire genes (red bar). After PCR amplification, each of the pooled PCR products is applied for library preparation with/without fragmentation to add adapters with/without indexes for each sequencer. In the PCR-based method, the first step is PCR for HLA genes and the second step is library preparation. (b) The sequence capture method based on hybridization is also commonly used to enrich HLA gene fragments. DNA/RNA probes with the HLA gene sequence are hybridized to the DNA library, which includes the HLA gene sequence. The biotinylated probes-bound DNA libraries are collected using a magnet and streptavidin magnetic beads. In the sequence capture method, the first step is library preparation and the second step is enrichment for HLA genes. (c) After sequencing, HLA gene sequences of each individual are reconstructed by alignment to reference HLA gene sequences. The consensus sequences constructed by the aligned reads are searched for specific HLA alleles in the IMGT/HLA database. In the NGS-based HLA-typing method, the basic data analysis approach is similar between PCR-based and sequence capture methods.

The capture method for HLA sequencing

Target resequencing of the HLA genes using the sequence capture method has not been well developed compared with PCR-based HLA typing. The sequence capture method is based on hybridization between DNA of an adapter-ligated library and a biotinylated DNA/RNA probe designed based on target sequences of genes or the genomic region (Figure 2b).[43] Hybridized DNA fragments are enriched for the target region using streptavidin magnetic beads. Wittig and colleagues[44] reported the first automated HLA-typing method based on the sequence capture technology. This method uses targeted capturing of the classical class I (HLA-A, HLA-C and HLA-B) and class II (HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1 and HLA-DPB1) HLA genes. The DNA fragments from these eight HLA genes can be simultaneously enriched by a hybridization reaction in a single tube, without allele dropout, which is frequently observed in PCR-based methods. The results showed high accuracy for allele call (99%) and identified errors in the IMGT/HLA reference database. It is also notable that the sequence capture method is generally applicable for NGS-based target resequencing of larger genomic regions and a larger number of genes than the PCR-based methods. On the basis of these features and the findings from the automated NGS-typing method, the sequence capture method has major advantages over PCR-based methods and is a promising method for HLA sequencing.

Sequencers for HLA gene sequencing

Various sequencing machines have been developed for NGS HLA typing. The majority of published methods have been established using Roche sequencers. However, for the last few years, the Illumina MiSeq instrument has also been used for HLA typing (Table 1). The types of NGSs used for HLA typing may often change along with improvements in NGS technologies. The Pacific Biosciences PacBio RS II sequencer, which is capable of generating enormously long reads in a single molecule using real-time sequencing, was recently developed for HLA typing. Single molecule using real-time sequencing is highly effective in generating accurate, phased sequences of full-length alleles of HLA genes. Complete phasing of the HLA genes from single molecule using real-time sequencing may resolve phase ambiguity, which is a fundamental problem of conventional HLA-typing methods.

HLA typing from WGS and WES data

During the past few years, whole exome sequencing (WES) has identified the causalities of a large number of Mendelian diseases by analyzing familial samples and/or sporadic patients.[45] In addition, WES has facilitated the acquisition of massive amounts of data in various genome sequencing projects, such as the 1000 Genomes Project,[7] NHLBI GO Exome Sequencing Project (https://esp.gs.washington.edu/drupal/) and UK10K project (http://www.uk10k.org), which are expected to improve our understanding of variations in the human genome. The sequence capture method has also been applied for WES using various kits, such as the Agilent Human All Exon kit, Roche SeqCap EZ Human Exome kit and Illumina TruSeq Exome Enrichment kit. The respective libraries of capture oligo-probes, which cover all human exons, are designed to target all exons of all HLA genes. For example, 820 exons of 182 genes in the HLA region are found in the Agilent Human All Exon kit design. Because DNA sequence reads for HLA genes are included in the whole genome sequencing (WGS) and WES datasets, HLA typing could be carried out using both the datasets. Within WGS and WES datasets, HLA gene sequences represent only a small portion of the data, but these sequences are phased HLA gene sequences as HLA alleles. Therefore, HLA typing from WGS or WES datasets should be the key analysis method used to promote higher accuracy rates compared with those of the existing PCR-SSO or PCR-sequencing-based typing results.

NGS HLA-typing software

As described in Table 2, various HLA-typing software programs, including the aforementioned Omixon Target HLA and HLAminer, as well as academic software and commercial software packages, have been developed for HLA typing from various types of data, including WGS, WES, RNA sequencing and amplicon datasets.[46, 47, 48, 49, 50, 51, 52, 53, 54] An example of HLA-typing software for WGS or WES, including a brief overview of the Omixon Target HLA typing system, is described in Figure 3. For statistical methods, sequence reads are first aligned to the whole IMGT/HLA database (all known HLA alleles). Then, the best matching alleles are selected based on various alignment statistics, such as the number of reads covering exons and the extent of exons covered. During statistical analysis, only reads that are mappable as homologous to any allele in the IMGT/HLA database with a low number of mismatches should be stored. In the Omixon publication, which used data from the 1000 Genomes Project, the concordance rate between the NGS-based method and PCR-SSO was around 90%, which was not considered high.[54] For the analysis, sequence reads from all exons of HLA genes were applied. At least 10 reads are required to counterbalance random noise. However, the sequence reads were not evenly distributed for each gene region, and the average depth implied that there may be holes in coverage. Another publication in which the authors utilized the HLAminer software also mentioned 92.8% concordance rate between these two methods for allele group prediction.[46] NGS HLA typing can call for all the HLA genes recorded in the IMGT/HLA database and for novel HLA alleles. On the other hand, it is not currently possible to detect rare HLA alleles by PCR-SSO. Therefore, if sequence reads of rare HLA alleles are in WGS or WES, the HLA-typing results from NGS would be expected to be discordant with those from PCR-SSO. In the near future, the reliability of these HLA-typing methods from WGS and WES data may be improved. These programs are the next step in developing methods with greater specificity and sensitivity of HLA-typing results. In particular, the specificity is dependent on the HLA-typing resolution, for example, two-, four-, six- and eight-digit, each of which is based on the composition of the allele group, the specific allele protein, the specific DNA sequence with synonymous substitutions in the coding region and the specific DNA sequence of the entire gene, respectively. The high-resolution HLA typing of NGS is advantageous compared with the existing PCR-SSO and PCR-sequencing-based typing methods. In practice, it is not possible to execute complete eight-digit HLA typing because of limitations in the number of known HLA allelic sequences deposited in the IMGT/HLA database, where most HLA allelic sequences have been recorded as coding sequences or partial exons. Only HLA alleles recorded as full-length HLA gene sequences can be used for allele-call with eight-digit resolution. To put eight-digit resolution typing into practice, the NGS-based phase-defined complete sequencing methods for the HLA genes will be applicable as a high-resolution tool for the detection of novel alleles, and will facilitate the development of expanded databases with full-length HLA allelic sequences for eight-digit HLA typing.[33] The success of complete HLA gene sequencing with high accuracy should be determined based on high sequence read depth. In the case of HLA-B sequencing, the minimum depth for complete phasing was approximately 800 folds the average depth.[37]
Table 2

HLA-typing software and category of acceptable reads

HLA-typing softwareURLRead typeReference
HLAminerhttp://www.bcgsc.ca/platform/bioinfo/software/hlaminerWGS/WES/RNA-seq/ampliconWarren RL et al.[46]
seq2HLAhttp://tron-mainz.de/tron-facilities/computational-medicine/seq2hla/RNA-seqBoegel S et al.[47]
ATHLATEShttps://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/athlatesWGS/WES/ampliconLiu C et al.[48]
OptiTypehttp://omictools.com/optitype-s6206.htmlWGS/WES/RNA-seqSzolek A et al.[49]
HLAforesthttp://code.google.com/p/hlaforestRNA-seqKim HJ et al.[50]
PHLAThttps://sites.google.com/site/projectphlat/WGS/WES/RNA-seqBai Y et al.[51]
Phase-defined HLA sequencinghttps://p-galaxy.ddbj.nig.ac.jpAmpliconHosomichi K et al.[33]
HLAreporterhttp://paed.hku.hk/genome/software.htmlWGS/WESHuang Y et al.[52]
HLA-VBSeqhttp://nagasakilab.csml.org/hla/WGSNariai N et al.[53]
HLAssignhttp://www.ikmb.uni-kiel.de/resources/download-tools/software/hlassignSequence captureWittig M et al.[44]
Assign ATF (Conexio Genomics)http://www.conexio-genomics.comAmplicon
Omixon Target HLA (Omixon)http://www.omixon.com/hla/WGS/WES/ampliconMajor E et al.[54]
NGSengine (Gen Dx)http://www.gendx.com/products/ngsengineAmplicon
GeMS (Scisco Genetics)http://sciscogenetics.com/services/integrated-genotyping-system/Amplicon
Figure 3

Overview of data analysis for HLA typing using WGS/WES. Massive sequence reads from WGS/WES are aligned to the whole IMGT/HLA database (all known HLA alleles) to search for best matching alleles based on alignment statistics, number of reads covering exons and the extent of exon coverage. The HLA allele can be identified by only storing reads that are mappable as homologous to any allele in the IMGT/HLA database with a low number of mismatches by statistical analysis.

HLA in the ENCODE project data

In HLA research, NGS technologies have influenced HLA typing as well as our understanding of the functional regulatory regions within the HLA region, which could affect the expression of HLA genes. Thus far, HLA-associated diseases have been understood on the hypothesis that antigen presentation of HLA molecule affects the immune system. Four-digit HLA typing is sufficient to explain the importance of antigen presentation in disease causality. On the other hand, the HLA-associated phenotypes could also be affected by the expression levels of HLA genes or by allelic imbalance. In 2012, the ENCODE project succeeded in the systematic arrangement of transcript regions and transcription factor (TF)-binding sites in the genome, and showed the genomic patterns of chromatin structure and histone modifications.[8] The achievements of the project also include the discovery of putative functional elements and domains within the HLA region. The knowledge obtained in the ENCODE project could be extended to understand HLA-associated diseases and phenotypes. Certain diseases are associated with specific HLA alleles, and many variants within the HLA genes are also associated with HLA alleles in linkage disequilibrium; therefore, it is quite difficult to genetically determine which variant is associated with the disease because the disease is associated with a haplotype carrying the HLA alleles and many variants. Furthermore, there are limitations to genetic analyses with limited numbers of samples and minor genetic effects; however, the ENCODE project highlights the functional regions of the entire human genome including the HLA region. Two examples of HLA genes and the associated diseases are described in Figure 4, Table 3 and Table 4. HLA-DRA is less polymorphic than HLA-A, -C, -B or -DRB1. However, many variants have been observed in the upstream region among HLA-DRA alleles, particularly between the same six-digit HLA-DRA alleles, HLA-DRA*01:01:01 (Figure 4, unpublished data). A deletion of about 2 kb was also detected in the upstream region of HLA-DRA*01:02:02 by HLA target resequencing data. Before the completion of the ENCODE project, it was difficult to understand the effects of deletions. Now, we can see the possibility of a functional regulatory region around the deletion. Interestingly, two haploid genome sequences of HLA-DRA*01:01:01 had different sequences within the intron and upstream regions. Some of the variants also may affect the expression levels of HLA-DRA by mediating TF binding to the variants. The haplotype of the variants in the upstream region could be significantly different, even though the HLA allele was found to be the same as the HLA-DRA*01:01:01 sequence with six-digit resolution. The ENCODE project stressed the importance of complete HLA gene sequencing, including the upstream regulatory region, to determine the haplotype. In total, 3619 SNVs in the HLA region were selected as expression Quantitative Trait Loci (eQTL) SNVs for HLA gene expression (Table 3).[55] These eQTL SNVs were identified in the RegulomeDB database (http://www.regulomedb.org), which have provided annotations of SNPs with known and predicted regulatory elements in the intergenic regions of the human genome. The database includes public datasets from the ENCODE project, in addition to GEO and publications. Known and predicted regulatory DNA elements from DNAase hypersensitivity, TF-binding sites and promoter regions that have been biochemically characterized to regulate transcription are also included. Recorded variants have been classified into various categories according to TF binding and target gene expression. The 3619 HLA eQTL SNVs are likely to affect the binding of TFs to mediate expression of the HLA gene. For variants and deletions near HLA-DRA, new hypotheses concerning the biological functions of the gene could be generated to improve our understanding of HLA-DRA-associated phenotypes.
Figure 4

Example of target resequencing to detect variants and functional prediction of the regulatory region. Target resequencing of the HLA region clarifies all variants in the target region. For example, several variants and approximately 2-kb deletions have been detected in the upstream region of HLA-DRA. (a) Alignment view of mapped reads (pink: forward strand read, purple: reverse strand read) in the alignment track for detection of SNVs (A: green, C: blue, G: yellow, T: red) and the deletion as displayed in the coverage track. (b) The region was located in cis-regulatory elements as active (H3K27ac-marked) enhancers and a DNase I-hypersensitive site defined by ENCODE chromatin immunoprecipitation sequencing and DNaseI-seq peaks. The deletion and SNVs may affect the expression level of HLA-DRA by influencing the binding of TFs.

Table 3

Number of eQTL SNVs linking expression level of HLA genes

HLA geneNumber of eQTL SNVs
HLA-A821
HLA-C12
HLA-B773
HLA-DRA288
HLA-DRB1580
HLA-DQA1544
HLA-DQB1473
HLA-DPA12
HLA-DPB1126
Total3619

Abbreviations: eQTL, expression Quantitative Trait Loci; HLA, human leukocyte antigen; SNV, single-nucleotide variant.

Table 4

Lead SNVs linking rheumatoid arthritis association with regulatory information in the human genome

ChrPositionLead SNPDistance to nearest TSSGENCODE v7 location RegulomeDB Score
629 789 171rs161067723 582 bpIntergenic region No regulatory annotation
631 379 931rs10636359903 bpCoding region6Motif
632 218 989rs929601527 283 bpIntergenic region6Motif
632 282 854rs691007149 238 bpIntron6Motif
632 429 643rs926885368 359 bpIntergenic region6Motif
632 574 171rs61567235 847 bpIntergenic region6Motif
632 577 380rs66089532 638 bpIntergenic region4ChIP-seq peak + DNaseI-seq peak
632 602 269rs927221944 749 bpIntergenic region6Motif
632 663 851rs645761727 409 bpIntergenic region6Motif
632 663 999rs645762027 557 bpIntergenic region6Motif
632 671 103rs1319247134 661 bpIntergenic region No regulatory annotation
632 680 928rs776537944 486 bpIntergenic region No regulatory annotation

Abbreviations: SNP, single nucleotide polymorphism; SNV, single-nucleotide variant; TSS, transcription start site.

In another example, 12 SNVs with regulatory functions have been shown to be associated with rheumatoid arthritis (Table 4). Of the lead SNVs, rs660895 is located 32.6 kb upstream of HLA-DRB1 (from the nearest transcription start site) and has been described as a tag SNP for the HLA-DRB1*04:01 allele. The HLA-DRB1*04:01 allele has been shown to be associated with a higher risk of rheumatoid arthritis (OR: 6.2).[56] From chromatin immunoprecipitation sequencing and DNaseI-seq peak data from the ENCODE project, it was found that the SNP is located within regulatory regions. Other eight SNVs are shown to be located within TF-binding sites predicted by in silico motif discovery. The information from ENCODE data will help decision-making for additional and follow-up experiments to obtain reliable evidence for the mechanism through which SNVs in the HLA region contribute to the development of RA.

Future directions

In 2005, Roche launched the first NGS instrument, the Genome Sequencer 20. The Genome Sequencer 20 was able to achieve a read length of about 100 bp and could sequence 20 Mbp per run. Within the last decade, rapid progress in NGS technology has resulted in revolutionary changes in medical genomics for applications in genetic diagnosis, called clinical sequencing or medical exome. However, the two commonly utilized methods for HLA typing, PCR-SSO and PCR-sequencing-based typing, are still the first-line methods in HLA research and diagnosis for more than 10 years. Recently, several manufacturers have begun to develop HLA-typing kits for NGS; thus, elucidation of the complete HLA gene sequence will soon provide new knowledge that will be useful for medical science. However, gene sequence of the HLA region alone will be insufficient for a complete understanding of HLA and all of the HLA-associated phenomena. For this purpose, phase-defined sequencing and haplotype determination of all regions including the HLA genes and regulatory sequences in the HLA region are essential. Further analyses will be required to determine the transcription of the fundamental ‘HLA' unit, including the HLA genes and all associated targets involved in the HLA functional pathway, along with physically interacting targets and regulatory regions containing TF-binding sites. These must all be considered carefully to develop a complete understanding of ‘HLA', that is, HLA-omics analysis. Finally, the goal of HLA typing as complete gene sequencing should be clinical applications that will benefit patients. Future HLA-typing methods will help realize the goal of ‘precision medicine' by determining biologically distinct subgroups for precisely targeted treatments.[57]
  57 in total

1.  Particular HLA-DRB1 shared epitope genotypes are strongly associated with rheumatoid vasculitis.

Authors:  Jennifer D Gorman; Eve David-Vaudey; Madhukar Pai; Raymond F Lum; Lindsey A Criswell
Journal:  Arthritis Rheum       Date:  2004-11

2.  Impact of three Illumina library construction methods on GC bias and HLA genotype calling.

Authors:  James H Lan; Yuxin Yin; Elaine F Reed; Kevin Moua; Kimberly Thomas; Qiuheng Zhang
Journal:  Hum Immunol       Date:  2014-12-25       Impact factor: 2.850

3.  Genetic variations in HLA-B region and hypersensitivity reactions to abacavir.

Authors:  Seth Hetherington; Arlene R Hughes; Michael Mosteller; Denise Shortino; Katherine L Baker; William Spreen; Eric Lai; Kirstie Davies; Abigail Handley; David J Dow; Mary E Fling; Michael Stocum; Clive Bowman; Linda M Thurmond; Allen D Roses
Journal:  Lancet       Date:  2002-03-30       Impact factor: 79.321

4.  Application of high-throughput, high-resolution and cost-effective next generation sequencing-based large-scale HLA typing in donor registry.

Authors:  M Zhou; D Gao; X Chai; J Liu; Z Lan; Q Liu; F Yang; Y Guo; J Fang; L Yang; D Du; L Chen; X Yang; M Zhang; H Zeng; J Lu; H Chen; X Zhang; S Wu; Y Han; J Tan; Z Cheng; C Huang; W Wang
Journal:  Tissue Antigens       Date:  2014-11-24

5.  A highly annotated whole-genome sequence of a Korean individual.

Authors:  Jong-Il Kim; Young Seok Ju; Hansoo Park; Sheehyun Kim; Seonwook Lee; Jae-Hyuk Yi; Joann Mudge; Neil A Miller; Dongwan Hong; Callum J Bell; Hye-Sun Kim; In-Soon Chung; Woo-Chung Lee; Ji-Sun Lee; Seung-Hyun Seo; Ji-Young Yun; Hyun Nyun Woo; Heewook Lee; Dongwhan Suh; Seungbok Lee; Hyun-Jin Kim; Maryam Yavartanoo; Minhye Kwak; Ying Zheng; Mi Kyeong Lee; Hyunjun Park; Jeong Yeon Kim; Omer Gokcumen; Ryan E Mills; Alexander Wait Zaranek; Joseph Thakuria; Xiaodi Wu; Ryan W Kim; Jim J Huntley; Shujun Luo; Gary P Schroth; Thomas D Wu; HyeRan Kim; Kap-Seok Yang; Woong-Yang Park; Hyungtae Kim; George M Church; Charles Lee; Stephen F Kingsmore; Jeong-Sun Seo
Journal:  Nature       Date:  2009-07-08       Impact factor: 49.962

6.  Complete MHC haplotype sequencing for common disease gene mapping.

Authors:  C Andrew Stewart; Roger Horton; Richard J N Allcock; Jennifer L Ashurst; Alexey M Atrazhev; Penny Coggill; Ian Dunham; Simon Forbes; Karen Halls; Joanna M M Howson; Sean J Humphray; Sarah Hunt; Andrew J Mungall; Kazutoyo Osoegawa; Sophie Palmer; Anne N Roberts; Jane Rogers; Sarah Sims; Yu Wang; Laurens G Wilming; John F Elliott; Pieter J de Jong; Stephen Sawcer; John A Todd; John Trowsdale; Stephan Beck
Journal:  Genome Res       Date:  2004-05-12       Impact factor: 9.043

7.  OptiType: precision HLA typing from next-generation sequencing data.

Authors:  András Szolek; Benjamin Schubert; Christopher Mohr; Marc Sturm; Magdalena Feldhahn; Oliver Kohlbacher
Journal:  Bioinformatics       Date:  2014-08-20       Impact factor: 6.937

8.  Cost-efficient multiplex PCR for routine genotyping of up to nine classical HLA loci in a single analytical run of multiple samples by next generation sequencing.

Authors:  Yuki Ozaki; Shingo Suzuki; Koichi Kashiwase; Atsuko Shigenari; Yuko Okudaira; Sayaka Ito; Anri Masuya; Fumihiro Azuma; Toshio Yabe; Satoko Morishima; Shigeki Mitsunaga; Masahiro Satake; Masao Ota; Yasuo Morishima; Jerzy K Kulski; Katsuyuki Saito; Hidetoshi Inoko; Takashi Shiina
Journal:  BMC Genomics       Date:  2015-04-18       Impact factor: 3.969

9.  HLA typing from RNA-seq data using hierarchical read weighting [corrected].

Authors:  Hyunsung John Kim; Nader Pourmand
Journal:  PLoS One       Date:  2013-06-28       Impact factor: 3.240

10.  High-throughput multiplex HLA genotyping by next-generation sequencing using multi-locus individual tagging.

Authors:  Philip K Ehrenberg; Aviva Geretz; Karen M Baldwin; Richard Apps; Victoria R Polonis; Merlin L Robb; Jerome H Kim; Nelson L Michael; Rasmi Thomas
Journal:  BMC Genomics       Date:  2014-10-06       Impact factor: 3.969

View more
  60 in total

1.  Human immune system diversity and its implications in diseases.

Authors:  Naoyuki Tsuchiya; Jun Ohashi
Journal:  J Hum Genet       Date:  2015-11       Impact factor: 3.172

2.  Novel Transcriptional Activity and Extensive Allelic Imbalance in the Human MHC Region.

Authors:  Elizabeth Gensterblum-Miller; Weisheng Wu; Amr H Sawalha
Journal:  J Immunol       Date:  2018-01-08       Impact factor: 5.422

3.  Distinguishing the dominant species of pathogen in ethmoidal sinusitis by sequencing DNA dataset analysis.

Authors:  Junyi Zhang; Shuai He; Yunchuan Li; Minggang Lv; Hongzheng Wei; Bin Qu; Yani Zheng; Chunhua Hu
Journal:  Exp Ther Med       Date:  2018-09-11       Impact factor: 2.447

Review 4.  The HLA-DRB1 allele polymorphisms and nasopharyngeal carcinoma.

Authors:  Huimin Yang; Kaihui Yu; Ruoheng Zhang; Jiatong Li; Xiaomou Wei; Yuening Zhang; Chengdong Zhang; Feifan Xiao; Dong Zhao; Xuandong Lin; Huayu Wu; Xiaoli Yang
Journal:  Tumour Biol       Date:  2016-04-08

Review 5.  Diagnostic molecular techniques in haematology: recent advances.

Authors:  Aikaterini Koutsi; Elisavet-Christina Vervesou
Journal:  Ann Transl Med       Date:  2018-06

6.  The genetics of Takayasu arteritis.

Authors:  Paul Renauer; Amr H Sawalha
Journal:  Presse Med       Date:  2017-07-26       Impact factor: 1.228

7.  Contribution of a Non-classical HLA Gene, HLA-DOA, to the Risk of Rheumatoid Arthritis.

Authors:  Yukinori Okada; Akari Suzuki; Katsunori Ikari; Chikashi Terao; Yuta Kochi; Koichiro Ohmura; Koichiro Higasa; Masato Akiyama; Kyota Ashikawa; Masahiro Kanai; Jun Hirata; Naomasa Suita; Yik-Ying Teo; Huji Xu; Sang-Cheol Bae; Atsushi Takahashi; Yukihide Momozawa; Koichi Matsuda; Shigeki Momohara; Atsuo Taniguchi; Ryo Yamada; Tsuneyo Mimori; Michiaki Kubo; Matthew A Brown; Soumya Raychaudhuri; Fumihiko Matsuda; Hisashi Yamanaka; Yoichiro Kamatani; Kazuhiko Yamamoto
Journal:  Am J Hum Genet       Date:  2016-08-04       Impact factor: 11.025

Review 8.  Long-read sequencing in deciphering human genetics to a greater depth.

Authors:  Mohit K Midha; Mengchu Wu; Kuo-Ping Chiu
Journal:  Hum Genet       Date:  2019-09-19       Impact factor: 4.132

9.  Association of serum 25-hydroxyvitamin D concentration with HLA-B, -DRB1 and -DQB1 genetic polymorphisms.

Authors:  M E Miettinen; L Kinnunen; V Harjutsalo; K Aimonen; H-M Surcel; C Lamberg-Allardt; J Tuomilehto
Journal:  Eur J Clin Nutr       Date:  2016-09-14       Impact factor: 4.016

10.  A novel RNAseq-assisted method for MHC class I genotyping in a non-model species applied to a lethal vaccination-induced alloimmune disease.

Authors:  Wiebke Demasius; Rosemarie Weikard; Frieder Hadlich; Johannes Buitkamp; Christa Kühn
Journal:  BMC Genomics       Date:  2016-05-17       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.