Literature DB >> 25339953

Insights into HLA-G Genetics Provided by Worldwide Haplotype Diversity.

Erick C Castelli1, Jaqueline Ramalho1, Iane O P Porto1, Thálitta H A Lima1, Leandro P Felício2, Audrey Sabbagh3, Eduardo A Donadi4, Celso T Mendes-Junior5.   

Abstract

Human leukocyte antigen G (HLA-G) belongs to the family of non-classical HLA class I genes, located within the major histocompatibility complex (MHC). HLA-G has been the target of most recent research regarding the function of class I non-classical genes. The main features that distinguish HLA-G from classical class I genes are (a) limited protein variability, (b) alternative splicing generating several membrane bound and soluble isoforms, (c) short cytoplasmic tail, (d) modulation of immune response (immune tolerance), and (e) restricted expression to certain tissues. In the present work, we describe the HLA-G gene structure and address the HLA-G variability and haplotype diversity among several populations around the world, considering each of its major segments [promoter, coding, and 3' untranslated region (UTR)]. For this purpose, we developed a pipeline to reevaluate the 1000Genomes data and recover miscalled or missing genotypes and haplotypes. It became clear that the overall structure of the HLA-G molecule has been maintained during the evolutionary process and that most of the variation sites found in the HLA-G coding region are either coding synonymous or intronic mutations. In addition, only a few frequent and divergent extended haplotypes are found when the promoter, coding, and 3'UTRs are evaluated together. The divergence is particularly evident for the regulatory regions. The population comparisons confirmed that most of the HLA-G variability has originated before human dispersion from Africa and that the allele and haplotype frequencies have probably been shaped by strong selective pressures.

Entities:  

Keywords:  1000Genomes Project; HLA-G; gene structure and diversity; haplotypes; non-classical HLA; polymorphisms; selective pressure; variability

Year:  2014        PMID: 25339953      PMCID: PMC4186343          DOI: 10.3389/fimmu.2014.00476

Source DB:  PubMed          Journal:  Front Immunol        ISSN: 1664-3224            Impact factor:   7.561


Introduction

Human leukocyte antigen G (HLA-G) belongs to the family of non-classical HLA class I genes, located within the major histocompatibility complex (MHC) at chromosomal region 6p21.3. The MHC segment is considered to be the most polymorphic region in vertebrate genome (1). Although the HLA-G product presents the same class I classical molecule structure, its main function is not antigen presentation. HLA-G function in the immune response regulation has been extensively studied since its discovery by Geraghty and colleagues in 1987 (2). The HLA-G gene has been the target of most recent research regarding the function of class I non-classical genes. The main features that distinguish HLA-G from classical class I genes are (a) limited protein variability, (b) alternative splicing generating several membrane bound and soluble isoforms, (c) short cytoplasmic tail, (d) modulation of immune response (immune tolerance), and (e) restricted expression to certain tissues (3). The HLA-G molecule does not seem to stimulate immune responses, however, it exerts inhibitory functions against natural killer (NK) cells (4), T lymphocytes (4), and antigen-presenting cells (APC) (5) through direct interaction with multiple inhibitory receptors such as ILT2/CD85j/LILRB1 (ILT2), expressed by all monocytes, B cells, some lineages of T cells, and NK cells (6); ILT4/CD85d/LILRB2 (ILT4), only expressed by monocytes and dendritic cells (7); and KIR2DL4/CD158d (KIR2DL4) that has a restricted expression to CD56 NK cells (8). HLA-G role in immune tolerance was first studied in trophoblast cells at the maternal–fetal interface (9). Several studies reported an aberrant or reduced HLA-G expression in both mRNA and protein levels. This phenomenon was observed in pathological conditions such as preeclampsia (10) and recurrent spontaneous abortion (11) in comparison with normal placentas. Beyond trophoblast expression, HLA-G is related to a variety of physiological and pathological conditions. In physiological conditions, HLA-G expression has been documented in cornea (12), thymus (13), and erythroid and endothelial precursors (14). On the other hand, HLA-G variation sites and/or expression levels are associated with pathological conditions such as viral infections (15–20), cancer (21–27), recurrent miscarriage (28–37), pregnancy outcome and pregnancy complications (37–45), autoimmune diseases (46–54), transplantation outcome (55–57), and inflammatory diseases (58–61), indicating that HLA-G encodes a critical molecule for the immune system.

HLA-G Genetic Structure

The HLA-G gene presents a structure that resembles other classical class I genes such as HLA-A, HLA-B, and HLA-C. HLA-G encodes for a membrane-bound molecule with the same extracellular domains presented by other class I molecules, including the association with the β2-microglobulin. However, its main function is not antigen presentation. The HLA-G gene exon/intron structure and splicing patterns are well defined, but there are inconsistencies between the National Center for Biotechnology Information (NCBI), the International Immunogenetics Database (IMGT/HLA), and the Ensembl database annotations regarding its structure, mainly because the IMGT/HLA database only presents sequences within 300 bases upstream the coding sequence (CDS) and the database does not consider most of the 3′ untranslated region (UTR) segment. Therefore, in the present work, the structure defined by NCBI/Ensembl will be used throughout the text. According to the NCBI reference sequence NC_000006.12 (GRCh38 or hg19) and transcripts such as NM_002127.5 (NCBI), ENST00000428701, and ENST00000376828 (Ensembl), the HLA-G gene (NCBI Gene ID: 3135) presents eight exons and seven introns, consistent with a classical class I gene structure, and encompasses a region of 4144 nucleotides between positions 29826979 and 29831122 at 6p21.3 (GRCh38). This gene is surrounded by some of the most polymorphic genes in the human genome (Figure 1), such as HLA-A (115 Kb downstream), HLA-B (1526 Kb downstream), and HLA-C (1441 Kb downstream), and other non-classical HLA loci such as HLA-E (662 Kb downstream) and HLA-F (103 Kb upstream). According to the NCBI annotation and hg19, the HLA-G DNA segment encodes a full-length mRNA of 1578 nucleotides and alternative smaller ones, as discussed later. Considering the full-length mRNA, 1017 nucleotides represent the CDS encoding for a full-length protein of 338 amino acids, 178 nucleotides represent the 5′UTR segment, and 383 nucleotides represent the 3′UTR segment.
Figure 1

.

. There is no consensus regarding the exact location where the HLA-G transcription may start. Considering the NCBI and Ensembl annotations, and the transcripts NM_002127.5 from NCBI and ENST00000428701 from Ensembl, the HLA-G transcription starts 866 nucleotides upstream the initial translated ATG (third * at Figure 1). However, other transcripts tell us a different story: ENST00000376828 indicates that the HLA-G transcription might start even earlier, while ENST00000360323 indicates that the transcription starts 24 nucleotides upstream the initial translated ATG. Given these contradictory information, it is possible that the HLA-G gene presents multiple transcription start points depending on the presence of specific transcription factors or other expression inducing mechanisms, but it probably presents only one translation start point as described further. Since there is no consensus, in the present work, we opt to use the annotation presented by both NCBI and Ensembl, considering NM_002127.5 and ENST00000428701 as references. Considering the transcription start site indicated by NM_002127.5/ENST00000428701 or ENST00000360323, HLA-G presents a large 5′UTR segment. Within this segment, there is an intron (intron 1) of about 688 nucleotides that is spliced out, giving rise to 5′UTR of about 178 nucleotides composed of DNA segments of two adjacent exons. Considering this transcription start point, the HLA-G 5′ sequence presents at least three potential translation start points, i.e., two in the 5′UTR and the third one defining the beginning of the CDS. In the present work, we will consider the Adenine of this third ATG, i.e., the first base of the CDS, as nucleotide +1. Although conventional nomenclature would suggest the first transcribed base as nucleotide +1, our decision will avoid unnecessary confusion regarding the position of various well-established HLA-G variation sites. All nucleotides before the CDS will be noted as negative numbers and nucleotides in the CDS segment will be noted as positive numbers, using as a reference sequence the one available at the official human genome hg19 or NC_000006.12. The first ATG is found between nucleotides −154 and −152 (mRNA) or nucleotides −842 and −840 (DNA). The second one is found between nucleotides −118 and −116 (mRNA) or nucleotides −806 and −804 (DNA). Both of these translation start points are in the same frame and are included in a sequence that does not resemble the preferred translation initiation sequence (Kozak consensus sequence) and might not initiate translation (62). Even if the first ATG is used, it would produce a peptide of only eight residues due to a stop codon found downstream in the reading frame. Alternatively, if the second ATG is used, a protein of about 136 amino acid residues would be produced. Although in a different frame from the main translation start point (the third one), this 136 amino acid molecule is quite similar to other human and primate class I molecule alpha-1 domains. The third and main ATG is compatible with the preferred Kozac sequence (62) and it initiates the translation of the full-length 338 amino acid residues protein and defines the beginning of the CDS segment. The HLA-G CDS is composed of joining segments of six exons, in which the first contains the translation start point and the last one contains the stop codon (Table 1, Figure 1). It should be noted that there is no consensus regarding exon and intron nomenclature between NCBI/Ensembl and the IMGT/HLA databases. IMGT/HLA considers as exon 1 the first mRNA segment that is translated, i.e., exon 2 for NCBI/Ensembl (Figure 1). The actual exon 2, which encodes the final portion of the 5′UTR, contains the main translation start point and in fact encodes the HLA-G leader peptide (Figure 1). In addition, exons 3, 4, and 5 encode the alpha-1, alpha-2, and alpha-3 domains, respectively, exon 6 encodes the transmembrane domain, and exon 7 the cytoplasmic tail. A premature stop codon at exon 7 leads to a shorter cytoplasmic tail when compared to other class I molecules (Figure 1, Table 1). The segment downstream the stop codon at exon 7 extending to exon 8 composes the HLA-G 3′UTR. The HLA-G mRNA 3′UTR is short when compared to other class I genes. This gene structure description highlights one of the widely spread misconceptions regarding HLA-G gene structure: in 1987, Geraghty and colleagues proposed the existence of an exon 7 based on homology with classical class I genes (2). This “exon 7” was in fact part of the intron 7 (NCBI) and it is usually absent in most of the HLA-G transcripts. Although this “exon 7” segment has been found in alternative transcripts (e.g., ENST00000478519), other intron segments are also sometimes kept in rare alternative transcripts (e.g., ENST00000478355), since alternative splicing is an important characteristic of the HLA-G gene as described further.
Table 1

The .

According to NC_000006.12 (hg19)According to IMGT/HLASize (nt)Function considering the full-length mRNA
Exon 1665′UTR
Intron 1688Spliced out
Exon 2Exon 11855′UTR/Leader peptide
Intron 2Intron 1129Spliced out
Exon 3Exon 2270Alpha-1 domain
Intron 3Intron 2226Spliced out
Exon 4Exon 3276Alpha-2 domain
Intron 4Intron 3599Spliced out
Exon 5Exon 4276Alpha-3 domain
Intron 5Intron 4122Spliced out
Exon 6Exon 5117Transmembrane domain/cytoplasmic tail
Intron 6Intron 5445Spliced out
Exon 7Exon 633Cytoplasmic tail/stop codon/3′UTR
Intron 7357Spliced out
Exon 83553′UTR
The . The HLA-G gene may produce at least seven protein isoforms generated by alternative splicing of the primary transcript (Figure 1). Four isoforms are membrane bound presenting the transmembrane domain and the short cytoplasmic tail. HLA-G1 is the full-length membrane-bound isoform with a structure that resembles classical class I molecules. HLA-G2 lacks alpha-2 domain, HLA-G3 lacks alpha-2 and alpha-3 domains, and HLA-G4 lacks alpha-3 domain. Three isoforms are soluble due to the lack of the transmembrane domain. The soluble HLA-G5 and HLA-G6 isoforms present the same extracellular domains of HLA-G1 and HLA-G2, respectively; however, both transcript variants retain intron 5 leading to a stop codon before the translation of the transmembrane domain, and a tail of 21 amino acids implicated in their solubility. HLA-G7 transcript variant retains intron 3 leading to a premature stop codon. Therefore, HLA-G7 isoform presents only the alpha-1 domain linked to two amino acids encoded by intron 2 (Figure 1) (63–65). In the next sections, we will address the HLA-G variability and haplotype diversity among several populations around the world.

HLA-G Variability as Described in the 1000Genomes Project

The 1000Genomes Project is a large survey aiming to sequence the entire genome of thousands of individuals in several populations around the world (66). In the initial released data, the phased genotypes of 1092 individuals from 14 populations were available. These data have driven several studies regarding HLA-G variability and evolutionary aspects (67–69). The initial genotype published by the 1000Genomes Project was based on exome sequencing or whole genome low coverage sequencing and lacks several known HLA-G polymorphisms due to limitations in the genotype detection procedures at that moment. Among the missing polymorphic sites, we may highlight some known indels, such as the traditionally studied 14-bp presence or absence (insertion/deletion) in the HLA-G 3′UTR. In addition, the method used to infer genotypes and haplotypes failed to clearly distinguish triallelic SNPs, reporting them as biallelic ones (e.g., the HLA-G promoter SNP at position −725C/T/G, rs1233334). Considering these technical limitations and considering the fact that most of the bioinformatics tools used in the initial survey are now more advanced and developed, we have reevaluated the 1000Genomes raw sequencing data regarding the HLA-G gene using a locally developed pipeline to get genotypes and haplotypes, to better understand the HLA-G variability around the world and to retrieve data regarding some HLA-G missed polymorphic sites. First, by using Samtools (70) subroutine view, we downloaded the BAM files (binary alignment map) containing the 1000Genomes official alignment data for the HLA-G gene region (between positions 29793317 and 29799834 at chromosome 6) directly from the 1000Genomes server (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/). The reads downloaded were already trimmed on both ends for primer sequences. The download was performed for each of the initial 1092 samples and included data from both low coverage whole genome and exome when available. It should be mentioned that we got the sequences (reads) from BAM files representing the HLA-G region, thus, the next step of our pipeline used only the reads that were previously mapped to the HLA-G region by the 1000Genomes Consortium. Each BAM file was converted into a Fastq format file retrieving all reads that were previously mapped to the HLA-G region. The BAM to Fastq conversion was made using Bamtools (https://github.com/pezmaster31/bamtools/) and Perl scripts (locally developed) to filter out duplicated reads and to classify the reads as paired or unpaired. Both paired and unpaired Fastq files were mapped to a masked chromosome 6 (hg19), in which only the HLA-G region was available and the rest of the chromosome was masked with “N” to preserve nucleotide positions regarding hg19. To date, hg19 presents a HLA-G coding region sequence compatible with the widely spread HLA-G allele known as G*01:01:01:05. Mapping was performed using the application BWA, subroutine ALN (71), configured to allow the extension of a deletion up to 20 nucleotides, in order to evaluate the 14-bp polymorphism. The resulting BAM files from the newly mapped reads, from both paired-end and unpaired sequences, were joined using Picard-tools (http://picard.sourceforge.net/index.shtml). Regions containing indels were locally realigned by using the application GATK (72), routines RealignerTargetCreator and IndelRealigner. This local realignment used as reference a file containing known HLA-G indels. The Bamtools software was also used to remove reads mapped with low mapping quality (MQ) scores (MQ < 40). After the procedure described above, 16 samples were discarded because all mapped reads (or most of them) were withdrawn due to poor MQ scores. The GATK routine UnifiedGenotyper was used to infer genotypes and a VCF file (variant call format) was generated. Given the low coverage nature of the 1000Genomes data, some genotypes called by GATK are far uncertain, mainly in situations in which a homozygous genotype is inferred when that position presents low depth coverage. In addition, given the polymorphic nature and the high level of sequence similarity of HLA genes, some level of miss-mapped reads is expected and might bias genotype inference. To circumvent this issue, the VCF file generated by GATK was treated with a locally developed Perl script that applied the rules described below. This script uses the number of different reads detected for each allele at a given position (provided by GATK when the VCF file was generated). – Homozygosity was only inferred when a minimal coverage of seven reads was achieved; otherwise, a missing allele was introduced in this genotype. This procedure assures (p > 0.99) that a homozygous genotype is called because of lack of variance at that position and not because the second allele was not sampled. – Genotypes, in which one allele was extremely underrepresented (proportion of reads under 5%), were considered as homozygous for the most represented allele. This procedure minimizes the influence of miss-mapped reads to the HLA-G region and the high level of sequencing errors that characterizes next-generation sequencing data, and such correction was applied only in situations characterized by high depth of coverage (20 or more reads available for the evaluated position). – For genotypes in which one allele was mildly underrepresented (with a proportion of reads between 5 and 20%), a missing allele was introduced representing this underrepresented allele. This procedure is particularly helpful in situations characterized by low depth of coverage (less than 20 reads available for the evaluated position), in which a single read may indicate the existence of an alternative allele, such read may be a miss-mapped read (false positive variant) or may represent a true unbalanced heterozygous genotype (true positive variant). Therefore, the definitive status of this kind of genotype (homozygous or heterozygous) was inferred during a final imputation step. – Genotypes in which the proportion of reads for the less represented allele was higher than 20% were considered to be heterozygous. This procedure assures that only high-quality heterozygous genotypes are passed forward to the imputation procedure. After applying the rules described above, the HLA-G database presented 8.42% of missing alleles, i.e., alleles that were considered uncertain because of low coverage or bad proportions. Some single nucleotide variations (SNVs) previously detected (with low quality) were converted into monomorphic as the alternative allele was removed or coded as missing, thus, they were not considered for further analyses. By using the VCFtools package (73), we removed SNVs that were no longer variable or that were represented just once in the dataset (i.e., singletons). In addition, we predicted the functional effect of each SNV, i.e., they were classified as coding synonymous mutations, coding non-synonymous mutations, splice site acceptors, stop-codon generation, and others, by using Snpeff (74). The missing alleles were imputed as well as HLA-G haplotypes were inferred by using the PHASE algorithm (75) as previously described (76, 77). For this purpose, a database containing high-quality genotype information for 133 SNVs for each of the 1076 remaining samples was used. The haplotyping procedure generated 200 haplotypes, with a mean haplotype pair probability of 0.7965 and with 524 samples (48.70%) presenting a haplotype pair with a probability higher than 0.9. The results of the procedure described above were presented separately for each HLA-G region (coding, 3′UTR and promoter) and, finally, as fully characterized extended haplotypes. To characterize and explore global patterns of HLA-G diversity, a population genetics approach was performed using the ARLEQUIN 3.5.1.3 software (78, 79). The frequencies of each HLA-G haplotype were computed by the direct counting method and adherences of diplotype proportions to expectations under Hardy–Weinberg equilibrium were tested by the exact test of Guo and Thompson (80). Intrapopulational genetic diversity parameters were assessed in each population by computation of gene diversity (average expected heterozygosity across variation sites), haplotype diversity, nucleotide diversity, and the number of private haplotypes. Interpopulation genetic diversity was explored by means of pair-wise F estimates (81), by the exact test of population differentiation (82), and by the analysis of molecular variance (AMOVA) (83), all based on haplotype frequencies. Since the pair-wise F and the exact test of population differentiation between pairs of populations represent 91 statistical comparisons, the Bonferroni correction was used to adjust the significance level for multiple testing, resulting in a α = 0.0005 (i.e., 0.05/91). Reynolds’ genetics distance was also estimated for each pair of population samples by the ARLEQUIN 3.5.1.3 software (78, 79, 84). The resulting matrix was used to generate a multidimensional scaling (MDS) using the PASW Statistics (17.0.2) software (SPSS Inc.).

HLA-G Coding Region Variability and Haplotypes

In contrast to classical HLA class I genes, HLA-G presents low variability in its coding region. To date, only 50 coding alleles or haplotypes are officially recognized by the IMGT/HLA database (version 3.17.0.1). Most of the SNVs in the HLA-G coding region are either coding synonymous mutations or intronic variants. Therefore, these 50 officially recognized HLA-G alleles encode only 16 different full-length proteins and two truncated molecules (null alleles). This is a distinctive feature of the HLA-G gene and also of other non-classical class I genes: only 36% of the known HLA-G alleles are associated with different HLA-G molecules when compared to classical class I genes, in which 75.4% for HLA-A, 77.8% for HLA-B, and 73.5% for HLA-C alleles are associated with different molecules (IMGT/HLA). The limited HLA-G coding region polymorphism is distributed among the alpha-1, alpha-2, and alpha-3 domains, while for classical class I genes, polymorphisms are found mainly around the region encoding the peptide binding groove, i.e., alpha-1 and alpha-2 domains (1). This is particularly evident for HLA-B, in which there is at least one recognized allele carrying a mutation for each nucleotide of exons 2 or 3, with few exceptions. Generally, a SNV is considered as a polymorphic site if the minor allele presents a frequency of at least 1%. In this matter, some HLA-G variable sites may not be considered as true polymorphisms because they are rarely observed. Considering the 50 HLA-G alleles that have been officially recognized by IMGT/HLA, and taking into account the several studies evaluating the HLA-G coding region polymorphisms in normal or pathological conditions, only 13 alleles encoding four different HLA-G full-length molecules and a truncated one are frequently observed in worldwide populations (3, 19, 23, 34, 36, 37, 68, 69, 76, 85–104). Among the high-frequency HLA-G coding alleles, we may find the G*01:01:01:01, G*01:01:01:04, G*01:01:01:05 (present at hg19), G*01:01:02:01, G*01:01:03:01, G*01:01:05, and G*01:01:07 alleles; all carrying intronic or synonymous mutations and encoding for the same full-length HLA-G molecule known as G*01:01. HLA-G*01:01:01:01 is the reference allele used by IMGT/HLA, it was the first one described (2) and usually the most common allele in all populations studied so far. Among the frequent ones, we also find the G*01:03:01:01 allele that is characterized by a non-synonymous mutation at position 292, codon 31, exchanging a Threonine by a Serine, encoding the full-length molecule known as G*01:03. Another group of alleles are represented by G*01:04:01, G*01:04:03, and G*01:04:04, all of them encoding the same molecule known as G*01:04. They are characterized by a non-synonymous mutation at position 755, codon 110, exchanging a Leucine by an Isoleucine, and by other synonymous mutations. The null allele, G*01:05N, which is associated with a truncated HLA-G molecule due to a deletion of a cytosine around codon 130 that changes the reading frame, is also very frequent in some African, Asian, and admixed populations. Finally, the last frequent allele is G*01:06, which is characterized by a non-synonymous mutation at position 1799, codon 258, exchanging a Threonine by a Methionine, encoding a molecule known as G*01:06. Other HLA-G alleles are sporadically found around the world, but only the ones presented above have been described at polymorphic frequencies. However, the variability in the HLA-G coding region may be higher than the one presented by IMGT/HLA, because IMGT/HLA only presents alleles that were cloned, sequenced, and properly characterized by the researchers. In addition, most of the known alleles are not fully characterized, presenting only some exons sequenced. Therefore, the variability at the HLA-G coding region may be greater than the one reported so far. The reevaluation of the HLA-G sequencing data from the 1000Genomes Project indicated that the HLA-G coding region is indeed much conserved and just a few new coding alleles are frequently found worldwide. The approach described earlier evidenced the presence of 81 SNVs in the HLA-G coding region, as described in Table 2. Some of these variation sites are truly polymorphic, while some might be considered as mutations. In addition, some of these new sites are not represented in the IMGT/HLA database and might represent new HLA-G alleles.
Table 2

List of all variation sites found in the .

Genomic position (hg19)SNPidHLA-GIMGTAllele 1Allele 1Allele 2Allele 2Annotation
positionrecognized(reference)frequencyfrequency
29795636rs163022315*G0.4967A0.5033Synonymous
29795657rs163018536*G0.4967A0.5033Synonymous
29795667.46G0.9991T0.0009Non-synonymous
29795720rs5638890399*A0.1120G0.8880Intronic
29795747rs6932888126*G0.7156C0.2844Intronic
29795751rs6932596130*C0.7161T0.2839Intronic
29795768rs1629329147*T0.4396C0.5604Intronic
29795809rs1628628188*C0.5669T0.4331Intronic
29795822.201A0.9963G0.0037Splice site acceptor
29795840.219G0.9967T0.0033Non-synonymous
29795913rs41551813292*A0.9503T0.0497Non-synonymous
29795914rs72558173293*C0.9986T0.0014Non-synonymous
29795918rs80153902297*G0.9958A0.0042Synonymous
29795927rs72558174306*G0.9972A0.0028Synonymous
29795945rs9258495324*G0.9991T0.0009Synonymous
29795987rs78627024366*G0.9972A0.0028Synonymous
29795993rs1130355372*G0.4967A0.5033Synonymous
29796103rs1626038482*T0.4340C0.5660Intronic
29796106rs17875399485*G0.9526T0.0474Intronic
29796114.493G0.9991A0.0009Intronic
29796115rs1736927494*A0.4336C0.5665Intronic
29796119rs201510147498G0.9986A0.0014Intronic
29796126rs3215482505*A0.4828AC0.5172Intronic
29796128.507*C0.9517A0.0483Intronic
29796149.528A0.9967C0.0033Intronic
29796152rs1625907531*G0.4819C0.5181Intronic
29796228.607G0.9981A0.0019Intronic
29796234rs375939243613*CA0.4991C0.5009Intronic
29796245.624*T0.9991C0.0009Intronic
29796257rs1625035636*C0.4493T0.5507Intronic
29796265rs17875401644*G0.9493T0.0507Intronic
29796273.652C0.9981T0.0019Intronic
29796306rs1624337685*G0.4986A0.5014Intronic
29796327rs1130356706*C0.7621T0.2379Synonymous
29796348rs79303923727*C0.9981T0.0019Synonymous
29796362.741*C0.9991G0.0009Non-synonymous
29796369rs3873252748*A0.9345T0.0655Synonymous
29796376rs12722477755*C0.8053A0.1947Non-synonymous
29796434rs41557518813*AC0.9642A0.0358Frame Shift
29796492rs17875402871*G0.9944A0.0056Synonymous
29796637rs178754031016*C0.9949T0.0051Intronic
29796640rs16329421019*T0.4475C0.5525Intronic
29796675rs178754041054*G0.9503T0.0497Intronic
29796685rs16329411064*T0.4972C0.5028Intronic
29796700rs1480619581079C0.9972T0.0028Intronic
29796725rs3707045341104C0.9981G0.0019Intronic
29796749rs623919651128*C0.9345A0.0655Intronic
29796752.1131A0.9991T0.0009Intronic
29796768rs16329401147*T0.2040C0.7960Intronic
29796800rs1409356231179A0.9981G0.0019Intronic
29796838rs17369231217*A0.4963G0.5037Intronic
29796934rs1140419581313*G0.9507A0.0493Intronic
29796935rs16329391314*G0.4972A0.5028Intronic
29796986rs16329381365*G0.4972A0.5028Intronic
29797043rs1450230771422C0.9912T0.0088Intronic
29797052rs1161392671431C0.9967T0.0033Intronic
29797073rs1888365621452G0.9991C0.0009Intronic
29797155rs178754051534*G0.9503C0.0497Intronic
29797173rs17369201552*A0.4470G0.5530Intronic
29797195.1574A0.9986AC0.0014Frame Shift
29797211rs415626161590*C0.9503T0.0497Synonymous
29797380rs2009317621759G0.9991A0.0009Non-synonymous
29797420rs127224821799*C0.9698T0.0302Non-synonymous
29797421rs769515091800*G0.9963A0.0037Synonymous
29797448rs178754061827*G0.9554A0.0446Synonymous
29797553rs16329371932*G0.4972C0.5028Intronic
29797639rs10490332018*C0.7742T0.2258Synonymous
29797696rs11303632075*A0.4470G0.5530Synonymous
29797782rs16116272161*T0.5627C0.4373Intronic
29797899rs16329342278*T0.4972C0.5028Intronic
29797933rs16329332312*C0.4972T0.5028Intronic
29797951rs17369122330*A0.4972G0.5028Intronic
29798029.2408T0.9991A0.0009Intronic
29798033rs171790802412G0.9707A0.0293Intronic
29798039rs16329322418*G0.4972A0.5028Intronic
29798083rs1140383082462*C0.9345T0.0655Intronic
29798140rs9156672519*A0.5084G0.4916Intronic
29798248rs1861703152627G0.9991A0.0009Intronic
29798419rs9156702798*G0.7742A0.2258Intronic
29798425rs9156692804*G0.4480T0.5520Intronic
29798459rs9156682838*C0.4480G0.5520Intronic

*Denotes a variation site that is recognized by the IMGT/HLA database.

List of all variation sites found in the . *Denotes a variation site that is recognized by the IMGT/HLA database. As observed in Table 2, most of the 81 variation sites occur in introns (54 sites) or in exons as synonymous changes (16 sites). Thus, 86.4% of all variants are associated with the same HLA-G full-length molecule, unless they somehow influence HLA-G splicing pattern. Among the ones that might be related to different HLA-G full-length proteins, we may find two frameshift mutations: the first associated with the G*01:05N null allele and the second representing a low-frequency variation site not recognized by IMGT/HLA (genomic position 29797195); one variation site associated with a splicing acceptor site (genomic position 29795822, HLA-G position + 201) and eight non-synonymous modifications, most of them recognized by IMGT/HLA. Interestingly, one synonymous modification was found presenting a high frequency (2.93%) and is not associated with any known HLA-G allele described so far (HLA-G position + 2412, rs17179080, Table 2). Although a triallelic SNV is described at exon 2 (HLA-G position + 372), associated with the G*01:04:02 allele, we did not find the third allele in the present data. As described earlier, haplotypes were inferred considering all variation sites found in the HLA-G region. When the coding region is isolated from these haplotypes, we found 93 different HLA-G coding haplotypes, a number far higher than the number of HLA-G alleles officially recognized. The complete table of haplotypes is available upon request. Table 3 describes all coding haplotypes presenting a minimum global frequency of 1% and the closest known HLA-G allele in terms of sequence similarity. It should be mentioned that non-variable positions for the haplotypes presented in Table 3 were removed. Although 93 different haplotypes were inferred, only 11 present a frequency higher than 1%. Of those, 10 were compatible with a specific allele described at the IMGT/HLA database and mentioned earlier as high-frequency alleles that usually occur in any population, and 1 is a new allele that is close to G*01:01:01:01 but presents the frequent nucleotide change at position + 2412, not recognized by IMGT/HLA. As previously observed in other studies, the most frequent HLA-G allele is G*01:01:01:01, followed by G*01:01:02:01 and G*01:04:01. These 11 haplotypes or coding alleles do represent 88.8% of all HLA-G coding haplotypes and are associated with only four different HLA-G full-length molecules and a truncated one. Moreover, taking into account these 11 haplotypes, at least 60.87% of all HLA-G full-length molecules would be the same (from G*01:01:01:01, G*01:01:02:01, G:01:01:03:03, G*01:01:01:04, and G*01:01:01:01new) and a higher proportion is expected if other rare haplotypes are considered.
Table 3

List of .

HLA-G positionGenomic position on chromosome 6 (hg19)SNPidG*01:01:01:01G*01:01:01:01newG*01:01:01:04G*01:01:01:05G*01:01:02:01G*01:01:03:03G*01:03:01:02G*01:04:01G*01:04:04G*01:05NG*01:06
1529795636rs1630223GGGGAAGAAAA
3629795657rs1630185GGGGAAGAAAA
9929795720rs56388903GGGAGGGGGGG
12629795747rs6932888CCGGGGGGGGG
13029795751rs6932596TTCCCCCCCCC
14729795768rs1629329TTTTCCCCCCC
18829795809rs1628628CCCCTCCTTTT
29229795913rs41551813AAAAAATAAAA
37229795993rs1130355GGGGAAGAAAA
48229796103rs1626038TTTTCCCCCCC
48529796106rs17875399GGGGGGTGGGG
49429796115rs1736927AAAACCCCCCC
50529796126rs3215482CCCCCC
50729796128CCCCCCACCCC
53129796152rs1625907GGGGCCGCCCC
61329796234rs375939243AAAAA
63629796257rs1625035CCCCTTTTTTT
64429796265rs17875401GGGGGGTGGGG
68529796306rs1624337GGGGAAGAAAA
70629796327rs1130356CCCCTCCCCTT
74829796369rs3873252AAAAATAAAAA
75529796376rs12722477CCCCCCCAACC
81329796434rs41557518CCCCCCCCCC
101929796640rs1632942TTTTCCCCCCC
105429796675rs17875404GGGGGGTGGGG
106429796685rs1632941TTTTCCTCCCC
112829796749rs62391965CCCCCACCCCC
114729796768rs1632940CCTTCCTCCCC
121729796838rs1736923AAAAGGAGGGG
131329796934rs114041958GGGGGGAGGGG
131429796935rs1632939GGGGAAGAAAA
136529796986rs1632938GGGGAAGAAAA
153429797155rs17875405GGGGGGCGGGG
155229797173rs1736920AAAAGGGGGGG
159029797211rs41562616CCCCCCTCCCC
179929797420rs12722482CCCCCCCCCCT
182729797448rs17875406GGGGGGGGAGG
193229797553rs1632937GGGGCCGCCCC
201829797639rs1049033CCCCTCCCCTT
207529797696rs1130363AAAAGGGGGGG
216129797782rs1611627TTTTCTTCCCC
227829797899rs1632934TTTTCCTCCCC
231229797933rs1632933CCCCTTCTTTT
233029797951rs1736912AAAAGGAGGGG
241229798033rs17179080GAGGGGGGGGG
241829798039rs1632932GGGGAAGAAAA
246229798083rs114038308CCCCCTCCCCC
251929798140rs915667AAAAGGAGGGG
279829798419rs915670GGGGAGGGGAA
280429798425rs915669GGGGTTTTTTT
283829798459rs915668CCCCGGGGGGG

Global haplotype frequency (2n = 2152)0.25280.02000.03760.09110.14450.06270.04460.13290.04040.03300.0283

HLA-G coding haplotypes were converted into coding alleles based on the International Immunogenetics Database (IMGT/HLA). The new HLA-G allele presenting a frequency of about 1% is defined with the suffix “new.”

List of . HLA-G coding haplotypes were converted into coding alleles based on the International Immunogenetics Database (IMGT/HLA). The new HLA-G allele presenting a frequency of about 1% is defined with the suffix “new.” The haplotypes listed in Table 3 do present heterogeneous frequencies among the 1000Genomes populations (Table 4). The G*01:01:01:01 allele, for example, is very frequent among Europeans and Asians, presents intermediate frequencies among admixed populations and lower frequencies in African populations, while an opposite pattern is observed for the G*01:05N null allele. In addition, allele G*01:01:03:03 is absent or very rare in African populations, and the G*01:04:04, G*01:01:01:04, and G*01:01:01:01new alleles are absent in Asians.
Table 4

The most frequent .

HLA-G coding alleles according to IMGT/HLAaEurope
Asia
Africa
Admixed
CEUTSIGBRFINIBSCHBCHSJPTYRILWKASWMXLPURCLM
2n = 1702n = 1962n = 1742n = 1842n = 282n = 1922n = 2002n = 1782n = 1742n = 1882n = 1182n = 1242n = 1102n = 116
G*01:01:01:010.38240.27550.29890.33700.28570.28130.39000.23600.06900.14890.12710.23390.21820.1810
G*01:01:02:010.18240.17350.19540.11960.25000.09380.03500.17420.13790.14360.17800.20970.10000.1552
G*01:04:010.06470.10200.05170.05430.07140.26560.24000.37640.04020.01060.03390.15320.13640.1810
G*01:01:01:050.15290.14290.10920.26090.10710.04690.01500.00560.06320.03190.03390.08060.11820.1293
G*01:01:03:030.05290.04080.09200.04350.03570.17190.20500.03370.00000.00000.00850.04840.04550.0086
G*01:03:01:020.03530.03060.02300.01630.00000.02600.00000.01690.06900.07980.11860.09680.08180.0603
G*01:04:040.02350.03060.01150.00540.00000.00000.00000.00000.22990.07450.11020.00810.02730.0259
G*01:01:01:040.01180.01530.06320.01090.07140.00000.00000.00000.07470.10110.07630.04030.07270.0603
G*01:05N0.00590.04080.00000.01090.00000.04170.01500.00560.12070.06380.08470.02420.00000.0172
G*01:060.04120.07140.06320.02720.10710.02600.01000.00560.00000.00530.00850.02420.02730.0431
G*01:01:01:01new0.00590.01530.01150.00000.00000.00000.00000.00000.04600.05850.05930.02420.03640.0345

aHLA-G coding haplotypes were converted into coding alleles based on the International Immunogenetics Database (IMGT/HLA). The new HLA-G allele presenting high frequencies is defined with the suffix “new.”

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Haplotypes are ordered according to their global frequency.

The most frequent . aHLA-G coding haplotypes were converted into coding alleles based on the International Immunogenetics Database (IMGT/HLA). The new HLA-G allele presenting high frequencies is defined with the suffix “new.” CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Haplotypes are ordered according to their global frequency.

HLA-G 3′ Untranslated Region Variability and Haplotypes

The reevaluation of the HLA-G sequencing data indicated that its 3′UTR presents several high-frequency variation sites in a short segment. The approach described earlier evidenced as much as 17 variation sites in this short region, as described in Table 5. Some of these variation sites are polymorphic and have been previously described in several studies that evaluated the HLA-G 3′UTR (38, 69, 76, 88, 105–117), while some might be considered as mutations. In general, nine variation sites can be considered as true polymorphisms. It should be noted that the nomenclature used to designate HLA-G 3′UTR variation sites is based on our previous reports, being designated as UTR-1, UTR-2, and so forth (88). In this matter, the 14-bp insertion (rs371194629), although less frequent and not represented in the hg19 human genome, is considered to be the ancestral allele and should be counted for designate HLA-G 3′UTR positions.
Table 5

List of all variation sites found in the .

GenomicSNPidHLA-GAllele 1Allele 1Allele 2Allele 2
positionposition(reference)frequencyfrequency
hg19 (Chr6)
297985632942T0.9986C0.0014
29798581rs3711946292960G0.7068GATTTGTTCATGCCT0.2932
297986083001C0.9986T0.0014
29798610rs17073003C0.1152T0.8848
29798617rs17103010G0.4610C0.5390
29798634rs171791013027C0.9359A0.0641
29798639rs1463397743032G0.9967C0.0033
29798642rs171791083035C0.8829T0.1171
297986593052C0.9991T0.0009
29798699rs1808270373092G0.9986T0.0014
29798728rs1382491603121T0.9967C0.0033
29798749rs10633203142C0.4484G0.5516
297987843177G0.9991T0.0009
29798790rs1873203443183G0.9991A0.0009
29798794rs93801423187A0.7045G0.2955
29798803rs16106963196C0.7625G0.2375
29798834rs12333313227G0.9707A0.0293
List of all variation sites found in the . When the 3′UTR segment is isolated from the 200 extended haplotypes found, we observe 41 different haplotypes for this region. Table 6 presents all haplotypes that reached a global frequency higher than 1% and the complete table of haplotypes is available upon request. Monomorphic positions considering these high-frequency haplotypes are removed from Table 6. Considering the global frequency of each haplotype, it is noteworthy that only nine haplotypes account for more than 95% of all haplotypes found. These haplotypes were named according to the previous studies addressing the HLA-G 3′UTR variability (38, 69, 76, 88, 105–117).
Table 6

The most frequent .

dbSNPrs371194629rs1707rs1710rs17179101rs17179108rs1063320rs9380142rs1610696rs1233331Global
HLA-G position2960 (14 bp)30033010302730353142318731963227frequency,
HG19 (Chr6)2979858129798610297986172979863429798642297987492979879429798803297988342n = 2152
UTR-1DelTGCCCGCG0.2904
UTR-2InsTCCCGAGG0.1938
UTR-3DelTCCCGACG0.1938
UTR-4DelCGCCCACG0.1083
UTR-7InsTCATGACG0.0558
UTR-10DelTCCCGAGG0.0367
UTR-5InsTCCTGACG0.0358
UTR-18DelTGCCCACA0.0283
UTR-6DelTGCCCACG0.0125
Major alleleDelTCCCGACG
Frequency0.70680.88480.53900.93590.88290.55160.70450.76250.9707

.

Haplotypes are ordered according to their global frequency.

The most frequent . . Haplotypes are ordered according to their global frequency. The haplotypes found considering the reevaluation of the 1000Genomes data are consistent with the ones found in several other populations, and some haplotypes that were previously considered as rare ones (such as UTR-10 and UTR-18) are actually more frequent than previously thought considering all populations pooled together (global frequency). Some rare SNVs that were previously described using Sanger sequencing, such as the one at position +3001 (69, 110, 111), and others that were described in studies evaluating the 1000Genomes data, such as +3032, +3052, +3092, +3121, and +3227, were also detected in this reevaluation (Table 5). In addition, it should be pointed out that the 14-bp polymorphism, which is absent at the 1000Genomes initial released VCF files, was retrieved from the raw sequence data and its genotypes were inferred for most of the samples. Similar to the HLA-G coding region, a heterogeneous distribution of these nine 3′UTR haplotypes is observed among the 1000Genomes populations (Table 7). The UTR-1 haplotype, for example, is very common in European populations, but presents lower frequencies in populations from Africa. The UTR-7 haplotype is absent or rare in populations of African ancestry, and haplotypes UTR-6 and UTR-18 are absent or rare in Asia. The 3′UTR haplotype frequencies in admixed populations are close to the ones reported for other admixed populations such as Brazilians (76, 88, 110, 111). In addition, the frequencies observed for the 1000Genomes African populations are close to the ones reported for other African populations described in isolated reports (108, 116, 117). Moreover, the frequencies reported here are close to the ones presented for the same data in another manuscript (69), with some minor differences since this latter manuscript only imputed the 14-bp polymorphism and used the original 1000Genomes VCF data.
Table 7

The most frequent .

HLA-G 3′UTR haplotypesaEurope
Asia
Africa
Admixed
CEUTSIGBRFINIBSCHBCHSJPTYRILWKASWMXLPURCLM
2n = 1702n = 1962n = 1742n = 1842n = 282n = 1922n = 2002n = 1782n = 1742n = 1882n = 1182n = 1242n = 1102n = 116
UTR-10.38820.29590.33330.35330.32140.28650.42000.24720.13220.22870.22880.28230.29090.2241
UTR-30.08820.12760.05750.06520.07140.28130.26000.49440.29890.11700.16100.15320.18180.2328
UTR-20.24710.23980.26440.17390.39290.15100.05000.16850.16670.23400.26270.24190.10000.2155
UTR-40.15290.13780.10920.28260.10710.04690.02000.00560.13220.11170.05080.08870.12730.1466
UTR-70.04710.04080.07470.04350.03570.15630.18000.02810.00000.00000.00850.04030.04550.0000
UTR-100.00000.07140.02300.03800.00000.03130.01000.02250.09770.05850.03390.01610.03640.0345
UTR-50.03530.02550.01720.01630.00000.01560.00000.01690.04600.04790.10170.08060.09090.0431
UTR-180.01180.01530.05170.01090.07140.00000.00000.00000.01720.07980.05080.03230.07270.0603
UTR-60.00590.01530.00000.00000.00000.00000.00000.00000.07470.02660.02540.00810.00910.0000
others0.02350.03060.06900.01630.00000.03130.06000.01690.03450.09570.07630.05650.04550.0431

.

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Haplotypes are ordered according to their global frequency.

The most frequent . . CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Haplotypes are ordered according to their global frequency.

HLA-G 5′ Promoter Region Variability and Haplotypes

As previously discussed, there is no consensus regarding where the HLA-G transcription starts. Considering NCBI and NM_002127.5, the HLA-G transcription starts 866 nucleotides upstream the initiation codon ATG. However, most of the studies performed so far regarding the HLA-G promoter structure did consider 1500 nucleotides upstream the main initiation codon ATG as the HLA-G promoter region. In this scenario, only SNVs above −866 should be considered as promoter SNVs (or SNVs from the upstream regulatory region) and the ones between −866 and −1 should be considered as 5′UTR SNVs. Nevertheless, despite of this inconsistency and considering the fact that there is no consensus yet regarding the HLA-G initial transcription starting point, in the present work we considered all SNVs upstream the main translation start point as promoter (5′ upstream regulatory region) SNVs. The approach described earlier evidenced the presence of 35 SNVs in the HLA-G promoter region, as described in Table 8. Among them, 26 of all variable sites (74.3%) can be considered as true polymorphisms (minor allele frequency above 1%), and at least 11 present frequencies around 50%. In addition, the trialleic SNP at position −725, as well as other known indels at the promoter region, was properly recovered.
Table 8

List of all variation sites found at the .

GenomicSNPidHLA-GAllele 1Allele 1Allele 2Allele 2Allele 3Allele 3
position hg19 (Chr6)position(reference)frequencyfrequencyfrequency
29794317rs1736936−1305G0.4995A0.5005
29794443rs1736935−1179A0.4466G0.5534
29794467rs3823321−1155G0.8020A0.1980
29794482rs1736934−1140A0.6952T0.3048
29794484rs17875389−1138A0.9493G0.0507
29794501rs3115630−1121T0.0428C0.9572
29794524rs146374870−1098G0.9972A0.0028
29794658rs1632947−964G0.4986A0.5014
29794700rs370338057−922C0.9981A0.0019
29794812rs182801644−810C0.9986T0.0014
29794860rs1632946−762C0.4972T0.5028
29794897rs1233334−725G0.0953C0.8550T0.0497
29794906rs2249863−716T0.4963G0.5037
29794933rs2735022−689A0.4963G0.5037
29794956rs35674592−666G0.4981T0.5019
29794976rs17875391−646A0.9749G0.0251
29794989rs1632944−633G0.4995A0.5005
29795076rs201221694−546/−540A0.9744AG0.0256
29795081rs368205133−541/−533GA0.9545G0.0455
29795083rs112940953−539A0.9967G0.0033
29795101rs138987412−521C0.9986A0.0014
29795113rs17875393−509C0.9559G0.0441
29795136rs1736933−486A0.4991C0.5009
29795139rs149890776−483A0.9717G0.0283
29795145rs1736932−477C0.4461G0.5539
29795179rs17875394−443G0.9638A0.0362
29795222rs17875395−400G0.9559A0.0441
29795231rs17875396−391G0.9559A0.0441
29795253rs1632943−369C0.4480A0.5520
29795267rs191630481−355G0.9967A0.0033
29795338.−284G0.9991A0.0009
29795366.−256TC0.9958T0.0042
29795421rs1233333−201G0.4967A0.5033
29795472.−150C0.9977T0.0023
29795566rs17875397−56C0.9503T0.0497
List of all variation sites found at the . When the promoter region is isolated from the 200 extended haplotypes found, we observe 64 haplotypes for this region. Table 9 presents all haplotypes that reached a frequency higher than 1% and the complete table of haplotypes is available upon request. Monomorphic positions considering these frequent haplotypes were removed from Table 9. Considering the global frequency of each haplotype, it is worth mentioning that only nine haplotypes account for more than 95% of all haplotypes found. These haplotypes were named according to previously published works addressing the HLA-G promoter region variability (76, 118–120). As previously observed for both the coding and 3′UTR regions, promoter haplotype frequencies greatly vary among populations (Table 10).
Table 9

The most frequent .

SNV Identification
HLA-G Promoter Haplotypes
HG19 (Chr6)SNPidHLA-G position010102a010101a010104a010101b010101f010101c010104b010101d0103a0103e
29794317rs1736936−1305AGAGGGAGGG
29794443rs1736935−1179GAGAAAGAGG
29794467rs3823321−1155GGAGGGAGGG
29794482rs1736934−1140TAAAAAAAAA
29794484rs17875389−1138AAAAAAAAGG
29794501rs3115630−1121CCCCCTCCCC
29794658rs1632947−964AGAGGGAGGG
29794860rs1632946−762TCTCCCTCCC
29794897rs1233334−725CCCGCGCCTT
29794906rs2249863−716GTGTTTGTTT
29794933rs2735022−689GAGAAAGAAA
29794956rs35674592−666TGTGGGTGGG
29794976rs17875391−646AAAAAAAAAG
29794989rs1632944−633AGAGGGAGGG
29795076rs201221694−546G
29795081rs368205133−541AAAAAAAAA
29795113rs17875393−509CCCCCCCCGG
29795136rs1736933−486CACAAACAAA
29795139rs149890776−483AAAAAAAGAA
29795145rs1736932−477GCGCCCGCGG
29795179rs17875394−443GGGGGGAGGG
29795222rs17875395−400GGGGGGGGAA
29795231rs17875396−391GGGGGGGGAA
29795253rs1632943−369ACACCCACAA
29795421rs1233333−201AGAGGGAGGG
29795566rs17875397−56CCCCCCCCTT
29795636rs1630223    15AGAGGGAGGG

Global Frequency (2n = 2152)0.28250.27280.15010.05200.04460.04180.03530.02600.01910.0149

.

Haplotypes are ordered according to their global frequency.

Table 10

The most frequent .

Promoter haplotypesaEurope
Asia
Africa
Admixed
CEUTSIGBRFINIBSCHBCHSJPTYRILWKASWMXLPURCLM
2n = 1702n = 1962n = 1742n = 1842n = 282n = 1922n = 2002n = 1782n = 1742n = 1882n = 1182n = 1242n = 1102n = 116
010102a0.28240.34180.39080.22830.42860.33850.27500.23600.25860.27130.28810.27420.16360.2328
010101a0.39410.27040.31030.33700.32140.28130.41500.23030.13790.23940.16950.24190.21820.1810
010104a0.08820.13270.05750.06520.07140.19790.18000.38200.27010.09040.13560.08060.14550.0862
010101b0.04710.05100.02300.19020.00000.04170.01000.00560.08050.02660.03390.06450.04550.0690
010101f0.01180.02550.07470.01090.07140.00000.00500.00560.07470.12770.08470.04840.09090.0603
010101c0.10590.08670.08620.08700.10710.00520.00500.00000.00000.00530.00850.01610.07270.0603
010104b0.00000.00000.00000.00000.00000.07810.08000.08990.00000.00000.00850.07260.02730.1379
010101d0.00590.01530.01150.00000.00000.00000.00000.00000.06320.06910.07630.04030.06360.0431
0103a0.02350.01530.01150.01630.00000.01560.00000.01690.00000.00000.03390.08870.03640.0345
0103e0.00590.00510.00570.00000.00000.01040.00000.00000.04020.04790.03390.00810.02730.0259

.

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Haplotypes are ordered according to their global frequency.

The most frequent . . Haplotypes are ordered according to their global frequency. The most frequent . . CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Haplotypes are ordered according to their global frequency.

HLA-G Extended Haplotypes

As described earlier, 200 extended haplotypes were inferred considering the whole HLA-G sequence encompassing the promoter, coding, and 3′UTR segments. Since there is no official nomenclature for the entire MHC genes, the HLA-G extended haplotypes were named according to the nomenclature adopted for each HLA-G segment. As already observed for some populations (76, 88, 118–120), the promoter haplotypes are usually associated with the same coding and 3′UTR haplotypes (Table 11). For example, promoter haplotype 010101a is usually associated with the coding allele G*01:01:01:01 and the 3′UTR haplotype named UTR-1. The same phenomenon is observed for each of the main HLA-G promoter, coding, or 3′UTR haplotypes. In this matter, only 24 extended HLA-G haplotypes were found presenting a minimum frequency of 0.5% and representing more than 85% of all haplotypes, and only 15 present frequencies higher than 1%.
Table 11

The most frequent .

Promoter haplotypeaCoding alleleb3′UTR haplotypecHLA-G lineagedGlobal frequencyExtended haplotypee
010101aG*01:01:01:01UTR-1HG010101a0.24257G010101a/G*01:01:01:01/UTR-1
010102aG*01:01:02:01UTR-2HG0101020.11803G010102a/G*01:01:02:01/UTR-2
0104aG*01:04:01UTR-3HG01040.09108G0104a/G*01:04:01/UTR-3
010102aG*01:01:03:03UTR-7HG0101030.05112G010102a/G*01:01:03:03/UTR-7
010101bG*01:01:01:05UTR-4HG010101c0.04786G010101b/G*01:01:01:05/UTR-4
010101cG*01:01:01:05UTR-4HG010101c0.04136G010101c/G*01:01:01:05/UTR-4
0104aG*01:04:04UTR-3HG01040.03810G0104a/G*01:04:04/UTR-3
0104bG*01:04:01UTR-3HG01040.03392G0104b/G*01:04:01/UTR-3
010101fG*01:01:01:04UTR-18HG010101b0.02835G010101f/G*01:01:01:04/UTR-18
010102aG*01:06UTR-2HG0101020.02556G010102a/G*01:06/UTR-2
010101dG*01:01:01:01newUTR-1HG010101a0.01859G010101d/G*01:01:01:01new/UTR-1
010102aG*01:05NUTR-10HG0101020.01812G010102a/G*01:05N/UTR-10
0103aG*01:03:01:02UTR-5HG01030.01766G0103a/G*01:03:01:02/UTR-5
010102aG*01:05NUTR-2HG0101020.01255G010102a/G*01:05N/UTR-2
010102aG*01:01:02:01UTR-10HG0101020.01115G010102a/G*01:01:02:01/UTR-10
0104aG*01:04:01-LikeUTR-3HG01040.00883G0104a/G*01:04:01-Like/UTR-3
010101dG*01:01:01:04-LikeUTR-1HG010101a0.00651G010101d/G*01:01:01:04-Like/UTR-1
0103cG*01:03:01:02UTR-5HG01030.00651G0103c/G*01:03:01:02/UTR-5
010101fG*01:01:01:04UTR-6HG010101b0.00604G010101f/G*01:01:01:04/UTR-6
010101aG*01:01:01:06UTR-4HG010101*0.00604G010101a/G*01:01:01:06/UTR-4
010102aG*01:01:03:03UTR-7-LikeHG0101030.00604G010102a/G*01:01:03:03/UTR-7-Like
0103eG*01:03:01:02UTR-13HG01030.00558G0103e/G*01:03:01:02/UTR-13
010102aUnknown/newUTR-2HG0101020.00558G010102a/unknown/UTR-2
010101aG*01:01:09UTR-4HG010101*0.00558G010101a/G*01:01:09/UTR-4

.

bHLA-G coding haplotypes were converted into coding alleles based on the International Immunogenetics Database (IMGT/HLA). When a haplotype is close to one known haplotype, except for a single nucleotide modification, suffix “-Like” was added. The new HLA-G allele is defined with the suffix “new.”

.

.

.

*Denotes possible crossing overs among known lineages

Haplotypes are ordered according to their global frequency.

The most frequent . . bHLA-G coding haplotypes were converted into coding alleles based on the International Immunogenetics Database (IMGT/HLA). When a haplotype is close to one known haplotype, except for a single nucleotide modification, suffix “-Like” was added. The new HLA-G allele is defined with the suffix “new.” . . . *Denotes possible crossing overs among known lineages Haplotypes are ordered according to their global frequency. The extended haplotypes shown in Table 11 were classified according to previously defined HLA-G lineages (76, 118). It becomes clear that most of the extended haplotypes are associated with the same encoded full-length molecule and functional polymorphisms are mainly present at the regulatory regions. In fact, many polymorphisms in the regulatory regions do present high frequencies (around 50%), what is compatible with the evidence of balancing selection acting on the HLA-G regulatory regions (3, 69, 76, 88, 115, 118, 121). For example, lineages HG010101 (a, b or c) and HG010102 are associated with HLA-G coding alleles that usually encode the same HLA-G molecules (exception made to the G*01:06 and G*01:05N alleles), but the promoter and 3′UTR haplotypes are the most divergent ones compared to each other. Recently, the Neanderthal genome sequence corresponding to a sample dating 40,000 years was published (122). The same pipeline described above was applied to this Neanderthal genome and we found that this unique sample does present a HLA-G haplotype found among modern humans with a frequency of 0.00604 (G010101f/G*01:01:01:04/UTR-6) and another haplotype that was not found in the present series and is composed of a recombined promoter, an unknown HLA-G coding allele close to G*01:01:02:01 and UTR-2.

HLA-G Worldwide Diversity

Human leukocyte antigen G worldwide intrapopulational genetic diversity was evaluated by means of different population genetics parameters (Table 12). Except for the number of private alleles, which is greatly influenced by sample sizes and the number of different samples from a same geographic area (group), African populations exhibited higher levels of genetic diversity in comparison with Europeans and Asians. Admixed populations sampled in America also revealed high levels of diversity. These findings are consistent with the current knowledge that older and admixed populations are prone to exhibit larger diversity than younger and non-admixed populations. Similar observations are made when the promoter (Table 13) and coding (Table 14) regions are considered separately. Since these differences between Africans and non-Africans are not as substantial as those observed for neutral markers (123), such similar levels of diversity may be reflecting both demographic events and the action of balancing selection. However, when the 3′UTR is considered (Table 15), a different pattern arises, regarding gene and nucleotide diversity. For instance, Europeans present the highest levels while Africans presents the lowest levels. This finding does not present a straightforward explanation, although one may suppose that a stronger signature of balancing selection over HLA-G 3′UTR may have distorted demographic signatures, resulting in a higher diversity in Eurasia. It should be emphasized that, as previously reported for a Brazilian population sample (76) and also for the populations of the 1000Genomes Project (69), both the promoter and 3′UTR diversity have been shaped by a strong balancing pressure.
Table 12

Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (.

Population sampleGene diversityPrivate haplotypesHaplotype diversityNucleotide diversity (%)pHWE
Africa (2n = 362)0.2913 ± 0.1949360.9417 ± 0.00540.7643 ± 0.36900.6582 ± 0.0137
 LWK (2n = 188)0.3108 ± 0.1888240.9497 ± 0.00750.7815 ± 0.37810.7200 ± 0.0130
 YRI (2n = 174)0.3175 ± 0.1722100.9118 ± 0.01210.7283 ± 0.35310.5892 ± 0.0134
Europe (2n = 752)0.2663 ± 0.2162330.8622 ± 0.00880.7399 ± 0.35700.8219 ± 0.0113
 CEU (2n = 170)0.3315 ± 0.190260.8210 ± 0.02310.7384 ± 0.35790.5821 ± 0.0133
 FIN (2n = 184)0.2940 ± 0.1828170.8501 ± 0.01870.6679 ± 0.32430.4973 ± 0.0142
 GBR (2n = 174)0.3234 ± 0.203680.8679 ± 0.01680.7632 ± 0.36960.3129 ± 0.0126
 IBS (2n = 28)0.4330 ± 0.156600.8492 ± 0.04120.7737 ± 0.38670.6021 ± 0.0065
 TSI (2n = 196)0.3055 ± 0.207890.8883 ± 0.01410.7546 ± 0.36530.7044 ± 0.0125
Asia (2n = 570)0.2675 ± 0.2013410.8503 ± 0.00900.6782 ± 0.32800.6628 ± 0.0137
 CHB (2n = 192)0.3185 ± 0.181650.8560 ± 0.01410.7093 ± 0.34390.3700 ± 0.0131
 CHS (2n = 200)0.3362 ± 0.1953190.8141 ± 0.02040.6898 ± 0.33450.6625 ± 0.0134
 JPT (2n = 178)0.2710 ± 0.161740.8468 ± 0.01410.5857 ± 0.28540.5297 ± 0.0136
Admixed (2n = 468)0.2908 ± 0.1999260.9332 ± 0.00590.7890 ± 0.38050.6699 ± 0.0136
 ASW (2n = 118)0.3253 ± 0.190860.9483 ± 0.00920.8108 ± 0.39330.7233 ± 0.0130
 CLM (2n = 116)0.3337 ± 0.178680.9237 ± 0.01130.7655 ± 0.37180.3765 ± 0.0131
 MXL (2n = 124)0.3508 ± 0.177430.9110 ± 0.01460.8045 ± 0.39020.6571 ± 0.0129
 PUR (2n = 110)0.3220 ± 0.168770.9296 ± 0.01400.7599 ± 0.36930.3774 ± 0.0134
Total (2n = 2152)0.2345 ± 0.2149-0.9068 ± 0.00400.7548 ± 0.36370.9025 ± 0.0089

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Table 13

Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (.

Population sampleGene diversityPrivate haplotypesHaplotype diversityNucleotide diversity (%)pHWE
Africa (2n = 362)0.2908 ± 0.203470.8438 ± 0.00920.6604 ± 0.33800.4466 ± 0.0127
 LWK (2n = 188)0.3000 ± 0.194150.8397 ± 0.01470.6590 ± 0.33820.7370 ± 0.0110
 YRI (2n = 174)0.3154 ± 0.190710.8269 ± 0.01490.6447 ± 0.33150.0849 ± 0.0051
Europe (2n = 752)0.2401 ± 0.2252140.7725 ± 0.00910.5998 ± 0.30880.5186 ± 0.0138
 CEU (2n = 170)0.2818 ± 0.212010.7471 ± 0.02170.5972 ± 0.30900.9768 ± 0.0026
 FIN (2n = 184)0.2584 ± 0.205470.7899 ± 0.01640.5476 ± 0.28520.2223 ± 0.0107
 GBR (2n = 174)0.2970 ± 0.219310.7379 ± 0.02160.6069 ± 0.31350.0324 ± 0.0036
 IBS (2n = 28)0.4400 ± 0.150400.7169 ± 0.05590.6000 ± 0.32020.6445 ± 0.0027
 TSI (2n = 196)0.2723 ± 0.224940.7848 ± 0.01760.6183 ± 0.31880.3980 ± 0.0125
Asia (2n = 570)0.2517 ± 0.218980.7536 ± 0.00760.5524 ± 0.28640.5938 ± 0.0129
 CHB (2n = 192)0.2878 ± 0.203810.7627 ± 0.01550.5664 ± 0.29410.6127 ± 0.0108
 CHS (2n = 200)0.3403 ± 0.218730.7166 ± 0.01830.5672 ± 0.29440.5743 ± 0.0112
 JPT (2n = 178)0.2574 ± 0.180610.7409 ± 0.01710.4871 ± 0.25640.3093 ± 0.0104
Admixed (2n = 468)0.2927 ± 0.195890.8700 ± 0.00810.6868 ± 0.35020.3354 ± 0.0122
 ASW (2n = 118)0.3128 ± 0.190710.8573 ± 0.01890.6867 ± 0.35250.3945 ± 0.0122
 CLM (2n = 116)0.3136 ± 0.192320.8777 ± 0.01470.6884 ± 0.35330.3855 ± 0.0108
 MXL (2n = 124)0.3241 ± 0.185100.8432 ± 0.01850.6870 ± 0.35250.5318 ± 0.0100
 PUR (2n = 110)0.3097 ± 0.179040.8881 ± 0.01420.6798 ± 0.34940.7863 ± 0.0092
Total (2n = 2152)0.2323 ± 0.22080.8145 ± 0.00470.6331 ± 0.32430.4803 ± 0.0142

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Table 14

Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (.

Population sampleGene diversityPrivate haplotypesHaplotype diversityNucleotide diversity (%)pHWE
Africa (2n = 362)0.2983 ± 0.2036140.9177 ± 0.00530.6649 ± 0.32660.6983 ± 0.0122
 LWK (2n = 188)0.3100 ± 0.198190.9255 ± 0.00770.6691 ± 0.32950.6843 ± 0.0121
 YRI (2n = 174)0.3306 ± 0.180840.8934 ± 0.01160.6436 ± 0.31750.6841 ± 0.0110
Europe (2n = 752)0.2588 ± 0.2233150.8292 ± 0.00850.6229 ± 0.30630.6674 ± 0.0132
 CEU (2n = 170)0.3348 ± 0.193020.7908 ± 0.02210.6162 ± 0.30450.5567 ± 0.0117
 FIN (2n = 184)0.3019 ± 0.189380.8011 ± 0.01920.5665 ± 0.28080.5260 ± 0.0133
 GBR (2n = 174)0.3151 ± 0.211240.8449 ± 0.01630.6358 ± 0.31380.1818 ± 0.0096
 IBS (2n = 28)0.4308 ± 0.162500.8492 ± 0.04120.6405 ± 0.32620.5893 ± 0.0067
 TSI (2n = 196)0.3070 ± 0.215100.8563 ± 0.01360.6411 ± 0.31610.9138 ± 0.0062
Asia (2n = 570)0.2631 ± 0.2097130.7914 ± 0.00950.5772 ± 0.28480.4079 ± 0.0135
 CHB (2n = 192)0.3089 ± 0.186620.8106 ± 0.01440.5903 ± 0.29200.3012 ± 0.0107
 CHS (2n = 200)0.3567 ± 0.201380.7495 ± 0.01870.5934 ± 0.29340.4342 ± 0.0131
 JPT (2n = 178)0.2649 ± 0.171210.7645 ± 0.01880.4969 ± 0.24780.3456 ± 0.0110
Admixed (2n = 468)0.2834 ± 0.2095140.8970 ± 0.00600.6621 ± 0.32510.4418 ± 0.0136
 ASW (2n = 118)0.3200 ± 0.195330.9126 ± 0.01070.6796 ± 0.33550.4556 ± 0.0131
 CLM (2n = 116)0.3335 ± 0.195850.8888 ± 0.01270.6494 ± 0.32120.2857 ± 0.0113
 MXL (2n = 124)0.3482 ± 0.181520.8624 ± 0.01490.6655 ± 0.32870.9311 ± 0.0048
 PUR (2n = 110)0.3264 ± 0.182330.8992 ± 0.01380.6471 ± 0.32020.5820 ± 0.0123
Total (2n = 2152)0.2244 ± 0.22190.8780 ± 0.00380.6432 ± 0.31560.5692 ± 0.0143

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Table 15

Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (.

Population sampleGene diversityPrivate haplotypesHaplotype diversityNucleotide diversity (%)pHWE
Africa (2n = 362)0.2833 ± 0.170080.8583 ± 0.00732.6744 ± 1.38270.1986 ± 0.0098
 LWK (2n = 188)0.3326 ± 0.162650.8573 ± 0.01242.9077 ± 1.49720.5067 ± 0.0116
 YRI (2n = 174)0.2965 ± 0.126830.8350 ± 0.01432.3841 ± 1.24860.6058 ± 0.0091
Europe (2n = 752)0.3276 ± 0.179550.7885 ± 0.00842.9784 ± 1.52470.5801 ± 0.0127
 CEU (2n = 170)0.3938 ± 0.133200.7577 ± 0.02033.0292 ± 1.55580.8857 ± 0.0057
 FIN (2n = 184)0.3258 ± 0.129410.7612 ± 0.01732.6197 ± 1.36030.9146 ± 0.0043
 GBR (2n = 174)0.3802 ± 0.158510.7986 ± 0.01893.1900 ± 1.63210.0704 ± 0.0059
 IBS (2n = 28)0.4352 ± 0.154500.7460 ± 0.05373.3476 ± 1.76170.8526 ± 0.0025
 TSI (2n = 196)0.3515 ± 0.161310.8158 ± 0.01412.9499 ± 1.51690.5941 ± 0.0105
Asia (2n = 570)0.3045 ± 0.156950.7507 ± 0.00982.6613 ± 1.37500.1824 ± 0.0093
 CHB (2n = 192)0.3849 ± 0.119400.7920 ± 0.01332.9605 ± 1.52220.3045 ± 0.0084
 CHS (2n = 200)0.3006 ± 0.159850.7234 ± 0.01982.6274 ± 1.36340.3031 ± 0.0104
 JPT (2n = 178)0.3086 ± 0.102400.6681 ± 0.02532.2658 ± 1.19200.6259 ± 0.0076
Admixed (2n = 468)0.3147 ± 0.185510.8385 ± 0.00772.9705 ± 1.52220.3325 ± 0.0117
 ASW (2n = 118)0.3598 ± 0.183500.8415 ± 0.01723.1446 ± 1.61500.2936 ± 0.0101
 CLM (2n = 116)0.3702 ± 0.091700.8273 ± 0.01392.7185 ± 1.41190.9862 ± 0.0011
 MXL (2n = 124)0.3958 ± 0.154510.8270 ± 0.01783.1832 ± 1.63270.9469 ± 0.0039
 PUR (2n = 110)0.3338 ± 0.118000.8459 ± 0.01842.6841 ± 1.39620.0933 ± 0.0045
Total (2n = 2152)0.2730 ± 0.19210.8223 ± 0.00412.8640 ± 1.46920.2546 ± 0.0118

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (. CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (. CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (. CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (. CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. The comparison of the three different HLA-G regions (Tables 13–15) also reveals interesting aspects. The average expected heterozygosity (gene diversity) for variation sites at the 3′UTR is ~20% higher (0.2730) than the estimated ones for the promoter (0.2323) and coding (0.2244) regions. As a consequence, nucleotide diversity is 4.5 times higher for the 3′UTR (2.8640%) than for the promoter (0.6331%) and coding (0.6432%) regions. Nucleotide diversity at HLA-G 3′UTR is almost 40 times higher than the human genome average (0.075%) (118, 124), resulting in an astonishing average of 8.19 differences when two randomly chosen 3′UTR (286-bp long) haplotypes are compared. Balancing selection favors the maintenance of different alleles in a population, resulting in a proportionally higher average pair-wise difference as compared with the measure of diversity based on the number of polymorphic sites. The worldwide nucleotide diversity at the whole HLA-G locus (0.7548%) is as expected slightly higher than that observed for the Brazilian population sample (0.00643%) (76). The direct comparison of haplotype diversity between the three regions could not be performed, since the very different lengths and number of variation sites of the three regions (Tables 2, 5, and 8) may bias any retrieved conclusions. Two independent approaches were used to evaluate the extent of differentiation between pairs of populations (interpopulation diversity): F and the exact test of population differentiation based on haplotype frequencies. Although these analyses have the same purpose and may provide similar results, both were performed to provide more reliable and robust conclusions. The analysis of the pair-wise F matrix revealed a large range of variation of F values: from −0.0150, between British from England and Scotland (GBR) and Iberian populations from Spain (IBS), to 0.2037, between Finnish (FIN) and Japanese (JPT) (Table 16). While only 1 out of 6 (16.7%) pairs of admixed populations and 4 out of 10 (40%) European populations differed significantly at the 5% unadjusted significance level; it is noteworthy that the two African populations, as well as the three Asian populations, differed. IBS presented the lowest number of significant comparisons (2 out of 13), a fact that is clearly related to the lack of statistical power due to the small sample size. On the other hand, JPT (all comparisons), CHB (12 out of 13), CHS (12 out of 13), FIN (12 out of 13), and YRI (11 out of 13) presented the largest number of significant comparisons. An overall stronger differentiation was observed by the matrix composed of non-differentiation probability values obtained through the exact test of population differentiation (Table 17). While only 3 out of 10 (30%) European populations differed significantly at the 5% significance level, it is noteworthy that the two African populations, as well as the three Asian populations and four admixed populations, differed. IBS presented the lowest number of significant comparisons (4 out of 13), while JPT, CHB, CHS and YRI differed in all pair-wise comparisons including them. To sum up, both the exact test of population differentiation based on haplotype frequencies and the F estimate revealed the existence of highly significant difference between the 14 populations. Since the more frequent HLA-G haplotypes are shared between most of the populations, these pair-wise population differences may be due to the existence of many low-frequency haplotypes that are restricted to two or three populations (22.5% of the 200 identified haplotypes) or are private to a single population (63% of the 200 haplotypes).
Table 16

Matrix of pair-wise .

CEUTSIGBRFINIBSCHBJPTCHSYRILWKASWMXLPURCLM
CEU0.03600.34230.10810.36040.0000*0.0000*0.00900.0000*0.09010.01800.05410.18920.0451
TSI0.00860.36940.0000*0.63960.01800.0000*0.01800.01800.23420.15320.30630.04510.4775
GBR0.0005−0.00120.00900.82880.0000*0.0000*0.00900.0000*0.14410.03600.23420.05410.1171
FIN0.00830.0391*0.02880.02700.0000*0.0000*0.0000*0.0000*0.0000*0.0000*0.0000*0.01800.0000*
IBS−0.0018−0.0123−0.01500.04110.12610.00900.09910.07210.35140.51350.65770.14410.3694
CHB0.0679*0.02510.0385*0.1219*0.02460.02700.01800.00900.0000*0.02700.0000*0.0000*0.0270
JPT0.1434*0.0772*0.1067*0.2037*0.09810.02030.0000*0.0000*0.0000*0.0000*0.0000*0.0000*0.0000*
CHS0.03660.01790.02330.0707*0.02490.01520.0610*0.0000*0.0000*0.0000*0.0000*0.0000*0.0180
YRI0.0562*0.01740.0365*0.0940*0.02700.01820.0362*0.0317*0.0000*0.03600.00900.0000*0.1712
LWK0.00700.00280.00370.0294*−0.00200.0469*0.1041*0.0331*0.0221*0.15320.16220.28830.3153
ASW0.02370.00440.00870.0659*−0.00560.02520.0767*0.0344*0.01300.00350.77480.02700.2883
MXL0.01420.00060.00210.0535*−0.01010.0236*0.0810*0.0256*0.01910.0029−0.00570.05410.3423
PUR0.00530.01280.01110.01510.01780.0625*0.1287*0.0311*0.0369*0.00270.01830.01280.1982
CLM0.0164−0.00110.00740.0450*0.00050.02350.0671*0.01800.00540.00090.00150.00000.0055

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Statistically significant . Statistically significant values at a 5% significance level after Bonferroni correction are marked with an asterisk (p < 0.0005).

Table 17

Matrix of non-differentiation probabilities obtained by means of exact tests of population differentiation based on haplotype frequencies for the 14 populations analyzed in the present study.

CEUTSIGBRFINIBSCHBJPTCHSYRILWKASWMXLPURCLM
CEU
TSI0.2109
GBR0.10510.0765
FIN0.00620.0004*0.0000*
IBS0.63450.92260.97720.2932
CHB0.0000*0.0000*0.0000*0.0000*0.0057
JPT0.0000*0.0000*0.0000*0.0000*0.0002*0.0000*
CHS0.0000*0.0000*0.0000*0.0000*0.0001*0.01050.0000*
YRI0.0000*0.0000*0.0000*0.0000*0.0000*0.0000*0.0000*0.0000*
LWK0.0000*0.0000*0.0000*0.0000*0.34880.0000*0.0000*0.0000*0.0000*
ASW0.0000*0.0000*0.0000*0.0000*0.30200.0000*0.0000*0.0000*0.0000*0.1072
MXL0.0000*0.0004*0.0000*0.0000*0.40850.0000*0.0000*0.0000*0.0000*0.0000*0.0004*
PUR0.0001*0.00480.00060.0000*0.78160.0000*0.0000*0.0000*0.0000*0.0000*0.0000*0.0677
CLM0.0000*0.0000*0.0000*0.0000*0.52900.0000*0.0000*0.0000*0.0000*0.0000*0.0001*0.04370.0117

CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia.

Statistically significant . Statistically significant values at a 5% significance level after Bonferroni correction are marked with an asterisk (p < 0.0005).

Matrix of pair-wise . CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Statistically significant . Statistically significant values at a 5% significance level after Bonferroni correction are marked with an asterisk (p < 0.0005). Matrix of non-differentiation probabilities obtained by means of exact tests of population differentiation based on haplotype frequencies for the 14 populations analyzed in the present study. CEU, Utah residents with Northern and Western European ancestry; TSI, Toscani from Italy; GBR, British from England and Scotland; FIN, Finnish from Finland; IBS, Iberian populations from Spain; CHB, Han Chinese from Beijing; CHS, Han Chinese from South China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba from Ibadan, Nigeria; LWK, Luhya from Webuye, Kenya; ASW, people of African ancestry from the southwestern United States; MXL, people of Mexican ancestry from Los Angeles, California; PUR, Puerto Ricans from Puerto Rico; CLM, Colombians from Medellin, Colombia. Statistically significant . Statistically significant values at a 5% significance level after Bonferroni correction are marked with an asterisk (p < 0.0005). To further explore the genetic relationships between populations, an AMOVA was performed assuming a hierarchical structure in which the 14 populations were divided into four groups: African, Asian, European, and admixed populations (Table 18). Considering the whole HLA-G gene, differences between the four groups account for only 2.45% of the variance, whereas 1.64% of the variance occurs as a consequence of differences between populations that belong to a same group. Almost all the variance (95.91%) is observed within populations. This same pattern is observed when each HLA-G region, i.e., promoter, coding, and 3′UTR, is considered separately, with the exception of the 3′UTR where the variance among groups (0.65%) gets even lower than the variance among populations that belong to a same group (1.32%), and is statistically non-significant.
Table 18

Analysis of molecular variance (AMOVA) for .

Groups composing the hierarchical structure aHLA-G data typeVariance
Among groups (FCT)Among populations within groups (FSC)Within populations (FST)
Africa: LWK, YRI; Asia: CHB, CHS, JPT; Europe: CEU, FIN, GBR, IBS, TSI; Admixed: ASW, CLM, MXL, PURPromoter3.09% (p = 0.0098 ± 0.0033)1.57% (p = 0.0000 ± 0.0000)95.34% (p = 0.0000 ± 0.0000)
Coding region2.99% (p = 0.0049 ± 0.0020)1.81% (p = 0.0000 ± 0.0000)95.20% (p = 0.0000 ± 0.0000)
3′UTR0.65% (p = 0.0665 ± 0.0000)1.32% (p = 0.0000 ± 0.0000)98.02% (p = 0.0000 ± 0.0000)
Whole gene2.45% (p = 0.0029 ± 0.0016)1.64% (p = 0.0000 ± 0.0000)95.91% (p = 0.0000 ± 0.0000)

Africa: LWK, YRI; Asia: CHB, CHS, JPT; Europe: CEU, FIN, GBR, IBS, TSIPromoter4.28% (p = 0.0156 ± 0.0039)2.01% (p = 0.0000 ± 0.0000)93.71% (p = 0.0000 ± 0.0000)
Coding region4.14% (p = 0.0147 ± 0.0042)2.28% (p = 0.0000 ± 0.0000)93.58% (p = 0.0000 ± 0.0000)
3′UTR1.00% (p = 0.0332 ± 0.0065)1.32% (p = 0.0010 ± 0.0010)97.68% (p = 0.0000 ± 0.0000)
Whole gene3.42% (p = 0.0166 ± 0.0000)1.99% (p = 0.0000 ± 0.0000)94.59% (p = 0.0000 ± 0.0000)
Analysis of molecular variance (AMOVA) for . Since the group composed of admixed populations represent an assembly of populations whose individuals present varying levels of ancestry that can be assigned to Africans, Amerindians/Asians, and Europeans, this group was removed from a second round of analysis (Table 18). As a result, levels of variance between groups increased, although still lower than the expected ones for neutrally evolving sequences (123). Therefore, one may conclude that this analysis reflects the fact that most of the HLA-G diversity, particularly that from the 3′UTR, (a) originated from Africa before Homo sapiens dispersion to other continents and (b) has been maintained in worldwide populations by non-neutral evolutionary forces, particularly balancing selection. These conclusions are corroborated by previous data on HLA-G (68, 69, 76, 89, 121). Moreover, many different low-frequency haplotypes are being generated within populations by mutation and recombination. These features are responsible for the relatively poor resolution of the MDS plot (Figure 2) obtained with the matrix of Reynolds’ genetic distance based on the whole HLA-G gene. Unexpectedly, (a) populations from a same geographic group, for example Asians (CHB, CHS and JPT), are distributed across large distances in the plot and (b) admixed populations (CLM, MXL, and PUR) that present major European, intermediate Amerindian, and minor African ancestry contributions (66), as revealed by the analysis of Ancestry Informative Markers (data not shown), are clustered together with African populations. These unexpected findings support the hypothesis that a strong signature of balancing selection over HLA-G may have distorted the expected demographic signatures.
Figure 2

Multidimensional scaling (MDS) plot revealing the genetics relationships between the 14 populations of the 1000Genomes Project (Phase 1).

Multidimensional scaling (MDS) plot revealing the genetics relationships between the 14 populations of the 1000Genomes Project (Phase 1).

HLA-G Evolution Aspects

The MHC class I molecules evolved by a series of events that include chromosomal duplication, gene recombination, and selection probably driven by pathogens (125–127). Apparently, MHC-G, the HLA-G homologous sequence in non-human primates, is the oldest class I gene and it would be responsible for the origin of the whole class I loci (127). In fact, MHC class I genes from the New World primates, such as the cotton-top tamarin (Saguinus oedipus), are much closer to the human HLA-G than other human classical class I genes (127). This primate lineage separated from the one that gave rise to the Old World monkeys (or anthropoids) about 38 million years ago. It is noteworthy that the HLA-G and MHC-G molecules are functionally different despite the high identity among exonic sequences (128). New World primates’ MHC-G plays a role in antigen presentation that is uncommon for human HLA-G, and this fact suggests that they are not orthologous as theorized in the past (129, 130). In contrast, the cotton-top tamarin presents two MHC-C molecules with inhibitory properties that interact with KIR receptors (131). The regulation of MHC levels (in this case, MHC-C) in these non-human primates seems to be one of the responsible mechanisms for fetal acceptance as well as for the shorter pregnancy period (132). Old World primates have a peculiar MHC-G molecule. It presents just the α1 domain due to a stop codon at codon 164 (133), which may not hinder fetal protection against maternal NK cells, unless there is a mechanism in which the stop codon is ignored, allowing translation to continue (which is not discarded). In addition, gorillas and chimpanzees present a conserved MHC-G coding segment with few variations (3, 128, 129). Even the pregnancy period being shorter than in human beings, these species are polygamous, which would expose the female to different allogeneic fetuses during the fertile age. Orangutans on the other hand have long-lasting relationships and five MHC-G variants have been found so far – the polymorphism levels are low but more similar to human beings (3). Orangutans and humans are separated by about 15 million years of evolution. Possibly, the differences between maternal-fetal relationships among different species are responsible for each MHC-G peculiarities and for its function and variation levels. In addition to alignments between human and other primates coding MHC-G sequences, analyses of HLA-G non-coding regions have proved to be highly informative about the evolutionary history of this gene. For example, the polymorphism of 14-pb located on HLA-G exon 8 (3′UTR) is exclusively found in the human lineage, suggesting that UTR haplotypes bearing the deletion such as UTR-1 are more recent than the ones that present the 14-bp fragment (134). An interesting finding confirmed recently is that one of the most frequent HLA-G coding allele (global frequency of 0.24257), G*01:01:01:01, which is usually associated with UTR-1 and the promoter haplotype G010101a [described in Ref. (76) and Table 11], is probably the most recent haplotype. These data were established by the association between G*01:01:01:01/UTR-1 with an Alu insertion (AluyHG) that occurred before human dispersion from Africa, in a location 20 Kb downstream HLA-G 3′UTR. The frequency of this Alu element increases with distance from Africa (68). Given the HLA-G immunomodulatory properties and the unique tissue expression patterns, HLA-G expression levels must be maintained under a fine regulatory control. In addition, the lack of variability found in its coding region and limited number of proteins coded by this gene lead us to believe that this region is under tight evolutionary forces that limit variation. The differences on mammalian pregnancy and species-specific pathogens must be considered when studying the evolution of the immune system molecules.

HLA-G Transcription Regulation

Most of the studies already performed to understand HLA-G regulation considered as the HLA-G promoter 200 nucleotides upstream the first translated ATG and within 1.5 Kb upstream the CDS. The HLA-G regulation is unique among all class I genes [reviewed at Ref. (67)]. Generally, HLA class I genes present two main regulatory modules in the proximal promoter region (within 200 bases upstream the CDS) that includes [reviewed at Ref. (67)] (a) the Enhancer-A (EnhA) that interacts with NF-κB family of transcription factors, which are important elements to induce HLA class I genes expression (135); (b) the interferon-stimulated response element (ISRE) that consists of a target site for interferon regulatory factors (IRF), which might act as class I activators (IRF-1) or inhibitors (IRF-2 and IRF-8) (135). The ISRE module is located adjacent to the EnhA element, and both work cooperatively controlling HLA class I genes expression; (c) the SXY module in which the transcription apparatus is mounted. However, the HLA-G gene presents regulation peculiarities that differ from other class I genes [reviewed at Ref. (67)]. First, the HLA-G EnhA is the most divergent one among the class I genes and is unresponsive to NF-κB (136) and might only interact with p50 homodimers, which are not potent HLA class I gene transactivators (137). In addition, the HLA-G ISRE is also unresponsive to IFN-γ (138) due to modified ISRE. In fact, the HLA-G locus presents the most divergent ISRE sequence among the class I genes (135, 136), what could explain the absence of IFN-γ induced transactivation. The ISRE is also a target for other protein complexes that may mediate HLA class I transactivation. However, both HLA-G EnhA and ISRE seem to bind only the expressed factor Sp1, which apparently does not modulate the constitutive or IFN-induced transactivation of HLA-G (136). Some polymorphisms in promoter region, such as −725 C > G/T, are close to known regulatory elements. In this matter, the −725 G allele was related with higher HLA-G expression levels (120). The SXY module comprises the S, X1, X2, and Y boxes and is an important target for regulatory binding elements and HLA class I genes transactivation. Box X1 is a target for the multiprotein complex regulatory factor X (RFX), including RFX5, RFX-associated protein, and RFXANK (137, 139–141). The RFX members use to interact with an important element for HLA class II transactivation (CIITA), also important to HLA class I gene transactivation (139). The X2 box is a binding target for activating transcription factor/cAMP response element binding protein (ATF/CREB) transcription factor family (142) and Y box is a binding target for nuclear factor Y (NFY), which includes subunits alpha, beta, and gamma (NFYA, BFYB, and NFYC) (67, 139). For HLA-G, the SXY module presents sequences compatible only with S and X1 elements, but divergent from X2 and Y. Because CIITA is dependent of a functional SXY module, which includes X2 and Y elements, the SXY module does not transactivate HLA-G gene (139, 143–146). Other regulatory elements within the HLA-G promoter have been described, such as heat shock element, located at −469/−454 position, that bind with heat shock factor-1 (HSF-1), important elements involved in immune responses modulation (147), and progesterone, which is a steroid hormone secreted from corpus luteum and placenta, involved with endometrium maintenance and embryo implantation [reviewed at Ref. (67)]. The mechanism involved in HLA-G expression induced by progesterone is primarily mediated by the activation of progesterone receptor and a subsequent binding to a progesterone response element, found in the promoter region (148). The transactivation of HLA-G transcription has also been demonstrated by leukemia inhibitory factor (LIF) (149) and methotrexate cell exposure (150). In addition, it was demonstrated an increased HLA-G transcription level in choriocarcinoma cell JEG3 line after the treatment with LIF. Furthermore, LIF induces HLA-G expression in the presence of endoplasmic reticulum aminopeptidase-1 (ERAP1), expressed in the endoplasmic reticulum, and repression of ERAP1 culminates in HLA-G downregulation, indicating that ERAP1 has an important role in HLA-G regulation (151). Finally, it is necessary to highlight the importance of methylation status of the HLA-G promoter, since it appears to be very important for HLA-G transcription (152, 153). Although some HLA-G regulatory elements are known, it is not clear why balancing selection is maintaining divergent lineages since most of the polymorphisms would not theoretically influence HLA-G transcription by the known mechanisms, mainly because they do not coincide with known regulatory elements [reviewed at Ref. (67)]. It should be noted that the same SNVs described for the HLA-G promoter in other manuscripts are also found in the present analysis.

HLA-G Post-Transcriptional Regulation

HLA-G might also be regulated by post-transcriptional mechanisms such as alternative splicing and microRNAs. Several studies have reported polymorphisms influencing splicing, mRNA stability, and also the ability of some microRNAs to bind to the HLA-G mRNA. The HLA-G 3′UTR segment is a key feature for its regulation mainly by the binding of microRNAs and influencing mRNA stability. HLA-G 3′UTR presents several polymorphic sites that influence gene expression [reviewed at Ref. (67)]. The 14-bp presence or absence (insertion or deletion) polymorphism was implicated in the HLA-G transcriptional levels and mRNA stability. The presence of the 14 bases segment in trophoblast samples has been associated with lower mRNA production for most membrane-bound and soluble isoforms (98, 154), and the absence of this segment seems to stabilize mRNA with a consequent higher HLA-G expression (98, 155, 156). In addition, HLA-G transcripts presenting the 14 bases segment can be further processed with the removal of 92 bases from the complete mRNA (98), giving rise to a shorter HLA-G transcript reported to be more stable than the complete isoform (157). The alternative splicing associated with the presence of the 14 bases segment is probably driven by other polymorphic sites in Linkage Disequilibrium with this polymorphic site (3). The SNP located at position +3142 has been associated with differential HLA-G expression, because it might influence microRNA binding (158). The presence of a Guanine at the + 3142 is associated with a stronger binding of specific microRNAs, such as miR-148a, miR-148b, and miR-152, decreasing HLA-G expression by mRNA degradation and translation suppression (3, 158, 159). In addition, the 14-bp region might also be a target for specific microRNAs and other 3′UTR polymorphisms might also influence microRNA binding (159). Another polymorphic site that would influence HLA-G expression is located at +3187. The allele +3187A is associated with decreased HLA-G expression because it extends an AU-rich motif that mediates mRNA degradation (106). UTR-1 (Table 6) is the only frequent 3′UTR haplotype that do not carry the 14-bp sequence, and both the high expression alleles +3142G and +3187A. Therefore, it was postulated that this haplotype would be associated with high HLA-G expression; this was confirmed by another study evaluating soluble HLA-G levels and 3′UTR haplotypes (109). In addition, as already introduced, this haplotype (together with the coding allele G*01:01:01:01) is probably the most recent one (109) and its frequency might be increased worldwide due to its high-expressing feature.

Concluding Remarks

Due to the key features of HLA-G on the regulation of immune response and immune modulation, particularly during pregnancy, the overall structure of the HLA-G molecule has been maintained during the evolution process. This is evident when the variability of more than a thousand individuals is taking into account, and only few encoded different molecules are frequently found. Most of the variation sites found in the HLA-G coding region are either synonymous or intronic mutations. The HLA-G promoter region presents numerous polymorphic sites, with several examples of variation sites in which both alleles are equally represented. Although the mechanisms underlying why some divergent promoter haplotypes are preferentially selected are still unclear, just a few divergent and frequent promoter haplotypes are found worldwide. The HLA-G 3′UTR variability is quite expressive considering the fact that most of the SNVs are true polymorphisms, they are equally represented, and this segment is of short size. These observations, for both promoter and 3′UTR, are compatible with the evidences of balancing selection acting on these regions. Finally, the population comparisons confirmed that most of the HLA-G variability has arisen before human dispersion from Africa and that the allele and haplotype frequencies might have been shaped by strong selective pressures.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  157 in total

1.  HLA-G allelic variants are associated with differences in the HLA-G mRNA isoform profile and HLA-G mRNA levels.

Authors:  Thomas Vauvert F Hviid; Sine Hylenius; Christina Rørbye; Lone G Nielsen
Journal:  Immunogenetics       Date:  2003-04-24       Impact factor: 2.846

2.  The association between human leukocyte antigen (HLA)-G polymorphisms and human papillomavirus (HPV) infection in Inuit women of northern Quebec.

Authors:  Stephanie Metcalfe; Michel Roger; Marie-Claude Faucher; François Coutlée; Eduardo L Franco; Paul Brassard
Journal:  Hum Immunol       Date:  2013-08-29       Impact factor: 2.850

3.  The role of enhancer A in the locus-specific transactivation of classical and nonclassical HLA class I genes by nuclear factor kappa B.

Authors:  S J Gobin; V Keijsers; M van Zutphen; P J van den Elsen
Journal:  J Immunol       Date:  1998-09-01       Impact factor: 5.422

4.  Plasma soluble HLA-G is a potential biomarker for diagnosis of colorectal, gastric, esophageal and lung cancer.

Authors:  M Cao; S-M Yie; J Liu; S R Ye; D Xia; E Gao
Journal:  Tissue Antigens       Date:  2011-08

5.  Evolution of MHC-G in humans and primates based on three new 3'UT polymorphisms.

Authors:  M J Castro; P Morales; J Martínez-Laso; L Allende; R Rojo-Amigo; M González-Hevilla; P Varela; A Moreno; M García-Berciano; A Arnaiz-Villena
Journal:  Hum Immunol       Date:  2000-11       Impact factor: 2.850

6.  Association of HLA-G 3' untranslated region polymorphisms with antibody response against Plasmodium falciparum antigens: preliminary results.

Authors:  A Sabbagh; D Courtin; J Milet; J D Massaro; E C Castelli; F Migot-Nabias; B Favier; N Rouas-Freiss; P Moreau; A Garcia; E A Donadi
Journal:  Tissue Antigens       Date:  2013-07

7.  Heat shock and arsenite induce expression of the nonclassical class I histocompatibility HLA-G gene in tumor cell lines.

Authors:  E C Ibrahim; M Morange; J Dausset; E D Carosella; P Paul
Journal:  Cell Stress Chaperones       Date:  2000-07       Impact factor: 3.667

8.  HLA-G +3142 polymorphism as a susceptibility marker in two rheumatoid arthritis populations in Brazil.

Authors:  T D Veit; C P S de Lima; L C Cavalheiro; S M Callegari-Jacques; C V Brenol; J C T Brenol; R M Xavier; M F L da Cunha Sauma; E J M dos Santos; J A B Chies
Journal:  Tissue Antigens       Date:  2014-02-28

9.  HLA-G genotype and HLA-G expression in systemic lupus erythematosus: HLA-G as a putative susceptibility gene in systemic lupus erythematosus.

Authors:  R Rizzo; T V F Hviid; M Govoni; M Padovan; M Rubini; L Melchiorri; M Stignani; S Carturan; M T Grappa; M Fotinidi; S Ferretti; A Voss; H Laustrup; P Junker; F Trotta; O R Baricordi
Journal:  Tissue Antigens       Date:  2008-03-29

10.  A soluble form of the HLA-G antigen is encoded by a messenger ribonucleic acid containing intron 4.

Authors:  T Fujii; A Ishitani; D E Geraghty
Journal:  J Immunol       Date:  1994-12-15       Impact factor: 5.422

View more
  33 in total

1.  Balancing immunity and tolerance: genetic footprint of natural selection in the transcriptional regulatory region of HLA-G.

Authors:  L Gineau; P Luisi; E C Castelli; J Milet; D Courtin; N Cagnin; B Patillon; H Laayouni; P Moreau; E A Donadi; A Garcia; A Sabbagh
Journal:  Genes Immun       Date:  2014-11-13       Impact factor: 2.676

Review 2.  HLA class Ib in pregnancy and pregnancy-related disorders.

Authors:  Gry Persson; Wenna Nascimento Melsted; Line Lynge Nilsson; Thomas Vauvert F Hviid
Journal:  Immunogenetics       Date:  2017-07-11       Impact factor: 2.846

3.  The balance of the immune system between HLA-G and NK cells in unexplained recurrent spontaneous abortion and polymorphisms analysis.

Authors:  Fateme Arjmand; Nasrin Ghasemi; Seyed Ali Mirghanizadeh; Morteza Samadi
Journal:  Immunol Res       Date:  2016-06       Impact factor: 2.829

4.  The Autoimmune Regulator (Aire) transactivates HLA-G gene expression in thymic epithelial cells.

Authors:  Breno Luiz Melo-Lima; Isabelle Poras; Geraldo Aleixo Passos; Edgardo D Carosella; Eduardo Antonio Donadi; Philippe Moreau
Journal:  Immunology       Date:  2019-08-19       Impact factor: 7.397

Review 5.  The Role of HLA-G Molecule and HLA-G Gene Polymorphisms in Tumors, Viral Hepatitis, and Parasitic Diseases.

Authors:  Fabrício C Dias; Erick C Castelli; Cristhianna V A Collares; Philippe Moreau; Eduardo A Donadi
Journal:  Front Immunol       Date:  2015-02-02       Impact factor: 7.561

6.  Haplotypes of the HLA-G 3' Untranslated Region Respond to Endogenous Factors of HLA-G+ and HLA-G- Cell Lines Differentially.

Authors:  Isabelle Poras; Layale Yaghi; Gustavo Martelli-Palomino; Celso T Mendes-Junior; Yara Costa Netto Muniz; Natalia F Cagnin; Bibiana Sgorla de Almeida; Erick C Castelli; Edgardo D Carosella; Eduardo A Donadi; Philippe Moreau
Journal:  PLoS One       Date:  2017-01-03       Impact factor: 3.240

7.  The analysis of APOL1 genetic variation and haplotype diversity provided by 1000 Genomes project.

Authors:  Ting Peng; Li Wang; Guisen Li
Journal:  BMC Nephrol       Date:  2017-08-11       Impact factor: 2.388

8.  Evaluation of MC1R high-throughput nucleotide sequencing data generated by the 1000 Genomes Project.

Authors:  Leonardo Arduino Marano; Letícia Marcorin; Erick da Cruz Castelli; Celso Teixeira Mendes-Junior
Journal:  Genet Mol Biol       Date:  2017-05-08       Impact factor: 1.771

9.  "HLA-G 3'UTR gene polymorphisms and rheumatic heart disease: a familial study among South Indian population".

Authors:  Maheshkumar Poomarimuthu; Sivakumar Elango; Sambath Soundrapandian; Jayalakshmi Mariakuttikan
Journal:  Pediatr Rheumatol Online J       Date:  2017-02-01       Impact factor: 3.054

10.  Association of HLA-G 3' Untranslated Region Polymorphisms with Systemic Lupus Erythematosus in a Japanese Population: A Case-Control Association Study.

Authors:  Yuki Hachiya; Aya Kawasaki; Shomi Oka; Yuya Kondo; Satoshi Ito; Isao Matsumoto; Makio Kusaoi; Hirofumi Amano; Akiko Suda; Keigo Setoguchi; Tatsuo Nagai; Kota Shimada; Shoji Sugii; Akira Okamoto; Noriyuki Chiba; Eiichi Suematsu; Shigeru Ohno; Masao Katayama; Hajime Kono; Shunsei Hirohata; Yoshinari Takasaki; Hiroshi Hashimoto; Takayuki Sumida; Shouhei Nagaoka; Shigeto Tohma; Hiroshi Furukawa; Naoyuki Tsuchiya
Journal:  PLoS One       Date:  2016-06-22       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.