| Literature DB >> 34845256 |
Erick C Castelli1,2, Bibiana S de Almeida3,4, Yara C N Muniz5, Nayane S B Silva6, Marília R S Passos6, Andreia S Souza6, Abigail E Page7, Mark Dyble8, Daniel Smith9, Gabriela Aguileta10, Jaume Bertranpetit10, Andrea B Migliano11, Yeda A O Duarte12, Marília O Scliar13, Jaqueline Wang13, Maria Rita Passos-Bueno13,14, Michel S Naslavsky13,14,15, Mayana Zatz13,14, Celso Teixeira Mendes-Junior16, Eduardo A Donadi17.
Abstract
HLA-G is a promiscuous immune checkpoint molecule. The HLA-G gene presents substantial nucleotide variability in its regulatory regions. However, it encodes a limited number of proteins compared to classical HLA class I genes. We characterized the HLA-G genetic variability in 4640 individuals from 88 different population samples across the globe by using a state-of-the-art method to characterize polymorphisms and haplotypes from high-coverage next-generation sequencing data. We also provide insights regarding the HLA-G genetic diversity and a resource for future studies evaluating HLA-G polymorphisms in different populations and association studies. Despite the great haplotype variability, we demonstrated that: (1) most of the HLA-G polymorphisms are in introns and regulatory sequences, and these are the sites with evidence of balancing selection, (2) linkage disequilibrium is high throughout the gene, extending up to HLA-A, (3) there are few proteins frequently observed in worldwide populations, with lack of variation in residues associated with major HLA-G biological properties (dimer formation, interaction with leukocyte receptors). These observations corroborate the role of HLA-G as an immune checkpoint molecule rather than as an antigen-presenting molecule. Understanding HLA-G variability across populations is relevant for disease association and functional studies.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34845256 PMCID: PMC8629979 DOI: 10.1038/s41598-021-02106-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Resources available for download or listed as supplementary material regarding the HLA-G gene and its polymorphisms.
| Resource | Description | Availability |
|---|---|---|
| VCF listing all variants | A VCF file containing all variants, the reference and alternative alleles, and global counts, to be used as support for variant refinement when genotyping | For downloada |
| Phased VCF | A VCF file with the phased genotypes for each sample | Upon request |
| FASTA file with full sequences | A copy of every full-length sequence observed in this survey (approximately 12 Kb each), and their global counts | For downloada |
| FASTA file with gene sequences | The full sequence of each | For downloada |
| FASTA file with CDS sequences | We extracted all different exonic sequences (CDS). This file contains a copy of each different sequence, with their global counts. Each sequence is identified with its official name according to the IPD-IMGT/HLA database, or as a novel sequence | For downloada |
| FASTA file with protein sequences | We translated all exonic sequences into proteins (allotypes). This file contains a copy of each different sequence, with their global counts. Each sequence is identified with its official name according to the IPD-IMGT/HLA database, or as a novel sequence | For downloada |
| FASTA file with 3’UTR sequences | The sequence of each 3’UTR haplotype we have detected, with their names and global counts | For downloada |
| SNP frequencies | This table provides the frequency for the reference and alternative alleles in the global population, in XLSX format | Supplementary material (Table |
| Genomic alleles (4-digit resolution) frequencies | This table provides the frequency for each genomic allele (4-digit resolution) in each population we have studied, in XLSX format. There are 3 sheets, one for biogeographic regions, one for countries, and one for specific population samples | Supplementary material (Table |
| Allotypes frequencies (2-digit resolution) | This table provides the frequency for each allotype (2-digit resolution, full-length protein) in each population we have studied, in XLSX format. There are 3 sheets, one for biogeographic regions, one for countries, and one for specific population samples | Supplementary material (Table |
| 3’UTR frequencies | This table provides the frequency for 3’UTR haplotype in each population we have studied, in XLSX format. There are 4 sheets, one for biogeographic regions, one for countries, one for specific population samples, and one describing the combination between genomic alleles and 3’UTR haplotypes in the global sample | Supplementary material (Table |
aAvailable for download in www.castelli-lab.net/HLA-G.
Figure 1Linkage disequilibrium between pairs of 106 bi-allelic SNPs at the HLA-G gene region, starting from 4 Kb upstream the gene up to 100 nucleotides downstream, all presenting a global minimum allele frequency of 1%. The image was generated in the Haploview and further edited by using Inkscape. Areas in black indicate strong LD (r2 > 0.8), shades of gray indicate moderate LD, and white indicates low LD.
Figure 2Frequency of each variant (panel A), nucleotide diversity (panel B), number of segregation sites (panel C), and Tajima’s D (panel D) across HLA-G, considering all samples from the 1000Genomes project pooled together, starting from approximately 4 kb upstream the gene (the promoter region) to 100 nucleotides downstream HLA-G. Panels B, C, and D were computed in sliding windows of 500 nucleotides and a step size of 1. The HLA-G exon/intron structure is indicated in the bottom panel, with fine lines indicating introns and thick lines indicating exons. To evaluate the significance of the parameters estimated from HLA-G, we built a null distribution considering the patterns observed in chromosome 6, computing these statistics in 10,000 random windows of 500 nucleotides from chromosome 6. The values above the blue and red horizontal lines are higher than 99.9% and 99% of the ones observed for chromosome 6, respectively. The orange line represents the average observed in chromosome 6 for each statistic. The green and the yellow dots represent polymorphisms -3614 (rs1611163) and − 725 (rs1233334), respectively.
The number of segregation sites, nucleotide diversity, and Tajima’s D across different HLA-G regions in 4640 samples from 88 different populations.
*The indexes are scaled in shades of green, from white (the lowest value) to dark green (the highest value).
Synonymous and nonsynonymous nucleotide substitution test of neutrality, positive, and purifying selection for analysis over 77 HLA-G sequences defined by exon mutations, in 4640 samples from 88 different populations.
| Number of sequences | Number of codons | HA = neutrality ( | HA = positive selection ( | HA = purifying selection ( | |
|---|---|---|---|---|---|
| Exon 1 | 11 | 24 | − 0.8576, P = 0.3928 | − 0.9059, P = 1.0000 | 0.8804, P = 0.1902 |
| Exon 2 | 19 | 90 | − 1.1546, P = 0.2505 | − 1.1160, P = 1.0000 | 1.1258, P = 0.1312 |
| Exon 3 | 20 | 92 | − 0.6806, P = 0.4973 | − 0.6930, P = 1.0000 | 0.6790, P = 0.2491 |
| Exon 4 | 17 | 92 | − 1.0288, P = 0.3056 | − 1.007, P = 1.0000 | 0.9609, P = 0.1692 |
| Exon 5 | 10 | 39 | − 1.5091, P = 0.1338 | − 1.3480, P = 1.0000 | 1.4232, P = 0.0786 |
| All exons | 77 | 339 | − | − 2.5069, P = 1.0000 | |
| Exons 2 and 3 | 39 | 182 | − 1.3036, P = 0.1948 | − 1.2308, P = 1.0000 | 1.2635, P = 0.1044 |
| Exons 2, 3, and 4 | 55 | 274 | − 1.7453, P = 0.0834 | − 1.7482, P = 1.0000 |
*Significant P-values are marked in boldface.
Figure 3Multidimensional scaling (MDS) illustrating the distance (measured by F estimated from HLA-G SNPs) among population samples with at least 10 individuals. The name of each population is available at Table S1. We have indicated the names of some outliers.
Figure 4Frequencies of the most common HLA-G alleles, allotypes, and 3’UTR haplotypes in different population samples across the world. Tables S3, S4, and S5 present all frequency values.
Figure 5Multidimensional scaling (MDS) illustrating the distance (measured by F estimated from HLA-G alleles—coding haplotypes) in populations samples with at least 10 individuals. The name of each population is available at Table S1. We have indicated the names of some outliers.
Figure 6HLA-G residues that are polymorphic or may influence HLA-G function, considering known allotypes detected in 4642 samples from across the world and new allotypes that have occurred at least twice. Polymorphic residues are marked in shades of gray. Important residues for the HLA-G function are marked in other colors. For allotypes with premature stop-codons, there is no amino acid indication after the frameshift mutation. In red, the Cysteine responsible for dimer formation. In green, the Methionine and Glutamine that interact with KIR2DL4. In purple and yellow, the important residues for ILT-2 and ILT-4 interaction. In blue, the motif DQTQDVE, which interacts with the TCD8 receptor.
Figure 7Linkage disequilibrium encompassing the HLA-G and HLA-A region, considering 5347 individuals from worldwide population samples and SNPs with minimum allele frequency higher than 1%. We have removed variants that coincides with known structural variants, producing three continuous segments: chr6:29,823,675–29,874,064, starting from the HLA-G promoter up to 43 Kb downstream HLA-G, chr6:29,881,527–29,883,079, a 1.5 kb region between HLA-A and HLA-G, and chr6:29,938,412–29,945,862, starting 4 kb upstream HLA-A to the end of the HLA-A 3’UTR.
The relationship between HLA-G alleles and 3’UTR haplotypes, for combinations that have occurred at least twice in 4640 individuals across the globe.
| 3'UTR haplotype | Global frequencya | Internal frequencyb | |
|---|---|---|---|
| G*01:01:01:01 | UTR-06 | 0.0027 | 0.0116 |
| G*01:01:01:01 | UTR-60 | 0.0002 | 0.0009 |
| G*01:01:01:01 | UTR-01 | 0.2277 | 0.9846 |
| G*01:01:01:04 | UTR-20 | 0.0036 | 0.1000 |
| G*01:01:01:04 | UTR-18 | 0.0198 | 0.5576 |
| G*01:01:01:04 | UTR-06 | 0.0121 | 0.3394 |
| G*01:01:01:05 | UTR-04 | 0.0704 | 0.9985 |
| G*01:01:01:06 | UTR-27 | 0.0002 | 0.0161 |
| G*01:01:01:06 | UTR-04 | 0.0131 | 0.9839 |
| G*01:01:01:08 | UTR-01 | 0.0204 | 1.0000 |
| G*01:01:01:09 | UTR-01 | 0.0033 | 1.0000 |
| G*01:01:01:13 | UTR-06 | 0.0066 | 1.0000 |
| G*01:01:01:14Q | UTR-01 | 0.0004 | 1.0000 |
| G*01:01:02:01 | UTR-02 | 0.1470 | 0.9956 |
| G*01:01:02:01 | UTR-10 | 0.0002 | 0.0015 |
| G*01:01:02:02 | UTR-02 | 0.0045 | 1.0000 |
| G*01:01:02:04 | UTR-02 | 0.0015 | 1.0000 |
| G*01:01:03:03 | UTR-07 | 0.0744 | 0.9957 |
| G*01:01:03:03 | UTR-31 | 0.0002 | 0.0029 |
| G*01:01:03:04 | UTR-07 | 0.0002 | 1.0000 |
| G*01:01:12 | UTR-02 | 0.0022 | 1.0000 |
| G*01:01:14 | UTR-02 | 0.0025 | 1.0000 |
| G*01:01:15 | UTR-06 | 0.0039 | 1.0000 |
| G*01:01:17 | UTR-02 | 0.0026 | 1.0000 |
| G*01:01:19 | UTR-02 | 0.0017 | 1.0000 |
| G*01:01:22:01 | UTR-02 | 0.0128 | 1.0000 |
| G*01:01:22:04 | UTR-02 | 0.0004 | 1.0000 |
| G*01:03:01:02 | UTR-56 | 0.0029 | 0.0510 |
| G*01:03:01:02 | UTR-17 | 0.0038 | 0.0662 |
| G*01:03:01:02 | UTR-48 | 0.0004 | 0.0076 |
| G*01:03:01:02 | UTR-05 | 0.0499 | 0.8752 |
| G*01:04:01:01 | UTR-03 | 0.1211 | 0.9799 |
| G*01:04:01:01 | UTR-02 | 0.0004 | 0.0035 |
| G*01:04:01:01 | UTR-13 | 0.0018 | 0.0148 |
| G*01:04:01:02 | UTR-53 | 0.0002 | 0.0227 |
| G*01:04:01:02 | UTR-03 | 0.0093 | 0.9773 |
| G*01:04:04 | UTR-23 | 0.0009 | 0.0171 |
| G*01:04:04 | UTR-03 | 0.0495 | 0.9829 |
| G*01:04:05 | UTR-03 | 0.0027 | 1.0000 |
| G*01:05 N | UTR-02 | 0.0243 | 1.0000 |
| G*01:06:01:01 | UTR-02 | 0.0484 | 1.0000 |
| G*01:06:01:02 | UTR-02 | 0.0009 | 1.0000 |
| G*01:08:02 | UTR-02 | 0.0003 | 1.0000 |
| G*01:11 | UTR-03 | 0.0009 | 1.0000 |
| G*01:14 | UTR-05 | 0.0006 | 1.0000 |
| G*01:21 N | UTR-03 | 0.0002 | 1.0000 |
| G*01:26 | UTR-02 | 0.0002 | 1.0000 |
aThe global frequency of this haplotype.
bConsidering all the 3’UTR haplotypes associated with a given genomic allele, this is the frequency in which this specific 3’UTR haplotype follows the given genomic allele. An internal frequency of 1.000 indicates that the given genomic allele always presents the same 3’UTR sequence.