Literature DB >> 31214166

Unveiling the Diversity of Immunoglobulin Heavy Constant Gamma (IGHG) Gene Segments in Brazilian Populations Reveals 28 Novel Alleles and Evidence of Gene Conversion and Natural Selection.

Verónica Calonga-Solís1, Danielle Malheiros1, Marcia Holsbach Beltrame1, Luciana de Brito Vargas1, Renata Montoro Dourado1, Hellen Caroline Issler1, Roseli Wassem2, Maria Luiza Petzl-Erler1, Danillo G Augusto1.   

Abstract

Even though immunoglobulins are critical for immune responses and human survival, the diversity of the immunoglobulin heavy chain gene (IGH) is poorly known and mostly characterized only by serological methods. Moreover, this genomic region is not well-covered in genomic databases and genome-wide association studies due to particularities that impose technical difficulties for its analysis. Therefore, the IGH gene has never been systematically sequenced across populations. Here, we deliver an unprecedented and comprehensive characterization of the diversity of the IGHG1, IGHG2, and IGHG3 gene segments, which encode the constant region of the most abundant circulating immunoglobulins: IgG1, IgG2, and IgG3, respectively. We used Sanger sequencing to analyze 357 individuals from seven different Brazilian populations, including five Amerindian, one Japanese-descendant and one Euro-descendant population samples. We discovered 28 novel IGHG alleles and provided evidence that some of them may have been originated by gene conversion between common alleles of different gene segments. The rate of synonymous substitutions was significantly higher than the rate of the non-synonymous substitutions for IGHG1 and IGHG2 (p = 0.01 and 0.03, respectively), consistent with purifying selection. Fay and Wu's test showed significant negative values for most populations (p < 0.001), which indicates that positive selection in an adjacent position may be shaping IGHG variation by hitchhiking of variants in the vicinity, possibly the regions that encode the Ig variable regions. This study shows that the variation in the IGH gene is largely underestimated. Therefore, exploring its nucleotide diversity in populations may provide valuable information for comprehension of its evolution, its impact on diseases and vaccine research.

Entities:  

Keywords:  DNA sequencing; IGHG genes; genetic diversity; immunoglobulin heavy chain; molecular characterization; populations

Year:  2019        PMID: 31214166      PMCID: PMC6558194          DOI: 10.3389/fimmu.2019.01161

Source DB:  PubMed          Journal:  Front Immunol        ISSN: 1664-3224            Impact factor:   7.561


Introduction

Immunoglobulins (Ig) are glycoproteins produced exclusively by activated B-lymphocytes and plasma cells that mediate humoral response against pathogens. Each B-cell clone presents an antigen-specific membrane-bound immunoglobulin that, together with CD79A and CD79B molecules, comprise the B-cell receptor (BCR). After stimulation by antigens, B-cells secrete immunoglobulins (antibodies) with the same antigen-binding sites than the membrane-bound molecules. All Ig share a similar basic structure composed of four polypeptide chains: two identical heavy chains and two identical light chains. The heavy chain has one variable domain (VH) and three or four constant domains (CH1, CH2, CH3, and CH4). Each light chain exhibits one variable domain (VL) and one constant domain (CL). The variable region (VH and VL) is responsible for antigen recognition and binding while the constant regions (CH and CL) primarily mediate the Ig effector functions, which includes complement activation and Fc Receptor binding (1, 2). In humans, the immunoglobulin heavy chain gene (IGH) is located in chromosome region 14q32 and consists of four groups of gene segments: the variable heavy (IGHV), diversity heavy (IGHD), joining heavy (IGHJ), and constant heavy (IGHC). The IGHC group includes IGHM, IGHD, IGHG3, IGHG1, IGHEP1, IGHA1, IGHGP, IGHG2, IGHG4, IGHGE, and IGHA2 gene segments and pseudogenes. Immunoglobulin light chains are encoded by two different genes: lambda (IGL) at 22q11 and kappa (IGK) at 2p11.2 (3, 4). During B-cell development, the IGH gene undergoes a somatic rearrangement, in which only one IGHV, one IGHD, and one IGHJ gene segment are combined to form the Ig variable region VH. In contrast, during clonal expansion after activation of the B-cell, IGHC gene segments go through a process called class-switch recombination, which determines the Ig class and subclass: IgM, IgD, IgG (IgG1, IgG2, IgG3, and IgG4), IgA (IgA1 and IgA2), and IgE. The human humoral immune response is mainly mediated by Ig gamma (IgG), which is subdivided into four subclasses, IgG1, IgG2, IgG3, and IgG4, ordered by decreasing abundance in peripheral blood (5). The constant regions of these four subclasses are encoded by the gene segments IGHG1, IGHG2, IGHG3, and IGHG4, respectively, the first three being the ones focused on this study. Each IGHG gene segment consists of three exons that encode the constant heavy domains (CH1, CH2, and CH3) and exon H, which encodes the hinge between the CH1 and CH2 domains (5). Most of the human IgG diversity in populations has only been characterized by serological methods, which defined the immunoglobulin allotypes at the protein level. Ig allotypes are polymorphic epitopes (resulting from nucleotide variation) on the Ig constant domain that provide binding sites for antibodies (6). Certain IgG allotypes have been associated with susceptibility to cancer, autoimmune and infectious diseases (7–9). Although the genetic variability of some IGHG gene segments has been characterized (10, 11), it has never been systematically sequenced at the nucleotide level across populations. Thus, the diversity of these gene segments is probably underestimated. Additionally, this genomic region is not well-covered in genome-wide studies and genomic databases for two reasons: first, DNA samples used are often extracted from B-cell lines, which are not suitable for analyzing this region due to the somatic rearrangement within this locus; second, the high sequence similarity of these segments imposes technical difficulties for sequencing and genotyping (12). Here, we analyzed the diversity of IGHG1, IGHG2, and IGHG3 in seven Brazilian populations: five Amerindian populations that have been genetically isolated for centuries and two urban populations. By analyzing deep sequencing data, we found 28 novel IGHG alleles, characterized the linkage disequilibrium of variants within these segments and analyzed the relationship among alleles. Additionally, we provided compelling evidence of the occurrence of gene conversion between different gene segments and evidence of purifying selection shaping IGHG diversity.

Methods

Characterization of the Study Populations

This study was approved by the Brazilian National Human Research Ethics Committee (CONEP), protocol number CAAE 02727412.4.0000.0096, in accordance to the Brazilian Federal laws. We analyzed a total of 357 individuals from seven Brazilian populations, of which five are Amerindian: Guarani Kaiowa (GKW, n = 46), Guarani Ñandeva (GND, n = 48), Guarani Mbya (GRC, n = 51), Kaingang from Ivaí (KIV, n = 52), and Kaingang from Rio das Cobras (KRC, n = 52); and two are urban populations: Japanese-descendants (BrJAP, n = 57) and Euro-descendants from Curitiba (CTBA, n = 51). Their detailed geographic location and sample sizes are found in Figure 1 and Table S1.
Figure 1

Location of the study populations. KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendant from Curitiba; CTBA, Euro-descendants from Curitiba.

Location of the study populations. KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendant from Curitiba; CTBA, Euro-descendants from Curitiba. The Amerindians samples were collected between late 1980s and early 1990s. According to public data from the Brazilian Institute of Geography and Statistics (IBGE), there are approximately 900,000 Amerindians individuals in Brazil, distributed across 693 official indigenous lands (https://www.ibge.gov.br). The Guarani speak a Tupi-Guarani language, which belongs to the Tupi language family. The Kaingang speak Jê, which belongs to the Macro-Jê language stock. Analyzing mtDNA segments and the proposed time of origin of Tupi-Guarani and Jê linguistic families, Marrero and colleagues (13) estimated the Guarani population split in three partialities (Guarani Kaiowa, Guarani Ñandeva and Guarani Mbya) 1,800 years ago, while the different Kaingang populations would have split more recently, around 200 years ago. Since then, they are believed to have remained isolated from each other and other urban populations due to strong cultural and language barriers (14). A former study from our group estimated that the gene flow of these Amerindian populations with non-Amerindians was low, being 0% in Guarani Kaiowa, 4% in Guarani Mbya, 14% in Guarani Ñandeva and 7% in Kaingang (15). The two urban samples were from Curitiba, the capital of Paraná State and the 5th largest city in Brazil. As a result of the Brazilian history of European colonization 500 years ago, and especially the more recent European migrations since the 19th Century, the population of Curitiba is of predominantly European ancestry. According to the public data from IBGE, 78.7% of the inhabitants of Curitiba self-declared themselves as white, 16.7% as admixed, 3% as black, 1.4% as Asian, and 0.2% as Amerindian (https://www.ibge.gov.br). The population here referred as CTBA only included Euro-descendant individuals. Therefore, we excluded all individuals with known miscegenation with Amerindian and/or other non-European ancestries. Paraná State also hosts the second largest Japanese community in Brazil, one of the largest outside Japan. The Japanese migration started in the twentieth century with the Treaty of Friendship, Commerce and Navigation between Brazil and Japan. All Japanese-descendent individuals of this study (BrJAP) were born in Brazil, with either both parents or all four grandparents born in Japan. They reported no history of admixture with non-Japanese ethnicities.

DNA Extraction

Genomic DNA was extracted from peripheral blood samples by standard salting-out (16) or by the phenol-chloroform-isoamyl method (17). High-quality DNA has been stored at −80°C since the extraction. DNA integrity was evaluated by 1% agarose gel electrophoresis and purity was accessed by spectrophotometry.

Sequencing and Allele Identification

We aligned all previously known IGHG alleles and designed primers to amplify each segment specifically. To define the best set of primers we used the following approaches: (i) we ruled out unspecific amplification by verifying that all amplicons did not exhibit any variant that was specific of other segments; (ii) we certified that the genotype distribution of all single nucleotide variable sites, in each amplicon, were in accordance to Hardy-Weinberg equilibrium (p > 0.05). Polymerase chain reaction (PCR) was performed for IGHG1, IGHG2, and IGHG3 as follows: 1X Buffer (Invitrogen); 0.2 mM dNTP (Life Technologies); 1.5 mM MgCl2 (Invitrogen, Carlsbad, CA, USA); 0.3 μM of each primer; 0.05 U/μL Taq polymerase Platinum® (Invitrogen, Carlsbad, CA, USA); and 2 ng/μL genomic DNA. The segments were amplified in a Mastercycler ep Gradient S thermocycler (Eppendorf, Hamburg, Germany), with a first step at 94°C for 2 min and 10 cycles of 94°C for 15 s, TmA °C for 15 s and 72°C for 60 s, followed by 25 cycles of 94°C for 15 s, TmB °C for 15 s and 72°C for 60 s, with a final extension step of 72°C for 60 s (primer sequences, location, and amplification temperatures are available in Table S2 and Figure S1). Amplicons were visualized by 1% agarose gel electrophoresis with 1% UniSafe Dye® (Uniscience, Sao Paulo, Brazil). Afterwards, PCR products were purified with 0.8 U/μL of exonuclease I enzyme (Fermentas, Waltham, MA, EUA) and 0.14 U/μL of alkaline phosphatase (ThermoFisher Scientific, Waltham, MA, EUA). Sequencing was performed using Big Dye® Terminator Cycle Sequencing Standard v3.1 (Life Technologies, Carlsbad, CA, USA), according to manufacturer's instructions. The sequencing reactions were performed in a Mastercycler ep Gradient S thermocycler (Eppendorf, Hamburg, Germany) with a first step at 95°C for 60 s and 25 cycles of 95°C for 10 s, 50°C for 5 s, and 60°C for 4 min, followed by capillary electrophoresis in a 3500xl Genetic Analyzer Sequencer (Life Technologies, Carlsbad, CA, USA). After sequencing, the alleles were identified according to the known alleles described at IMGT database (International ImMunoGeneTics Information System) (18). IMGT database provides public access to an integrated information system specialized in immunoglobulins (Ig), T cell receptors (TCR), and major histocompatibility complex (MHC) genes and molecules. All data submitted to the IMGT database are manually checked by experts in the field, which assure the deposit of high-quality data. The nucleotide sequence of each individual was aligned with consensus sequences with Mutation Surveyor® DNA Variant Analysis Software v5.0.1 (Softgenetics), and their variable sites were annotated. Alleles that were different from the ones listed in the IMGT database were considered novel and were subsequently confirmed by sequencing and/or molecular cloning as described below. The novel alleles that were observed in homozygosis (IGHG1*07, IGHG1*08, IGHG2*09, IGHG2*13, IGHG3*21, IGHG3*22, IGHG3*26) were confirmed by direct re-sequencing from a different PCR product. Novel alleles observed in heterozygosis without phasing ambiguities due to the presence of only one heterozygous position (IGHG1*06, IGHG1*09, IGHG1*10, IGHG1*12, IGHG1*13, IGHG1*14, IGHG2*07, IGHG2*10, IGHG2*12, IGHG3*20, IGHG3*27, IGHG3*28) were also confirmed by re-sequencing. The new variants with ambiguous phasing (IGHG1*11, IGHG2*08, IGHG2*11, IGHG2*14, IGHG2*15, IGHG3*23, IGHG3*24, IGHG3*25, IGHG3*29) were confirmed by molecular cloning. In this case, the segments were re-amplified and ligated into a PTZ57R/T vector (Fermentas, Waltham, MA, EUA) with terminal deoxynucleotidyl transferase (TdT) enzyme. Afterwards, recombinant plasmids were obtained and purified from multiple transformed colonies and sequenced as described above. Novel alleles were verified based on sequences from at least two independent colonies containing each allele.

Data Analysis

Allelic frequencies were obtained by direct counting using GenAlEx v6.502 software (19). Hardy-Weinberg equilibrium was tested for each gene segment in all populations by Guo and Thompson's method (20), performed in Arlequin v3.5.2 software (21). IGHG haplotypes from different gene segments were estimated via ELB algorithm and this information was used for Gm allotype haplotype inference, according to the correspondence between nucleotide variants and allotypes described by Lefranc et al. (6). Linkage disequilibrium (LD) between single nucleotide variants of each gene segment was estimated with Haploview software (22). Allele networks were performed with variants from each gene segment through the median-joining (MJ) algorithm (23) with Network v5.0 software. Allele frequencies were compared using the exact test of population differentiation (24) and population-pairwise FST (25, 26) with Arlequin v3.5.2 software (21). Principal component analysis (PCA) was performed using the Minitab 17 Statistics Software (27) for graphical representation of the genetic differences and similarities in the major components of variation among populations. The PCA was performed using inferred allotype haplotype frequencies to compare the frequencies from the study population with others that were previously described serologically. These haplotypes were classified according to Lefranc et al. (6), and detailed information is available in Table S3. Neutrality tests were performed using the Tajima's D (28), Fu and Li's D*, F*, D, and F (29) and Fay and Wu's H (30) in DnaSP software (31). Homologous gene segments from rhesus monkey were used as outgroup (Macaca mulatta; accession number: NW_001121238, AY292519, AY292512).

Results

One Novel Single Nucleotide Variant and 28 Novel IGHG Alleles Have Been Discovered

Within all three gene segments in the seven populations analyzed, we found a total of 49 exonic variable sites, of which 26 were non-synonymous substitutions. Based on the Grantham scale (32), which ranges from 5 to 215 according to the physicochemical distance between amino acid pairs, amino acid replacements were from low to moderate (15 < D < 103) (Table 1). Of the single variable sites, 21 have not been reported in any of the previously described alleles at the IMGT database (Table 1, in bold). We also found a novel synonymous IGHG3 single nucleotide variant at the position chr14:106235856 (GRCh37.p13 primary assembly) in the CTBA population. This new variant was submitted to the dbSNP database (34) under reference SNP ID number rs155533833 (NC_000014.8:g.106235856G>A).
Table 1

Variable sites found in IGHG1, IGHG2, and IGHG3 gene segments.

Gene segmentExonrsIDLocationaIMGT numberingbEu numberingcNucleic acid substitutionAmino acid substitutionAllotypeGrantham's DdFrequencye
IGHG1CH1rs1155299810620934019140G>A0.006
rs1785009610620928940157G>C0.001
rs1071803106209119120214A>GK>RGm17>Gm3260.143
CH2rs58769096010620847122260A>G0.003
rs37753805010620836484.3296A>TY>F220.001
rs19316035410620832791308C>T0.011
rs104310910620832692309C>GL>V320.011
rs104324910620830698315T>C0.011
CH3rs115579401062081075349C>T0.001
rs104585310620808612356T>GD>EGm1>nGm1450.118
rs1162125910620808214358C>AL>MGm1>nGm1150.118
rs1784108710620793386407C>T0.114
rs113804727106207862110431C>GA>GnGm2>Gm2600.270
rs370028332106207858112432G>C0.001
rs8011686106207843117437G>A0.003
rs12879979106207822124444T>C0.106
IGHG2CH1rs18932874010611107115136C>T0.023
rs58764867210611106916137A>GE>G980.023
rs77381817710611106717138A>GS>G560.023
rs1155795510611096682171A>G0.157
rs1162759410611091492189C>AP>T380.105
CH2rs800915610611013745.1282G>AV>MGm(.)>Gm23210.103
rs1116085910611005791308T>C0.163
rs11367860910611005692309G>CV>L320.003
CH3rs5876824501061098259353A>C0.017
rs498349910610975238378G>TA>S990.003
rs36835978910610970879392G>CK>N940.001
rs104981010610970281394A>G0.054
rs28371022106109573117437G>A0.106
IGHG3CH1rs298377710623764230151C>A0.001
rs1205009510623762440157G>A0.025
CH2rs13886969310623620235271C>T0.006
rs14503520010623619538274C>AQ>K530.006
rs7409386510623614382291C>TP>LnGm21>Gm21980.797
rs6074642510623614183292C>TR>WnGm16>Gm161010.048
rs1289062110623612884.3296A>TY>F220.123
rs201027762106236035110327C>GA>G600.006
rs141959627106236000124339A>GT>A580.006
CH3rs1890259871062358951.4341A>G0.001
rs1475946531062358744348G>A0.001
*rs15553383310623585610354C>T0.001
rs11316945810623578339379G>AV>MnGm15>Gm15210.047
rs7730709910623576744384G>AS>NGm11>nGm11460.799
rs7837619410623576644384C>TGm11>nGm110.799
rs58773952410623575845.2387C>GP>R1030.003
rs14965326710623574279392C>GN>K940.052
rs13941305210623572984397A>GM>VGm14>nGm14210.048
rs4042056106235614115435G>AR>HGm5>nGm5290.085
rs1051112106235611116436T>AF>YGm5>nGm5220.847

In bold, variant sites that have not been observed in any allele listed in the IMGT database. .

Coordinate at chromosome 14 location (GRCh37.p13 primary assembly).

Amino acid position according to IMGT database (International ImMunoGeneTics Information System) (18).

According Edelman et al. (33).

Physicochemical distances between amino the amino acids involved in the substitution, according Grantham (32). The higher the value, the greater the differences, ranging from 5 to 215.

Frequency of the alternative allele, merging all the samples of this study.

Variable sites found in IGHG1, IGHG2, and IGHG3 gene segments. In bold, variant sites that have not been observed in any allele listed in the IMGT database. . Coordinate at chromosome 14 location (GRCh37.p13 primary assembly). Amino acid position according to IMGT database (International ImMunoGeneTics Information System) (18). According Edelman et al. (33). Physicochemical distances between amino the amino acids involved in the substitution, according Grantham (32). The higher the value, the greater the differences, ranging from 5 to 215. Frequency of the alternative allele, merging all the samples of this study. A total of 28 novel IGHG alleles have been found in our study: nine in IGHG1 (Table 2), nine in IGHG2 (Table 3), and ten in IGHG3 (Table 4). All novel alleles have been confirmed either by sequencing or by molecular cloning followed by sequencing. Novel alleles have been submitted to IMGT Nomenclature Committee (18), which verified the accuracy of our data and assigned official names (reports #2018-2-0824 and #2018-5-1113).
Table 2

IGHG1 alleles previous described and the 9 novel IGHG1 alleles identified in this study.

ExonCH1CH2CH3
IMGT unique numbering19401202284.385,19192985121486101110112117124
Eu numberinga140157214260296301308309315349356358407422431432437444
Amino acid changeASK>RTY>FRVL>VNYD>EL>MYC>IA>GLTS
Exonic position6811928989196212233234254264751200243271275290311
Consensus nucleotideGGAAACCCTCTCCGCGGT
Allele nameGenBank accession numberAllotypeb#
IGHG1*0117,1..................0
IGHG1*0217,1.....T............391
IGHG1*033..G..T....GAT....C81
IGHG1*0417,1,27.....T.......A....0
IGHG1*0517,1A....T..........A.2
IGHG1*06MG9202523..G..T....GAT..C.C1
IGHG1*07MG92024517,1,2.....T........G...189
IGHG1*08MG9202463,1..G..T............18
IGHG1*09MG92024717,1A....T............2
IGHG1*10MG92024817,1...G.T............2
IGHG1*11MG92024917,1.....TTGC.........7
IGHG1*12MG92025017,1.....T...T........1
IGHG1*13MG92025117,1....TT............1
IGHG1*14MG92025317,1.C...T............1

Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (18). Dots represent the consensus nucleotide.

According to Edelman et al. (33).

llotypes were inferred according to Lefranc et al. (6).

Number of copies observed in this study.

Table 3

IGHG2 alleles previous described and the 9 novel IGHG2 alleles identified in this study.

ExonCH1CH2CH3
IMGT unique numbering151617198292959645.191929387981117
Eu numberinga136137138140171189192193282308309353378392394437
Amino acid changeSE>GS>GAPP>TN>SF>LV>MVV>LPA>SK>NTT
Exonic position5658606816121322322715023023138111155161290
Consensus nucleotideCAACACACGTGAGGAG
Allele nameGenBank accession numberAllotypeb#
IGHG2*01(..)................0
IGHG2*0223...GGA..AC.....A71
IGHG2*03(..)...G............557
IGHG2*04(..)...G..GG........0
IGHG2*05(..)...G..........G.0
IGHG2*06(..)...GG....C..T..A2
IGHG2*07MH025828(..)...G.....C......4
IGHG2*08MH025829(..)...GG....C....G.30
IGHG2*09MH025830(..)TGGG............16
IGHG2*10MH025831(..)...G.......C....12
IGHG2*11MH025832(..)...GG....CC...G.1
IGHG2*12MH025833(..)...GG...........3
IGHG2*13MH025834(..)...G.....C....G.8
IGHG2*14MH025835(..)...GG....CC.....1
IGHG2*15MH025836(..)...GG....C...CG.1

Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (18). Dots represent the consensus nucleotide.

According to Edelman et al. (.

Allotypes were inferred according to Lefranc et al. (.

Number of copies observed in this study.

IGHG1 alleles previous described and the 9 novel IGHG1 alleles identified in this study. Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (18). Dots represent the consensus nucleotide. According to Edelman et al. (33). llotypes were inferred according to Lefranc et al. (6). Number of copies observed in this study. IGHG2 alleles previous described and the 9 novel IGHG2 alleles identified in this study. Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (18). Dots represent the consensus nucleotide. According to Edelman et al. (. Allotypes were inferred according to Lefranc et al. (. Number of copies observed in this study. IGHG3 alleles previous described and the 10 novel IGHG3 alleles identified in this study. Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (18). Dots represent the consensus nucleotide; abs, absent. According to Edelman et al. (. Allotypes were inferred according to Lefranc et al. (. Number of copies observed in this study. Interestingly, some new alleles of all gene segments were observed at high frequency (f > 0.10; Table 5). The highest frequencies for novel alleles were observed for IGHG1*07 in GKW (f = 0.478; 34 individuals), IGHG1*08 in BrJAP (f = 0.155; 15 individuals), IGHG2*08 in BrJAP (f = 0.202; 23 individuals), IGHG2*09 in GRC (f = 0.137; 9 individuals), IGHG3*21 in BrJAP (f = 0.158; 16 individuals), and IGHG3*22 in GRC (f = 0.157; 15 individuals).
Table 5

One third of the novel IGHG alleles were observed in high frequencies (0.05 < f < 0.48).

GKWGNDGRCKIVKRCBrJAPCTBA
Sample size46485150525556
HW p-value10.0860.9120.8360.8890.5300.530
IGHG1*020.5220.4380.7250.7700.6060.6000.228
IGHG1*030.0940.0200.0200.0290.707
IGHG1*050.022
IGHG1*060.011
IGHG1*070.4780.4690.2350.2000.3650.1360.033
IGHG1*080.0100.155
IGHG1*090.018
IGHG1*100.018
IGHG1*110.064
IGHG1*120.010
IGHG1*130.009
IGHG1*140.010
Sample size46485152525747
HW p-value111110.2320.146
IGHG2*020.0830.0200.0380.0290.1140.436
IGHG2*030.9570.8850.7350.9420.9520.5790.489
IGHG2*060.021
IGHG2*070.0330.010
IGHG2*080.0100.0190.0100.2020.032
IGHG2*090.0110.0100.137
IGHG2*100.0210.098
IGHG2*110.009
IGHG2*120.0180.011
IGHG2*130.0610.011
IGHG2*140.009
IGHG2*150.009
Sample size46485152515751
AlleleHW p-value-11110.9510.519
IGHG3*010.020
IGHG3*100.010
IGHG3*110.0940.0100.0380.0200.588
IGHG3*120.069
IGHG3*141.0000.8440.8330.9520.9710.4740.186
IGHG3*150.009
IGHG3*160.0100.039
IGHG3*190.272
IGHG3*200.018
IGHG3*210.158
IGHG3*220.0630.1570.0100.029
IGHG3*230.009
IGHG3*240.018
IGHG3*250.035
IGHG3*260.039
IGHG3*270.010
IGHG3*280.010
IGHG3*290.009

Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (.

One third of the novel IGHG alleles were observed in high frequencies (0.05 < f < 0.48). Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (. Because most of the previous studies only described the immunoglobulin heavy chain diversity serologically, we inferred the serological Gm allotypes from our nucleotide sequence data, based on the nucleotide sequence description for each previously reported allotype (6), to allow comparison with previously reported variants. For example, the most frequent allele haplotype (alleles that are in the same chromosome and inherited together in a block) was the one comprising the gene segments IGHG1*02, IGHG2*03, IGHG3*14 (f = 0.182 to 0.740), which encodes the Gm haplotype “C” Gm21,26,27,28;17,1;(.), the most frequent lgG allotype haplotype in our populations (f = 0.21 to 0.77; Table 6). The correspondence between allele haplotype and allotype haplotypes are in the Table S4. More than one IGHG allele haplotype can define a single Gm allotype haplotype, as is the case of the Gm haplotype “B” Gm5,10,11,13,14,26,27;3;(.), that is encoded in our data by the allele haplotype IGHG3*11,IGHG2*03,IGHG2*03, by IGHG3*11,IGHG1*14,IGHG2*03, or by IGHG3*11,IGHG1*03,IGHG2*08. In order to simplify the interpretation of the data, Gm haplotype identifiers (from A to M) were used as suggested by Lefranc et al. (6).
Table 6

Gm allotype haplotypes frequencies inferred from nucleotide sequencing.

GKWGNDGRCKIVKRCBrJAPCTBA
Sample size46485150515544
IDaGm haplotypes
A5,10,11,13,14,26,27;3;230.0830.010.020.020.444
B5,10,11,13,14,26,27;3;(.)0.010.277
C21,26,27,28;17,1;(.)0.5220.3750.5880.770.6080.3840.211
D21,26,27,28;17,1,2;(.)0.4780.4680.2350.20.3630.1340.011
I10,11,13,15,16,27;17,1;(.)0.286
J5,10,11,13,14,26,27;3,1;230.010.107
K5,10,11,13,14,26,27;3,1;(.)0.045
21,26,27,28;3;230.01
5,10,11,13,14,26,27;17,1;230.009
10,11,13,16,27;17,1;(.)0.009
21,27;17,1,2;(.)0.022
21,27;17,1;(.)0.0630.1570.010.011
5,10,11,13,14,26,27;17,1;(.)0.022

Allotype haplotype ID are as described by Lefranc et al. (6).

KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba.

Gm allotype haplotypes frequencies inferred from nucleotide sequencing. Allotype haplotype ID are as described by Lefranc et al. (6). KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba.

Lower IGHG Diversity Was Observed in Amerindians and Frequencies Differed Significantly Among Populations

IGHG allelic frequencies varied across populations (Table 5). A small number of highly frequent alleles were observed for all gene segments in Amerindian populations. Even though Guarani populations share a more recent common ancestor, allelic frequencies significantly differed among them (p < 0.01), with low to moderate FST values (0.02–0.10) (Table 7). Allelic frequencies did not differ between the two Kaingang populations (p = 0.065; FST = 0.03). More conspicuous differences were found between the Japanese-descendant and Euro-descendant populations compared to each other, and between each of these two populations compared to the Amerindian populations, with FST values ranging from 0.11 to 0.52, indicating moderate to high genetic differentiation.
Table 7

Genetic differentiation for IGHG1, IGHG2, and IGHG3 among populations.

GKWGNDGRCKIVKRCBrJAPCTBA
GKW********ns******
GND0.02828**************
GRC0.107380.07720************
KIV0.104940.111200.05722ns******
KRC0.010420.038160.068890.03220******
BrJAP0.214960.151440.114370.184920.19168***
CTBA0.515770.381420.410230.512700.501340.28576

Upper diagonal: the statistical significance of the exact test of population differentiation between pairs of population. **p < 0.001–0.01; ***p < 0.001; ns p > 0.05. Lower diagonal: F.

Genetic differentiation for IGHG1, IGHG2, and IGHG3 among populations. Upper diagonal: the statistical significance of the exact test of population differentiation between pairs of population. **p < 0.001–0.01; ***p < 0.001; ns p > 0.05. Lower diagonal: F. The principal component analysis (PCA) grouping was consistent with ancestry and geography (Figure 2). Amerindians and Asians formed two separated groups close to each other. Europeans and admixed populations of predominantly European ancestry grouped together, while Africans were more distant.
Figure 2

Principal component analysis using Gm allotype haplotype frequencies was consistent with geography and ancestry. For comparisons with previously described population, we inferred the Gm allotype frequencies based on the observed nucleotide sequences, according to Lefranc et al. (6). Circles represent population data from the literature and squares represent populations from the present study. All frequencies reported in the literature are listed in Table S3. AFR, African populations; AMER, Amerindian populations; ASIA, Asian populations; EUR, European populations; EUR-BR, Euro-descendant populations from Brazil; ADM-BR, Admixed population from Brazil; KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba.

Principal component analysis using Gm allotype haplotype frequencies was consistent with geography and ancestry. For comparisons with previously described population, we inferred the Gm allotype frequencies based on the observed nucleotide sequences, according to Lefranc et al. (6). Circles represent population data from the literature and squares represent populations from the present study. All frequencies reported in the literature are listed in Table S3. AFR, African populations; AMER, Amerindian populations; ASIA, Asian populations; EUR, European populations; EUR-BR, Euro-descendant populations from Brazil; ADM-BR, Admixed population from Brazil; KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba. Genotypic distributions for all gene segments were in accordance with Hardy-Weinberg equilibrium in all population samples (0.08 > p >1).

Distinct Linkage Disequilibrium Patterns Among Populations

Linkage disequilibrium (LD) patterns differed among populations (Figure S2). Interestingly, each Guarani population exhibited a distinct LD pattern despite their close relationship. In GKW, only five variable sites were observed in all three gene segments, of which three were in absolute LD (D′ = 1, r2 = 1). In contrast, more variable sites (21 and 24) were observed for the other two Guarani populations. In addition, many variants that were in LD in GND were not observed in LD in GRC. The G1m3 allotype (rs1071803) and the G2m23 allotype (rs8009156) were in strong LD in all Amerindian populations (D′ = 1; r2 > 0.87), as well as in CTBA (D′ = 1; r2 = 0.43), and BrJAP (D′ = 0.73; r2 = 0.92) in which fewer SNPs were observed in strong LD.

Sequence Analysis Suggests That Gene Conversion Between Frequent Alleles of Different Gene Segments Generated Novel Alleles

Median-Joining network (Figure 3) shows that the most frequent alleles IGHG1*01, IGHG2*03, and IGHG3*14 were central nodes in the network, with few nucleotides differing between them and the other alleles. The loops indicate possible recombination sites.
Figure 3

Relationship of IGHG alleles. Median-Joining Network of all IGHG1 (A), IGHG2 (B) and IGHG3 (C) alleles. Each circle (node) represents an allele and the size of each circle is proportional to the allele frequency. Numbers in the branches indicate the exon and the exonic position of nucleotide differences between alleles. The mv nodes (median vector) are possible unsampled or extinct ancestral sequences generated by the MJ algorithm to connect the alleles. Alleles IGHG3*11 and IGHG3*12 (C) were grouped because they do not differ in nucleotide sequence, except for the hinge size. KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba; NS, not sampled (alleles not observed in this study). The occurrence of multiple mutations in the same positions in different gene segments is extremely unlikely. In addition, sequence homology and tandem positioning favor unequal crossing over between high frequent alleles. Therefore, based on the multiple alignments, we suggest that (D) IGHG2*09 allele could be a product of gene conversion between IGHG2*03 and IGHG3*14 at position 56 (T), 58 (G), and 60 (G) of CH1 exon; and (E) IGHG1*11 could be a product of gene conversion between IGHG1*02 and IGHG2*03 at position 233 (T), 234 (G), and 254 (C) in CH2 exon.

Relationship of IGHG alleles. Median-Joining Network of all IGHG1 (A), IGHG2 (B) and IGHG3 (C) alleles. Each circle (node) represents an allele and the size of each circle is proportional to the allele frequency. Numbers in the branches indicate the exon and the exonic position of nucleotide differences between alleles. The mv nodes (median vector) are possible unsampled or extinct ancestral sequences generated by the MJ algorithm to connect the alleles. Alleles IGHG3*11 and IGHG3*12 (C) were grouped because they do not differ in nucleotide sequence, except for the hinge size. KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba; NS, not sampled (alleles not observed in this study). The occurrence of multiple mutations in the same positions in different gene segments is extremely unlikely. In addition, sequence homology and tandem positioning favor unequal crossing over between high frequent alleles. Therefore, based on the multiple alignments, we suggest that (D) IGHG2*09 allele could be a product of gene conversion between IGHG2*03 and IGHG3*14 at position 56 (T), 58 (G), and 60 (G) of CH1 exon; and (E) IGHG1*11 could be a product of gene conversion between IGHG1*02 and IGHG2*03 at position 233 (T), 234 (G), and 254 (C) in CH2 exon. Alignment of all the known alleles of the IGHG1, IGHG2, and IGHG3 gene segments suggests that some novel alleles discovered in this study could have been generated by gene conversion between alleles of different gene segments (Figure 3). For example, the novel allele IGHG1*11, present in BrJAP (f = 0.064), could have been generated by gene conversion between the most frequent IGHG2 allele (IGHG2*03; f = 0.579) and the most frequent IGHG1 allele (IGHG1*02; f = 0.60). In addition, gene conversion between the frequent IGHG2*03 and IGHG3*14 alleles (f = 0.735 and f = 0.833, respectively) could explain the origin of allele IGHG2*09 (f = 0.14).

Neutrality Tests Suggest Evidence of Natural Selection Shaping IGHG Polymorphism

Neutrality tests performed by Tajima's D, Fu and Li's D and F were non-significant for most populations. However, Fay and Wu's test resulted in significant negative values for most populations, which may indicate positive selection at an adjacent site (Table 8).
Table 8

Fay and Wu's test was significant in the majority of the study populations.

GKWGNDGRCKIVKRCBrJAPCTBA
2n929610210010211698
Gene segmentTest
IGHG1Tajima's D
Fu and Li's D
Fu and Li's F
Fay and Wu's H−2.455*−3.400**−3.384**−3.490**−2.36*
IGHG2Tajima's D−1.680***−1.716***
Fu and Li's D−2.696**
Fu and Li's F−2.793**
Fay and Wu's H−1.768**−2.745*−4.971**−5.241***−5.518***
IGHG3Tajima's D−1.825***
Fu and Li's D−3.615**
Fu and Li's F−3.309**
Fay and Wu's H−4.310**−7.253***−5.670***−6.307*−7.372**

Statistical significance was tested by coalescent simulations with 10,000 repetitions:

p < 0.01−0.05;

p < 0.001−0.01;

p < 0.001;

–p>0.05. KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba.

Fay and Wu's test was significant in the majority of the study populations. Statistical significance was tested by coalescent simulations with 10,000 repetitions: p < 0.01−0.05; p < 0.001−0.01; p < 0.001; –p>0.05. KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba. Deviation of neutrality was also tested by analyzing synonymous and non-synonymous substitution rates across all the known and novel alleles of all gene segments (Tables S5–S7). Overall, the rate of synonymous substitutions (dS) was significantly higher than the rate of non-synonymous (dN) substitutions (dN/dS < 1) for IGHG1 and IGHG2 (p = 0.01 and 0.032, respectively) (Table 9), consistent with purifying selection.
Table 9

Codon-based test indicates purifying selection shaping IGHG1 and IGHG2 variation.

IGHG1IGHG2IGHG3
Number of alleles151429
Purifying selection (dN < dS)p = 0.010p = 0.032ns
Positive selection (dN > dS)nsnsns

ns, p > 0.05.

Codon-based test indicates purifying selection shaping IGHG1 and IGHG2 variation. ns, p > 0.05.

Discussion

Our main goal was to deliver an unprecedented and comprehensive nucleotide sequencing-based characterization of the IGHG gene segments in populations of different ancestries. Before this study, only 30 IGHG alleles have been described for IGHG1, IGHG2, and IGHG3 together (18). Here, we report the discovery of 28 novel alleles, of which 16 were in a single population sample of Japanese descendants (n = 57) and seven in one population sample of Euro-descendants (n = 51). It is interesting that even in Amerindian populations, which exhibited a limited diversity, seven new alleles were found. This is clear evidence that the diversity of IGHG is far from being fully described and possibly a much larger number of novel alleles will be discovered as more populations are interrogated. We focused on the segments that code for the most abundant Ig in serum. Considering the homology and high sequence similarity, a different strategy would be needed for the precise characterization of IGHG4 due to the high frequency of duplications observed for this gene segment (35). Some of the new alleles were highly frequent. The novel allele IGHG3*22, frequent in Guarani Mbya (GRC, f = 0.157), exhibited a lower frequency in Guarani Ñandeva (GND, f = 0.063), and was absent in Guarani Kaiowa (GKW). These three populations share a more recent common ancestor and the differences observed can be explained by its demographic history and genetic drift. Demographic factors played a major role in shaping the diversity of other genes important for immune responses in these same Amerindian populations (36). Genetic drift, particularly founder effect and bottleneck, may explain the lower diversity of IGHG in Amerindians and the fluctuation of their allelic frequencies. On the other hand, the IGHG3*22 allele was observed only in one Kaingang individual. This fact suggests gene flow from Guarani to Kaingang. Although GRC and KRC remain isolated due to strong cultural barriers, their immediate vicinity did result in a low degree of admixture (14). IGHG3*11 is the most common IGHG3 allele in Euro-descendants (CTBA, f = 0.588) and was observed at lower frequency in Amerindians: GND (f = 0.094), GRC (f = 0.010), KIV (f = 0.038), KRC (f = 0.020), being absent in GKW. This allele corresponds to the allotype G3m5,10,11,13,14,26,27 which has been previously shown to be highly frequent in Europeans but absent in non-admixed Amerindians (37–42). Also, similar allele distribution was observed for IGHG1*03 in the study populations. These observations are consistent with previous studies from our group, which estimated the admixture rate of Guarani and Kaingang by analyzing HLA class II genes. In that study, the estimated admixture rate with non-Amerindians was 14.3% for GND, 3.7% for GRC, 7.2% for Kaingang, and no admixture for GKW (15). The Gm allotype haplotype frequencies inferred from DNA sequencing in our study (in which the most common haplotypes were C and D) were similar to those found in former reports that characterized serologically the Guarani and Kaingang populations from Santa Catarina State, Brazil (42), and other native American populations (41, 43, 44). The new allele IGHG3*21 was frequent in BrJAP (f = 0.158), but absent in the other populations. According to the nucleotide sequence, it encodes the haplotype Gm5,10,11,13,14,26,27, whose frequency was previously reported as 15.2% in a study with Japanese families (45). In that same study, the haplotypes C (Gm21;17,1;(.) – 40.7%), D (Gm21;17,1,2;(.) – 16.4%), I (Gm11,13,15,16;17,1;(.) – 27.7%), and J (Gm5,11,13;3,1;23 – 15.2%) exhibited similar frequencies to the ones inferred from DNA sequencing in BrJAP, which were 38.4%, 13.4%, 28.6%, and 10.7%, respectively (Table 6). The novelty of our results is showing, for the first time, the characterization of the variants at DNA level that are responsible for the occurrence of these Gm haplotypes in Japanese populations. Strong linkage disequilibrium (LD) (Figure S2) was observed in most Amerindian populations, as expected for these historically small populations that suffered strong genetic drift and multiple founder effects since the arrival of the first Americans to the continent and during their migration from the North to the South in the American continent. Interestingly, the patterns of LD differed among Guaranis, despite their shared ancestry. GKW exhibited a reduced number of variable sites, while GRC exhibited a reduced LD in comparison to GND. These differences could also be explained by genetic drift, as certain haplotypes that stochastically increased their frequencies in a population after their divergence may not have increased in the others. In contrast, the Japanese-descendant and Euro-descendant populations have higher nucleotide and allele diversity and fewer SNPs in LD. Even so, SNPs from different gene segments are in LD in these urban populations. In BrJAP, SNPs of allotypes G1m17 (rs1071803) and G2m(.) (rs8009156) are in LD (D = 0.92; r2 = 0.73) and are present in the allotype haplotypes C and D, reported as the most common in Japanese populations (45). In the MJ networks (Figure 3), IGHG1*02, IGHG2*03 and IGHG3*14 were connected with most alleles and were present at high frequencies in all populations. This pattern suggests that most of the other known alleles could have been originated from them. In the IGHG2 MJ network, one loop shows two paths where substitutions at position 161 of exon CH3 and 230 of CH2 occurred to generate the IGHG2*05, *07, and *13 alleles. It can be hypothesized that a mutation occurred in one of them, for example, IGHG2*03 at position 161 of exon CH3, generated IGHG2*05 and this allele, likewise, might have mutated at position 230 of exon CH2 originating allele IGHG2*13. As independent mutations in the same positions are extremely unlikely, the fact that the IGHG2*07 allele has a variant in the same position (230 of CH2 exon) indicates that gene conversion between alleles IGHG2*13 and IGHG2*03 originated the IGHG2*07 allele. Moreover, we suggest that the novel alleles IGHG1*11 and IGHG2*09 resulted from gene conversion between two frequent alleles of different gene segments. Overall, our data point to a major role of recombination and gene conversion originating new IGHG alleles, which is consistent with the tandem positioning and high sequence similarity of these segments, which favor unequal crossing-over (46). Kaingang from Ivaí and Kaingang from Rio das Cobras presented low genetic differentiation (FST = 0.032), and similar allele frequencies (Table 7), most probably because of their recent common origin and gene flow due to the absence of cultural barriers, in addition to their geographical proximity. The FST values between the Guarani populations were low to moderate, which is an evidence of genetic drift affecting the IGHG diversity in these populations. These results are compatible with previous reports for mtDNA in the same populations, which indicated that divergence of the three Guarani populations occurred at around 1,800 years before present (ybp), much earlier than the separation of the Kaingang populations that was estimated at of 207 ybp (13). The PCA results (Figure 2) were consistent with geography and ancestry and showed that our data are consistent with data obtained by serologic methods, previously reported in the literature. The exception was India, which grouped with Europeans and Euro-descendants. In fact, PCA grouping does not necessarily mean common ancestry, as it can also result from migration or stochastic factors, or convergent evolution by natural selection. The grouping solely reflects the similarities of the IGHG allelic frequencies in these populations. The results of most neutrality tests suggested that natural selection is not the major factor responsible for shaping IGHG diversity in the study populations. In other words, for IGHG the impact of genetic drift due to demographical processes is possibly stronger than the signal left by natural selection. As is known, Amerindians have a long history of migrations and isolation, and went through severe bottlenecks after the European colonization (14). Still, in GKW and KRC for IGHG2 and KRC and CTBA for IGHG3, the results of Tajima's D, and Fu and Li's D and F tests indicated diversity sweeps due to bottlenecks or purifying selection. Analyzing all the currently known IGHG alleles, including the 28 novel alleles that we here described, we found that the codon-based dN/dS test showed significant results for purifying selection (Table 9) for IGHG1 (p = 0.01) and IGHG2 (p = 0.03). We observed that synonymous (dS) substitution rates were higher than non-synonymous (dN) substitution rates. It was previously demonstrated that Gm1 allotypes have a different impact on the IgG1 ability to bind the Fc gamma receptor (FcγR)-like proteins from viruses. Antibodies with G1m1,2,17 allotype exhibit lower affinity to the viral FcγR-like protein of the human cytomegalovirus (HCMV), which decreases susceptibility to this infection (47). Similarly, the FcγR-like protein from herpes simplex virus (HSV) binds with lower affinity to antibodies carrying the G1m3 allotype due to certain residues in the CH1 and CH3 domains (9). In the light of our results, it is plausible to suggest that emerging amino acid replacements that favored binding to viral proteins were negatively selected as a result of their deleterious effect for the individuals carrying the mutations. Higher binding to these viral proteins would favor viral evasion from immune responses and increase the susceptibility to certain viral infections. Moreover, purifying selection against non-synonymous changes could have limited the diversification of IGHG1 and IGHG2. The Fay and Wu H test was significant with negative values for almost every population and gene segment analyzed. This could be interpreted as a result of an excess of derived variants at high frequencies in the gene genealogies. Fay and Wu (30) suggested that this may be a unique pattern produced by hitchhiking of variants in the vicinity that are being favored by positive selection. IGHG gene segments are located downstream of the IGHV, IGHD, and IGHJ gene segments that encode the immunoglobulin variable regions, which specifically bind to antigens (2, 4). Therefore, we suggest that selection for variants in the variable region may be impacting the diversity of the constant region by hitchhiking mutations in the IGHG gene segments. This hypothesis is corroborated by the findings of Tanaka and Nei (48), who demonstrated that the non-synonymous mutation rate was higher than the synonymous rate in the gene segments that code for the Ig variable region. Their results were consistent with diversity-enhancing selection or overdominant selection driving the nucleotide diversity in the variable region.

Conclusion

Antibodies are pivotal for human survival, at both the individual and the population levels. It is surprising that despite decades of compelling evidence about the importance of the immunoglobulin gene variation for human immunity and the not so recent advent of sequencing technologies, most of the knowledge about IGHG is still based on serologic typing. As we see here, the fact that the regions encoded by IGHG are called “constant” does not mean these segments are not highly polymorphic. In fact, we found 16 novel alleles in a population sample of only 57 Japanese descendants. The IGHG genomic region is not well-covered in genome-wide association studies and whole genome sequencing databases. The homology and high sequence similarity of IGHG segments impose technical difficulties for sequencing, particularly at large scale. Besides, the somatic recombination events characteristic of the IGH locus makes DNA from B-cell lines, used in so many studies, not suitable for IGHG sequencing. Our study is the first to sequence systematically these segments at the nucleotide level in populations. We here present a full characterization of IGHG1-3 diversity in seven Brazilian populations, linkage disequilibrium, haplotypes and evidence of purifying selection and genetic drift. Understanding the IGHG normal variation in populations and its evolution may be the key to better comprehend how the immune system fights invading organisms and non-self-antigens and also may contribute to the development of new vaccines.

Ethics Statement

This study was carried out in accordance with the recommendations of Brazilian National Human Research Ethics Committee (CONEP) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Brazilian National Human Research Ethics Committee (CONEP).

Author Contributions

DA designed the study. VC-S, DA, LV, RD, and HI performed DNA sequencing and genotyping. VC-S analyzed the data. RW, VC-S, RD, HI, and LV performed molecular cloning and validation of novel alleles. MP-E, DA, DM, and RW contributed with reagents. VC-S, DA, MP-E, DM, and MB drafted the manuscript. All authors significantly contributed with ideas and critically reviewed this manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Table 4

IGHG3 alleles previous described and the 10 novel IGHG3 alleles identified in this study.

ExonCH1H1H2H3H4CH2
Eu numberinga118151157176192193---271274291292296309327339
IMGT unique numbering1.4304084,395961010133538828384,392110124
Amino acid changeSSSS>YS>NL>FPPRPQ>KP>LR>WY>FL>VA>GT>A
Exonic position2101119175223227292936122129181183196234289324
Consensus nucleotideTCGCGGAGACCCCACCA
Allele nameGenBank accession numberAllotypeb#
IGHG3*015,10,11,13,14,26,27...................2
IGHG3*035,6,11,24,26.......abs...........0
IGHG3*045,10,11,13,14,26,27C......absabs..........0
IGHG3*055,10,11,13,14,26,27...................0
IGHG3*065,10,11,13,14,26,27.A.................0
IGHG3*075,10,11,13,14,26,27...................0
IGHG3*085,14,26,27...................0
IGHG3*095,10,11,13,14,26,27................G..0
IGHG3*105,10,11,13,14,26,27.A.................1
IGHG3*115,10,11,13,14,26,27.........AC....T...76
IGHG3*125,10,11,13,14,26,27........absAC....T...7
IGHG3*135,6,10,11,14,26,27...................0
IGHG3*1421,26,27,28.............T.....529
IGHG3*1521,26,27,28.............T.....1
IGHG3*1621,26,27,28.............T....G5
IGHG3*1710,11,13,15,27....AC.absG..........0
IGHG3*1810,11,13,15,16,27...A...absG.....T....0
IGHG3*1910,11,13,15,16,27.......absG.....T....31
IGHG3*20MG92025621,26,27,28.............T.....2
IGHG3*21MG9202555,10,11,13,14,26,27..A................18
IGHG3*22MG92025421,27.............T.....26
IGHG3*23MH02583710,11,13,16,27.......absG.....T....1
IGHG3*24MG92025726,27,28...................2
IGHG3*25MG92025821,26,27,28...........TAT.....4
IGHG3*26MG9202595,10,11,13,14,26,27...............T.G.4
IGHG3*27MG92026026,27,28.............T.....1
IGHG3*28MG7868135,10,11,13,14,26,27.........AC....T...1
IGHG3*29MG92026121,26,27,28.............T.....1
ExonCH3
IMGT unique numbering1,441039444445,279818488899098100101115116
Eu numberinga341348354379384384387392394397409410411419421422435436
Amino acid changeSVSV>MS>NS>NP>RN>KSINM>VK>RSINSINQ>ESINI>VR>HF>Y
Exonic position22341114130131139155161168205209212234242243283286
Consensus nucleotideAGCGGCCCGAACCCCAGT
Allele nameGenBank accession numberAllotypeb#
IGHG3*015,10,11,13,14,26,27..................2
IGHG3*035,6,11,24,26.........GGA.GTG..0
IGHG3*045,10,11,13,14,26,27..................0
IGHG3*055,10,11,13,14,26,27..................0
IGHG3*065,10,11,13,14,26,27G......G..........0
IGHG3*075,10,11,13,14,26,27G......G..........0
IGHG3*085,14,26,27....AT............0
IGHG3*095,10,11,13,14,26,27............T.....0
IGHG3*105,10,11,13,14,26,27..................1
IGHG3*115,10,11,13,14,26,27..................76
IGHG3*125,10,11,13,14,26,27..................7
IGHG3*135,6,10,11,14,26,27.......GA....G....0
IGHG3*1421,26,27,28....AT...........A529
IGHG3*1521,26,27,28....AT.G.........A1
IGHG3*1621,26,27,28....AT...........A5
IGHG3*1710,11,13,15,27...A...G.G......AA0
IGHG3*1810,11,13,15,16,27...A...G.G......AA0
IGHG3*1910,11,13,15,16,27...A...G.G......AA31
IGHG3*20MG92025621,26,27,28....ATG..........A2
IGHG3*21MG9202555,10,11,13,14,26,27..................18
IGHG3*22MG92025421,27....AT..........AA26
IGHG3*23MH02583710,11,13,16,27.......G.G......AA1
IGHG3*24MG92025726,27,28....AT.G.........A2
IGHG3*25MG92025821,26,27,28....AT...........A4
IGHG3*26MG9202595,10,11,13,14,26,27..................4
IGHG3*27MG92026026,27,28.A..AT...........A1
IGHG3*28MG7868135,10,11,13,14,26,27..T...............1
IGHG3*29MG92026121,26,27,28G...AT...........A1

Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (18). Dots represent the consensus nucleotide; abs, absent.

According to Edelman et al. (.

Allotypes were inferred according to Lefranc et al. (.

Number of copies observed in this study.

  43 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Hitchhiking under positive Darwinian selection.

Authors:  J C Fay; C I Wu
Journal:  Genetics       Date:  2000-07       Impact factor: 4.562

3.  DNA sequence variability of IGHG3 alleles associated to the main G3m haplotypes in human populations.

Authors:  P Dard; M P Lefranc; L Osipova; A Sanchez-Mazas
Journal:  Eur J Hum Genet       Date:  2001-10       Impact factor: 4.246

4.  Median-joining networks for inferring intraspecific phylogenies.

Authors:  H J Bandelt; P Forster; A Röhl
Journal:  Mol Biol Evol       Date:  1999-01       Impact factor: 16.240

5.  THE GM AND INV GROUPS OF INDIANS FROM SANTA CATARINA, BRAZIL.

Authors:  F M SALZANO; A G STEINBERG
Journal:  Am J Hum Genet       Date:  1965-05       Impact factor: 11.025

6.  Immunoglobulin allotypes in Sardinia.

Authors:  A Piazza; E van Loghem; G de Lange; E S Curtoni; L Ulizzi; L Terrenato
Journal:  Am J Hum Genet       Date:  1976-01       Impact factor: 11.025

7.  Immunoglobulin Allotypes of European Populations. I. Gm and Km(Inv) allotypic markers in Hungarians.

Authors:  M S Schanfield; J Gergely; H H Fudenberg
Journal:  Hum Hered       Date:  1975       Impact factor: 0.444

8.  Chromosomal location of the genes for human immunoglobulin heavy chains.

Authors:  C M Croce; M Shander; J Martinis; L Cicurel; G G D'Ancona; T W Dolby; H Koprowski
Journal:  Proc Natl Acad Sci U S A       Date:  1979-07       Impact factor: 11.205

9.  The herpes simplex virus type 1 Fc receptor discriminates between IgG1 allotypes.

Authors:  A Atherton; K L Armour; S Bell; A C Minson; M R Clark
Journal:  Eur J Immunol       Date:  2000-09       Impact factor: 5.532

10.  HLA class II diversity in seven Amerindian populations. Clues about the origins of the Aché.

Authors:  L T Tsuneto; C M Probst; M H Hutz; F M Salzano; L A Rodriguez-Delfin; M A Zago; K Hill; A M Hurtado; A K C Ribeiro-dos-Santos; M L Petzl-Erler
Journal:  Tissue Antigens       Date:  2003-12
View more
  7 in total

1.  Functional consequences of allotypic polymorphisms in human immunoglobulin G subclasses.

Authors:  Andrew R Crowley; Simone I Richardson; Marina Tuyishime; Madeleine Jennewein; Meredith J Bailey; Jiwon Lee; Galit Alter; Guido Ferrari; Lynn Morris; Margaret E Ackerman
Journal:  Immunogenetics       Date:  2022-07-29       Impact factor: 3.330

2.  Coming together at the hinges: Therapeutic prospects of IgG3.

Authors:  Thach H Chu; Edward F Patz; Margaret E Ackerman
Journal:  MAbs       Date:  2021 Jan-Dec       Impact factor: 5.857

3.  Remarkably Low KIR and HLA Diversity in Amerindians Reveals Signatures of Strong Purifying Selection Shaping the Centromeric KIR Region.

Authors:  Luciana de Brito Vargas; Marcia H Beltrame; Brenda Ho; Wesley M Marin; Ravi Dandekar; Gonzalo Montero-Martín; Marcelo A Fernández-Viña; A Magdalena Hurtado; Kim R Hill; Luiza T Tsuneto; Mara H Hutz; Francisco M Salzano; Maria Luiza Petzl-Erler; Jill A Hollenbach; Danillo G Augusto
Journal:  Mol Biol Evol       Date:  2022-01-07       Impact factor: 16.240

4.  Population-specific diversity of the immunoglobulin constant heavy G chain (IGHG) genes.

Authors:  Arman A Bashirova; Wanjing Zheng; Marjan Akdag; Danillo G Augusto; Nicolas Vince; Krista L Dong; Colm O'hUigin; Mary Carrington
Journal:  Genes Immun       Date:  2021-12-04       Impact factor: 2.676

Review 5.  Beyond Allotypes: The Influence of Allelic Diversity in Antibody Constant Domains.

Authors:  Annmaree K Warrender; William Kelton
Journal:  Front Immunol       Date:  2020-08-18       Impact factor: 7.561

6.  A Novel Framework for Characterizing Genomic Haplotype Diversity in the Human Immunoglobulin Heavy Chain Locus.

Authors:  Oscar L Rodriguez; William S Gibson; Tom Parks; Matthew Emery; James Powell; Maya Strahl; Gintaras Deikus; Kathryn Auckland; Evan E Eichler; Wayne A Marasco; Robert Sebra; Andrew J Sharp; Melissa L Smith; Ali Bashir; Corey T Watson
Journal:  Front Immunol       Date:  2020-09-23       Impact factor: 7.561

Review 7.  Immunoglobulin germline gene variation and its impact on human disease.

Authors:  Ivana Mikocziova; Victor Greiff; Ludvig M Sollid
Journal:  Genes Immun       Date:  2021-06-26       Impact factor: 2.676

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.