Literature DB >> 23549729

Current perspectives on the intensity of natural selection of MHC loci.

Yoshiki Yasukochi1, Yoko Satta.   

Abstract

Polymorphism of genes in the major histocompatibility complex (MHC) is believed to be maintained by balancing selection. However, direct evidence of selection has proven difficult to demonstrate. In 1994, Satta and colleagues estimated the selection intensity of the human MHC (human leukocyte antigen (HLA)) loci; however, at that time the number of HLA sequences was limited. By comparing five different methods, this study demonstrated the best way to calculate the selection coefficient, through a computer simulation study. Since the study, many HLA nucleotide sequences have been made available. Our new analysis takes advantage of these newly available sequences and compares new estimates with those of the previous study. Generally, our new results are consistent with those of the 1994 study. Our results show that, even after 20 years of exhaustive sequencing of human HLA, the number of dominant HLA alleles, on which our original estimate of selection intensity depended, appears to be conserved. Indeed, according to the frequency distribution for each HLA allele, most sequences in the database were minor or private alleles; therefore, we conclude that the selection intensities of HLA loci are at most 4.4 % even though the HLA is the prominent example on which the natural selection has been operating.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23549729      PMCID: PMC3651823          DOI: 10.1007/s00251-013-0693-x

Source DB:  PubMed          Journal:  Immunogenetics        ISSN: 0093-7711            Impact factor:   2.846


The large extent of polymorphism of major histocompatibility complex (MHC) genes is believed to be maintained by balancing selection for the extent of the peptide binding repertoire between individuals (Hughes and Nei 1988, 1989; Takahata and Nei 1990; Hughes and Yeager 1998). A unique effect of balancing selection is the long persistence time of alleles in populations and, consequently, trans-species polymorphism (Klein 1987; Takahata 1990; Takahata et al. 1992; Klein et al. 1998, 2007). However, it is difficult to show direct evidence of such selection by experiments and to measure selection intensity directly. Satta et al. (1994) estimated the intensity of selection at the human MHC (human leukocyte antigen (HLA)) loci by using the available collection of allelic sequences and a simple model based on symmetric overdominant selection and the theory of allelic genealogy (Kimura and Crow 1964; Takahata 1990; Takahata and Nei 1990; Takahata et al. 1992). In recent years, a number of HLA allelic nucleotide sequences have become available through IMGT/HLA database (http://www.ebi.ac.uk/imgt/hla/, Robinson et al. 2011). Currently (2012), the database contains 7,670 alleles. This large dataset of sequences provides an opportunity to estimate more reliable evolutionary parameters, such as natural selection intensity. Hence, we re-estimated the selection coefficient and compared the estimates with those in the previous study that was based on a limited number of sequences (Satta et al. 1994). The large number of nucleotide sequences at the six functional HLA loci (HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1, and HLA-DPB1), which play important roles in peptide presentation, was obtained from the IMGT/HLA database. In addition, nucleotide sequences of alleles at the HLA class II A (DQA1 and DPA1) and class II B (DRB3 and DRB5) loci were also used in this analysis. Because the inclusion of recombinants will lead a biased estimation of the selection intensity, possible recombinant alleles were excluded by using the method described by Satta (1992). This method assumes that the relationship between the number of substitutions in a particular region and the number of substitutions in the entire region is binomially distributed. At the HLA-B locus, an exceptionally divergent HLA-B*73:01 allele (Abi-Rached et al. 2011), which might have been transmitted to extant humans from a distinct Homo by interbreeding, was also excluded from this analysis. Applying the theory of allelic genealogy under symmetric overdominant selection to this analysis, we used only dominant alleles that have a frequency >1 % throughout various human populations (the NCBI dbMHC database, http://www.ncbi.nlm.nih.gov/gv/mhc, Meyer et al. 2007). We also excluded the nucleotide sequences with a wide range of undetermined nucleotides from this analysis (Table 1). Therefore, the number for alleles used in this analysis was limited to 9 HLA-A alleles, 19 HLA-B, 20 HLA-C, 25 HLA-DRB1, 13 HLA-DQB1, 10 HLA-DPB1, 6 HLA-DQA1, 3 HLA-DPA1, 13 HLA-DRB3, and 5 HLA-DRB5. These HLA alleles are listed in Online Resource 1. Interestingly, most of the enormously large numbers of nucleotide sequences in the current database are minor or private alleles.
Table 1

The number of alleles and dominant alleles in the database

HLA locusNo. of allelesNo. of PBR in different alleles
In the databaseWholeNonrecombinantDominanta, b Nonrecombinantb Dominantb
A 1,5941565027c (18)d 3226
B 2,12323514340c (21)d 11339
C 1,10214312920c 6019
DRB1 975645626 (1)d 3723
DQB1 144(61)e 55131310
DPB1 145(44)e 3811 (1)d 2311
DRB3 58(13)e 136
DRB5 20(5)e 54
DQA1 4734318 (2)d 95
DPA1 34(11)e 95 (2)d 33

aThe number of dominant alleles that have a high frequency (>1 %) throughout human populations worldwide (including possible recombinants)

bThe number of amino acid sequences

cThe number of dominant alleles that are detected in >100 chromosomes from >25 human populations

dThe number of dominant alleles that are excluded due to a possible recombinant or short sequence

eNot whole coding sequence (see text)

The number of alleles and dominant alleles in the database aThe number of dominant alleles that have a high frequency (>1 %) throughout human populations worldwide (including possible recombinants) bThe number of amino acid sequences cThe number of dominant alleles that are detected in >100 chromosomes from >25 human populations dThe number of dominant alleles that are excluded due to a possible recombinant or short sequence eNot whole coding sequence (see text) According to the theory described in Takahata (1990) and Takahata et al. (1992), to estimate the selection coefficient s, two estimators, γ and K B, must be calculated. The estimator γ is the ratio of the number of nonsynonymous substitutions per peptide-binding region (PBR) site to that of synonymous substitutions per site among given pairs of alleles, whereas K B is the mean number of pairwise nonsynonymous substitutions in the PBR. The number of synonymous and nonsynonymous sites was estimated using the modified Nei–Gojobori method (Zhang et al. 1998) with the Jukes–Cantor correction (Jukes and Cantor 1969). Because of the relatively early ceiling in the number of nonsynonymous substitutions in the PBR due to acceleration of the nucleotide substitution rate by balancing selection, Satta et al. (1994) developed five methods for estimating K B, and these methods were evaluated by computer simulations. Here, we used method II because this method minimized errors in the multiple-hit correction (Satta et al. 1994). In this method, selection coefficients can be adequately estimated by using only sets of sequences that are relatively closely related. The estimated values of K B and γ at the six major HLA loci described above are provided in Table 2. Using these values, we obtained other estimators, M and S, which were also necessary for estimating the selection coefficient, s (see Satta et al. 1994). Assuming that a long-term effective population size of humans is 105, the s values of HLA-B and HLA-DRB1 loci (s = 4.4 and 1.9 %, respectively) in the present study were the highest for the class I and class II loci, respectively. This result was consistent with that of the previous study (Satta et al. 1994). All s values were more or less similar to those of the previous study with the exception of DQB1 and DPB1 loci: the current estimate of DQB1 was lower than the previous estimate and the value for DPB1 was much higher than the previous estimate (Satta et al. 1994). One possible reason for this is the different set of nucleotides sequences used than the previous study. In fact, both for DQB1 and DPB1, the number of dominant alleles used in the present analysis increases compared to that of the previous one.
Table 2

Estimates of the mean number of nonsynonymous substitutions, the relative nonsynonymous substitution rate in the PBR, and the selection coefficient (s)

HLA locusLengtha L S a L B a L N a No. of allele 1b No. of allele 2c K B γ S M s
A 1,095 bp29512367427d 928.9 (26.0)7.6 (6.3)4,500 (3,000)0.04 (0.09)2.25 % (1.50 %)
B 1,086 bp30012264340d 1935.9 (36.0)9.7 (9.0)8,825 (8,200)0.01 (0.02)4.41 % (4.20 %)
C 1,093 bp30112566520d 2017.3 (15.0)4.9 (3.4)1,030 (530)0.15 (0.29)0.52 % (0.26 %)
DRB1 795 bp22353521262523.2 (25.0)10.2 (9.3)3,890 (3,900)0.01 (0.01)1.94 % (1.90 %)
DQB1 687 bp14851347131312.4 (20.0)4.4 (6.0)479 (1,700)0.14 (0.08)0.24 % (0.85 %)
DPB1 543 bp14653344111011.9 (6.8)9.2 (4.3)918 (140)0.01 (0.08)0.46 % (0.07 %)
DRB3 549 bp14854347(13e)5.6 −5.4 −120 −0.04 −0.06 % −
DRB5 549 bp14853348(5e)8.0 −7.9 −360 −0.01 −0.18 % −
DQA1 765 bp21147504865.9 (13.0)2.1 (4.5)53 (550)0.23 (0.14)0.03 % (0.28 %)
DPA1 663 bp19042428534.8 −3.3 −54 −0.10 −0.03 % −

The numbers of sites of synonymous and nonsynonymous substitutions were estimated using the modified Nei–Gojobori model (R = 1.04 for class I, R = 1.14 for class II). The parameter values in parentheses were estimated on the basis of method II described in Satta (1992). The mutation rate per PBR per generation (u) = 1.7 × 10−6 for class I loci and 7.5 × 10−7 for class II loci; effective population size (N e) = 105 (see Satta et al. 1994)

L the number of synonymous sites across the entire region, L the number of nonsynonymous sites at the PBR, L the number of nonsynonymous sites at the non-PBR

aThe length or the number of sites used in this study (not in the previous study)

bThe number of dominant alleles that have a high frequency (>1 %) throughout human populations worldwide (shown as n a in text)

cThe number of dominant alleles excluding possible recombinants

dThe number of dominant alleles that are detected in >100 chromosomes from >25 human populations

eThe number of alleles not derived from the dominant allele because of lack of information about allele frequencies in the human populations

Estimates of the mean number of nonsynonymous substitutions, the relative nonsynonymous substitution rate in the PBR, and the selection coefficient (s) The numbers of sites of synonymous and nonsynonymous substitutions were estimated using the modified Nei–Gojobori model (R = 1.04 for class I, R = 1.14 for class II). The parameter values in parentheses were estimated on the basis of method II described in Satta (1992). The mutation rate per PBR per generation (u) = 1.7 × 10−6 for class I loci and 7.5 × 10−7 for class II loci; effective population size (N e) = 105 (see Satta et al. 1994) L the number of synonymous sites across the entire region, L the number of nonsynonymous sites at the PBR, L the number of nonsynonymous sites at the non-PBR aThe length or the number of sites used in this study (not in the previous study) bThe number of dominant alleles that have a high frequency (>1 %) throughout human populations worldwide (shown as n a in text) cThe number of dominant alleles excluding possible recombinants dThe number of dominant alleles that are detected in >100 chromosomes from >25 human populations eThe number of alleles not derived from the dominant allele because of lack of information about allele frequencies in the human populations Allelic genealogy predicts that K B is approximately equal to the number of dominant alleles (n a) in a population. In fact, n a showed good agreement with K B in three class II B loci (Table 2). In class I loci, the HLA-C showed relatively good agreement between n a and K B, whereas for the HLA-A and HLA-B loci, the observed number of dominant alleles was less than the expected number. This discrepancy might indicate that the definition of dominant alleles is inappropriate for class I loci. Originally, we regarded an allele with a frequency of more than 1 % over all populations examined as a dominant allele. According to the dbMHC database, the number of chromosomes examined at all three class I loci was more than 10,000 in total, ranging from allele to allele. Thus, we defined 1 % (100 chromosomes) of 10,000 chromosomes as a class I dominant allele. In addition, the mean number of populations in which class II dominant alleles were observed was about 25. Therefore, for class I loci, we considered the alleles detected on >100 chromosomes through >25 populations as a dominant allele. Surprisingly, n a of class I loci under this new definition showed good agreement with K B (Table 2). This might imply that some dominant alleles, with <1 % allele frequency in the entire world population, were dominantly distributed throughout the human population until quite recently and that they have decreased in frequency because their alleles might be replaced by other alleles that had an advantage in the modern environments of some populations. The number of different dominant alleles in the PBR also shows good agreement with expectations (Table 1). After the exclusion of possible recombinants, the numbers at each locus were 26 at HLA-A, 39 at HLA-B, and 19 at HLA-C. However, when we included rare alleles, these numbers increased to 32, 113, and 60, respectively. The number of rare alleles which have de novo PBR nonsynonymous mutations is large and they may have emerged by a population expansion quite recently (Fu et al. 2013). In addition to the above estimates, we further estimated the selection coefficients for DRB3, DRB4, DRB5, DQA1, and DPA1 (Table 2). With the exception of DRB4 (see below), all selection coefficient s of the four HLA class II loci were lower than those of the six major HLA loci (HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1 and HLA-DPB1), indicating that the six major loci have been strongly affected by balancing selection. The present s estimate of DQA1 is lower than that of the previous one, but the present K B value is similar to the n a. We consider that the present estimate is close to the true value. For DRB4, 15 alleles were deposited in the database and they are identical at the PBR sites and nearly identical at the neutral (synonymous and non-PBR nonsynonymous) sites. Thus, inference of the γ and K B values is difficult. The relatively recent emergence of DRB4 (the per site nucleotide divergence from DRB2 is 0.015∼0.017: Satta et al. 1996) supports this observation. In addition, the small amount of nucleotide divergence at neutral sites for DRB4 indicates the relatively small effective population size of DRB4. This suggests that the frequency of DR53 haplotype on which DRB4 resides is relatively lower than that of other HLA haplotypes. In addition, DRB3 and DRB5 also show the smaller effective size than that of other HLA loci (The estimated N e values of DRB3 and DRB5 are quite smaller than 105). This is also because that DRB3 and DRB5 are located on a limited DR haplotype, whereas other HLA loci exist in all humans. Our findings show that although the number of sequences in the database has greatly increased in the past 20 years, most of the accumulated sequences are minor or private alleles and the number of dominant alleles does not change largely since the previous estimation. Therefore, most of selection coefficients in the six major HLA loci estimated in the present study were similar to those of the previous study. One may consider that application of symmetrical overdominance is too strict for the actual data. However, the simulation study by Takahata and Nei (1990) reveals that the asymmetrical overdominance model does not fit the mode of polymorphism for actual data: under a given selection coefficient of asymmetrical model, the number of alleles and the average heterozygosity become smaller than those under symmetrical overdominance model. In fact, the number of dominant alleles at all HLA loci was consistent with the K B values under symmetrical overdominance, suggesting the consistency between our assumed model and the actual data. Therefore, the overdominance model is appropriate to the present estimation. Through this analysis, we confirmed that the selection intensity (selection coefficient, s) of HLA loci in modern humans is at most 4.4 %, even though HLA is the prominent example on which natural selection acts. (PDF 18.6 kb)
  15 in total

1.  THE NUMBER OF ALLELES THAT CAN BE MAINTAINED IN A FINITE POPULATION.

Authors:  M KIMURA; J F CROW
Journal:  Genetics       Date:  1964-04       Impact factor: 4.562

Review 2.  MHC, TSP, and the origin of species: from immunogenetics to evolutionary genetics.

Authors:  Jan Klein; Akie Sato; Nikolas Nikolaidis
Journal:  Annu Rev Genet       Date:  2007       Impact factor: 16.830

Review 3.  Natural selection at major histocompatibility complex loci of vertebrates.

Authors:  A L Hughes; M Yeager
Journal:  Annu Rev Genet       Date:  1998       Impact factor: 16.830

4.  Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection.

Authors:  A L Hughes; M Nei
Journal:  Proc Natl Acad Sci U S A       Date:  1989-02       Impact factor: 11.205

5.  Polymorphism and balancing selection at major histocompatibility complex loci.

Authors:  N Takahata; Y Satta; J Klein
Journal:  Genetics       Date:  1992-04       Impact factor: 4.562

6.  Evolutionary relationship of HLA-DRB genes inferred from intron sequences.

Authors:  Y Satta; W E Mayer; J Klein
Journal:  J Mol Evol       Date:  1996-06       Impact factor: 2.395

7.  Positive Darwinian selection after gene duplication in primate ribonuclease genes.

Authors:  J Zhang; H F Rosenberg; M Nei
Journal:  Proc Natl Acad Sci U S A       Date:  1998-03-31       Impact factor: 11.205

8.  Intensity of natural selection at the major histocompatibility complex loci.

Authors:  Y Satta; C O'hUigin; N Takahata; J Klein
Journal:  Proc Natl Acad Sci U S A       Date:  1994-07-19       Impact factor: 11.205

9.  A simple genealogical structure of strongly balanced allelic lines and trans-species evolution of polymorphism.

Authors:  N Takahata
Journal:  Proc Natl Acad Sci U S A       Date:  1990-04       Impact factor: 11.205

10.  The IMGT/HLA database.

Authors:  James Robinson; Kavita Mistry; Hamish McWilliam; Rodrigo Lopez; Peter Parham; Steven G E Marsh
Journal:  Nucleic Acids Res       Date:  2010-11-11       Impact factor: 16.971

View more
  14 in total

1.  Integrative analyses of major histocompatibility complex loci in the genome-wide association studies of major depressive disorder.

Authors:  Huijuan Li; Hong Chang; Xueqin Song; Weipeng Liu; Lingyi Li; Lu Wang; Yongfeng Yang; Luwen Zhang; Wenqiang Li; Yan Zhang; Dong-Sheng Zhou; Xingxing Li; Chen Zhang; Yiru Fang; Yan Sun; Jia-Pei Dai; Xiong-Jian Luo; Yong-Gang Yao; Xiao Xiao; Luxian Lv; Ming Li
Journal:  Neuropsychopharmacology       Date:  2019-02-16       Impact factor: 7.853

2.  Joint estimation of selection intensity and mutation rate under balancing selection with applications to HLA.

Authors:  Montgomery Slatkin
Journal:  Genetics       Date:  2022-05-31       Impact factor: 4.402

3.  Shared Signature of Recent Positive Selection on the TSBP1-BTNL2-HLA-DRA Genes in Five Native Populations from North Borneo.

Authors:  Boon-Peng Hoh; Xiaoxi Zhang; Lian Deng; Kai Yuan; Chee-Wei Yew; Woei-Yuh Saw; Mohammad Zahirul Hoque; Farhang Aghakhanian; Maude E Phipps; Yik-Ying Teo; Vijay Kumar Subbiah; Shuhua Xu
Journal:  Genome Biol Evol       Date:  2020-12-06       Impact factor: 3.416

4.  A human-specific allelic group of the MHC DRB1 gene in primates.

Authors:  Yoshiki Yasukochi; Yoko Satta
Journal:  J Physiol Anthropol       Date:  2014-06-13       Impact factor: 2.867

Review 5.  The relevance of HLA sequencing in population genetics studies.

Authors:  Alicia Sanchez-Mazas; Diogo Meyer
Journal:  J Immunol Res       Date:  2014-07-15       Impact factor: 4.818

6.  Patterns of MHC-DRB1 polymorphism in a post-glacial island canid, the Newfoundland red fox (Vulpes vulpes deletrix), suggest balancing selection at species and population timescales.

Authors:  H Dawn Marshall; Barbara L Langille; Crystal A Hann; Hugh G Whitney
Journal:  Immunogenetics       Date:  2016-02-19       Impact factor: 2.846

7.  Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection.

Authors:  Tobias L Lenz; Victor Spirin; Daniel M Jordan; Shamil R Sunyaev
Journal:  Mol Biol Evol       Date:  2016-06-28       Impact factor: 16.240

8.  Expression quantitative trait loci of genes predicting outcome are associated with survival of multiple myeloma patients.

Authors:  Angelica Macauda; Chiara Piredda; Alyssa I Clay-Gilmour; Juan Sainz; Gabriele Buda; Miroslaw Markiewicz; Torben Barington; Elad Ziv; Michelle A T Hildebrandt; Alem A Belachew; Judit Varkonyi; Witold Prejzner; Agnieszka Druzd-Sitek; John Spinelli; Niels Frost Andersen; Jonathan N Hofmann; Marek Dudziński; Joaquin Martinez-Lopez; Elzbieta Iskierka-Jazdzewska; Roger L Milne; Grzegorz Mazur; Graham G Giles; Lene Hyldahl Ebbesen; Marcin Rymko; Krzysztof Jamroziak; Edyta Subocz; Rui Manuel Reis; Ramon Garcia-Sanz; Anna Suska; Eva Kannik Haastrup; Daria Zawirska; Norbert Grzasko; Annette Juul Vangsted; Charles Dumontet; Marcin Kruszewski; Magdalena Dutka; Nicola J Camp; Rosalie G Waller; Waldemar Tomczak; Matteo Pelosini; Małgorzata Raźny; Herlander Marques; Niels Abildgaard; Marzena Wątek; Artur Jurczyszyn; Elizabeth E Brown; Sonja Berndt; Aleksandra Butrym; Celine M Vachon; Aaron D Norman; Susan L Slager; Federica Gemignani; Federico Canzian; Daniele Campa
Journal:  Int J Cancer       Date:  2021-03-30       Impact factor: 7.396

9.  Computer simulation of human leukocyte antigen genes supports two main routes of colonization by human populations in East Asia.

Authors:  Da Di; Alicia Sanchez-Mazas; Mathias Currat
Journal:  BMC Evol Biol       Date:  2015-11-04       Impact factor: 3.260

10.  Elucidating the origin of HLA-B*73 allelic lineage: Did modern humans benefit by archaic introgression?

Authors:  Yoshiki Yasukochi; Jun Ohashi
Journal:  Immunogenetics       Date:  2016-09-30       Impact factor: 2.846

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.