Literature DB >> 29526279

Whole-Genome-Sequence-Based Haplotypes Reveal Single Origin of the Sickle Allele during the Holocene Wet Phase.

Daniel Shriner1, Charles N Rotimi2.   

Abstract

Five classical designations of sickle haplotypes are made on the basis of the presence or absence of restriction sites and are named after the ethno-linguistic groups or geographic regions from which the individuals with sickle cell anemia originated. Each haplotype is thought to represent an independent occurrence of the sickle mutation rs334 (c.20A>T [p.Glu7Val] in HBB). We investigated the origins of the sickle mutation by using whole-genome-sequence data. We identified 156 carriers from the 1000 Genomes Project, the African Genome Variation Project, and Qatar. We classified haplotypes by using 27 polymorphisms in linkage disequilibrium with rs334. Network analysis revealed a common haplotype that differed from the ancestral haplotype only by the derived sickle mutation at rs334 and correlated collectively with the Central African Republic (CAR), Cameroon, and Arabian/Indian haplotypes. Other haplotypes were derived from this haplotype and fell into two clusters, one composed of Senegal haplotypes and the other composed of Benin and Senegal haplotypes. The near-exclusive presence of the original sickle haplotype in the CAR, Kenya, Uganda, and South Africa is consistent with this haplotype predating the Bantu expansions. Modeling of balancing selection indicated that the heterozygote advantage was 15.2%, an equilibrium frequency of 12.0% was reached after 87 generations, and the selective environment predated the mutation. The posterior distribution of the ancestral recombination graph yielded a sickle mutation age of 259 generations, corresponding to 7,300 years ago during the Holocene Wet Phase. These results clarify the origin of the sickle allele and improve and simplify the classification of sickle haplotypes. Published by Elsevier Inc.

Entities:  

Keywords:  Green Sahara; ancestral recombination graph; balancing selection; haplotype; sickle

Mesh:

Substances:

Year:  2018        PMID: 29526279      PMCID: PMC5985360          DOI: 10.1016/j.ajhg.2018.02.003

Source DB:  PubMed          Journal:  Am J Hum Genet        ISSN: 0002-9297            Impact factor:   11.025


Introduction

Several hereditary variants in the hemoglobin genes afford protection against malaria (MIM: 611162). Many such variants are thought to have evolved in the last 10,000 years.1, 2 In the beta globin gene HBB (MIM: 141900), the sickle (MIM: 603903) allele βS is under balancing selection because of heterozygote advantage, which results from protection against malaria caused by Plasmodium falciparum and recessive lethality. The chromosomal background of the βS allele has been classified on the basis of the presence or absence of a set of seven canonical restriction sites, 5′ ε HincII → Gγ1 HindIII → Aγ1 HindIII → ψβ HincII → 3′ ψβ HincII → β AvaII → 3′ β BamHI, yielding five haplotypes, Arabian/Indian, Benin, Cameroon, CAR, and Senegal, named after ethno-linguistic groups or geographic regions, as well as a sixth category for “atypical” haplotypes.3, 4, 5, 6, 7, 8, 9 These designations do not necessarily indicate where the haplotypes originated. Whether the βS allele has a recent or old origin has been debated since the development of restriction fragment length polymorphism (RFLP) data.10, 11 According to the multicentric model, the origin of the βS allele is recent—within the last few thousand years—and each haplotype represents an independent occurrence of the same exact mutation in the corresponding geographic region.4, 12, 13, 14 In contrast, according to the unicentric model, the origin of the βS allele is anywhere from tens to hundreds of thousands of years old, and the mutation occurred once.15, 16, 17, 18 Suggested places of origin include equatorial Africa and the Middle East.20, 21 A 1.2 kb recombination hotspot exists 1 kb upstream of HBB. Consequently, recombination and gene conversion, rather than de novo mutation, have generated several haplotypes.3, 23, 24 We investigated the origins of the sickle allele by using whole-genome-sequence data from a total of 2,932 individuals from the 1000 Genomes Project, the African Genome Variation Project, and Qatar. We identified a total of 156 sickle carriers. We classified haplotypes by using phased sequence data comprising diallelic sites (both single-nucleotide polymorphisms and short insertions or deletions). We then used a combination of forward time simulation, phylogenetic network analysis, and coalescent analysis to infer a single origin of the sickle allele approximately 7,300 years ago, during the Holocene Wet Phase or Green Sahara.

Material and Methods

Ethics Statement

This project was excluded from institutional-review-board review by the Office of Human Subjects Research Protections, National Institutes of Health (OHSRP ID# 17-NHGRI-00282).

Sequence Data

We retrieved whole-genome-sequence data from 2,504 individuals in the 1000 Genomes Project, 320 individuals in the African Genome Variation Project, and 108 individuals from Qatar. As previously detailed by the 1000 Genomes Project, a three-stage approach was used to establish phased haplotypes: (1) given array-based genotype data and family information, SHAPEIT2 was used to estimate a scaffold of long-range phased haplotypes, (2) a combination of Beagle and SHAPEIT2 was used to jointly analyze the long-range phased haplotypes and di-allelic variants, and (3) MVNcall was used to place multi-allelic and structural variants onto the haplotype scaffold. The average switch error rate was 0.56%, and the average distance between phasing errors was 1,062 kb. For comparison, the length of the β-globin locus, including the locus control region, is 67 kb. All data were accessed with VCFtools version 0.1.14. We assessed linkage disequilibrium in the 1000 Genomes Project data by using the --hap-r2 function for phased haplotypes.

Measuring Multi-locus Association

We calculated Cramér’s for the association between two nominal variables. In our analyses, the number of levels equaled the number of haplotypes. is the square of the mean canonical correlation and is equivalent to Pearson’s if at least one variable has only two levels. We implemented a bias-corrected version of Cramér’s given by , , , , , , where is the sample size and is the chi-square statistic without a continuity correction for a contingency table with rows and columns.

African Ancestry

We used YFitter to call Y chromosome haplogroups and HaploFind to call mitochondrial DNA haplogroups. We used projection analysis in ADMIXTURE version 1.3 with a reference panel of 21 global ancestries to analyze autosomal ancestry. To determine standard errors for the proportions of ancestral components for each individual, we reran ADMIXTURE with the addition of 200 bootstrap replicates. Accounting for both within and between individual variances, we calculated the proportions for average ancestry by using inverse variance weights. We then calculated 95% confidence intervals for each ancestry and individual, zeroed out any average proportions for which the 95% confidence intervals included 0, and renormalized the remaining averages to sum to 1.

Balancing Selection

Let the genotype frequencies of the sickle homozygote, heterozygote, and wild-type homozygote be , , and , respectively. Let the corresponding relative fitnesses be 0, , and 1, respectively. Then, at equilibrium, . For each of the five continental African samples in the 1000 Genomes Project Phase 3 release version 5a, we estimated the effective population size on the basis of the heterozygosities of all single-nucleotide polymorphisms (i.e., diallelic, triallelic, and quadrallelic) by assuming a mutation rate of mutations per site per generation. We then took the harmonic mean of the five estimates. Assuming one initial copy of the mutant allele, we simulated 1,000 generations under a combination of random genetic drift and balancing selection. We repeated this process 1,000 times.

Phylogenetic Network Analysis

Phylogenetic trees are based on models of evolution that assume a purely bifurcating process. This assumption is violated by several evolutionary processes, including recombination, gene conversion, and recurrent mutation. A single phylogenetic tree is unable to represent incompatible signals across sites. In contrast, a phylogenetic network is a more general form of graph that relaxes the assumption of bifurcation. Split decomposition is an inferential method that computes a set of incompatible splits from a given distance matrix. Trivial splits separate a set of taxa into two sets, one set containing a single taxon and the other set containing all other taxa, thereby defining terminal branches. Non-trivial splits separate a set of taxa into two sets, both containing at least two taxa, thereby defining internal branches. A split network depicts incompatible splits as parallel branches. We used the Neighbor-Net method implemented in SplitsTree version 4.13.1 to perform split decomposition analysis of haplotypes.

Inferring the Ancestral Recombination Graph

We used ARGweaver to infer the ancestral recombination graph. ARGweaver is based on the standard coalescent model and is sensitive to balancing selection, such that regions under balancing selection have older times to the most recent common ancestor than comparable neutral regions. To account for uncertainty in both the mutation and recombination rates, we used a grid approach. We tested mutation rates of , , and mutations per generation per site. We tested recombination rates of , , (the default value for human data), , and recombinations per generation per site. We ran the sampler for 3,000 iterations and saved the ancestral recombination graph from every tenth iteration. We used the functions heidel.diag and geweke.diag in the coda library of R, version 3.2.3, to assess convergence diagnostics on the basis of the posterior distribution of the number of recombination events. To convert generations into years, we assumed a generation interval of 28 years.38, 39

Results

Molecular Mapping of Restriction Sites

We mapped 15 restriction sites, including the seven canonical sites, to the reference human genome sequence. We identified 12 known markers that predict the presence or absence of ten of these sites. Of the canonical sites, we predicted 5′ ε HincII with rs3834466, Gγ1 HindIII with rs2070972, Aγ1 HindIII with rs28440105, and 3′ ψβ HincII with rs968857 (Table 1); similar results were described in a previously reported analysis of rs3834466, rs28440105, rs10128556, and rs968857. Pairwise correlation (measured via ) between these variants and rs334 was weak to nonexistent (Table 1). On the basis of these four RFLP-predicting markers, we identified ten unique haplotypes. between the set of ten haplotypes and rs334 was 0.079 (Figure 1).
Table 1

Molecular Characterization of the Classical Sickle Haplotypes

Site
3′ ψβ HincIIAγ1 HindIIIGγ1 HindIII5′ ε HincII
SequenceaGTTGACAAGCTTAAGCTTGTTGAC
Range5,260,457–5,260,4625,269,799–5,269,8045,274,717–5,274,7225,291,563–5,291,567
rsIDrs968857rs28440105rs2070972rs3834466
Position (hg19)5,260,4585,269,7995,274,7175,291,563–5,291,564
Senegal++
Benin+
CAR+
Cameroon+++
Arabian/Indian+++
r2b0.0000.0160.0030.020
D′b−0.1040.9300.094−0.853
AncestralTCCG
Status+
DerivedCAAGT
Status+++

Underlining indicates the polymorphic position.

Pairwise linkage disequilibrium values are shown with respect to rs334.

Figure 1

Pairwise Association Plots

For each triangle, vertices indicate markers, and edges are labeled with association values. The associations on the bottom edges are conditional on the presence of the βS or βA allele.

(A) Associations between rs334, the RFLP-predicting markers, and the one marker with pairwise with rs334.

(B) Associations between rs334, the RFLP-predicting markers, and the set of three markers with pairwise with rs334.

(C) Associations between rs334, the RFLP-predicting markers, and the set of 27 markers with pairwise with rs334.

Pairwise Association Plots For each triangle, vertices indicate markers, and edges are labeled with association values. The associations on the bottom edges are conditional on the presence of the βS or βA allele. (A) Associations between rs334, the RFLP-predicting markers, and the one marker with pairwise with rs334. (B) Associations between rs334, the RFLP-predicting markers, and the set of three markers with pairwise with rs334. (C) Associations between rs334, the RFLP-predicting markers, and the set of 27 markers with pairwise with rs334. Molecular Characterization of the Classical Sickle Haplotypes Underlining indicates the polymorphic position. Pairwise linkage disequilibrium values are shown with respect to rs334.

Distributions of βS and the Classical Haplotypes

In the 1000 Genomes Project data, we identified 137 sickle carriers and 0 sickle homozygotes; we predicted the classical haplotypes for all 137 carriers (Table 2). The Benin haplotype was the predominant haplotype in the samples of Esan and Yoruba from Nigeria, the CAR haplotype was the predominant haplotype in the sample of Luhya from Kenya, and the Senegal haplotype was the predominant haplotype in the samples of Mende from Sierra Leone and Mandinka from the Gambia. There was limited representation of the Arabian/Indian (one) and Cameroon (two) haplotypes. The average sickle allele frequency was 12.0% and did not statistically differ among the five continental African samples (, ; Table S1). The distribution of matrilines comprised 1 A2, 2 B2, 1 J2, 8 L0, 24 L1, 43 L2, 49 L3, 3 L4, 2 L5, 1 T2, and 3 U6 haplogroups. The distribution of patrilines comprised 2 A1a, 5 E1a, 54 E1b1a, 1 E1b1b, 2 E2b, 1 G2a, 1 I2a, and 2 R1b haplogroups. Of the 54 E1b1a, 29 were E1b1a1a1f and 23 were E1b1a1a1g.
Table 2

Distribution of Classical Sickle Haplotypes

SampleaArabian/IndianBeninCameroonCARSenegalAtypical
ACB040230
ASW011000
CLM000101
ESN0181050
GWD0200240
LWK1001900
MSL0300171
PUR000120
YRI01900101
Baganda0001400
Zulu000100

The population codes are as follows: ACB, African Caribbean in Barbados; ASW, People with African Ancestry in Southwest USA; CLM, Colombians in Medellín, Colombia; ESN, Esan in Nigeria; GWD, Gambian in Western Division, Mandinka; LWK, Luhya in Webuye, Kenya; MSL, Mende in Sierra Leone; PUR, Puerto Ricans in Puerto Rico; and YRI, Yoruba in Ibadan, Nigeria.

Distribution of Classical Sickle Haplotypes The population codes are as follows: ACB, African Caribbean in Barbados; ASW, People with African Ancestry in Southwest USA; CLM, Colombians in Medellín, Colombia; ESN, Esan in Nigeria; GWD, Gambian in Western Division, Mandinka; LWK, Luhya in Webuye, Kenya; MSL, Mende in Sierra Leone; PUR, Puerto Ricans in Puerto Rico; and YRI, Yoruba in Ibadan, Nigeria. In the African Genome Variation Project data, we identified 14 sickle carriers in the Baganda and one sickle carrier in the Zulu. We predicted that all 15 of these individuals carried the CAR haplotype (Table 2). In the Qatar sample, we identified four sickle carriers, all with insufficient information to predict the haplotypes.

Haplotype Classification Based on Linkage Disequilibrium

We defined haplotypes centered on rs334 in the 504 continental Africans from the 1000 Genomes Project. We recorded pairwise linkage disequilibrium (LD) between rs334 and all phased, diallelic markers on chromosome 11 (Figure 2). The largest value of with rs334 was 0.407. There was only one marker, rs149481026, with . This one marker was more strongly associated with rs334 than the set of four RFLP-predicting markers was (Figure 1A).
Figure 2

Linkage Disequilibrium with rs334

We calculated pairwise with rs334 across chromosome 11 among 504 continental Africans from the 1000 Genomes Project. We plotted for all 4,024,958 phased diallelic markers.

Linkage Disequilibrium with rs334 We calculated pairwise with rs334 across chromosome 11 among 504 continental Africans from the 1000 Genomes Project. We plotted for all 4,024,958 phased diallelic markers. To strengthen the association with rs334, we investigated lower levels of . There were three markers with : rs183055323, rs149481026, and rs73404549. These three markers ranged across 132.6 kb, including the entire β-globin cluster. This interval was substantially smaller than the average distance between phasing errors. On the basis of these three markers, we identified five unique haplotypes (Table 3). One haplotype contained all occurrences of the Arabian/Indian, Cameroon, and CAR haplotypes. This haplotype contained the ancestral allele at all three markers. The Senegal haplotype was distributed across all five haplotypes, and the Benin haplotype was distributed across four of the five haplotypes.  between these three markers and rs334 was 0.753 (Figure 1B).
Table 3

Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using Three Markers

NameHaplotypeaArabian/IndianBeninCameroonCARSenegalAtypical
Ancestor0000NANANANANANA
HAPA00101423811
HAPB0011020010
HAPC0110010011
HAPD011104000150
HAPE10100000431

0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined.

Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using Three Markers 0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined. To improve cross-classification with the Senegal and Benin haplotypes, we identified 27 markers with . These markers extended across 725.3 kb, which is still less than the average distance between phasing errors. On the basis of these markers, we identified 59 unique haplotypes, of which 18 carried the sickle allele at rs334. A 19th sickle haplotype was observed once in the ACB sample, and a 20th sickle haplotype was observed once in the Baganda sample (Table 4). between these 27 markers and rs334 was 0.728 (Figure 1C). The most common haplotype carried the ancestral allele at all 28 sites and accounted for 68.5% of all haplotypes in the continental Africans. Globally, the ancestral haplotype had a frequency of 91.9%, was the most frequent haplotype in all 26 samples in the 1000 Genomes Project, and was the only haplotype in 15 of those samples, including all ten samples from East Asia and Europe. 13 of the sickle haplotypes in the Baganda, the one in the Zulu, and all four in the Qatari were identical to HAP1, the haplotype most commonly designated CAR (Table 4). Additionally, the four Qatari carriers had (1) >7.8% autosomal African ancestry, (2) an African mitochondrial haplogroup, or (3) an African Y chromosome haplogroup. The three most common haplotypes (HAP1, HAP16, and HAP20) correlated primarily with the CAR, Benin, and Senegal haplotypes, respectively (Table 4).
Table 4

Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using 27 Markers

NameHaplotypeaArabian/IndianBeninCameroonCARSenegalAtypical
Ancestor0000000000000000000000000010NANANANANANA
HAP100000000000000000000100000101023701
HAP20000000000000000000010110010040000
HAP30000000000000000000010110101000010
HAP40000000000000000001110110101020000
HAP50000000000000000111110110101050000
HAP60000000000000001100011001010000030
HAP70000000100000100111110110101010000
HAP80000000100000100111110110110000010
HAP90000111011111011100011001010000010
HAP100001000000000000000010000010000100
HAP110001000100000100111110110101020010
HAP120011000100000100101110000010000010
HAP130011000100000100111110000010010001
HAP140011000100000100111110110010000010
HAP150011000100000100111110110100020010
HAP1600110001000001001111101101010230070
HAP170011000100000100111110110110060050
HAP180011000100000100111110110111010000
HAP191100111011111011100010000010000031
HAP2011001110111110111000110010100000360

0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined.

Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using 27 Markers 0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined. Given that our sets of 1, 3, and 27 markers more strongly associated with rs334 than did the set of RFLP-predicting markers, we assessed association between our sets of markers and the set of RFLP-predicting markers. Our sets of markers were moderately associated with the set of RFLP-predicting markers when conditioned on the presence of βS and weakly associated when conditioned on the presence of βA (Figure 1).

Bioinformatic Annotation

Different haplotypes might be associated with different clinical phenotypes or disease severity. Using Ensembl and HaploReg version 4.1, we annotated each of the 27 markers in linkage disequilibrium at with rs334 (Table S1). Nine sites were marked on histones as promoters or enhancers, possibly implicating differential expression of HBB, HBD (MIM: 142000), HBE1 (MIM: 142100), or HBG2 (MIM: 142250). Three of these nine sites, rs334, rs73402608, and rs112336602, were also annotated as bound by proteins, possibly implicating the same four genes. In addition, rs1039215 is correlated with gene expression, most strongly with HBG2 (Table S2). We modeled balancing selection assuming that the relative fitness of the βA/βA homozygote was 1, the relative fitness of the βS/βS homozygote was 0, and the relative fitness of the βA/βS heterozygote was . On the basis of the 504 continental Africans from the 1000 Genomes Project, we estimated that , which was in agreement with previous estimates. Next, by assuming a single initial copy and an effective population size , we modeled random genetic drift plus balancing selection to estimate how many generations it would take for an equilibrium frequency of 12.0% to be reached. We found that the mutant allele was lost 74.6% of the time and, conditional on reaching equilibrium, reached a frequency of 12.0% after an average of 87 (95% confidence interval ) generations, or approximately 2,400 (95% confidence interval from 1,900 to 3,500) years. We stress that this value is not the age of the sickle mutation nor the age since the onset of balancing selection but rather the time to reach a frequency of 12.0%. To determine the fate of the mutant allele in the absence of heterozygote advantage, we repeated the simulation while assuming . We found that the mutant allele was lost after an average of 12 generations (95% confidence interval ), with a median of two generations. We used split decomposition analysis to infer the phylogeny of the 20 sickle haplotypes, whereby the phylogeny was rooted by the ancestral haplotype. (Figure 3). The network revealed that the sickle mutation occurred once in the background of the ancestral haplotype and gave rise to HAP1, which is associated predominantly with the CAR haplotype. One cluster containing HAP1 and HAP10 accounted for all occurrences of the Cameroon, CAR, and Arabian/Indian haplotypes (Table 4). Two additional clusters were derived from this haplotype and not from the ancestral haplotype. One cluster contained HAP6, HAP9, HAP19, and HAP20; all associated with the Senegal haplotype (Table 4). In this cluster, the modal haplotype, HAP20, differed from HAP1 by 15 derived mutations (Table 4 and Table S1). This cluster accounted for 70.5% of occurrences of the Senegal haplotype. The other cluster contained the remaining 14 haplotypes, accounting for all occurrences of the Benin haplotype and 29.5% of the occurrences of the Senegal haplotype. In this cluster, the modal haplotype, HAP16, differed from HAP1 by 13 derived mutations (Table 4; Table S1). Other than the sickle mutation, only one derived mutation, at rs10655224, was shared between HAP16 and HAP20. The remaining 14 derived mutations in HAP20 had an average frequency of 11.3% in the Gambian in Western Divisions in the Gambia (GWD) and Mende in Sierra Leone (MSL) samples and 0.1% in the Esan in Nigeria (ESN) and Yoruba in Ibadan, Nigeria (YRI) samples (Table S1). Conversely, the remaining 12 derived mutations in HAP16 had an average frequency of 13.4% in the ESN and YRI samples and 2.2% in the GWD and MSL samples (Table S1). These 26 derived alleles had an average frequency of 0.4% in the Luhya in Webuye, Kenya (LWK) sample (Table S1).
Figure 3

Split Decomposition Networks of Sickle-Carrying Haplotypes

(A) Network of 20 distinct sickle-carrying haplotypes, rooted by the ancestral haplotype. The haplotypes are defined in Table 4. The single branch leading from the ancestral root is the only branch to which the sickle mutation contributes, indicating a single origin of the sickle mutation. The scale bar represents 0.01 mutations/site.

(B) The subnetwork showing the ancestral root and the three modal haplotypes. This subnetwork emphasizes that HAP16 and HAP20 share a common ancestor and that this common ancestor is derived from HAP1. The scale bar represents 0.1 mutations/site.

Split Decomposition Networks of Sickle-Carrying Haplotypes (A) Network of 20 distinct sickle-carrying haplotypes, rooted by the ancestral haplotype. The haplotypes are defined in Table 4. The single branch leading from the ancestral root is the only branch to which the sickle mutation contributes, indicating a single origin of the sickle mutation. The scale bar represents 0.01 mutations/site. (B) The subnetwork showing the ancestral root and the three modal haplotypes. This subnetwork emphasizes that HAP16 and HAP20 share a common ancestor and that this common ancestor is derived from HAP1. The scale bar represents 0.1 mutations/site.

The Ancestral Recombination Graph

Using coalescent theory, we sampled the posterior distribution of the ancestral recombination graph with the 1,008 haplotypes, including 121 sickle haplotypes, from the 504 continental Africans from the 1000 Genomes Project. Over a grid of 15 pairs of mutation and recombination rates, a mutation rate of mutations per generation per site and a recombination rate of recombinations per generation per site yielded the best fit to the data. Given these two rates, the trace of the posterior number of recombination events visually indicated convergence to a stationary distribution (Figure S1). More formally, Geweke’s diagnostic indicated that the sampled values came from a stationary distribution . Heidelberger and Welch’s diagnostic also indicated that the sampled values came from a stationary distribution and further indicated that there was no need to discard initial iterations. We estimated the age of the sickle mutation as 259 (95% credible interval ) generations, or approximately 7,300 years (95% credible interval from 3,400 to 11,100 years).

Discussion

There are two models of the origins of the sickle allele. The multicentric model posits five independent occurrences of the same mutation within the last few thousand years. The unicentric model posits a single occurrence and an older age. We used whole-genome-sequence data to provide insight into this issue. Using haplotypic classification and phylogenetic network analysis, we found clear and consistent evidence for a single origin of the sickle mutation. After accounting for recombination, we estimated that the sickle mutation is 259 generations old. Starting from first principles of population genetics, we defined haplotypes on the basis of phased sequence data. In contrast, the classical haplotypes were based on patterns of restriction sites seen in individuals with sickle cell anemia.3, 4 By molecularly mapping restriction sites to the sequence data, we found that the restriction sites correlated poorly with rs334, such that none of the canonical sites were included in our sequence-based haplotypes. This discrepancy can be explained by the fact that the RFLP-defining markers circulate on both βS- and βA-carrying chromosomes, even though the restriction sites were originally used for defining types of βS-carrying chromosomes. In contrast, our haplotypes correlate strongly with rs334 and simultaneously correlate moderately with the restriction sites conditional on the presence of the βS allele, thereby capturing the classical patterns. There are four lines of evidence regarding the age of the sickle mutation: historical data, the simulations of balancing selection, the patriline data, and the analysis of the ancestral recombination graph. Although historical records are sparse, the earliest recorded cases of fevers that could have been caused by malaria were ∼5,000 years ago in China and were possibly due to Plasmodium vivax.2, 43, 44 The earliest recorded case of illness that could have been malaria specifically caused by Plasmodium falciparum could have been ∼4,000 years ago in Egypt and Sumer; however, Plasmodium falciparum could have been present in Africa several thousand years earlier. The first recorded cases of sickle cell anemia or, more broadly, sickle cell disease were in Egypt during the predynastic period (∼3200 BC), in the Persian Gulf during the Hellenistic period (2,130 years before present), and in Ghana in 1670 AD. The existence of the sickle allele in predynastic Egypt constrains the lower bound of the age of the sickle mutation to be 5,200 years. On the basis of these limited data, it is historically plausible that the selective environment preceded the sickle mutation, consistent with our balancing-selection simulations, which showed that the sickle allele would have been lost almost immediately without a heterozygote advantage (under the assumption of recessive lethality). The patriline data corroborate the age of the sickle mutation. Given a single origin, the sickle mutation most likely arose either in a population containing both E1b1a1a1f-L485 (also known as E1b1a1a1a1c-L485) and E1b1a1a1g-U175 (also known as E1b1a1a1a2a-U175) or in a population containing their common ancestor. Thus, the time of origin of the common ancestor of these two haplogroups is a plausible upper bound on the age of the sickle mutation. The common ancestor E1b1a1a1a-M4732 arose 10,500 (95% confidence interval between 9,200 and 12,000) years ago, consistent with the upper credible interval of 11,100 years ago that we estimated for the age of the sickle mutation. The Bantu expansions started ∼5,000 years ago and took ∼2,000 years to cross from present-day Cameroon to the Great Lakes region. The wide credible interval (3,400 to 11,100 years) of the age of the sickle mutation does not rule out the possibility that the mutation coincided with the onset of the Bantu expansions. However, our results support the hypothesis that the origin of the βS allele predated the onset of the Bantu expansions. First, the βS allele could have been at or near its equilibrium frequency when the Bantu expansions occurred ∼5,000 years ago, according to our estimation that the sickle mutation originated ∼7,300 years ago and that balancing selection ensued immediately and reached equilibrium in ∼2,400 years in an environment where malaria preexisted. Second, the likelihood that the βS allele is carried by migrating individuals increases with the frequency of that allele in the source population. Our results provide some suggestions as to where the sickle mutation might have originated. Descendants of the Y chromosome haplogroup E1b1a-V38 migrated across the Sahara from east to west, possibly around 19,000 years ago. E1b1a1-M2 most likely did not originate in eastern or northeastern Africa, but where it originated in either western or central Africa is unclear. Accordingly, the sickle mutation most likely did not occur in eastern or northeastern Africa. Our results indicate that the origin of the sickle mutation was in the middle of the Holocene Wet Phase, or Neolithic Subpluvial, which lasted from ∼7,500–7,000 BC to ∼3,500–3,000 BC. This time was the most recent of the Green Sahara periods, during which the Sahara experienced wet and rainy conditions. Our results thus support the Green Sahara as a possible place of origin of the sickle mutation. An alternative hypothesis is that the sickle allele arose in west-central Africa,13, 52 possibly in the northwestern portion of the equatorial rainforest. Our results also provide some suggestions as to where the three clusters might have originated. Two splits occurred early in the original βS-carrying population. The first split defined one cluster containing HAP1 and accounting for the Cameroon and CAR haplotypes. It is plausible that HAP1 was carried from an area in or around present-day Cameroon to the area that is presently the CAR, as well as to areas east and south, as part of the Bantu expansions. However, the Bantu expansions did not extend west and north. The second split subsequently separated the clusters containing HAP16 and HAP20, the modal haplotypes accounting for the Benin and Senegal haplotypes, respectively. HAP16 and HAP20 shared one derived mutation, consistent with an early split. Furthermore, given the subsequent accumulation of derived mutations not shared between HAP16 and HAP20, effectively no gene flow occurred between these two descendant populations, consistent with geographic separation. We therefore hypothesize that the common ancestor of these two clusters existed north of Cameroon among non-Bantu-speaking peoples in or around present-day Nigeria. From this common ancestral population, a group of migrants separated and traveled west and north to the area around present-day Senegal and the Gambia. These migrants could have taken a coastal or an inland route. The finding that the Senegal haplotype was the predominant haplotype in the sample of Mende from Sierra Leone is consistent with a coastal route. We do not have data to investigate an inland route, but we note that the frequency of the sickle allele is higher along the coast than inland. Classical haplotypes in eastern Arabia tend to have the Arabian/Indian designation, whereas those in western Arabia tend to have the Benin designation.18, 21 The Arabian/Indian haplotype has been hypothesized to have originated in either east Saudi Arabia or India. Although our samples include only one predicted instance of the Arabian/Indian haplotype, the occurrence of this haplotype in the Luhya in Kenya and its clustering with the predominant haplotype found in Kenya and Uganda are consistent with the hypothesis that the Arabian/Indian haplotype originated in Africa and had an overseas migration route from eastern Africa to eastern Arabia and India.13, 19 In contrast, the absence of the Benin haplotype in the Luhya in Kenya and the Baganda in Uganda provides evidence against an overseas migration route from eastern Africa to western Arabia. Instead, the presence of the Benin haplotype in western Arabia is consistent with an African origin and an overland migration route through northeast Africa. Together, our results suggest the following evolutionary history of the sickle mutation. The presence of African ancestry, an African patriline, and/or an African matriline in all sickle carriers, combined with the absence of Arabian or Indian ancestries in the five continental African samples in the 1000 Genomes Project, supports an African origin of the sickle mutation. The sickle mutation occurred once approximately 7,300 years ago either in the Sahara or in west-central Africa. In an environment where malaria preexisted, balancing selection took approximately 2,400 years to drive the βS allele to an equilibrium frequency of 12.0%. A population split occurred, possibly in the area around present-day Cameroon. Starting approximately 5,000 years ago, the Bantu expansions resulted in the spread of HAP1 to the CAR, the Great Lakes region, and South Africa. After acquiring one derived allele, the population split into one carrying a haplotype that evolved into HAP16 and another carrying a haplotype that evolved into HAP20. In both clusters, hitchhiking of the derived alleles with the βS allele resulted in similar derived allele frequencies. Our study includes sequence data from western, west-central, and eastern Africa but lacks comparable data from northern, central, and southern Africa. A haplotype has been identified among Sudanese individuals that was classified as atypical but might be a fifth African haplotype. It is possible that sampling additional populations could provide more evidence regarding when, where, and in whom the sickle mutation arose. Our findings support classification of sickle haplotypes based on three clusters. Notably, we found that the Senegal haplotype is substructured into two clusters, one containing only Senegal haplotypes and one containing Benin and Senegal haplotypes. This substructuring of haplotypes might have confounded previous assessments of clinical phenotype or disease severity. At the same time, the identification of markers linked to the sickle allele provides an opportunity to investigate possible cis-effects, such as differential expression of HBD, HBE1, and HBG2, on disease severity.
  48 in total

1.  Molecular analysis of the β-globin gene cluster haplotypes in a Sudanese population with sickle cell anaemia.

Authors:  A Y Elderdery; J Mills; B A Mohamed; A J Cooper; A O Mohammed; N Eltieb; J Old
Journal:  Int J Lab Hematol       Date:  2012-01-20       Impact factor: 2.877

2.  Application of phylogenetic networks in evolutionary studies.

Authors:  Daniel H Huson; David Bryant
Journal:  Mol Biol Evol       Date:  2005-10-12       Impact factor: 16.240

Review 3.  Sickle Cell Disease.

Authors:  Frédéric B Piel; Martin H Steinberg; David C Rees
Journal:  N Engl J Med       Date:  2017-04-20       Impact factor: 91.245

4.  Atypical haplotypes linked to the beta S gene in Africa are likely to be the product of recombination.

Authors:  R Srinivas; O Dunda; R Krishnamoorthy; M E Fabry; A Georges; D Labie; R L Nagel
Journal:  Am J Hematol       Date:  1988-09       Impact factor: 10.047

Review 5.  The origin of mutant beta-globin genes in human populations.

Authors:  J S Wainscoat
Journal:  Acta Haematol       Date:  1987       Impact factor: 2.195

6.  Sickle gene. Its origin and diffusion from West Africa.

Authors:  J G Mears; H M Lachman; R Cabannes; K P Amegnizin; D Labie; R L Nagel
Journal:  J Clin Invest       Date:  1981-09       Impact factor: 14.808

7.  Fetal Haemoglobin and β-globin Gene Cluster Haplotypes among Sickle Cell Patients in Chhattisgarh.

Authors:  Sanjana Bhagat; Pradeep Kumar Patra; Amar Singh Thakur
Journal:  J Clin Diagn Res       Date:  2013-02-01

8.  HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants.

Authors:  Lucas D Ward; Manolis Kellis
Journal:  Nucleic Acids Res       Date:  2011-11-07       Impact factor: 16.971

9.  A phased SNP-based classification of sickle cell anemia HBB haplotypes.

Authors:  Elmutaz M Shaikho; John J Farrell; Abdulrahman Alsultan; Hatem Qutub; Amein K Al-Ali; Maria Stella Figueiredo; David H K Chui; Lindsay A Farrer; George J Murphy; Gustavo Mostoslavsky; Paola Sebastiani; Martin H Steinberg
Journal:  BMC Genomics       Date:  2017-08-11       Impact factor: 3.969

10.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.

Authors:  Swapan Mallick; Heng Li; Mark Lipson; Iain Mathieson; Melissa Gymrek; Fernando Racimo; Mengyao Zhao; Niru Chennagiri; Susanne Nordenfelt; Arti Tandon; Pontus Skoglund; Iosif Lazaridis; Sriram Sankararaman; Qiaomei Fu; Nadin Rohland; Gabriel Renaud; Yaniv Erlich; Thomas Willems; Carla Gallo; Jeffrey P Spence; Yun S Song; Giovanni Poletti; Francois Balloux; George van Driem; Peter de Knijff; Irene Gallego Romero; Aashish R Jha; Doron M Behar; Claudio M Bravi; Cristian Capelli; Tor Hervig; Andres Moreno-Estrada; Olga L Posukh; Elena Balanovska; Oleg Balanovsky; Sena Karachanak-Yankova; Hovhannes Sahakyan; Draga Toncheva; Levon Yepiskoposyan; Chris Tyler-Smith; Yali Xue; M Syafiq Abdullah; Andres Ruiz-Linares; Cynthia M Beall; Anna Di Rienzo; Choongwon Jeong; Elena B Starikovskaya; Ene Metspalu; Jüri Parik; Richard Villems; Brenna M Henn; Ugur Hodoglugil; Robert Mahley; Antti Sajantila; George Stamatoyannopoulos; Joseph T S Wee; Rita Khusainova; Elza Khusnutdinova; Sergey Litvinov; George Ayodo; David Comas; Michael F Hammer; Toomas Kivisild; William Klitz; Cheryl A Winkler; Damian Labuda; Michael Bamshad; Lynn B Jorde; Sarah A Tishkoff; W Scott Watkins; Mait Metspalu; Stanislav Dryomov; Rem Sukernik; Lalji Singh; Kumarasamy Thangaraj; Svante Pääbo; Janet Kelso; Nick Patterson; David Reich
Journal:  Nature       Date:  2016-09-21       Impact factor: 49.962

View more
  32 in total

1.  Co-inheritance of a-thalassemia dramatically decreases the risk of acute splenic sequestration in a large cohort of newborns with hemoglobin SC.

Authors:  Paulo V Rezende; André R Belisário; Érica L Oliveira; Jéssica A Almeida; Larissa M M Oliveira; Maristela B S R Muniz; Marcos B Viana
Journal:  Haematologica       Date:  2019-01-10       Impact factor: 9.941

2.  Genetic susceptibility to severe childhood asthma and rhinovirus-C maintained by balancing selection in humans for 150 000 years.

Authors:  Mary B O'Neill; Guillaume Laval; João C Teixeira; Ann C Palmenberg; Caitlin S Pepperell
Journal:  Hum Mol Genet       Date:  2020-03-27       Impact factor: 6.150

3.  Impact of Rare and Common Genetic Variants on Diabetes Diagnosis by Hemoglobin A1c in Multi-Ancestry Cohorts: The Trans-Omics for Precision Medicine Program.

Authors:  Chloé Sarnowski; Aaron Leong; Laura M Raffield; Peitao Wu; Paul S de Vries; Daniel DiCorpo; Xiuqing Guo; Huichun Xu; Yongmei Liu; Xiuwen Zheng; Yao Hu; Jennifer A Brody; Mark O Goodarzi; Bertha A Hidalgo; Heather M Highland; Deepti Jain; Ching-Ti Liu; Rakhi P Naik; Jeffrey R O'Connell; James A Perry; Bianca C Porneala; Elizabeth Selvin; Jennifer Wessel; Bruce M Psaty; Joanne E Curran; Juan M Peralta; John Blangero; Charles Kooperberg; Rasika Mathias; Andrew D Johnson; Alexander P Reiner; Braxton D Mitchell; L Adrienne Cupples; Ramachandran S Vasan; Adolfo Correa; Alanna C Morrison; Eric Boerwinkle; Jerome I Rotter; Stephen S Rich; Alisa K Manning; Josée Dupuis; James B Meigs
Journal:  Am J Hum Genet       Date:  2019-09-26       Impact factor: 11.025

Review 4.  Fetal hemoglobin in sickle cell anemia.

Authors:  Martin H Steinberg
Journal:  Blood       Date:  2020-11-19       Impact factor: 22.113

Review 5.  Genetic Modifiers of Fetal Haemoglobin in Sickle Cell Disease.

Authors:  Stephan Menzel; Swee Lay Thein
Journal:  Mol Diagn Ther       Date:  2019-04       Impact factor: 4.074

6.  Recent Adaptive Acquisition by African Rainforest Hunter-Gatherers of the Late Pleistocene Sickle-Cell Mutation Suggests Past Differences in Malaria Exposure.

Authors:  Guillaume Laval; Stéphane Peyrégne; Nora Zidane; Christine Harmant; François Renaud; Etienne Patin; Franck Prugnolle; Lluis Quintana-Murci
Journal:  Am J Hum Genet       Date:  2019-02-28       Impact factor: 11.025

Review 7.  Not all red cells sickle the same: Contributions of the reticulocyte to disease pathology in sickle cell anemia.

Authors:  Marcus A Carden; Ross M Fasano; Emily Riehm Meier
Journal:  Blood Rev       Date:  2019-11-05       Impact factor: 8.250

8.  The Emergence of Genomic Research in Africa and New Frameworks for Equity in Biomedical Research.

Authors:  Amy R Bentley; Shawneequa Callier; Charles Rotimi
Journal:  Ethn Dis       Date:  2019-02-21       Impact factor: 1.847

Review 9.  A Comprehensive Review of the Treatment and Management of Pain in Sickle Cell Disease.

Authors:  Jacob Fiocchi; Ivan Urits; Vwaire Orhurhu; Mariam Salisu Orhurhu; Stephen Giacomazzi; Briggs Hoyt; Alan D Kaye; Rachel J Kaye; Omar Viswanath
Journal:  Curr Pain Headache Rep       Date:  2020-03-21

Review 10.  Evolutionary history of sickle-cell mutation: implications for global genetic medicine.

Authors:  Kevin Esoh; Ambroise Wonkam
Journal:  Hum Mol Genet       Date:  2021-04-26       Impact factor: 6.150

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.