| Literature DB >> 29526279 |
Daniel Shriner1, Charles N Rotimi2.
Abstract
Five classical designations of sickle haplotypes are made on the basis of the presence or absence of restriction sites and are named after the ethno-linguistic groups or geographic regions from which the individuals with sickle cell anemia originated. Each haplotype is thought to represent an independent occurrence of the sickle mutation rs334 (c.20A>T [p.Glu7Val] in HBB). We investigated the origins of the sickle mutation by using whole-genome-sequence data. We identified 156 carriers from the 1000 Genomes Project, the African Genome Variation Project, and Qatar. We classified haplotypes by using 27 polymorphisms in linkage disequilibrium with rs334. Network analysis revealed a common haplotype that differed from the ancestral haplotype only by the derived sickle mutation at rs334 and correlated collectively with the Central African Republic (CAR), Cameroon, and Arabian/Indian haplotypes. Other haplotypes were derived from this haplotype and fell into two clusters, one composed of Senegal haplotypes and the other composed of Benin and Senegal haplotypes. The near-exclusive presence of the original sickle haplotype in the CAR, Kenya, Uganda, and South Africa is consistent with this haplotype predating the Bantu expansions. Modeling of balancing selection indicated that the heterozygote advantage was 15.2%, an equilibrium frequency of 12.0% was reached after 87 generations, and the selective environment predated the mutation. The posterior distribution of the ancestral recombination graph yielded a sickle mutation age of 259 generations, corresponding to 7,300 years ago during the Holocene Wet Phase. These results clarify the origin of the sickle allele and improve and simplify the classification of sickle haplotypes. Published by Elsevier Inc.Entities:
Keywords: Green Sahara; ancestral recombination graph; balancing selection; haplotype; sickle
Mesh:
Substances:
Year: 2018 PMID: 29526279 PMCID: PMC5985360 DOI: 10.1016/j.ajhg.2018.02.003
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025
Molecular Characterization of the Classical Sickle Haplotypes
| Sequence | G | G | ||
| Range | 5,260,457–5,260,462 | 5,269,799–5,269,804 | 5,274,717–5,274,722 | 5,291,563–5,291,567 |
| rsID | rs968857 | rs28440105 | rs2070972 | rs3834466 |
| Position (hg19) | 5,260,458 | 5,269,799 | 5,274,717 | 5,291,563–5,291,564 |
| Senegal | + | − | + | − |
| Benin | + | − | − | − |
| CAR | − | − | + | − |
| Cameroon | + | + | + | − |
| Arabian/Indian | + | − | + | + |
| r2 | 0.000 | 0.016 | 0.003 | 0.020 |
| D′ | −0.104 | 0.930 | 0.094 | −0.853 |
| Ancestral | T | C | C | G |
| Status | + | − | − | − |
| Derived | C | A | A | GT |
| Status | − | + | + | + |
Underlining indicates the polymorphic position.
Pairwise linkage disequilibrium values are shown with respect to rs334.
Figure 1Pairwise Association Plots
For each triangle, vertices indicate markers, and edges are labeled with association values. The associations on the bottom edges are conditional on the presence of the βS or βA allele.
(A) Associations between rs334, the RFLP-predicting markers, and the one marker with pairwise with rs334.
(B) Associations between rs334, the RFLP-predicting markers, and the set of three markers with pairwise with rs334.
(C) Associations between rs334, the RFLP-predicting markers, and the set of 27 markers with pairwise with rs334.
Distribution of Classical Sickle Haplotypes
| ACB | 0 | 4 | 0 | 2 | 3 | 0 |
| ASW | 0 | 1 | 1 | 0 | 0 | 0 |
| CLM | 0 | 0 | 0 | 1 | 0 | 1 |
| ESN | 0 | 18 | 1 | 0 | 5 | 0 |
| GWD | 0 | 2 | 0 | 0 | 24 | 0 |
| LWK | 1 | 0 | 0 | 19 | 0 | 0 |
| MSL | 0 | 3 | 0 | 0 | 17 | 1 |
| PUR | 0 | 0 | 0 | 1 | 2 | 0 |
| YRI | 0 | 19 | 0 | 0 | 10 | 1 |
| Baganda | 0 | 0 | 0 | 14 | 0 | 0 |
| Zulu | 0 | 0 | 0 | 1 | 0 | 0 |
The population codes are as follows: ACB, African Caribbean in Barbados; ASW, People with African Ancestry in Southwest USA; CLM, Colombians in Medellín, Colombia; ESN, Esan in Nigeria; GWD, Gambian in Western Division, Mandinka; LWK, Luhya in Webuye, Kenya; MSL, Mende in Sierra Leone; PUR, Puerto Ricans in Puerto Rico; and YRI, Yoruba in Ibadan, Nigeria.
Figure 2Linkage Disequilibrium with rs334
We calculated pairwise with rs334 across chromosome 11 among 504 continental Africans from the 1000 Genomes Project. We plotted for all 4,024,958 phased diallelic markers.
Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using Three Markers
| Ancestor | 00 | NA | NA | NA | NA | NA | NA |
| HAPA | 00 | 1 | 4 | 2 | 38 | 1 | 1 |
| HAPB | 00 | 0 | 2 | 0 | 0 | 1 | 0 |
| HAPC | 01 | 0 | 1 | 0 | 0 | 1 | 1 |
| HAPD | 01 | 0 | 40 | 0 | 0 | 15 | 0 |
| HAPE | 10 | 0 | 0 | 0 | 0 | 43 | 1 |
0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined.
Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using 27 Markers
| Ancestor | 00000000000000000000 | NA | NA | NA | NA | NA | NA |
| HAP1 | 00000000000000000000 | 1 | 0 | 2 | 37 | 0 | 1 |
| HAP2 | 00000000000000000000 | 0 | 4 | 0 | 0 | 0 | 0 |
| HAP3 | 00000000000000000000 | 0 | 0 | 0 | 0 | 1 | 0 |
| HAP4 | 00000000000000000011 | 0 | 2 | 0 | 0 | 0 | 0 |
| HAP5 | 00000000000000001111 | 0 | 5 | 0 | 0 | 0 | 0 |
| HAP6 | 00000000000000011000 | 0 | 0 | 0 | 0 | 3 | 0 |
| HAP7 | 00000001000001001111 | 0 | 1 | 0 | 0 | 0 | 0 |
| HAP8 | 00000001000001001111 | 0 | 0 | 0 | 0 | 1 | 0 |
| HAP9 | 00001110111110111000 | 0 | 0 | 0 | 0 | 1 | 0 |
| HAP10 | 00010000000000000000 | 0 | 0 | 0 | 1 | 0 | 0 |
| HAP11 | 00010001000001001111 | 0 | 2 | 0 | 0 | 1 | 0 |
| HAP12 | 00110001000001001011 | 0 | 0 | 0 | 0 | 1 | 0 |
| HAP13 | 00110001000001001111 | 0 | 1 | 0 | 0 | 0 | 1 |
| HAP14 | 00110001000001001111 | 0 | 0 | 0 | 0 | 1 | 0 |
| HAP15 | 00110001000001001111 | 0 | 2 | 0 | 0 | 1 | 0 |
| HAP16 | 00110001000001001111 | 0 | 23 | 0 | 0 | 7 | 0 |
| HAP17 | 00110001000001001111 | 0 | 6 | 0 | 0 | 5 | 0 |
| HAP18 | 00110001000001001111 | 0 | 1 | 0 | 0 | 0 | 0 |
| HAP19 | 11001110111110111000 | 0 | 0 | 0 | 0 | 3 | 1 |
| HAP20 | 11001110111110111000 | 0 | 0 | 0 | 0 | 36 | 0 |
0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined.
Figure 3Split Decomposition Networks of Sickle-Carrying Haplotypes
(A) Network of 20 distinct sickle-carrying haplotypes, rooted by the ancestral haplotype. The haplotypes are defined in Table 4. The single branch leading from the ancestral root is the only branch to which the sickle mutation contributes, indicating a single origin of the sickle mutation. The scale bar represents 0.01 mutations/site.
(B) The subnetwork showing the ancestral root and the three modal haplotypes. This subnetwork emphasizes that HAP16 and HAP20 share a common ancestor and that this common ancestor is derived from HAP1. The scale bar represents 0.1 mutations/site.