Literature DB >> 32697301

Khoe-San Genomes Reveal Unique Variation and Confirm the Deepest Population Divergence in Homo sapiens.

Carina M Schlebusch^1,2,3, Per Sjödin¹, Gwenna Breton¹, Torsten Günther¹, Thijessen Naidoo^1,2,3, Nina Hollfelder¹, Agnes E Sjöstrand¹, Jingzi Xu¹, Lucie M Gattepaille¹, Mário Vicente¹, Douglas G Scofield^4,5, Helena Malmström^1,2, Michael de Jongh⁶, Marlize Lombard², Himla Soodyall^7,8, Mattias Jakobsson^1,2,3.

Abstract

The southern African indigenous Khoe-San populations harbor the most divergent lineages of all living peoples. Exploring their genomes is key to understanding deep human history. We sequenced 25 full genomes from five Khoe-San populations, revealing many novel variants, that 25% of variants are unique to the Khoe-San, and that the Khoe-San group harbors the greatest level of diversity across the globe. In line with previous studies, we found several gene regions with extreme values in genome-wide scans for selection, potentially caused by natural selection in the lineage leading to Homo sapiens and more recent in time. These gene regions included immunity-, sperm-, brain-, diet-, and muscle-related genes. When accounting for recent admixture, all Khoe-San groups display genetic diversity approaching the levels in other African groups and a reduction in effective population size starting around 100,000 years ago. Hence, all human groups show a reduction in effective population size commencing around the time of the Out-of-Africa migrations, which coincides with changes in the paleoclimate records, changes that potentially impacted all humans at the time.

Entities: CellLine Chemical Disease Gene Species

Keywords: Khoe-San; population structure; southern Africa

Mesh：

Year: 2020 PMID： 32697301 PMCID： PMC7530619 DOI： 10.1093/molbev/msaa140

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

Genetics has played an increasingly important role in revealing human evolutionary history, by demonstrating that Homo sapiens emerged from Africa (Cann et al. 1987; Ramachandran et al. 2005), with some groups outside Africa admixing with archaic humans (Meyer et al. 2012; Prüfer et al. 2014). Our deepest roots include indigenous groups of current-day southern Africa, with modern-day Khoe-San representing one branch in the earliest population divergence in Homo sapiens, and all other Africans and non-Africans representing the other branch (Gronau et al. 2011; Veeramah et al. 2012; Schlebusch et al. 2012, 2017; Schlebusch and Jakobsson 2018). Southern African hunter-gatherers (San) and herders (Khoekhoe) are collectively referred to as Khoe-San (Schlebusch 2010). Khoe-San people speak Khoisan languages, a group of languages that rely heavily on “click” sounds. Three out of the five major Khoisan language families are spoken in southern Africa, namely, Kx’a (formerly called Northern Khoisan), Tuu (formerly Southern Khoisan), and Khoe-Kwadi (formerly Central Khoisan). These three language families show no linguistic relatedness to each other (Güldemann 2014). A few complete genomes from Khoe-San individuals have been investigated with poor representation among the different groups (Meyer et al. 2012; Kim et al. 2014; Mallick et al. 2016). As the Khoe-San represents one of two branches of the deepest population divergence within Homo sapiens, it is crucial to reveal their evolutionary history and their genetic diversity in order to understand the early evolutionary history of our species. We sequenced and analyzed 25 complete high-coverage genomes from five different Khoe-San groups, representing the three main Khoisan linguistic phyla, across an extensive geographic area. These genomes were placed into a global context by jointly investigating 11 previously published genomes from the HGDP panel, sequenced on the same platform and subjected to similar single nucleotide polymorphism (SNP) calling procedures (Meyer et al. 2012; Raghavan et al. 2014), and another 67 genomes sequenced on the Complete Genomics platform (Drmanac et al. 2010; Lachance et al. 2012; 1000 Genomes Project Consortium 2015). Using these data sets, we characterized genome variation across the world and inferred past population history, where Khoe-San groups showed greater genetic diversity than any other group, but still revealed a reduction in effective population size coinciding with the Out-of-Africa migrations and bottleneck. We further discovered a number of selection targets in the Khoe-San and other groups, and within our common ancestors of >300,000 years ago. These results shed new light on Pleistocene human demographic history and evolution.

Results and Discussion

Among the genomes of 25 individuals (mean coverage 53.4× after mapping and quality filtering; supplementary sections 1–3, Supplementary Material online and table 1), we called 20,020,719 autosomal SNPs (table 1 and supplementary table S5.1, Supplementary Material online). After group-wide quality filtering (supplementary sections 1–3, Supplementary Material online), 18,637,959 autosomal biallelic SNPs remained (table 1), 1,960,665 (10.5%) of which were novel (compared with dbSNP build 151). The two southern Khoe-San groups (Nama and Karretjie People) presented the most novel variants (table 1 and supplementary fig. S5.4, Supplementary Material online). Although many novel variants were singletons (supplementary fig. S5.3, Supplementary Material online and table 1), 3.2% of them were both novel and present in more than one copy; demonstrating that many variants common among the Khoe-San have not been reported yet. Of the 5,101,560 variants present in all five Khoe-San groups, 24,517 were novel (supplementary fig. S5.4, Supplementary Material online). These variants, common among Khoe-San groups but absent in other populations, have not been previously characterized.

Table 1.

Summary of Genomic (autosomal) Variation in Five Individuals each from Five Khoe-San Groups.

Category	Total	Karretjie	Nama	\|Gui ‖Gana	Ju\|’hoansi	!Xun
Dinucleotide SNPs (filtered)	18,637,959	10,555,587	10,514,246	10,649,570	10,429,573	10,676,563
Exonic SNPs (%)	0.653	0.597	0.598	0.596	0.599	0.598
Novel variants versus dbSNP built 151 (% of variants)	1,960,665 (10.5%)	578,935 (5.5%)	632,492 (6.0%)	491,543 (4.6%)	548,528 (5.3%)	477,360 (4.5%)
Singletons (% of variants)	5,403,107 (29.0%)	4,547,752 (43.1)	4,504,833 (42.8)	4,639,215 (43.6)	4,315,691 (41.4)	4,660,181 (43.6)
Non-singleton novel variants (% of variants)	602,402 (3.2%)	108,129 (1.0%)	95,736 (0.9%)	97,282 (0.9%)	106,546 (1.0%)	91,365 (0.9%)
Mean Depth per Individual, duplicates excluded (all positions in ref genome)	53.4 (45.1–59.8)	52.5 (45.1–56.9)	55.3 (53.7–56.4)	51.7 (48.2–54.8)	52.8 (48.0–59.8)	54.7 (51.7–57.1)
Mean heterozygosity (genomic)	0.001274	0.001273	0.001266	0.001275	0.001263	0.001291
Heterozygosity variable sites (Called+Filtered Variants)	0.183249	0.183205	0.182149	0.183438	0.181665	0.18579
Tajima’s D	−0.7827	−0.3349	−0.3200	−0.3609	−0.2830	−0.3639
Tajima’s D (exonic)	−1.1412	−0.5198	−0.4844	−0.5477	−0.4731	−0.5446
Mean DAF	0.1746052	0.2746496	0.2751872	0.2727002	0.2770476	0.2724151
Mean DAF exonic	0.1607157	0.2650433	0.2663403	0.2632778	0.2680222	0.2638948
Total indels (VQSRed)	2,176,524	1,441,604	1,458,457	1,433,307	1,439,949	1,461,609
Deletions	1,267,661	802,507	799,461	812,743	795,648	815,037
Insertions	908,863	634,150	634,609	640,933	631,300	642,247
Complex indels	527,796	513,400	512,696	514,178	512,684	514,099
Structural variants	4,452	1,979	2,030	2,419	2,139	2,362
ProportionStructural variants with genes	0.378	0.39	0.37	0.379	0.377	0.374

Summary of Genomic (autosomal) Variation in Five Individuals each from Five Khoe-San Groups. The Khoe-San exhibited the greatest genetic diversity (mean heterozygosity per individual: 1.154 × 10−3; fig. 1 and supplementary figs. S5.1 and S5.5, Supplementary Material online), compared with other African genomes (mean heterozygosity: 1.079 × 10−3, Mbuti, Mandenka, Yoruba, and Dinka). However, modern-day Khoe-San groups received 10–30% admixture from a mixed eastern African-Eurasian group ∼1,500 years ago (Schlebusch et al. 2017; Skoglund et al. 2017). When genomic material attributed to recent admixture was masked out, the genetic diversity of the Khoe-San (mean heterozygosity after masking: 1.106 × 10−3) decreased and approached that of other African groups (supplementary fig. S8.1, Supplementary Material online), but still remained significantly greater (P = 0.013, Wilcoxon test). Among the Khoe-San, the !Xun had the highest heterozygosity and allelic diversity (table 1 and supplementary figs. S5.1 and S5.5, Supplementary Material online), also sharing the most alleles with all other African groups (fig. 2), pointing to the highest amount of admixture into the !Xun from non-Khoe-San African groups among the Khoe-San.

Fig. 1.

Fig. 2.

Grouped bar-plots summarizing private allele sharing as a fraction of the total number of variant sites in the data set: (A) Privately shared alleles of various Khoe-San groups with comparative groups. (B) Privately shared alleles of comparative groups.

Sample locations and genetic diversity in the Khoe-San. (A) Sample locations across the world. Colors depict the various data sets included in the study and sample sizes are indicated after the population code. CG, Complete Genomics diversity set (Drmanac et al. 2010); HGDP, HGDP data (Meyer et al. 2012); KGP, 1000 Genomes typed on Complete Genomics platform (1000 Genomes Project Consortium 2015); KSP, this study; LC, Lachance et al. (2012); SGDP, Simons Genome Diversity Project (Mallick et al. 2016); BBA, Ballito Bay A (Schlebusch et al. 2017). The locations chosen for the CEU, GIH, and MXL reflect the ancestry of the population (not the sampling location). (B) Sample locations across Africa. Populations in boldface display newly sequenced individuals. (C) Genetic (autosomal) variation for three population groups: Khoe-San, other sub-Saharan Africans, and non-Africans. The summary statistics were calculated on the joint KSP and HGDP group called data set to avoid biases. The KSP and HGDP data sets were both sequenced on Illumina platforms. Note that the HGDP San individual was not included in the metrics shown here. Heterozygosity was computed from the number of variable positions divided by number of sequenced positions, and averaged across individuals. Mean total runs of homozygosity (ROH) displays the sum over the lengths 0.2–1 Mb. Average (across the genome) number of distinct alleles (allelic richness) and average number of alleles are unique to a single population (private allelic richness) in a sample of eight haploid genomes per variable site. Standard errors were calculated. For heterozygosity, it is the standard error of the mean per individual, averaged across individuals. For ROH, it is the standard error of the mean of individuals. Standard errors for heterozygosity and for allelic richness were very small (<0.08%, see supplementary sections 5.2, 5.3, and 5.5, Supplementary Material online, for details). (D) Private allelic richness (per variable site) of alleles shared by pairwise combinations of the five Khoe-San populations. We distinguish three groups: northern San (Ju|’hoansi and !Xun), central San (|Gui and ‖Gana), and southern San (Nama and Karretjie). (E) Venn diagram summarizing private and shared variants in the Khoe-San versus other Africans versus non-Africans. Grouped bar-plots summarizing private allele sharing as a fraction of the total number of variant sites in the data set: (A) Privately shared alleles of various Khoe-San groups with comparative groups. (B) Privately shared alleles of comparative groups. In a set of 99 sequenced individuals (from 31 populations), we inferred population stratification across the globe (∼27 million variants; supplementary figs. S6.1, S6.4, and S6.6, Supplementary Material online). The first two principal components (PCs) (supplementary fig. S6.1, Supplementary Material online) explained 7.5% of the global genetic variation and roughly divided it into three groups: non-Africans, Khoe-San, and other Africans. Subsequent PCs summarized variation in other African hunter-gatherer groups (eastern- and western-rainforest hunter-gatherers and Hadza), as well as variation within the Khoe-San (northern, southern, and central) (supplementary fig. S6.1, Supplementary Material online). Variation among non-Africans first became visible at PC20 (we note, however, that the African data set was larger than the non-African data set, 60 vs. 39 individuals). This PCA—based on a globally representative, whole-genome data set—illustrates the extent of African diversity and is a reflection of global genetic diversity, in contrast to inferences based on SNP genotypes, where non-African variation is magnified through ascertainment bias and sample bias (supplementary section 6, Supplementary Material online). We found a distinct signal of eastern African/non-African affinity and shared private variants among the Khoe-San, particularly for the Nama (supplementary section 6.7, Supplementary Material online, fig. 2, and supplementary figs. S6.1–S6.7 and S6.16–S6.19, Supplementary Material online). This outcome is consistent with recent migration of mixed (eastern African-Eurasian) herding groups to southern Africa, and potentially long-term gene flow between eastern African hunter-gatherers (e.g., Hadza) and Khoe-San (Pickrell et al. 2012, 2014; Schlebusch et al. 2012, 2017; Breton et al. 2014; Macholdt et al. 2014; Skoglund et al. 2017). This pattern can also be seen in mtDNA and Y chromosome data (supplementary section 5.10, Supplementary Material online) (Naidoo et al. 2020), with haplogroup sharing detected between the Ju|’hoansi and Hadza. We estimated population divergence between the Khoe-San and various other groups using different and complementary approaches (Gronau et al. 2011; Schlebusch et al. 2017). We applied a mutation rate of 1.25 × 10−8 per base pair per generation and a generation time of 30 years to convert estimates to years ago in the past (unscaled estimates, means, medians, and standard deviations are available in supplementary tables S7.1 and S7.2, Supplementary Material online). Consistent with previous studies (Gronau et al. 2011; Veeramah et al. 2012; Schlebusch et al. 2012, 2017; Schlebusch and Jakobsson 2018), the deepest divergences included the Khoe-San populations (fig. 3 and supplementary tables S7.1 and S7.2 and figs. S7.1, S7.2, and S7.6, Supplementary Material online); a result probably not caused by “archaic admixture” into the Khoe-San (supplementary section 10 and fig. S10.1, Supplementary Material online). Modern-day Khoe-San have, however, >10% of their genetic material tracing to a recent admixture with external groups (Schlebusch et al. 2017; Skoglund et al. 2017). By sequencing the genome of the Stone Age boy from Ballito Bay (BBA), South Africa, the deepest population divergence in Homo sapiens was estimated to 350,000–260,000 years ago (Schlebusch et al. 2017). Consistent with the recent admixture into all modern-day Khoe-San groups, which reduces population divergence time estimates (Schlebusch et al. 2017) (supplementary section 8 and figs. S7.2, S7.4, and S8.2, Supplementary Material online), we found the mean divergence time of all Khoe-San populations from all other groups to be within the 200–300 ka range (supplementary tables S7.2 and S7.2, Supplementary Material online, and fig. 3). These dates correlate well with previous estimates (Gronau et al. 2011; Veeramah et al. 2012) that also fall within the 200–300 ka (kiloannum: thousand years ago) range when applying the mutation rate used here. The Ju|’hoansi (with the lowest level of recent admixture) had a point estimate of ∼270 ka (∼9,000 generations), SD 20 ka (GphoCS method; TT method: ∼260 ka, SD 12 ka), whereas the Nama (with the greatest level of recent admixture) had a point estimate of ∼210 ka, SD 30 ka (TT method: ∼210 ka, SD 30 ka; supplementary tables S7.1 and S7.2, Supplementary Material online). The Mbuti then diverged around ∼220 ka, SD 10 ka (TT method: 215 ka, SD 9 ka), with the other population divergences occurring subsequently. We inferred a mean divergence time of ∼160 ka, SD 20 ka (TT method: ∼190 ka, SD 20 ka) among the different San groups, consistent with previous estimates (Schlebusch et al. 2017).

Fig. 3.

Population divergence estimates. (A) Schematic overview of the estimated population divergences. The colored nodes correspond to the population divergences that were estimated with the TT method and GPhoCS, and the estimates are presented in (B). (B) Distribution of divergence time estimates based on GPhoCS (unscaled estimates, means, and medians available in supplementary table S7.1, Supplementary Material online) and mean ± standard error of the divergence time estimated with the TT method (supplementary table S7.2, Supplementary Material online). We note that the population history of humans may not always be well represented by divergence models, as gene flow often occurs among human groups, and isolation-by-distance models may sometimes be better descriptions (Vicente et al. 2019). For instance, there is distinct sharing of private alleles between the !Xun/Ju|’hoansi (who traditionally live in the northwestern part of southern Africa) and Mbuti central African rainforest foragers, indicating gene-flow across south-central Africa (fig. 2). The indigenous southern African hunter-gatherer genetic component, might thus have extended far beyond southern Africa in the past (Skoglund et al. 2017; Henn et al. 2018; Scerri et al. 2018, 2019; Schlebusch and Jakobsson 2018; Vicente et al. 2019). A likely consequence is that all population divergence estimates should be interpreted as lower bounds and that the actual population structure could be much older. The effective ancestral population size (Ne) of currently living individuals can be estimated from genome data (Li and Durbin 2011), and the resolution for certain time periods can be affected by evaluating different numbers of genomes, with increasing numbers improving resolution closer to the present day (Schiffels and Durbin 2014). All human groups were inferred to have had an Ne of ∼30,000 about 300 ka, with a reduction in estimated effective size starting around 150–100 ka (assuming a mutation rate of 1.25 × 10−8 per base pair per generation and a generation time of 30 years; fig. 4 and supplementary fig. S7.11, Supplementary Material online). Non-African populations reached a lowest level (Ne ∼2,000) in the bottleneck around 80 ka, coinciding with the Homo sapiens Out-of-Africa migration event (Nielsen et al. 2017). Surprisingly, most African populations also showed a reduction in estimated Ne during this period, reaching ∼1/3 of the previous Ne (fig. 4 and supplementary figs. S7.10 and S7.11, Supplementary Material online). The decline in effective population sizes appears to be the largest among eastern African populations, followed by western Africans, and subsequently by the rainforest hunter-gatherer populations. Khoe-San groups seem to be the least affected; however, the genome of the 2,000-year-old Ballito Bay boy (unaffected by recent admixture into Khoe-San groups) also showed a reduction in effective population size (fig. 3Schlebusch et al. 2017).

Fig. 4.

Estimates of effective population size across time. (A) Effective population sizes estimated for autosomal data from single individuals (i.e., two chromosomes) for the Khoe-San (average over the five individuals in each population), the HGDP individuals, and the Stone Age southern African Ballito Bay A boy (BBA; Schlebusch et al. 2017). (B) African temperature variation estimated from the reconstruction of sea surface temperature in the southwestern Indian Ocean (Caley et al. 2018). (C) Khoe-San effective population sizes estimated from single individuals (“two chromosomes,” solid gray), pairs of individuals (“four chromosomes,” solid colored lines), and five individuals (“ten chromosomes,” colored dotted lines). The curves are averaged over all MSMC runs for all different combinations of individuals (respectively, five, ten, and one). If we jointly analyze two individuals (four haploid genomes) instead of one, it should provide more resolution on the timing of the bottleneck (Schiffels and Durbin 2014), because the mean time to first coalescence for four haploid genomes is 85 ka (assuming an average ancestral Ne of 17,000 and a generation time of 30 years). For this analysis, we found that all Khoe-San groups showed a reduction to 1/3 of the previous Ne between 100 and 20 ka (fig. 4 and supplementary fig. S7.12, Supplementary Material online). The same pattern was also observed with samples of five individuals (ten haploid genomes), though it could not be detected with samples of one single modern-day Khoe-San individual (fig. 4 and supplementary fig. S7.12, Supplementary Material online). We simulated data under a bottleneck model and ran MSMC on samples of one, two, four, and five individuals under a range of varying conditions of bottleneck strength, duration, and age. From this investigation, we observed a qualitatively similar pattern (supplementary sections 7.3 and 9, Supplementary Material online) of reduced power to infer population-size changes around 80 ka when basing the inference on single genomes. Thus, all human groups appeared to have suffered reduced Ne, of varying degrees, between ∼100 and ∼20 ka; declining to between 50% and 10% of an Ne of ∼30,000 at ∼300 ka. We note that Ne does not necessarily capture the census size and that population structure clearly can impact the estimates of Ne (Mazet et al. 2016). However, in terms of population genetics and understanding of past population histories, estimates of Ne are informative as they tell us about the rate of genetic drift, which in turn can be important for understanding the evolutionary history. With the 25 complete genomes from Khoe-San individuals that represent one of two legs of the deepest population divergence in Homo sapiens, we have a unique opportunity to search for regions in the genome that display an unusual signal of high numbers of derived variants among all groups of humans. This pattern will be an indicator of distinct adaptation prior to the deepest population divergence, >300,000 years ago. We developed and investigated three Population Branch Statistic (PBS) - derived analyses (supplementary section 12, Supplementary Material online; Schlebusch et al. 2012) that target different parts of human evolutionary history (fig. 5 and supplementary section 12 and table S12.2, Supplementary Material online) and use the 3P-CLR (Racimo 2016) statistic to investigate adaptation in the lineage leading to Homo sapiens.

Fig. 5.

Signatures of adaptation in the genomes. (A) Schematic overview of the three different population branch statistic (PBS) based analyses. The different PBS-based statistics are designed to capture adaptation signals in different parts of the phylogeny. (B) Manhattan plot of the archaicPBS statistic across the genome (supplementary fig. S12.3, Supplementary Material online, displays the aPBS and the emhPBS results). The eight dashed red lines show all the top-five peaks among the three PBS statistics (they are highly correlated). The most likely candidate genes are written below the peaks with genes involved with brain functions, immune system, and other functions indicated in blue, green, and black, respectively. The dashed horizontal line shows the 99.9% percentile of the archaicPBS statistic for these data. (C) A close-up of the strongest peak for archaicPBS, which is located upstream of the gene LPHN3. (D) An example of a local selection signal in southern Khoe-San. |iHS| for southern Khoe-San is shown in green, |iHS| for northern Khoe-San in red, and XP-EHH in purple. The strong negative XP-EHH values suggest adaptation in southern Khoe-San. Four of the top-ten 3P-CLR peaks and four of the eight top-five regions for the three PBS-statistics (because there is overlap among the PBS top lists, the three top-five lists sum up to eight genomic regions) can be linked to selection for brain development (supplementary section 12, Supplementary Material online). The region with the strongest signal common to all three PBS statistics implicates the LPHN3 (latrophilin 3) gene on chromosome 4 (fig. 5), which has an important function in determining the connectivity rates between the principal neurons in the cortex, and the gene is associated with attention deficit–hyperactivity disorder (Lu et al. 2015). For several of these genes, there is also a strong effect on skull morphology, in addition to the brain-associated effect (see supplementary section 12, Supplementary Material online), a result that has been reported previously (Green et al. 2010; Schlebusch et al. 2012). Furthermore, the regions with strong signals of adaptation in the lineage leading to Homo sapiens are enriched for brain development genes in gene ontology (GO) analyses (Kofler and Schlotterer 2012) (supplementary tables S12.1 and S12.5, Supplementary Material online). Immune response genes also overlap with signals of adaptation in the lineage leading to Homo sapiens. For instance, the third and fourth strongest 3P-CLR signals and two of the top-five regions for the PBS-statistics overlap with immune response genes (supplementary sections 12.1 and 12.2, Supplementary Material online). Additional strong signals are found for genes in sperm/flagellum motility (supplementary sections 12.1 and 12.2, Supplementary Material online); for example, the DNAL1 gene expressed in motile flagella is located in the region with the strongest 3P-CLR signal (supplementary section 12.1, Supplementary Material online) and the flagellum category is an enriched GO-term in two of the three PBS statistics (supplementary tables S12.3 and S12.5, Supplementary Material online). We note that identifying targets of selection in early humans, several hundred thousands of years ago, is a difficult problem and that, similar to previous studies (Schlebusch et al. 2012; Racimo et al. 2014; Racimo 2016), our approach also results in a list of potential targets of selection, which need further investigation. However, although there is modest overlap with previous studies, the emerging trend of these investigations points to some similarity in gene functions (Green et al. 2010; Schlebusch et al. 2012; Racimo et al. 2014; Racimo 2016). In addition to adaptation in the lineage leading to Homo sapiens, we searched for gene regions targeted by selection in specific groups, that is, local adaptation signals using haplotype-based methods for within population (iHS; Voight et al. 2006) and between population comparisons (XP-EHH; Sabeti et al. 2007) (see supplementary section 11, Supplementary Material online). Signals of local adaptation frequently overlapped with genes involved in immune response to infectious diseases in several of the analyses and on different levels of population groupings. For instance, within the northern Khoe-San the strongest signal overlapped with the MHC-region (supplementary table S11.1 and fig. S11.1, Supplementary Material online), and the two strongest signals in the southern Khoe-San were found close to the MHC region; near several genes coding for immunoglobins (supplementary table S11.3 and fig. S11.1, Supplementary Material online). When contrasting the northern and southern Khoe-San, two other regions within the MHC were identified as strong targets of adaptation (in the top-ten regions in XP-EHH analysis; supplementary table S11.2 and fig. S11.2, Supplementary Material online). GO-term analyses (Kofler and Schlotterer 2012) show enrichment for immune response genes among the adaptation signals in the southern Khoe-San as well as in other Africans (supplementary table S11.9, Supplementary Material online). Previous studies found the MHC region to be a common target of selection in various Khoe-San groups (Schlebusch et al. 2012; Owers et al. 2017; Sugden et al. 2018) as well as other populations (Pickrell et al. 2009). The greatest single iHS-value in the northern Khoe-San overlaps with the anthrax toxin receptor-like pseudogene 1 (ANTXRLP1, on chromosome 10), which is near the anthrax toxin receptor-like (ANTXRL) gene. Anthrax is endemic to Namibia, where many of the northern San groups live, and causes intense sporadic disease outbreaks affecting wild animals and humans (Turner et al. 2013). This signal has not been reported previously. In summary, immune system-related genes appear to be targets of adaptation irrespective of time and group, but with slightly different genes involved, which, sometimes, can be directly linked to local and endemic disease conditions. Signals of local adaptation overlap with genes associated with diet, for instance the FRRS1 gene involved in dietary absorption of iron shows a strong signal in the northern Khoe-San (supplementary table S11.2 and supplementary section 11.5, Supplementary Material online), and the SLCO1B3 gene that mediates fat metabolism and uptake of xenobiotic compounds shows a strong adaptive signal in the southern Khoe-San (the genome-wide greatest single iHS-value; fig. 5 and supplementary table S11.3 and supplementary section 11.6, Supplementary Material online). Adaptation to increased metabolism of endo- and xenobiotics (Schuster et al. 2010) and fat storage (Sugden et al. 2018) have been reported previously for Khoe-San groups. The genome-wide greatest signal of group-specific adaptation (supplementary table S11.8, Supplementary Material online) overlaps with the MINPP1 gene-region, which codes for the only enzyme known to hydrolyze phytic acid in humans. Phytic acid is storing phosphorus in many plant tissues, particularly in bran, seeds, cereals, and grains. Phytic acid is not digested by humans, but it chelates minerals and vitamins and tends to decrease their uptake from food (supplementary section 11.8, Supplementary Material online). The sign of the signal indicates that this gene has been under much stronger selection in the non-Khoe-San group than in the Khoe-San group. This signal has not been reported previously and is an ideal candidate for future studies that focus on potential targets of selection, related to the change in food-producing lifeways. Genes involved in skeletal muscle development show signals of adaptation, specifically among the Khoe-San populations (supplementary sections 11.5–11.7, Supplementary Material online). In southern Khoe-San, two strong selection signals (the second strongest XP-EHH signal and the widest XP-EHH signal) both implicate genes associated with muscle function (the DTNB gene and the NAA35 gene; supplementary table S11.4, Supplementary Material online), the SNTB1 gene was among the top-ten XP-EHH regions in northern Khoe-San (supplementary table S11.2, Supplementary Material online), and the strongest iHS signal in the Khoe-San group as whole overlaps with the PPP1R12B gene region that plays a regulatory role in muscle contraction (supplementary table S11.5, Supplementary Material online). Selection acting on genes related to muscle development and function has been reported previously for Khoe-San groups (Schlebusch et al. 2012) and other populations (Pickrell et al. 2009). Interestingly the DTNB gene specifically also appeared in the top 1% of selected genes in East Asians, the SNTB1 gene in the top 1% in Oceania (it was the top-11th iHS signal) and the PPP1R12B gene in the top 1% in Bantu-speaking groups (it was the top-14th iHS signal) (Pickrell et al. 2009). Based on the complete genomes, we also examined the distribution of loss-of-function (LOF) variants in the Khoe-San and estimated levels of functional significance (supplementary section 5.9, Supplementary Material online). Biological functions associated with LOF variants in the Khoe-San included the detection of chemical stimuli (smell and taste), receptor activity, immune response, and keratin/intermediate filaments (supplementary table S5.7, Supplementary Material online). We found two examples of LOF variants which are close to completely lost in most non-African populations, but are found at moderate to high frequencies among the 25 Khoe-San individuals; CASP12 and FMO2. The functional form of the CASP12 gene was found at 48% among the 25 Khoe-San individuals, whereas the global average is around 5% and the loss of the Caspase-12 protein has been associated with an increased risk of sepsis as it is involved in the downregulation of inflammatory cytokines (Saleh et al. 2004, 2006). Although the nonfunctional form of FMO2 is close to fixation in most populations, the functional form was found among the Khoe-San at 60%. The gene product is an enzyme that metabolizes thiourea; however, in doing so produces toxic derivatives (Veeramah et al. 2008). Carriers of the functional allele may be at increased risk for pulmonary toxicity when exposed to thiourea, which is present in a wide range of industrial, household, and medical products. The high frequencies of these functional alleles in the Khoe-San may point to differing selective pressures experienced in the past by these populations.

Conclusion

The genetic diversity among the Khoe-San is the greatest among all human groups across the world, which, in part, is explained by relatively recent (pre-colonial) admixture. When the admixed DNA portion was excluded, the genetic diversity of the Khoe-San approached levels seen in other African populations. All human groups, including the Khoe-San, showed a reduction in Ne (between 1/3 and 1/10) between ∼100 and 20 ka (fig. 4). The early phase of the reduction coincides with the Out-of-Africa bottleneck for non-Africans. Sub-Saharan African populations would not have been impacted by this migration bottleneck, but they all (including the Khoe-San) show a reduction in Ne (fig. 4). This observation suggests that an additional factor—beyond the migration out of Africa—impacted all humans at this time, perhaps the change in climate. For example, work on the Lake Malawi core indicates severe drought and low-lake stage occurring between ∼109 and 92 ka when the area is also shifting from leaf- to grass-dominated vegetation (Veeramah et al. 2008; Beuning et al. 2011; Scholz et al. 2011), which roughly aligns with a change from warm toward colder temperatures for Africa (fig. 4). These events may have caused a reduction in the number of humans; potentially also driving them out of arid African regions, such as the Sahara, and into western Asia. By revealing substantial and previously unknown genetic variation, we demonstrate that a sizable portion of human genetic variation, including common variants, remains undiscovered among populations often overlooked in medical genetics. We inferred adaptation signals in the genomes and found an overrepresentation of these signals overlapping immunity genes, irrespective of group or time period. This suggests that immunity genes have been under selection throughout human evolutionary history and across the globe.

Materials and Methods

A full description of materials and methods is included in the Supplementary Material online. Click here for additional data file.

45 in total

1. A draft sequence of the Neandertal genome.

Authors: Johannes Krause; Adrian W Briggs; Tomislav Maricic; Udo Stenzel; Martin Kircher; Nick Patterson; Richard E Green; Heng Li; Weiwei Zhai; Markus Hsi-Yang Fritz; Nancy F Hansen; Eric Y Durand; Anna-Sapfo Malaspinas; Jeffrey D Jensen; Tomas Marques-Bonet; Can Alkan; Kay Prüfer; Matthias Meyer; Hernán A Burbano; Jeffrey M Good; Rigo Schultz; Ayinuer Aximu-Petri; Anne Butthof; Barbara Höber; Barbara Höffner; Madlen Siegemund; Antje Weihmann; Chad Nusbaum; Eric S Lander; Carsten Russ; Nathaniel Novod; Jason Affourtit; Michael Egholm; Christine Verna; Pavao Rudan; Dejana Brajkovic; Željko Kucan; Ivan Gušic; Vladimir B Doronichev; Liubov V Golovanova; Carles Lalueza-Fox; Marco de la Rasilla; Javier Fortea; Antonio Rosas; Ralf W Schmitz; Philip L F Johnson; Evan E Eichler; Daniel Falush; Ewan Birney; James C Mullikin; Montgomery Slatkin; Rasmus Nielsen; Janet Kelso; Michael Lachmann; David Reich; Svante Pääbo
Journal: Science Date: 2010-05-07 Impact factor: 47.728

2. Structural Basis of Latrophilin-FLRT-UNC5 Interaction in Cell Adhesion.

Authors: Yue C Lu; Olha V Nazarko; Richard Sando; Gabriel S Salzman; Nan-Sheng Li; Thomas C Südhof; Demet Araç
Journal: Structure Date: 2015-07-30 Impact factor: 5.006

3. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history.

Authors: Carina M Schlebusch; Pontus Skoglund; Per Sjödin; Lucie M Gattepaille; Dena Hernandez; Flora Jay; Sen Li; Michael De Jongh; Andrew Singleton; Michael G B Blum; Himla Soodyall; Mattias Jakobsson
Journal: Science Date: 2012-09-20 Impact factor: 47.728

4. Beyond multiregional and simple out-of-Africa models of human evolution.

Authors: Eleanor M L Scerri; Lounès Chikhi; Mark G Thomas
Journal: Nat Ecol Evol Date: 2019-10 Impact factor: 15.460

5. Bayesian inference of ancient human demography from individual genome sequences.

Authors: Ilan Gronau; Melissa J Hubisz; Brad Gulko; Charles G Danko; Adam Siepel
Journal: Nat Genet Date: 2011-09-18 Impact factor: 38.330

6. The complete genome sequence of a Neanderthal from the Altai Mountains.

Authors: Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal: Nature Date: 2013-12-18 Impact factor: 49.962

7. Genetic Affinities among Southern Africa Hunter-Gatherers and the Impact of Admixing Farmer and Herder Populations.

Authors: Mário Vicente; Mattias Jakobsson; Peter Ebbesen; Carina M Schlebusch
Journal: Mol Biol Evol Date: 2019-09-01 Impact factor: 16.240

8. A map of recent positive selection in the human genome.

Authors: Benjamin F Voight; Sridhar Kudaravalli; Xiaoquan Wen; Jonathan K Pritchard
Journal: PLoS Biol Date: 2006-03-07 Impact factor: 8.029

9. Localization of adaptive variants in human genomes using averaged one-dependence estimation.

Authors: Lauren Alpert Sugden; Elizabeth G Atkinson; Annie P Fischer; Stephen Rong; Brenna M Henn; Sohini Ramachandran
Journal: Nat Commun Date: 2018-02-19 Impact factor: 14.919

10. Y-Chromosome Variation in Southern African Khoe-San Populations Based on Whole-Genome Sequences.

Authors: Thijessen Naidoo; Jingzi Xu; Mário Vicente; Helena Malmström; Himla Soodyall; Mattias Jakobsson; Carina M Schlebusch
Journal: Genome Biol Evol Date: 2020-07-01 Impact factor: 3.416

13 in total

Review 1. Advances in integrative African genomics.

Authors: Chao Zhang; Matthew E B Hansen; Sarah A Tishkoff
Journal: Trends Genet Date: 2021-11-02 Impact factor: 11.639

2. Revisiting the out of Africa event with a deep-learning approach.

Authors: Francesco Montinaro; Vasili Pankratov; Burak Yelmen; Luca Pagani; Mayukh Mondal
Journal: Am J Hum Genet Date: 2021-10-08 Impact factor: 11.025

Review 3. The deep population history in Africa.

Authors: Nina Hollfelder; Gwenna Breton; Per Sjödin; Mattias Jakobsson
Journal: Hum Mol Genet Date: 2021-04-26 Impact factor: 6.150

4. Regional patterns of diachronic technological change in the Howiesons Poort of southern Africa.

Authors: Manuel Will; Nicholas J Conard
Journal: PLoS One Date: 2020-09-17 Impact factor: 3.240

5. Ostrich eggshell beads reveal 50,000-year-old social network in Africa.

Authors: Jennifer M Miller; Yiming V Wang
Journal: Nature Date: 2021-12-20 Impact factor: 49.962

6. Comparison of sequencing data processing pipelines and application to underrepresented African human populations.

Authors: Gwenna Breton; Anna C V Johansson; Per Sjödin; Carina M Schlebusch; Mattias Jakobsson
Journal: BMC Bioinformatics Date: 2021-10-09 Impact factor: 3.169

7. Climate effects on archaic human habitats and species successions.

Authors: Axel Timmermann; Kyung-Sook Yun; Pasquale Raia; Jiaoyang Ruan; Alessandro Mondanaro; Elke Zeller; Christoph Zollikofer; Marcia Ponce de León; Danielle Lemmon; Matteo Willeit; Andrey Ganopolski
Journal: Nature Date: 2022-04-13 Impact factor: 69.504