Mark Lipson1, Isabelle Ribot2, Swapan Mallick3,4,5, Nadin Rohland3, Iñigo Olalde3,6, Nicole Adamski3,5, Nasreen Broomandkhoshbacht3,5,7, Ann Marie Lawson3,5, Saioa López8, Jonas Oppenheimer3,5,9, Kristin Stewardson3,5, Raymond Neba'ane Asombang10, Hervé Bocherens11,12, Neil Bradman8,13, Brendan J Culleton14, Els Cornelissen15, Isabelle Crevecoeur16, Pierre de Maret17, Forka Leypey Mathew Fomine18, Philippe Lavachery19, Christophe Mbida Mindzie10, Rosine Orban20, Elizabeth Sawchuk21, Patrick Semal20, Mark G Thomas8,22, Wim Van Neer20,23, Krishna R Veeramah24, Douglas J Kennett25, Nick Patterson3,26, Garrett Hellenthal8,22, Carles Lalueza-Fox6, Scott MacEachern27, Mary E Prendergast3,28, David Reich3,4,5,26. 1. Department of Genetics, Harvard Medical School, Boston, MA, USA. mlipson@genetics.med.harvard.edu. 2. Département d'Anthropologie, Université de Montréal, Montreal, Quebec, Canada. 3. Department of Genetics, Harvard Medical School, Boston, MA, USA. 4. Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 5. Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA. 6. Institute of Evolutionary Biology (CSIC-UPF), Barcelona, Spain. 7. Department of Anthropology, University of California, Santa Cruz, CA, USA. 8. UCL Genetics Institute, University College London, London, UK. 9. Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA. 10. Department of Arts and Archaeology, University of Yaoundé I, Yaoundé, Cameroon. 11. Department of Geosciences, Biogeology, University of Tübingen, Tübingen, Germany. 12. Senckenberg Research Centre for Human Evolution and Palaeoenvironment, University of Tübingen, Tübingen, Germany. 13. The Henry Stewart Group, London, UK. 14. Institutes of Energy and the Environment, Pennsylvania State University, University Park, PA, USA. 15. Department of Cultural Anthropology and History, Royal Museum for Central Africa, Tervuren, Belgium. 16. CNRS, UMR 5199-PACEA, Université de Bordeaux, Bordeaux, France. 17. Faculté de Philosophie et Sciences Sociales, Université Libre de Bruxelles, Brussels, Belgium. 18. Department of History and African Civilization, University of Buea, Buea, Cameroon. 19. Agence Wallonne du Patrimoine, Service Public de Wallonie, Namur, Belgium. 20. Royal Belgian Institute of Natural Sciences, Brussels, Belgium. 21. Department of Anthropology, Stony Brook University, Stony Brook, NY, USA. 22. Department of Genetics, Evolution and Environment, University College London, London, UK. 23. Department of Biology, University of Leuven, Leuven, Belgium. 24. Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA. 25. Department of Anthropology, University of California, Santa Barbara, CA, USA. 26. Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA. 27. Division of Social Science, Duke Kunshan University, Kunshan, China. 28. Department of Sociology and Anthropology, Saint Louis University, Madrid, Spain.
Abstract
Our knowledge of ancient human population structure in sub-Saharan Africa, particularly prior to the advent of food production, remains limited. Here we report genome-wide DNA data from four children-two of whom were buried approximately 8,000 years ago and two 3,000 years ago-from Shum Laka (Cameroon), one of the earliest known archaeological sites within the probable homeland of the Bantu language group1-11. One individual carried the deeply divergent Y chromosome haplogroup A00, which today is found almost exclusively in the same region12,13. However, the genome-wide ancestry profiles of all four individuals are most similar to those of present-day hunter-gatherers from western Central Africa, which implies that populations in western Cameroon today-as well as speakers of Bantu languages from across the continent-are not descended substantially from the population represented by these four people. We infer an Africa-wide phylogeny that features widespread admixture and three prominent radiations, including one that gave rise to at least four major lineages deep in the history of modern humans.
Our knowledge of ancient human population structure in sub-Saharan Africa, particularly prior to the advent of food production, remains limited. Here we report genome-wide DNA data from four children-two of whom were buried approximately 8,000 years ago and two 3,000 years ago-from Shum Laka (Cameroon), one of the earliest known archaeological sites within the probable homeland of the Bantu language group1-11. One individual carried the deeply divergent Y chromosome haplogroup A00, which today is found almost exclusively in the same region12,13. However, the genome-wide ancestry profiles of all four individuals are most similar to those of present-day hunter-gatherers from western Central Africa, which implies that populations in western Cameroon today-as well as speakers of Bantu languages from across the continent-are not descended substantially from the population represented by these four people. We infer an Africa-wide phylogeny that features widespread admixture and three prominent radiations, including one that gave rise to at least four major lineages deep in the history of modern humans.
The deposits at Shum Laka, a rockshelter located in the Grassfields region of western Cameroon, are among the most important archaeological sources for the study of Late Pleistocene and Holocene prehistory in West-Central Africa [1-4]. The oldest human-occupied layers at the site date to ∼30,000 calendar years before present (BP), but of special interest are a series of artifacts and skeletons from ∼8000–3000 BP, between the Later Stone Age (LSA) and the Iron Age (Extended Data Fig. 1; Supplementary Information section 1). This transitional period, sometimes referred to as the Stone to Metal Age (SMA), featured a gradual appearance of new stone tools as well as pottery [3-5]. Subsistence evidence in the rockshelter during the SMA points primarily to foraging, but with increasing use of fruits from Canarium schweinfurthii coinciding with developments in material culture and serving as a foundation for later agriculture [3] (Supplementary Information section 1; Supplementary Table 1). These cultural changes and their early appearance at Shum Laka are particularly intriguing because the Cameroon/Nigeria border area during the late Holocene was likely the cradle of Bantu languages, and of populations whose descendants would spread across much of the southern half of Africa between ∼3000–1500 BP, resulting in the vast range and diversity of the Bantu language family [6-11].
A total of 18 human skeletons have been discovered at Shum Laka, comprising two distinct burial phases (Supplementary Information section 1) [1-3]. We attempted to retrieve DNA from six petrous bone samples and obtained working data from two early SMA and two late SMA individuals (∼8000 and ∼3000 BP, respectively; Table 1, Supplementary Table 2). The two earlier individuals—a boy of 4±1 years (2/SE I) lying on top of the lower limbs of an adolescent male of 15±3 years (2/SE II) [2]—were recovered from a primary double burial, while the two later individuals—a boy of 8±2 years (4/A) and a girl of 4±1 years (5/B) [2]—were in adjacent single burials.
Table 1:
Details for the four ancient Shum Laka individuals in the study
ID
Age at death (yrs)
Date (cal BP)
Radiocarbon date (uncal)
Sex
Mt hap
Y hap
Cov
SNPs
Mt/X contam (%)
2/SE I
4±1
7920–7690
6985 ± 30 BP (PSUAMS-6307)
M
L0a2a1
B
0.70
564164
1.0/1.0
2/SE II
15±3
7970–7800
7090 ± 35 BP (PSUAMS-6308)
M
L0a2a1
A00
7.71
1082018
1.5/0.6
4/A
8±2
3160–2970
2940 ± 20 BP (PSUAMS-6309)
M
L1c2a1b
B2b
3.83
935777
0.3/0.5
5/B
4±1
3210–3000
2970 ± 25 BP (PSUAMS-6310)
F
L1c2a1b
..
6.41
1014618
0.5/..
Calibrated direct radiocarbon dates are given as 95.4% CI (Methods). Age (mean ± SE) was determined from skeletal remains [2], and sex from genetic data. Mt/Y hap, mtDNA/Y-chromosome haplogroup; Cov, average sequencing coverage.; Mt/X contam, estimated contamination from mtDNA/X chromosome. See also Supplementary Table 2.
We extracted DNA from bone powder and prepared 2–4 libraries per individual for Illumina sequencing, enriching for ∼1.2 million target single-nucleotide polymorphisms (SNPs) across the genome (Methods; Supplementary Table 2). Final coverage ranged from 0.7–7.7× (0.56–1.08 million SNPs). Authenticity of the data was supported by the observed rate of apparent C-to-T substitutions in the final base of sequenced fragments (4–10%, within the expected range given our library preparation strategy [14]) and of heterozygosity for mitochondrial DNA (mtDNA) and for the X chromosome in males (estimated contamination 0.3–1.5%). We also generated whole-genome shotgun sequence data for individuals 2/SE II (∼18.5× coverage) and 4/A (∼3.9×), as well as genome-wide data (∼598,000 SNPs) for 63 individuals from five present-day Cameroonian populations (Extended Data Table 1; Supplementary Table 3).
Extended Data Table 1:
Populations used in the study
Population
Country
Language family
Date
Sample size
Data type
Reference
Shum Laka
Cameroon
~8000–3000 BP
4/1/1
1240k/DG/SG
This paper
Ancient Malawi HG
Malawi
~8100–2500 BP
7*
1240k
[22]
Mota
Ethiopia
~4500 BP
1
SG
[23]
Ancient South African HG
South Africa
~2000 BP
3[†]
SG
[21,22]
Taforalt
Morocco
~15,000–14,000 BP
6
1240k
[26]
Altai Neanderthal
Russia
~120,000 BP
1
DG
[78]
Aghem
Cameroon
NC
Present
28
HO
This paper
Bafut
Cameroon
NC
Present
11
HO
This paper
Baka
Cameroon
NC
Present
2
DG
[20]
Bakoko
Cameroon
NC
Present
1
HO
This paper
Bakola
Cameroon
NC
Present
2
DG
[20]
Bangwa
Cameroon
NC
Present
2
HO
This paper
Bedzan
Cameroon
NC
Present
2
DG
[20]
Fulani
Cameroon
NC
Present
2
DG
[20]
Lemande
Cameroon
NC
Present
2
DG
[25]
Mada
Cameroon
AA
Present
2
DG
[20]
Mbo
Cameroon
NC
Present
21
HO
This paper
Ngumba
Cameroon
NC
Present
2
DG
[20]
Tikar
Cameroon
NC
Present
2
DG
[20]
Agaw
Ethiopia
AA
Present
2
DG
[20]
Aka (Biaka)
Central African Republic
NC
Present
20/2
HO/DG
[22,25]
Chewa
Malawi
NC
Present
11
HO
[22]
Dinka
Sudan
NS
Present
7/4
HO/DG
[22,25]
French
France
IE
Present
3
DG
[25]
Hadza
Tanzania
KS
Present
5(2)/1
HO/DG
[22,25]
Han
China
ST
Present
4
DG
[25]
Herero
Namibia
NC
Present
2
DG
[25]
Khoesan
Namibia
KS
Present
22
HO
[22]
Mbuti
DR Congo
NC, NS
Present
10/4
HO/DG
[22,25]
Mende
Sierra Leone
NC
Present
8/2
HO/DG
[22,25]
Mursi
Ethiopia
NS
Present
2
DG
[20]
Sandawe
Tanzania
KS
Present
22
HO
[22]
Somali
Kenya
AA
Present
13
HO
[22]
Yoruba
Nigeria
NC
Present
70/3
HO/DG
[22,25]
List of populations used in analyses in the study. Data types are in-solution targeted SNP capture (1240k), whole-genome sequence with pseudo-haploid genotype calls (SG), high-coverage whole-genome sequence with diploid genotype calls (DG), and Human Origins SNP array (HO). For some populations, we used different sample sets for different analyses, indicated by slashes; Human Origins array genotyped individuals were used for PCA and for f-statistics testing differential relatedness to Shum Laka (Fig. 3B, Extended Data Fig. 3B). For Hadza, we used five individuals with Human Origins data for PCA and two of those five individuals for admixture graph modeling. HG, hunter-gatherers; AA, Afroasiatic; IE, Indo-European; KS, Khoesan; NC, Niger-Congo; NS, Nilo-Saharan; ST, Sino-Tibetan.
Individuals from Hora, Chencherere, and Fingira.
Individuals from Ballito Bay (A and B) and St. Helena Bay.
Uniparental markers and kinship analysis
All of the mtDNA and Y chromosome haplogroups we observe at Shum Laka are associated today with sub-Saharan Africans. The two earlier individuals carry mtDNA haplogroup L0a (specifically L0a2a1), which is widespread in Africa, while the two later individuals carry L1c (specifically L1c2a1b), which is found among both farmers and hunter-gatherers in Central and West Africa [15, 16]. Individuals 2/SE I and 4/A have Y chromosomes from macrohaplogroup B, often found today in Central African hunter-gatherers [17], while 2/SE II has the rare Y chromosome haplogroup A00, which was discovered in 2013 and is present at appreciable frequencies only in Cameroon, in particular among the Mbo and Bangwa in the western part of the country [12, 13]. A00 is the oldest known branch of the modern human Y chromosome tree, with a split time of ∼300,000–200,000 BP [12, 18, 19]. At 1666 positions (from whole-genome sequence data; Supplementary Table 4) that differ between present-day A00 [18] and all other Y chromosomes, the Shum Laka A00 carries the non-reference allele at 1521, translating to a within-A00 split at ∼37,000–25,000 BP (95% CI; Methods; Fig. 1).
Figure 1:
Y chromosome phylogeny.
Circles represent mutations along the (unrooted) A00 lineage where we observe the alternative (filled) or reference (empty) allele in the Shum Laka A00.
Leveraging the effects of chromosomal segments shared identical by descent (IBD), we computed rates of allelic identity for each pair of individuals to infer degrees of relatedness. Both contemporaneous pairs display elevated identity, with 2/SE I and 2/SE II at the level of fourth-degree relatives and 4/A and 5/B at the level of second-degree relatives (either uncle and niece, aunt and nephew, or half-siblings; Extended Data Fig. 2), supporting archaeological interpretations that the rockshelter was used as an extended family cemetery during both burial phases [2]. We would expect more recent shared ancestry for the contemporaneous pairs even if they were not closely related, but we observe clear signatures of long IBD segments across the genome, confirming their close family relatedness (Supplementary Information section 2). All four individuals also show evidence of recent inbreeding (i.e., intra-individual IBD).
Extended Data Figure 2:
Kinship analysis.
Shown are mean genome-wide allelic mismatch rates for each pair of individuals (blue), as well as intra-individual comparisons (red). We selected one read per individual at random at each targeted SNP (using all 1,233,013 targeted sites). Monozygotic twins (or intra-individual comparisons) are expected to have a value one-half as large as unrelated individuals; first-degree relatives, halfway between monozygotic twins and unrelated individuals; second-degree relatives, halfway between first-degree relatives and unrelated individuals; and so on. The presence of inbreeding also serves to reduce the rates of mismatches. For 4/A and 5/B, because both died as children, we can eliminate a grandparent-grandchild relationship, and the lack of long segments with both homologous chromosomes shared IBD implies that they are not double cousins (the few ostensible double-IBD stretches are likely a result of inbreeding; see Supplementary Information section 2). Thus, we can conclude that they were either uncle and niece (or aunt and nephew) or half-siblings. Bars show 99% confidence intervals (computed by block jackknife).
PCA and allele-sharing statistics
We visualized the genome-wide relationships between the Shum Laka individuals and diverse present-day and ancient sub-Saharan Africans (Extended Data Table 1) using principal component analysis (PCA). Initially, we computed axes using East and West Africans and southern and East-Central African hunter-gatherers (Fig. 2A). The Shum Laka individuals project to the right of Bantu speakers and related West African populations (Chewa, Mbo, and Mende), closest to present-day West-Central hunter-gatherers from Cameroon (Baka, Bakola, and Bedzan [20]) and the Central African Republic (Aka, often known as Biaka). We then carried out a second PCA using only West and East Africans and Aka to compute the axes, and again the Shum Laka individuals project in the direction of West-Central hunter-gatherers (Fig. 2B). By contrast, present-day Niger-Congo-speaking groups from western Cameroon cluster tightly with other West Africans (Fig. 2; Extended Data Fig. 3A). In both plots, the two earlier Shum Laka individuals fall slightly closer to West and East Africans, but based on their overall similarity, we grouped all four together for most subsequent analyses.
Figure 2:
PCA results.
(A) Broad-scale analysis. (B) Narrow-scale analysis. Groups in blue (including ancient individuals, filled symbols) were projected onto axes computed using the other populations, using 593,124 SNPs (Methods). HG, hunter-gatherers; S.L., Shum Laka; W-Cent. HG, Aka plus Cameroon hunter-gatherers (Baka, Bakola, and Bedzan).
Extended Data Figure 3:
Alternative PCA and allele-sharing analyses.
(A) Broad-scale PCA (differing from Fig. 2A by projecting all present-day Cameroon populations; again using 593,124 Human Origins SNPs). Groups shown in blue were projected onto axes computed using the other populations. HG, hunter-gatherers; S. L., Shum Laka. The W-Cent. HG grouping consists of Aka and Cameroon hunter-gatherers (Baka, Bakola, and Bedzan). The majority of the present-day Cameroon individuals fall in a tight cluster near other West Africans and Bantu speakers. (B) Relative allele sharing (mean ± SE, multiplied by 10,000, computed on 538,133 SNPs, as in Fig. 3B) with Shum Laka versus East Africans (f4 (X, Yoruba; Shum Laka, Somali); x-axis) and versus Aka (f4 (X, Yoruba; Shum Laka, Aka); y-axis) for present-day populations from Cameroon (blue points) and southern and eastern Bantu speakers (Herero in red and Chewa in orange). Mada and Fulani share more alleles with Shum Laka than with Aka, but this is likely a secondary consequence of admixture from East or North African sources (as reflected in greater allele sharing with Somali; see also Supplementary Information section 3). Bars show one standard error in each direction.
Using f-statistics (Fig. 3A), we investigated components of “deep ancestry” from sources diverging earlier than the split between non-Africans and most sub-Saharan Africans (above point (2) in Fig. 4A). We began with the statistic f (X, Mursi; South Africa HG, Han), which is expected to be increasingly positive for increasing deep ancestry in population X (via allele-sharing between X and ancient South African hunter-gatherers [21, 22]), with a baseline of zero set by Mursi, Nilotic-speaking pastoralists from western Ethiopia [20]. Shum Laka shows a large positive statistic, comparable to West-Central African hunter-gatherers (Fig. 3A, top), while other West Africans (e.g., Yoruba and Mende) yield smaller but significantly positive values, as do East African hunter-gatherers (Hadza from Tanzania and the ∼4500 BP Mota individual from Ethiopia [23]). We also obtained consistent results from analogous statistics with different reference groups (Extended Data Table 2).
Figure 3:
Allele-sharing statistics.
(A) Statistics sensitive to deep ancestry (mean ± 2SE, multiplied by 1000; blue, deeper than non-Africans; red, deeper than South African hunter-gatherers; computed on 1,121,119 SNPs). S.L., Shum Laka; SA, ancient South African hunter-gatherers. (B) Relative allele sharing (mean ± SE, multiplied by 10,000; computed on 538,133 SNPs) with Shum Laka versus East Africans (f4(X, Yoruba; Shum Laka, Somali); x-axis) and versus Aka (f4 (X, Yoruba; Shum Laka, Aka); y-axis) for present-day populations from Cameroon (blue) and southern (Herero, red) and eastern (Chewa, orange) Bantu speakers. See also Extended Data Fig. 3B.
Figure 4:
Admixture graph results.
Points at which multiple lineages are shown diverging simultaneously indicate splits occurring in short succession (whose order we cannot confidently assess) but do not represent exact multifurcations. Key points are (1) early modern human split, (2) East African divergences, and (3) Bantu expansion. Branch lengths not drawn to scale. (A) Full model; see also Extended Data Fig. 4. HG, hunter-gatherer; AP, agro-pastoralist; *proportion not well constrained. (B) Geographical structure: shaded areas denote hypothesized historical locations of lineages descended from split point (1) in panel (A), and branching order is shown for populations descended from split point (2) (one ancestry component per population, with leaf nodes at sampling locations). The blue star represents Shum Laka (dashed line, possible direction of gene flow).
Extended Data Table 2:
Allele-sharing statistics for deep ancestry
f4(X, Mursi; SA, Han)
f4(X, Mota; SA, Han)
f4(X, Han; SA, Mursi)
f4(X, Mota; SA, Mursi)
Test pop
Value
Z-score
Value
Z-score
Value
Z-score
Value
Z-score
Dinka
1.4
5.8
−2.0
−5.5
0.1
0.2
−6.3
−20.2
Mota
3.4
9.0
0
0
6.3
18.1
0
0
Hadza
4.1
10.3
0.8
1.7
7.3
21.2
1.0
2.7
Yoruba
4.7
17.8
1.3
3.8
5.2
18.2
−1.1
−3.5
Lemande
5.0
16.8
1.7
4.5
5.7
18.2
−0.6
−2.1
Mende
5.7
19.1
2.3
6.3
6.3
20.0
0
0
Shum Laka
11.7
38.7
8.3
22.6
12.7
40.8
6.4
20.5
Aka
13.3
39.1
9.9
25.2
13.6
40.4
7.3
22.0
Mbuti
16.4
50.4
13.0
34.9
16.4
49.9
10.0
31.8
Mursi
0
0
−3.4
−9.0
..
..
..
..
Agaw
..
..
..
..
0.1
0.3
−6.2
−18.9
SA
..
..
..
..
..
..
..
..
f4(X, Mursi; SA, Mota)
f4(X, Han; SA, Mota)
f4(X, Han; SA, Yor)
f4(X, Mursi; Chimp, Yor)
Test pop
Value
Z-score
Value
Z-score
Value
Z-score
Value
Z-score
Dinka
0.8
3.3
3.7
11.9
−0.7
−2.8
−0.9
−4.7
Mota
..
..
..
..
5.7
18.1
5.2
17.7
Hadza
4.1
11.5
7.0
17.7
4.8
15.2
3.4
11.4
Yoruba
4.1
15.7
7.1
21.6
..
..
..
..
Lemande
4.1
14.5
7.1
21.0
..
..
..
..
Mende
4.8
17.3
7.8
22.5
..
..
..
..
Shum Laka
9.1
29.8
12.0
33.7
8.0
28.7
8.3
31.9
Aka
10.3
33.4
13.2
35.5
7.8
24.8
8.5
30.1
Mbuti
12.5
41.8
15.5
44.1
11.6
40.8
11.8
46.3
Mursi
0
0
3.0
8.8
0.6
2.2
0
0
Agaw
−2.4
−7.7
0.6
1.8
0
0.2
−0.2
−0.9
SA
..
..
..
..
..
..
20.3
66.0
Variations of allele-sharing statistics (multiplied by 1000; computed on 1,121,119 SNPs) sensitive to ancestry in the test population X from a deeply-splitting lineage, along with Z-scores for difference from zero. We note that the zero level has a different meaning depending on which population is in the second position in the statistic. Blank entries are statistics that are confounded by specific relationships between the test population and one of the reference populations (in the third or fourth position; either duplication of the same group, Agaw with Han due to non-African-related ancestry, or Yoruba with other West Afrians). From the statistics f4 (Mursi/Agaw, Han; South Africa HG, Yoruba), we find minimal differences in deep ancestry proportions among Han, Mursi, and Agaw; from f4 (X, Mursi; Chimp, Yoruba), we obtain a value for South African hunter-gatherers that is roughly twice as large as for Central African hunter-gatherers. SA, ancient South African hunter-gatherers; Yor, Yoruba.
Next, we computed f4 (X, Mursi; Chimp, South Africa HG) (using chimpanzee as an outgroup symmetric to all human populations) to evaluate whether any of this deep ancestry is from sources diverging more deeply than southern African hunter-gatherers (the modern human lineage with the oldest known average split date [21, 24, 25]). Previous work has shown that southern African hunter-gatherers are not a symmetric outgroup relative to other sub-Saharan Africans, with West Africans (especially Mende) having excess affinity toward deeper outgroups [22]. Indeed, our test statistic is maximized in Mende and other West Africans (Fig. 3A, bottom). Hadza and Mota have values close to zero, and Shum Laka and Central African hunter-gatherers are intermediate. Some populations yield positive values for both f4-statistics (Fig. 3A), but the two sets are poorly correlated, implying that they in part reflect separate signals.Combining our newly genotyped individuals with published data [20], we searched for differential allele-sharing between the Shum Laka individuals (compared to either East Africans [Somali] or Aka) and present-day Cameroonians (Fig. 3B, Extended Data Fig. 3B). We identified three distinct clusters: (a) Mada and Fulani, (b) hunter-gatherers, and (c) other Niger-Congo-speaking populations (in closeup in Fig. 3B). Within the third cluster are the only groups—Mbo, Aghem, and Bafut, all living close to Shum Laka today—with significantly Shum Laka-directed statistics in both dimensions, consistent with small proportions of Shum Laka-related admixture (maximum ∼7–8%; Supplementary Information section 3).
Admixture graph analysis
Finally, we built an admixture graph (Methods, Fig. 4A, Extended Data Fig. 4) co-modeling the ancient Shum Laka, Mota, and South African hunter-gatherer individuals; present-day Mbuti, Aka, Agaw (Afroasiatic speakers from Ethiopia [20]), Yoruba, Mende, and Lemande; non-Africans (French); and two outgroups (Altai Neanderthal and chimpanzee). We also fit versions of the model using alternative SNP ascertainments and additional populations (Hadza, Mbo, Herero, Chewa, Mursi, Baka, Bakola, Bedzan, Mada, Fulani, and ancient individuals from Taforalt in Morocco [26]) and obtained similar results (Extended Data Table 3; Supplementary Information section 3).
Extended Data Figure 4:
Primary inferred admixture graph with full parameters.
Of the ∼1.2M targeted SNPs, 932k are used for fitting (i.e., are covered by all populations in the model). Branch lengths (in units of squared allele frequency divergence) are rounded to the nearest integer. All f-statistics relating the populations are predicted to within 2.3 standard errors of their observed values.
Extended Data Table 3:
Admixture graph parameter estimates
Model version:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Mixture proportions (%)
Shum Laka basal WA
64
66
62
71
64
58
63
61
63
61
64
64
64
64
63
63
63
64
61
69
63
67/62*
Aka Bantu-associated
59
59
57
63
59
56
58
57
59
58
59
59
59
59
59
59
58
58
59
58
62
61
59
Mbuti Bantu-associated
26
24
33
19
28
27
26
12
28
30
32
25
24
26
29
28
35
35
25
35
23
36
27
Mbuti East African-related
17
19
10
27
14
9
16
23
15
13
11
19
20
18
13
14
6
6
18
9
23
8
16
West African clade archaic
2
2
4
4
3
3
3
2
2
2
3
2
2
2
3
3
3
2
..
..
..
..
2
West African clade
10
9
17
8
12
29
15
24
11
18
19
9
8
9
14
13
29
29
..
..
..
..
11
deep modern human Mende deep
4
4
4
3
4
3
4
6
5
5
5
4
4
4
5
5
5
5
4
4
4
3
4
ancestry Mota deep ancestry
29
29
30
29
30
31
31
30
29
31
29
29
29
28
30
31
29
29
29
30
27
26
29
Branch lengths
Basal WA split[†]
2
3
3
3
3
1
3
2
2
2
2
2
3
3
2
2
3
..
2
3
3
1
3
South African HG split[‡]
1
1
0
4
1
−1
1
2
1
1
1
1
1
1
1
1
1
1
1
0
4
0
1
Ghost modern human split[#]
1
1
1
−3
1
1
0
−2
1
0
−1
1
1
1
0
1
1
1
..
..
..
..
2
Key admixture graph parameter estimates across different model versions (see Supplementary Information section 3 for full details): 1, primary model; 2, no “dummy” admixture; 3, African-ascertained SNPs; 4, transversion SNPs; 5, Shum Laka whole-genome sequence data; 6, outgroup-ascertained transversions; 7, Hadza added; 8, Mbo in place of Lemande; 9, Herero added; 10, Chewa added; 11, Mursi in place of Agaw; 12, Baka added; 13, Bakola added; 14, Bedzan added; 15, Mada added; 16, Fulani added; 17, Taforalt added; 18, alternative admixture for Shum Laka; 19, alternative deep source; 20, alternative deep source with African-ascertained SNPs; 21, alternative deep source with transversion SNPs; 22, alternative deep source with outgroup-ascertained transversions; 23, Shum Laka pairs fit separately. HG, hunter-gatherers.
Earlier pair/later pair
Units above the main West African clade
Units below the split of the Central African hunter-gather lineage (negative value indicates distance above)
Units along the Central African hunter-gather lineage (negative values indicate distances along an adjacent edge)
Among modern humans, the deepest-splitting branch is inferred to be the one leading to Central African hunter-gatherers, although four lineages diverge in a very short span: those contributing the primary ancestry to (a) Central African hunter-gatherers, (b) southern African hunter-gatherers, and (c) other modern human populations, along with (d) a “ghost” source contributing a minority of the ancestry in West Africans and the Mota individual. Central African hunter-gatherers separate into eastern (Mbuti) and western clades, with the latter then branching into components represented in Aka and Shum Laka. Next, a second cluster of divergences involves West Africans, two East African lineages (hunter-gatherer-associated and agro-pastoralist-associated), and non-Africans, the latter tentatively inferred to be a sister group to Mota but with no deep “ghost” ancestry. Within the West African clade, we identify Yoruba and Mende as sister groups, with Lemande as an outgroup, and most basally a separate West African-related lineage contributing to Shum Laka (64%). A Bantu-associated source (most closely related to Lemande) contributes 59% of the ancestry in Aka and 26% in Mbuti (who also harbor ancestry [17%] from an East African agro-pastoralist-related source). In a model separating the ∼8000 BP and ∼3000 BP Shum Laka pairs, the latter have ∼5% more Central African hunter-gatherer-related ancestry (as confirmed by the significantly positive statistic f4 (Shum Laka 8000 BP, Shum Laka 3000 BP; Yoruba, Aka) [Z=4.2]; Supplementary Information section 3).We can also obtain a good fit for the Shum Laka individuals in a less-parsimonious alternative model using three components, replacing the basal West African source with a combination of ancestry from inside the clade defined by the other West African populations and from a source splitting between East and West Africans (near one lineage contributing to Taforalt; Extended Data Fig. 5, Supplementary Information section 3). However, two-component models for Shum Laka with the majority source splitting closer to other West or East Africans are rejected (Z=7.1 and Z=3.7, respectively).
Extended Data Figure 5:
Schematic of first alternative admixture graph.
Results are shown including ancient individuals from Taforalt in Morocco associated with the Iberomaurusian culture, with the Shum Laka individuals modeled as having a mixture of hunter-gatherer-related ancestry plus two additional components: one from within the main portion of the West African clade, and one splitting at nearly the same point as one of the sources contributing ancestry to Taforalt. Branch lengths are not drawn to scale. Points at which multiple lineages are shown diverging simultaneously indicate splits occurring in short succession (whose order we cannot confidently assess) but are not meant to represent exact multifurcations. HG, hunter-gatherer; AP, agro-pastoralist. ∗ Proportion not well constrained (for Mbuti, the sum of the two indicated proportions is well constrained but not the separate values). See Supplementary Information section 3 for full inferred model parameters.
The West African clade is distinguished by admixture from a deep source that can be modeled as a combination of modern human and archaic ancestry. The modern human component diverges at almost the same point as Central and southern African hunter-gatherers and is tentatively related to the deep source contributing ancestry to Mota, while the archaic component diverges close to the split between Neanderthals and modern humans (Supplementary Information section 3). The signals of deep ancestry in West African-related groups (Fig. 3A) can be explained by two admixture events: one along the ancestral West African lineage, and a second, smaller contribution (∼4%) to Mende from the same source (Fig. 4A). Accordingly, f4 -statistics testing for ancestry basal to southern African hunter-gatherers (Fig. 3A, bottom) are well correlated to inferred proportions of ancestry from the West African clade (Extended Data Fig. 6). We estimate the shared admixture to introduce 10% deep modern human and 2% archaic ancestry, although the first proportion is not well constrained (Extended Data Table 3). An alternative model with no archaic component, in which the West African clade receives deep ancestry from a single source [22] splitting before point (1) in Fig. 4A, also provides a reasonable fit to the data (Extended Data Fig. 5, Supplementary Information section 3), although it does not account for previous evidence of archaic ancestry in sub-Saharan Africans [27-31].
Extended Data Figure 6:
Deep ancestry correlation from the West African clade.
An allele-sharing statistic sensitive to ancestry splitting more deeply than South African hunter-gatherers (f4 (X, Mursi; Chimp, South Africa HG), mean ± 2SE from block jackknife, computed on 1,121,119 SNPs, as in Fig. 3A) is shown as a function of West African-related ancestry (from admixture graph results; Mota, Yoruba, and Lemande shifted slightly away from the boundaries for legibility). The (relative) allele-sharing rate for Mursi is identically zero according to the definition of the statistic.
Shum Laka in genetic and archaeological context
Our analyses show that the four sampled children from Shum Laka can be modeled as admixed with ∼35% ancestry related to West-Central African hunter-gatherers and ∼65% from a basal West African-related source, or alternatively as a mixture of hunter-gatherer-related ancestry plus two additional components, one from inside the clade of present-day West Africans and one splitting between East and West Africans. The first component plausibly represents ancestry present in the area since at least the LSA, whereas the second component (third in the alternative model) may have originated farther to the north, given the geography and phylogeny of other sampled populations (Fig. 4B). The chronology of the archaeological record at Shum Laka suggests a possible northern influence on cultural developments during the SMA [3, 9]; these include changes in stone tools, which can be interpreted as a fusion of local LSA tool-making traditions with new macrolithic technologies introduced from the north [3], and the appearance of ceramics (four sherds found in the early SMA burial layer, and more abundant and distinct ceramics in later SMA deposits) potentially related to earlier pottery-working traditions in the Sahara and Sahel [3, 32]. Gene flow from the north before 8000 BP is also plausible due to a short period of Saharan and Sahelian aridification [3, 33]. Present-day groups in northern West Africa and the Sahel have substantial admixture connected to later migrations [34], so identifying the exact source area may await additional ancient DNA studies.Although the scope of our sampling is limited to two individuals at either end of the SMA, the observed genetic similarity across a span of almost 5000 years—also consistent with skeletal morphometric analyses—suggests a long-term presence of related peoples who used the rockshelter for various activities, including burying their dead (Supplementary Information section 1). Today, however, most populations in Cameroon are more closely related to other West Africans than to the group represented by these individuals. Present-day hunter-gatherers in Cameroon are also not descended substantially from this specific group, as they lack the signal of basal West African ancestry (Supplementary Information section 3). We do observe elevated allele-sharing between the Shum Laka individuals and present-day Grassfields populations, so the genetic discontinuity is not absolute. Additionally, the adolescent male 2/SE II carried an A00 Y chromosome, suggesting that the concentration of this haplogroup in western Cameroon may have a long history, and moreover that A00 was formerly more diverse, given that the Shum Laka sequence falls outside of known present-day variation [12, 13]. The ∼300,000–200,000 BP divergence time of A00 from other modern human haplogroups [18, 19] could support its association either with the Central African hunter-gatherer-related ancestry component of the Shum Laka individuals or with the deep modern human portion of their West African-related ancestry.Linguistic and genetic evidence points to western Cameroon as the most likely area for the development of Bantu languages and as the ultimate source of subsequent migrations of Bantu speakers, and while the regional mid-Holocene archaeological record is sparse, Shum Laka has been highlighted as possibly an important site in the early phase of this process [1–4, 6–11]. However, the genetic profiles of our four sampled individuals—even by ∼3000 BP, when the spreads of Bantu languages and of ancestry associated with Bantu-speaking populations was already underway—are very different from those of most Niger-Congo speakers today, implying that these individuals are not representative of the primary source population(s) ancestral to present-day Bantu speakers. These results neither support nor contradict a central role for the Grassfields area in the origins of Bantu-speaking peoples, and it may be that multiple, highly differentiated populations formerly lived in the region, with potentially either high or low levels of linguistic diversity. It would not be surprising if the Shum Laka site itself was used (either successively or concurrently) by multiple groups with different ancestry, cultural traditions, or languages [1], evidence of which may not be visible from the collection of remains as preserved today.
Implications for deep African population history
By analyzing data from Shum Laka and other ancient individuals in conjunction with present-day groups, we gain new insights into African population structure on multiple timescales. First, we infer a series of closely spaced population splits involving West African-related and two East African-related lineages, as well as non-Africans (point (2) in Fig. 4A). From the geography of the populations involved, the center of this radiation was plausibly in East Africa (Fig. 4B), with a date of ∼80,000–60,000 BP based on estimated divergences of African and non-African populations [24, 35]. Such an expansion is also consistent with mtDNA phylogeography—specifically the diversification of haplogroup L3, likely originating in East Africa ∼70,000 BP [36, 37]—and potentially with the origins of clade CT in the Y chromosome tree at a similar time depth [18, 38].Second, we infer a phase of divergences involving at least four lineages early in the history of modern humans (point (1) in Fig. 4A). Recent consensus has been that southern African hunter-gatherers, who split from other populations ∼250,000–200,000 BP, represent the deepest sampled branch of modern human variation [21, 24, 25]. Our results suggest that Central African hunter-gatherers split at close to the same time (perhaps slightly earlier), and thus that both clades, as well as the lineage that would later diversify at point (2), originated as part of a large-scale African radiation.In addition to the well-characterized deep lineages, we also detect at least one deep “ghost” source contributing to West Africans and East African hunter-gatherers. This signal corroborates previous evidence for Hadza and Sandawe [39] and for West Africans [22], although we find that the best fit is a source splitting near the same point as southern and Central African hunter-gatherers. Our results are also consistent with previous reports of archaic ancestry in African populations [27-31], specifically in West Africans. The presence of deep ancestry in the West African clade is notable in light of the Pleistocene archaeological record [5, 40], which includes Homo sapiens fossils dated to ∼300,000 BP in northwestern Africa [41], as well as an individual with archaic features buried ∼12,000 BP in southwestern Nigeria (the oldest known human fossil from West Africa proper) [42]. Middle Stone Age artifacts have also been found in parts of West Africa into the terminal Pleistocene [43], despite the development of LSA technologies elsewhere (e.g., Shum Laka). Thus, the available material and fossil evidence is concordant with our genetic results in indicating long-term African population structure and admixture [44, 45].Further genetic studies may reveal additional complexities in deep human population history, while some early human groups will likely remain known only through fossils [44, 45]. Based on our current understanding, the presence of at least four modern human lineages that diversified ∼250,000–200,000 BP and are represented in people living today supports archaeological evidence that this was a pivotal period for human evolution in Africa.
Methods
Ancient DNA sample processing
We obtained bone powder from the Shum Laka skeletons (see Supplementary Information section 1 for more information on the site and burials) by drilling cochlear portions of petrous bone samples in a clean room facility at the Royal Belgian Institute of Natural Sciences. In dedicated clean rooms at Harvard Medical School, we extracted DNA using published protocols [46, 47]. From the extracts, we prepared barcoded double-stranded libraries treated with uracil-DNA glycosylase (UDG) to reduce the rate of characteristic ancient DNA damage [14, 48] in a modified partial UDG preparation including magnetic bead cleanups [14, 49]. For the SNP capture data, we used two rounds of in-solution target hybridization to enrich for sequences overlapping the mitochondrial genome and approximately 1.2 million genome-wide SNPs [50-54]. We then added 7-base-pair indexing barcodes to the adapters of each library [55] and sequenced on an Illumina NextSeq 500 machine with 76-base-pair paired-end reads. For individuals 2/SE II and 4/A, we also generated whole-genome shotgun data from the same libraries but without the target enrichment step. Sequencing was performed at the Broad Institute on an Illumina HiSeq X Ten machine, using 19 lanes for 2/SE II (yielding approximately 18.5× average coverage, including 1,216,658 sites covered from the set of target SNPs used in most analyses) and two lanes for 4/A (3.9× average coverage, 1,158,884 sites covered).From the raw sequencing results, we retained reads with no more than one mismatch per read pair to the library-specific barcodes. Prior to alignment, we merged paired-end sequences based on forward and reverse mate overlaps and trimmed barcodes and adapters. Preprocessed reads were then mapped to both the mitochondrial reference genome RSRS [37] and the human reference genome (version hg19) using the “samse” command with default parameters in BWA (version 0.6.1) [56]. Duplicate molecules (having the same mapped start and end positions and strand orientation) were removed post-alignment. We filtered the mapped sequences (requiring mapping quality scores of at least 10 for targeted SNP capture and 30 for whole-genome shotgun data) and trimmed two terminal bases to eliminate (almost all) damage-induced errors.For mitochondrial DNA, we called haplogroups using HaploGrep2 [57]. For nuclear DNA obtained from SNP capture and for the whole-genome shotgun data for individual 4/A, we selected one allele at random per site to create pseudo-haploid genotypes. For the whole-genome shotgun data for individual 2/SE II, we used a previously described reference-bias-free diploid genotype calling procedure [25], converting resulting genotypes into a fasta-like encoding allowing for extraction of data at specified sites via cascertain and cTools [25]. We determined the sex of each individual by examining the fractions of sequences mapping to the X and Y chromosomes [58], and we determined Y-chromosome haplogroups by comparing sequence-level SNP information to the tree established by the International Society of Genetic Genealogy (http://www.isogg.org).To ensure authenticity, we computed the proportion of C-to-T deamination errors in terminal positions of sequenced molecules and evaluated possible contamination via heterozygosity at variable sites in haploid genome regions, using contamMix [50] and ANGSD [59] for mtDNA and the X chromosome (in males), respectively. Observed damage rates (4–10%) were relatively low but within the expected range after partial UDG treatment [14], and apparent heterozygosity rates for mtDNA (0.3–1.5% estimated contamination) and the X chromosome (0.5–1.0% estimated contamination) were minimal. The molecular preservation of the samples is impressive given the long-term warm and humid climate at Shum Laka [60] (supporting a mixed forest-savannah environment, at an elevation of ∼1650 meters above sea level).
Radiocarbon dates
At the Pennsylvania State University (PSU) Radiocarbon Laboratory, we generated new direct radiocarbon dates via accelerator mass spectrometry (AMS) for the four analyzed individuals, using fragments of the same temporal bone portions that were sampled for ancient DNA. We extracted and purified amino acids using a modified XAD process [61] and assessed sample quality via stable isotope analysis. C:N ratios for all four samples fell between 3.3 and 3.4, well within the nominal range of 2.9–3.6 indicating good collagen preservation [62]. The PSU dates were in good agreement with previously reported direct dates for different bones from individuals 2/SE II (8160–7790 cal BP, 7150 ± 70 BP, OxA-5203) and 4/A (3380–3010 cal BP, 3045 ± 60 BP, OxA-5205) [1, 2, 63, 64], but on the basis of a (modestly) aberrant date [65] from a rib of individual 2/SE I (Supplementary Table 5), we restricted our final reported results to the temporal bones. We performed calibrations using OxCal [66] version 4.3.2 with a mixture of the IntCal13 [67] and SHCal13 [68] curves, specifying “U(0,100)” to allow for a flexible combination [66, 69], and rounding final results to the nearest 10 years (see also Supplementary Information section 1).
New present-day data
We generated genome-wide SNP genotype data for 63 individuals from five present-day Cameroonian populations on the Human Origins array: Aghem (28), Bafut (11), Bakoko (1), Bangwa (2), and Mbo (21) (Extended Data Table 1; Supplementary Table 3). Samples were collected with informed consent, with collection and analysis approved by the UCL/UCLH Committee on the Ethics of Human Research, Committee A and Alpha.
A00 Y chromosome split time estimation
Present-day A00 Y chromosomes are classified into the subtypes A00a, A00b, and A00c, whose divergence times from each other have not been precisely estimated but are quite recent, perhaps only a few thousand years [12, 13]. To estimate the split time of the Shum Laka A00 Y chromosome from present-day A00, we called genotypes for individual 2/SE II (from our whole-genome sequence data) at a set of positions where sequences from two present-day individuals with haplogroup A00 [18] differ from all non-A00 individuals. (At every subtype-specific site for which we had coverage, the Shum Laka A00 carries the ancestral allele.) To avoid needing to determine the status of mutations as ancestral or derived, we considered the entire unrooted lineage specific to A00 (see Fig. 1). The total time span represented by this lineage is approximately 359,000 years, using published values of ∼275,000 BP for the divergence of the A00 lineage from other modern human haplogroups [19] and ∼191,000 BP for the next-oldest split within macrohaplogroup A [70]. With a requirement of at least 90% agreement among the reads at each site, we called 1521 positions as having the alternative allele (i.e., matching present-day A00 and differing from the human reference sequence) and 145 as having the reference allele (taking the average of 143 and 147 for the two present-day individuals). The fraction 145/(145+1521) then defines the position of the Shum Laka split along the (unrooted) A00 lineage. We note that split times computed either from all sites (relaxing the 90% threshold and using the majority allele), or from additionally requiring at least two reads per site, differ from our primary estimate by only a few hundred years. To produce a confidence interval, we used the variance in the published estimates and assumed an independent Poisson sampling error for the number of observed reference alleles. The final point estimate was ∼31,000 BP (95% CI: 37,000–25,000 BP), meaning that the Shum Laka A00 (with a sample date of ∼8000 BP) cannot be directly ancestral to the present-day subtypes.
PCA and allele-sharing statistics
We performed PCA using smartpca (with the “lsqproject” and “autoshrink” options) [71, 72] and computed f4-statistics using ADMIXTOOLS (with standard errors estimated via block jackknife over 5 cM chromosomal segments) [73]. We projected all ancient individuals in PCA rather than using them to compute axes in order to avoid artifacts caused by missing data. In each PCA, we also projected a subset of the present-day populations to allow controlled comparisons with ancient individuals. In most cases, reported f4-statistics are based on the approximately 1.15M autosomal SNPs from our target capture set. For PCA and for f4-statistics testing differential relatedness to Shum Laka, we used autosomal SNPs from the Human Origins array (a subset of the target capture set), with some populations in the analyses only genotyped on this subset (see Extended Data Table 1). For these latter f4-statistics, we excluded for all populations a set of roughly 40k SNPs having high missingness in the present-day Cameroon data.
Admixture graphs
We fit admixture graphs with the ADMIXTUREGRAPH (qpGraph) program in ADMIXTOOLS (with the options “outpop: NULL,” “lambdascale: 1,” “inbreed: YES,” and “diag: 0.0001”) [73-75], using the 1.15M autosomal SNPs from our target capture set by default, and other sets of SNPs in alternative model versions as specified. The program requires as input the branching order of the populations in the graph and a list of admixture events, and it then solves for the optimal parameters of the model (branch lengths and mixture proportions) via an objective function measuring the deviation between predicted and observed values of a basis set of f-statistics. From the inferred parameters, poorly fitting topologies (including positions of admixture sources) can be corrected by changing split orders at internal nodes that appear as trifurcations under the constraints enforced by the input (see Supplementary Information section 3).To evaluate the fit quality of output models, we employed two metrics: first, a list of residual Z-scores for all f-statistics relating the populations in the graph, and second, a combined approximate log-likelihood score. The first metric is useful for identifying particularly poorly fitting models and the elements that are most responsible for the poor fits, while the second provides a means for comparing the overall fits of separate models (Supplementary Information section 3). In order to assess the degree of constraint on individual parameter inferences, we were guided primarily by the variability across different model versions (using different populations and SNP sets; see Extended Data Table 3 and Supplementary Information section 3), which reflects both statistical uncertainty and changes in model-specific assumptions. In our primary model, all f-statistics relating subsets of the populations are predicted to within 2.3 standard errors of their observed values.Initially, we detected a slight but significant signal (max Z=2.5) of allele-sharing between Shum Laka and non-Africans, which we hypothesize is due to a small amount of DNA contamination. To prevent this effect from influencing our results, we included a “dummy” admixture of non-African ancestry into Shum Laka (inferred 1.1%, consistent with mtDNA- and X chromosome-based contamination estimates), although model parameters without the dummy admixture are also very similar (Extended Data Table 3, Supplementary Information section 3).
Data availability
The aligned sequences are available through the European Nucleotide Archive under accession number PRJEB32086. Genotype data used in analysis are available at https://reich.hms.harvard.edu/datasets.
Shown are mean genome-wide allelic mismatch rates for each pair of individuals (blue), as well as intra-individual comparisons (red). We selected one read per individual at random at each targeted SNP (using all 1,233,013 targeted sites). Monozygotic twins (or intra-individual comparisons) are expected to have a value one-half as large as unrelated individuals; first-degree relatives, halfway between monozygotic twins and unrelated individuals; second-degree relatives, halfway between first-degree relatives and unrelated individuals; and so on. The presence of inbreeding also serves to reduce the rates of mismatches. For 4/A and 5/B, because both died as children, we can eliminate a grandparent-grandchild relationship, and the lack of long segments with both homologous chromosomes shared IBD implies that they are not double cousins (the few ostensible double-IBD stretches are likely a result of inbreeding; see Supplementary Information section 2). Thus, we can conclude that they were either uncle and niece (or aunt and nephew) or half-siblings. Bars show 99% confidence intervals (computed by block jackknife).
Alternative PCA and allele-sharing analyses.
(A) Broad-scale PCA (differing from Fig. 2A by projecting all present-day Cameroon populations; again using 593,124 Human Origins SNPs). Groups shown in blue were projected onto axes computed using the other populations. HG, hunter-gatherers; S. L., Shum Laka. The W-Cent. HG grouping consists of Aka and Cameroon hunter-gatherers (Baka, Bakola, and Bedzan). The majority of the present-day Cameroon individuals fall in a tight cluster near other West Africans and Bantu speakers. (B) Relative allele sharing (mean ± SE, multiplied by 10,000, computed on 538,133 SNPs, as in Fig. 3B) with Shum Laka versus East Africans (f4 (X, Yoruba; Shum Laka, Somali); x-axis) and versus Aka (f4 (X, Yoruba; Shum Laka, Aka); y-axis) for present-day populations from Cameroon (blue points) and southern and eastern Bantu speakers (Herero in red and Chewa in orange). Mada and Fulani share more alleles with Shum Laka than with Aka, but this is likely a secondary consequence of admixture from East or North African sources (as reflected in greater allele sharing with Somali; see also Supplementary Information section 3). Bars show one standard error in each direction.
Primary inferred admixture graph with full parameters.
Of the ∼1.2M targeted SNPs, 932k are used for fitting (i.e., are covered by all populations in the model). Branch lengths (in units of squared allele frequency divergence) are rounded to the nearest integer. All f-statistics relating the populations are predicted to within 2.3 standard errors of their observed values.
Schematic of first alternative admixture graph.
Results are shown including ancient individuals from Taforalt in Morocco associated with the Iberomaurusian culture, with the Shum Laka individuals modeled as having a mixture of hunter-gatherer-related ancestry plus two additional components: one from within the main portion of the West African clade, and one splitting at nearly the same point as one of the sources contributing ancestry to Taforalt. Branch lengths are not drawn to scale. Points at which multiple lineages are shown diverging simultaneously indicate splits occurring in short succession (whose order we cannot confidently assess) but are not meant to represent exact multifurcations. HG, hunter-gatherer; AP, agro-pastoralist. ∗ Proportion not well constrained (for Mbuti, the sum of the two indicated proportions is well constrained but not the separate values). See Supplementary Information section 3 for full inferred model parameters.
Deep ancestry correlation from the West African clade.
An allele-sharing statistic sensitive to ancestry splitting more deeply than South African hunter-gatherers (f4 (X, Mursi; Chimp, South Africa HG), mean ± 2SE from block jackknife, computed on 1,121,119 SNPs, as in Fig. 3A) is shown as a function of West African-related ancestry (from admixture graph results; Mota, Yoruba, and Lemande shifted slightly away from the boundaries for legibility). The (relative) allele-sharing rate for Mursi is identically zero according to the definition of the statistic.
Schematic of second alternative admixture graph.
Results are shown with a single-component deep source for West Africans. Branch lengths are not drawn to scale. Points at which multiple lineages are shown diverging simultaneously indicate splits occurring in short succession (whose order we cannot confidently assess) but are not meant to represent exact multifurcations. HG, hunter-gatherer; AP, agro-pastoralist.*Proportion not well constrained (for Mbuti, the sum of the two indicated proportions is well constrained but not the separate values). See Supplementary Information section 3 for full inferred model parameters.Populations used in the studyList of populations used in analyses in the study. Data types are in-solution targeted SNP capture (1240k), whole-genome sequence with pseudo-haploid genotype calls (SG), high-coverage whole-genome sequence with diploid genotype calls (DG), and Human Origins SNP array (HO). For some populations, we used different sample sets for different analyses, indicated by slashes; Human Origins array genotyped individuals were used for PCA and for f-statistics testing differential relatedness to Shum Laka (Fig. 3B, Extended Data Fig. 3B). For Hadza, we used five individuals with Human Origins data for PCA and two of those five individuals for admixture graph modeling. HG, hunter-gatherers; AA, Afroasiatic; IE, Indo-European; KS, Khoesan; NC, Niger-Congo; NS, Nilo-Saharan; ST, Sino-Tibetan.Individuals from Hora, Chencherere, and Fingira.Individuals from Ballito Bay (A and B) and St. Helena Bay.Allele-sharing statistics for deep ancestryVariations of allele-sharing statistics (multiplied by 1000; computed on 1,121,119 SNPs) sensitive to ancestry in the test population X from a deeply-splitting lineage, along with Z-scores for difference from zero. We note that the zero level has a different meaning depending on which population is in the second position in the statistic. Blank entries are statistics that are confounded by specific relationships between the test population and one of the reference populations (in the third or fourth position; either duplication of the same group, Agaw with Han due to non-African-related ancestry, or Yoruba with other West Afrians). From the statistics f4 (Mursi/Agaw, Han; South Africa HG, Yoruba), we find minimal differences in deep ancestry proportions among Han, Mursi, and Agaw; from f4 (X, Mursi; Chimp, Yoruba), we obtain a value for South African hunter-gatherers that is roughly twice as large as for Central African hunter-gatherers. SA, ancient South African hunter-gatherers; Yor, Yoruba.Admixture graph parameter estimatesKey admixture graph parameter estimates across different model versions (see Supplementary Information section 3 for full details): 1, primary model; 2, no “dummy” admixture; 3, African-ascertained SNPs; 4, transversion SNPs; 5, Shum Laka whole-genome sequence data; 6, outgroup-ascertained transversions; 7, Hadza added; 8, Mbo in place of Lemande; 9, Herero added; 10, Chewa added; 11, Mursi in place of Agaw; 12, Baka added; 13, Bakola added; 14, Bedzan added; 15, Mada added; 16, Fulani added; 17, Taforalt added; 18, alternative admixture for Shum Laka; 19, alternative deep source; 20, alternative deep source with African-ascertained SNPs; 21, alternative deep source with transversion SNPs; 22, alternative deep source with outgroup-ascertained transversions; 23, Shum Laka pairs fit separately. HG, hunter-gatherers.Earlier pair/later pairUnits above the main West African cladeUnits below the split of the Central African hunter-gather lineage (negative value indicates distance above)Units along the Central African hunter-gather lineage (negative values indicate distances along an adjacent edge)
Authors: Gemma Berniell-Lee; Francesc Calafell; Elena Bosch; Evelyne Heyer; Lucas Sica; Patrick Mouguiama-Daouda; Lolke van der Veen; Jean-Marie Hombert; Lluis Quintana-Murci; David Comas Journal: Mol Biol Evol Date: 2009-04-15 Impact factor: 16.240
Authors: Etienne Patin; Marie Lopez; Rebecca Grollemund; Paul Verdu; Christine Harmant; Hélène Quach; Guillaume Laval; George H Perry; Luis B Barreiro; Alain Froment; Evelyne Heyer; Achille Massougbodji; Cesar Fortes-Lima; Florence Migot-Nabias; Gil Bellis; Jean-Michel Dugoujon; Joana B Pereira; Verónica Fernandes; Luisa Pereira; Lolke Van der Veen; Patrick Mouguiama-Daouda; Carlos D Bustamante; Jean-Marie Hombert; Lluís Quintana-Murci Journal: Science Date: 2017-05-05 Impact factor: 47.728
Authors: Mary Katherine Gonder; Holly M Mortensen; Floyd A Reed; Alexandra de Sousa; Sarah A Tishkoff Journal: Mol Biol Evol Date: 2006-12-28 Impact factor: 16.240
Authors: Elizabeth T Wood; Daryn A Stover; Christopher Ehret; Giovanni Destro-Bisol; Gabriella Spedini; Howard McLeod; Leslie Louie; Mike Bamshad; Beverly I Strassmann; Himla Soodyall; Michael F Hammer Journal: Eur J Hum Genet Date: 2005-07 Impact factor: 4.246
Authors: Fernando L Mendez; Thomas Krahn; Bonnie Schrack; Astrid-Maria Krahn; Krishna R Veeramah; August E Woerner; Forka Leypey Mathew Fomine; Neil Bradman; Mark G Thomas; Tatiana M Karafet; Michael F Hammer Journal: Am J Hum Genet Date: 2013-02-28 Impact factor: 11.025
Authors: Sarah A Tishkoff; Floyd A Reed; Françoise R Friedlaender; Christopher Ehret; Alessia Ranciaro; Alain Froment; Jibril B Hirbo; Agnes A Awomoyi; Jean-Marie Bodo; Ogobara Doumbo; Muntaser Ibrahim; Abdalla T Juma; Maritha J Kotze; Godfrey Lema; Jason H Moore; Holly Mortensen; Thomas B Nyambo; Sabah A Omar; Kweli Powell; Gideon S Pretorius; Michael W Smith; Mahamadou A Thera; Charles Wambebe; James L Weber; Scott M Williams Journal: Science Date: 2009-04-30 Impact factor: 47.728
Authors: Monika Karmin; Lauri Saag; Mário Vicente; Melissa A Wilson Sayres; Mari Järve; Ulvi Gerst Talas; Siiri Rootsi; Anne-Mai Ilumäe; Reedik Mägi; Mario Mitt; Luca Pagani; Tarmo Puurand; Zuzana Faltyskova; Florian Clemente; Alexia Cardona; Ene Metspalu; Hovhannes Sahakyan; Bayazit Yunusbayev; Georgi Hudjashov; Michael DeGiorgio; Eva-Liis Loogväli; Christina Eichstaedt; Mikk Eelmets; Gyaneshwer Chaubey; Kristiina Tambets; Sergei Litvinov; Maru Mormina; Yali Xue; Qasim Ayub; Grigor Zoraqi; Thorfinn Sand Korneliussen; Farida Akhatova; Joseph Lachance; Sarah Tishkoff; Kuvat Momynaliev; François-Xavier Ricaut; Pradiptajati Kusuma; Harilanto Razafindrazaka; Denis Pierron; Murray P Cox; Gazi Nurun Nahar Sultana; Rane Willerslev; Craig Muller; Michael Westaway; David Lambert; Vedrana Skaro; Lejla Kovačevic; Shahlo Turdikulova; Dilbar Dalimova; Rita Khusainova; Natalya Trofimova; Vita Akhmetova; Irina Khidiyatova; Daria V Lichman; Jainagul Isakova; Elvira Pocheshkhova; Zhaxylyk Sabitov; Nikolay A Barashkov; Pagbajabyn Nymadawa; Evelin Mihailov; Joseph Wee Tien Seng; Irina Evseeva; Andrea Bamberg Migliano; Syafiq Abdullah; George Andriadze; Dragan Primorac; Lubov Atramentova; Olga Utevska; Levon Yepiskoposyan; Damir Marjanovic; Alena Kushniarevich; Doron M Behar; Christian Gilissen; Lisenka Vissers; Joris A Veltman; Elena Balanovska; Miroslava Derenko; Boris Malyarchuk; Andres Metspalu; Sardana Fedorova; Anders Eriksson; Andrea Manica; Fernando L Mendez; Tatiana M Karafet; Krishna R Veeramah; Neil Bradman; Michael F Hammer; Ludmila P Osipova; Oleg Balanovsky; Elza K Khusnutdinova; Knut Johnsen; Maido Remm; Mark G Thomas; Chris Tyler-Smith; Peter A Underhill; Eske Willerslev; Rasmus Nielsen; Mait Metspalu; Richard Villems; Toomas Kivisild Journal: Genome Res Date: 2015-03-13 Impact factor: 9.043
Authors: Shaohua Fan; Derek E Kelly; Marcia H Beltrame; Matthew E B Hansen; Swapan Mallick; Alessia Ranciaro; Jibril Hirbo; Simon Thompson; William Beggs; Thomas Nyambo; Sabah A Omar; Dawit Wolde Meskel; Gurja Belay; Alain Froment; Nick Patterson; David Reich; Sarah A Tishkoff Journal: Genome Biol Date: 2019-04-26 Impact factor: 13.583
Authors: Anders Bergström; Chris Stringer; Mateja Hajdinjak; Eleanor M L Scerri; Pontus Skoglund Journal: Nature Date: 2021-02-10 Impact factor: 49.962
Authors: Stefanie Kaboth-Bahr; William D Gosling; Ralf Vogelsang; André Bahr; Eleanor M L Scerri; Asfawossen Asrat; Andrew S Cohen; Walter Düsing; Verena Foerster; Henry F Lamb; Mark A Maslin; Helen M Roberts; Frank Schäbitz; Martin H Trauth Journal: Proc Natl Acad Sci U S A Date: 2021-06-08 Impact factor: 11.205