Literature DB >> 31593238

Along the Indian Ocean Coast: Genomic Variation in Mozambique Provides New Insights into the Bantu Expansion.

Armando Semo1, Magdalena Gayà-Vidal1, Cesar Fortes-Lima2, Bérénice Alard1, Sandra Oliveira3, João Almeida1, António Prista4, Albertino Damasceno5, Anne-Maria Fehn1,6, Carina Schlebusch2,7,8, Jorge Rocha1,9.   

Abstract

The Bantu expansion, which started in West Central Africa around 5,000 BP, constitutes a major migratory movement involving the joint spread of peoples and languages across sub-Saharan Africa. Despite the rich linguistic and archaeological evidence available, the genetic relationships between different Bantu-speaking populations and the migratory routes they followed during various phases of the expansion remain poorly understood. Here, we analyze the genetic profiles of southwestern and southeastern Bantu-speaking peoples located at the edges of the Bantu expansion by generating genome-wide data for 200 individuals from 12 Mozambican and 3 Angolan populations using ∼1.9 million autosomal single nucleotide polymorphisms. Incorporating a wide range of available genetic data, our analyses confirm previous results favoring a "late split" between West and East Bantu speakers, following a joint passage through the rainforest. In addition, we find that Bantu speakers from eastern Africa display genetic substructure, with Mozambican populations forming a gradient of relatedness along a North-South cline stretching from the coastal border between Kenya and Tanzania to South Africa. This gradient is further associated with a southward increase in genetic homogeneity, and involved minimum admixture with resident populations. Together, our results provide the first genetic evidence in support of a rapid North-South dispersal of Bantu peoples along the Indian Ocean Coast, as inferred from the distribution and antiquity of Early Iron Age assemblages associated with the Kwale archaeological tradition.
© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Bantu expansion; Mozambique; admixture; migration; population structure

Year:  2020        PMID: 31593238      PMCID: PMC6993857          DOI: 10.1093/molbev/msz224

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

It is generally believed that the dispersal of Bantu languages over a vast geographical area of sub-Saharan Africa is the result of a migratory wave that started in the Nigeria-Cameroon borderlands around 4,000–5,000 BP (Rocha and Fehn 2016; Bostoen 2018; Schlebusch and Jakobsson 2018). Although the earliest stages of the Bantu expansions were probably not associated with plant cultivation and domestication, Bantu speech communities added agriculture and iron metallurgy to their original subsistence strategies and subsequently replaced or assimilated most of the resident forager populations who lived across sub-Saharan Africa (Mitchell and Lane 2013; Bostoen et al. 2015). For this reason, the dispersal of Bantu-speaking peoples has often been considered a prime example of the role of food production in promoting demic migrations and language spread (Diamond and Bellwood 2003). While genetic studies had a pivotal role in demonstrating that the Bantu expansions involved a movement of people (demic diffusion) rather than a mere spread of cultural traits (Tishkoff et al. 2009; de Filippo et al. 2012; Schlebusch et al. 2012; Li et al. 2014), the majority of research on the specific routes and detailed dynamics of the spread of Bantu-speakers has been conducted in the fields of linguistics and archaeology. Linguistic studies focusing on the reconstruction of the historical relationships between modern Bantu languages have led to some rather concrete proposals about links between individual languages and language areas, including the establishment of three widely accepted geographical subgroups: North-West Bantu, East Bantu, and West Bantu (Guthrie 1948; Vansina 1995; Bostoen 2018). Among them, the East Bantu languages, which currently extend from Uganda to South Africa, have been shown to form a single monophyletic clade that is believed to be a relatively late offshoot of West Bantu (Holden 2002; Currie et al. 2013; Grollemund et al. 2015). Assuming that the phylogenetic trees inferred from the comparison of lexical data can be used to trace the migratory routes of ancestral Bantu-speaking communities, the linguistic pattern favors a dispersal scenario whereby populations from the Nigeria-Cameroon homeland first migrated to the south of the rainforest and later diversified into several branches before occupying eastern and southern Africa (Currie et al. 2013; Grollemund et al. 2015). According to archaeological evidence, the earliest Bantu speakers in East Africa appeared around 2,600 BP in the Great Lakes region, associated with pottery belonging to the so-called Urewe tradition, also characterized by a distinctive iron smelting technology and farming (Phillipson 2005; Bostoen 2018). However, the link between Urewe and pottery traditions further west is unclear, and the historical events leading to its introduction to the interlacustrine area are still poorly understood (Bostoen 2007). Some interpretations of the archaeological data have proposed that, in contrast with the “late split” between East and West Bantu suggested by linguistic evidence, East Bantu peoples introduced the Urewe tradition into the Great Lakes by migrating out of the proto-Bantu heartland along the northern fringes of the rainforest after an early separation from Bantu speakers occupying the western half of Africa (Phillipson 1977; Huffman 2007). This model, however, is not supported by recent genetic studies showing that Bantu-speaking populations from eastern and southern Africa are more closely related to West Bantu speakers that migrated to the south of the rainforest than they are to West Bantu speakers that remained in the north (Busby et al. 2016; Patin et al. 2017; Schlebusch et al. 2017). In spite of their uncertain origins, the Urewe assemblages display pottery styles similar to the younger Kwale and Matola traditions that are distributed along coastal areas ranging from southern Kenya across Mozambique to KwaZulu-Natal (Sinclair et al. 1993; Phillipson 2005; Bostoen 2007, 2018). This archaeological continuity has been interpreted as the earliest material evidence for an extremely rapid dispersion of East Bantu speakers from the Great Lakes, starting around the second century AD and reaching South Africa in less than two centuries (Sinclair et al. 1993; Phillipson 2005; Bostoen 2018). Such a migration remains, however, to be documented by genetic data, due to insufficient sampling of the areas lying between eastern and southern Africa that roughly correspond to present-day Mozambique. In this study, we fill this important gap by investigating the population history of Mozambique using ∼1.9 million quality-filtered single nucleotide polymorphisms (SNPs) that were genotyped in 161 individuals from 12 populations representing all major Mozambican languages, and in 39 individuals from 3 contextual populations from Angola (fig. 1 and supplementary table 1, Supplementary Material online). By making use of a maximally wide range of available genetic and linguistic data, we show that East Bantu-speaking populations display genetic substructure, and detect a strong signal for the dispersal of East Bantu peoples along a North–South cline, which possibly started in the coastal border between Kenya and Tanzania and involved minimum admixture with local foragers until the Bantu-speakers reached South Africa. Together, our results provide a strong support for reconstructions of the eastern Bantu migrations based on the distribution of Kwale archaeological sites.
. 1.

Genetic structure in Angolan and Mozambican populations. (A) Geographic locations of sampled individuals. The geographic subgroups of Bantu languages (“Guthrie zones”) following Maho (Maho 2003) are given in parentheses in the legend. (B) Principal components 1 and 2 of Angolan and Mozambican individuals rotated to fit geography (Procrustes correlation: 0.89; P < 0.001). (C) Population structure estimated with ADMIXTURE assuming 2 and 3 clusters (K). Vertical lines represent the estimated proportion of each individual’s genotypes that are derived from the assumed genetic clusters (note that the order of individuals in K = 2 is not the same as K = 3). The lowest cross-validation error (CV) was associated with K = 2 (CV values are reported in supplementary table 2, Supplementary Material online).

Genetic structure in Angolan and Mozambican populations. (A) Geographic locations of sampled individuals. The geographic subgroups of Bantu languages (“Guthrie zones”) following Maho (Maho 2003) are given in parentheses in the legend. (B) Principal components 1 and 2 of Angolan and Mozambican individuals rotated to fit geography (Procrustes correlation: 0.89; P < 0.001). (C) Population structure estimated with ADMIXTURE assuming 2 and 3 clusters (K). Vertical lines represent the estimated proportion of each individual’s genotypes that are derived from the assumed genetic clusters (note that the order of individuals in K = 2 is not the same as K = 3). The lowest cross-validation error (CV) was associated with K = 2 (CV values are reported in supplementary table 2, Supplementary Material online).

Results and Discussion

Genetic Variation in Mozambique

To assess the genetic relationships between Angolan and Mozambican individuals, we performed principal component analysis (PCA; Patterson et al. 2006) and unsupervised clustering analysis using ADMIXTURE (Alexander et al. 2009; fig. 1). The PCA patterns are closely related to geography, with the first PC (PC1) separating Mozambican and Angolan individuals, and the second PC (PC2) revealing a noticeable heterogeneity among samples from Mozambique (fig. 1 and supplementary fig. 1, Supplementary Material online; Procrustes correlation: 0.89; P < 0.001). The ADMIXTURE analysis confirmed the substantial differentiation between populations from Angola and Mozambique (at K = 2), and the genetic substructure among Mozambican populations (at K = 3; fig. 1). Within Mozambique, the association between genetic patterns and geography is further highlighted by a strong correlation between average PC2 scores and latitude (r = –0.97, P < 10−6), showing that genetic variation is structured along a North–South cline corresponding to the orientation of the country’s major axis (fig. 2). The highest genetic divergence was found between Yao and Mwani speakers in the north, and Tswa-Ronga (Tswa, Changana, Ronga) and Inhambane (Bitonga and Chopi) speakers in the south, whereas Makhuwa, Sena, Nyanja, and Shona (Manyika and Ndau) speakers occupy intermediate genetic and geographic positions (figs. 1). Qualitatively, this trend is consistent with the geographic distribution of subclusters of Mozambican languages in the Bantu phylogeny proposed by Grollemund et al. (Grollemund et al. 2015; cf. their supplementary fig. 1, Supplementary Material online). Our own lexicostatistical analyses (supplementary fig. 2; supplementary tables 3–5, Supplementary Material online) reveal significant correlations between genetic and linguistic pairwise distances (Mantel test: r = 0.68; P = 2.9 × 10–5), as well as between genetic and latitudinal distances (r = 0.61; P = 7 × 10–4), and linguistic and latitudinal distances (r = 0.79; P = 6.3 × 10–5). In contrast, correlations with longitude, involving either language or genetics, were not significant, further emphasizing the importance of latitude in structuring genetic and linguistic diversity in Mozambique (supplementary table 3, Supplementary Material online). We also performed partial Mantel tests to evaluate the respective effect of language and geography on genetic variation. We found that while genetic and linguistic distances remained correlated when latitude was kept constant, genetic and latitudinal distances were not significantly correlated when holding language constant (supplementary table 3, Supplementary Material online). The latter result indicates that language is a more important predictor of genetic differentiation than geography, as populations speaking similar languages tend to be genetically closer than expected on the basis of their location along the latitudinal axis. Since it has been recently shown that the relationships between Bantu languages can be represented by robust phylogenetic trees reflecting the fission history of Bantu-speaking groups (Currie et al. 2013; Grollemund et al. 2015), the correlation results can be interpreted as an indication that the spatial patterns of genetic and linguistic variation in Mozambique are the outcome of successive population splits during a North–South range expansion, rather than a consequence of geographically structured gene flow underlying isolation by distance (cf. Smouse et al. 1986; Sokal 1988; Smouse and Long 1992).
. 2.

Genetic variation and geography in Mozambique. The plots show the correlations between latitude and (A) average PC2 scores (supplementary fig. 1B, Supplementary Material online) (B) average number of RoHs, and (C) average LD (r2). In B and C, Tswa and Ronga were lumped and are identified by the Tswa symbol (see supplementary material, Supplementary Material online).

Genetic variation and geography in Mozambique. The plots show the correlations between latitude and (A) average PC2 scores (supplementary fig. 1B, Supplementary Material online) (B) average number of RoHs, and (C) average LD (r2). In B and C, Tswa and Ronga were lumped and are identified by the Tswa symbol (see supplementary material, Supplementary Material online). A stepwise reduction in levels of genetic diversity with increasing geographic distance from a reference location is generally considered to be the typical outcome of a demic migration involving serial bottlenecks (Ramachandran et al. 2005). In the global context of the Bantu expansion, a significant decrease of genetic diversity with distance to the Bantu homeland was previously reported for mitochondrial DNA and the Y-chromosome, but not for autosomes (de Filippo et al. 2012). Moreover, to the best of our knowledge, there have been no reports for such patterns at more local scales. In order to evaluate the relationship between genetic diversity and geography, we studied the distribution of haplotype heterozygosity (HH), numbers and total lengths of runs of homozygosity (RoHs), and linkage disequilibrium (LD), as measured by the squared correlation of allele frequencies (r2), across all sampled Mozambican populations (supplementary material, Supplementary Material online). We found that the number of RoHs and LD were significantly correlated with latitude, with northern populations displaying higher genetic diversity than southern populations (fig. 2; supplementary figs. 3–5, Supplementary Material online). We also observed a decrease of HH with absolute latitude that did not reach significance (supplementary fig. 3A, Supplementary Material online; r = 0.51, P = 0.104). However, HH was still significantly correlated with LD (supplementary fig. 4C, Supplementary Material online). Together, these results suggest that East Bantu-speaking peoples entered Mozambique from the North and underwent sequential reductions in effective population size, leading to increased genetic homogeneity and differentiation as they moved southwards. To further assess the relationship between population structure and geography in Mozambique, we used the Estimated Effective Migration Surfaces (EEMS) method, which identifies local zones with increased or decreased migration rates, relative to the global migration across the whole country (Petkova et al. 2016; fig. 3). We detected two zones of low migration between northern and central Mozambique (fig. 3 one associated with Yao speakers, located in the northwestern highlands of the Nyasa Province between lake Nyasa/Malawi and the Lugenda River (fig. 3); the other, located in the Northeast, to the north of the Ligonha River, around Makhuwa-speaking areas (fig. 3). An additional low-migration zone was found around the Save River, between southern and south-central Mozambique (fig. 3). Interestingly, the EEMS analysis also shows that the Zambezi River in central Mozambique is not an obstacle but rather a corridor for migration (fig. 3). This is in line with archaeological findings supporting the importance of the Zambezi Basin in long-distance trading networks between the Indian Ocean Coast and the southern African hinterland from the mid-first millennium onwards (Chirikure 2014; Nikis and Smith 2017). Overall, the geographic patterns revealed by the EEMS method are consistent with the PC cline in showing that the highest genetic differentiation between the northernmost and southernmost populations is reinforced by intervening low migration zones, whereas the relative genetic proximity between central Mozambican groups was enhanced by increased migration around the Zambezi Basin (fig. 3).
. 3.

Estimated Effective Migration Surface (EEMS) analysis. See figure 1 for legend of population symbols. (A) EEMS estimated with 12 Mozambican populations. (B, C) Major rivers (B) and mountains (C) associated with barriers and corridors of migration. The effective migration rates are presented in a log10 scale: white indicates the mean expected rate in the data set; blue and brown indicate migration rates that are X-fold higher or lower than average, respectively. The orographic map (C) was generated with the raster package (Hijmans and van Etten 2011). Altitude is given in meters.

Estimated Effective Migration Surface (EEMS) analysis. See figure 1 for legend of population symbols. (A) EEMS estimated with 12 Mozambican populations. (B, C) Major rivers (B) and mountains (C) associated with barriers and corridors of migration. The effective migration rates are presented in a log10 scale: white indicates the mean expected rate in the data set; blue and brown indicate migration rates that are X-fold higher or lower than average, respectively. The orographic map (C) was generated with the raster package (Hijmans and van Etten 2011). Altitude is given in meters.

Genetic Relationships with Other African Populations

To place the genetic variation of Mozambican and Angolan samples into the wider context of the Bantu expansion, we combined our data set with available genome-wide comparative data from other African populations (fig. 4 and supplementary table 6, Supplementary Material online).
. 4.

Genetic structure in African populations. (A) Geographic locations of sampled populations. (B, C) PC plots rotated to geography using Procrustes analysis. (B) All Bantu-speaking populations (Procrustes correlation: 0.76; P < 0.001). (C) Only East Bantu-speaking populations (Procrustes correlation: 0.44; P < 0.001). The numbers in (C) refer to groups of populations that are discussed in the text. Additional PCA and ADMIXTURE plots are shown in supplementary figures 6 and 8, Supplementary Material online. (D) Population structure estimated with ADMIXTURE assuming eight clusters (K = 8), with Mozambican and Angolan groups from this study labeled in red. Vertical lines represent the estimated proportions of each individual’s genotypes that are derived from the assumed genetic clusters (CV values are reported in supplementary table 2, Supplementary Material online). The maps, obtained by interpolation, display the mean proportions of major ADMIXTURE components (K = 8) from Niger-Congo-speaking populations. The colors in the maps match the colors in the ADMIXTURE plot.

Genetic structure in African populations. (A) Geographic locations of sampled populations. (B, C) PC plots rotated to geography using Procrustes analysis. (B) All Bantu-speaking populations (Procrustes correlation: 0.76; P < 0.001). (C) Only East Bantu-speaking populations (Procrustes correlation: 0.44; P < 0.001). The numbers in (C) refer to groups of populations that are discussed in the text. Additional PCA and ADMIXTURE plots are shown in supplementary figures 6 and 8, Supplementary Material online. (D) Population structure estimated with ADMIXTURE assuming eight clusters (K = 8), with Mozambican and Angolan groups from this study labeled in red. Vertical lines represent the estimated proportions of each individual’s genotypes that are derived from the assumed genetic clusters (CV values are reported in supplementary table 2, Supplementary Material online). The maps, obtained by interpolation, display the mean proportions of major ADMIXTURE components (K = 8) from Niger-Congo-speaking populations. The colors in the maps match the colors in the ADMIXTURE plot. Genetic clustering analysis shows that three partially overlapping components can be roughly associated with major geographic areas and linguistic subdivisions of the Niger-Congo phylum, of which the Bantu languages form part (fig. 4 and supplementary fig. 6, Supplementary Material online): non-Bantu Niger-Congo in West Africa, to the north of the rainforest (beige); West Bantu, including Angolans, along the Atlantic coast (green); and East Bantu, including Mozambicans, in East Africa and along the Indian Ocean Coast (blue). A pairwise Fst analysis measuring the genetic divergence among Niger-Congo speaking populations further shows that the highest levels of differentiation (Fst = 0.01) are found between non-Bantu Niger-Congo groups and East Bantu-speaking peoples (supplementary fig. 7, Supplementary Material online). Other major genetic components revealed by clustering analysis are associated with Kx’a, Tuu, and Khoe-Kwadi-speaking peoples from southern Africa, also known as Khoisan (brown), Rainforest Hunter-Gatherers (RHG; violet and light green), non-Bantu Eastern Africans (black), and Europeans (pink). As found in previous works (Pickrell et al. 2012; Schlebusch et al. 2012; Patin et al. 2017), several Bantu-speaking populations have varying proportions of these genetic components, which were likely acquired through admixture with local residents: 11% (range: 4–21%) of RHG-related component in West Bantu speakers; 16% (range: 9–38%) of non-Bantu eastern African-related component in East Bantu speakers from Kenya and Tanzania; and 17% (range: 16–18%) of Khoisan-related component in southeastern Bantu speakers from South Africa. To mitigate the effect of admixture with resident populations, we carried out a PC analysis of all Bantu-speaking groups, together with one representative group of non-Bantu Eastern Africans (Amhara) and one representative group of southern African Khoisan (Ju|’hoansi), which are the two most important sources for external admixture with Bantu-speaking populations from the East and South, respectively. As expected, the first two principal axes are driven by genetic differentiation between the Amhara (PC1) and the Ju|’hoansi (PC2), relative to Bantu-speaking groups (supplementary fig. 8A, Supplementary Material online). Moreover, some Bantu peoples from eastern (e.g., Kikuyu and Luhya) and southern Africa (e.g., Sotho and Zulu) stand out from a tight cluster encompassing all Bantu speakers by extending toward the Amhara and Ju|’hoansi, respectively, indicating admixture of local components into the genomes of Bantu-speaking populations. When considering PCs that explain less variance, a close link between the internal differentiation of Bantu-speaking groups and geography becomes apparent (fig. 4 and supplementary fig. 8E, Supplementary Material online; Procrustes correlation: 0.76; P < 0.001). PC3 represents an east–west axis displaying a noticeable gap between West and East Bantu speakers, and PC4 highlights the differentiation of Mozambican and South African groups from eastern African populations located to their north. As shown in figure 4, the heterogeneity of East Bantu populations is further emphasized when West Bantu speakers are removed (Procrustes correlation: 0.44; P < 0.001; supplementary fig. 8F, Supplementary Material online). While PC4 is correlated with longitude (r = –0.73; P < 10−4), PC3 is highly correlated with latitude (r = 0.95; P < 10−13), showing that the gradient of genetic differentiation previously observed within Mozambique extends from eastern to southern Africa (fig. 2 and supplementary fig. 9, Supplementary Material online). Heuristically, the genetic differentiation among East Bantu speakers can be described by defining four groups that are broadly associated with different geographic regions in eastern and southeastern Africa, and partially correspond to various linguistic zones of Guthrie’s Bantu classification (Guthrie 1948; Maho 2003; fig. 4, supplementary figs. 9 and 10, Supplementary Material online): 1) the first group includes peoples from the western fringe of eastern Africa (Kikuyu, Luhya, Baganda, Barundi, and Kinyarwanda), who live around Lake Nyanza/Victoria and mostly speak languages belonging to Bantu zone J (Lakes Bantu; Bastin et al. 1999); 2) the second group includes populations from coastal Kenya (Chonyi, Giriama, Kambe, and Kauma), who belong to the Mijikenda ethnic group and speak languages from zone E; 3) the third group is genetically intermediate between groups 1 and 2, and includes the Mzigua, Wabondei, and Wasambaa from Tanzania, who speak languages from zone G; 4) the fourth group, formed by Mozambicans and South Africans, is an heterogeneous set of populations covering linguistic zones N, P, and S, who bridge the area between eastern and southern Africa and are genetically closer to groups from Tanzania than to other East Africans. These findings have important implications for integrating archaeological, linguistic and genetic data in the reconstruction of the Bantu migrations in the easternmost regions of Africa. Although many crucial areas like Democratic Republic of Congo, Zambia and Zimbabwe still need to be included in genome-wide analyses, the available data suggest that the occupation of eastern Africa by Bantu-speaking populations was associated with genetic structuring in the relatively small area between the Great Lakes and the Indian Ocean Coast, with Tanzanian groups being closest to the ancestors of south-eastern Bantu-speaking populations. This scenario agrees with the migratory path inferred from the continuity between Early Iron Age (EIA) archaeological sites from the Kwale ceramic tradition, which extend from coastal Kenya and Tanzania to South Africa across a Mozambican corridor (Sinclair et al. 1993; Phillipson 2005; Bostoen 2018). To further investigate the origins of the migratory streams linking different Bantu-speaking groups and to better characterize the admixture dynamics between Bantu speakers and resident populations, we applied the haplotype-based approaches implemented in CHROMOPAINTER and GLOBETROTTER (Lawson et al. 2012; Hellenthal et al. 2014). We found that the haplotype copy profiles of Angolans differ significantly from Mozambicans + South Africans (fig. 5): whereas the former derive most of their haplotypes from West Bantu-speaking populations located to their North, the latter trace most of their ancestry to Bantu-speaking groups from East Africa, in close agreement with the PCA results (fig. 4). More specifically, we found that the best donor population proxy (Mzigua) for Bantu speakers from Mozambique and South Africa is located in Tanzania (range: 72–93%), whereas Angolans derive most of their ancestry from Bantu-speaking groups in Gabon and Cameroon (range: 77–83%; fig. 5supplementary table 7, Supplementary Material online).
. 5.

Inferred ancestry of Bantu-speaking groups from Angola, Mozambique and South Africa. (A) CHROMOPAINTER coancestry matrix based on the number of haplotype segments (chunk counts) shared between representative donor groups (columns) and recipient populations (rows) from Angola, Mozambique and South Africa. The copy profile of each recipient group is an average of the copy profiles of all individuals belonging to that group. (B) Matrix of pairwise TVDxy values based on the ancestry profiles of Angolan, Mozambican, and South African groups. The scales of chunk counts and TVDxy values are shown to the right of the matrices in (A) and (B), respectively. (C) Ancestry profiles of Angolan, Mozambican, and South African populations (pie charts) as inferred by the MIXTURE MODEL implemented in GLOBETROTTER. The colored circles indicate the most important contributing regions where best source populations were found: West Bantu-speaking groups (green); Tanzanian East Bantu-speaking groups (yellow); Great Lakes Bantu-speaking groups (red); and Khoisan groups (blue).

Inferred ancestry of Bantu-speaking groups from Angola, Mozambique and South Africa. (A) CHROMOPAINTER coancestry matrix based on the number of haplotype segments (chunk counts) shared between representative donor groups (columns) and recipient populations (rows) from Angola, Mozambique and South Africa. The copy profile of each recipient group is an average of the copy profiles of all individuals belonging to that group. (B) Matrix of pairwise TVDxy values based on the ancestry profiles of Angolan, Mozambican, and South African groups. The scales of chunk counts and TVDxy values are shown to the right of the matrices in (A) and (B), respectively. (C) Ancestry profiles of Angolan, Mozambican, and South African populations (pie charts) as inferred by the MIXTURE MODEL implemented in GLOBETROTTER. The colored circles indicate the most important contributing regions where best source populations were found: West Bantu-speaking groups (green); Tanzanian East Bantu-speaking groups (yellow); Great Lakes Bantu-speaking groups (red); and Khoisan groups (blue). Estimated Khoisan ancestry in the South African Sotho (24%) and Zulu (24%) is much higher than in their close Mozambican neighbors Ronga (5%) and Changana (4%), or in any other Mozambican group (range: 1–5%; fig. 5; supplementary table 7, Supplementary Material online). This pattern suggests that Bantu speakers scarcely admixed with local foragers, in agreement with recent findings about Bantu speakers from Malawi, who displayed no Khoisan ancestry, despite the confirmed presence of a Khoisan-related genetic component in ancient samples from the region (Skoglund et al. 2017). It therefore seems that the processes governing earlier admixture events between Bantu-speakers and local hunter-gather groups in modern-day Mozambique and Malawi were very different from what has been reported for South Africa and Botswana (Pickrell et al. 2012; Schlebusch et al. 2012; González-Santos et al. 2015). As previously suggested on the basis of genetic variation in uniparental markers and archaeological modeling, the differences in admixture dynamics leading to increased Bantu/Khoisan admixture beyond the southern border of Mozambique could have been caused by a slowdown of the Bantu expansion due to adverse ecoclimatic conditions (Marks et al. 2015). In addition, the better conditions found in Mozambique and Malawi may have favored the rapid population growth of Bantu-speaking migrants, resulting in a demographic imbalance between residents and incomers and leading to low levels of Khoisan admixture, even in the event of total assimilation. To evaluate the effect of Khoisan ancestry on the pattern of southward increase of genetic homogeneity detected in Mozambique (fig. 2; supplementary fig. 3, Supplementary Material online), we reassessed the correlations between genetic diversity and latitude after masking Khoisan segments in Mozambican groups (supplementary material, Supplementary Material online). Although the masking procedure led to a decrease in power due reduction of the number of available SNPs (950,000 vs. 500,000), we still found a strong signal of southward increase in the number of RoHs, after removal of Khoisan ancestry (supplementary fig. S12B, Supplementary Material online). These results favor the hypothesis that the decreasing levels of genetic diversity in Mozambique are associated with a range expansion with serial founder effects, confirming that the effect remains after masking admixed fragments. A recent genome-wide study found that the best-matching source population for South African Bantu speakers is located in Angola (Kimbundu) rather than in East Africa (as represented by the Bakiga and Luhya from around the Great Lakes; Patin et al. 2017). Here, we used a stepwise approach to rank the best proxies for the ancestry of two South African Bantu-speaking groups (Sotho + Zulu) among all populations contained in our data set (fig. 6; supplementary material; supplementary table 7, Supplementary Material online). We found that the Changana and Ronga from Mozambique, and a southern Khoisan descendent group (the Karretjie People of South Africa) are the best proxies for the ancestry of the South African Bantu speakers (fig. 6). When Mozambican populations are removed from the list of sources, the next best non-Khoisan proxies are the Mzigua from Tanzania (fig. 6). The contribution of Angola only becomes increasingly more relevant when Tanzanian (fig. 6), Kenyan (fig. 6), and Great Lakes (fig. 6) populations are successively removed from the list of donors. Nevertheless, the fact that Angola still represents a better proxy for the ancestry of southeastern Bantu speakers than populations closer to the Bantu homeland provides additional evidence in favor of a “late-split” between southwestern and southeastern Bantu-speaking groups after a single passage through the rainforest, as suggested in previous studies (Busby et al. 2016; Patin et al. 2017).
. 6.

Inferred average ancestry of Bantu-speaking groups from South Africa. The most important contributing regions and best source populations are provided in the legend. (A) 71 source populations from Sub-Saharan Africa. (B) As in (A), but removing Mozambique from the list of sources. (C) As in (B), but removing Tanzanian Bantu speakers from the list of sources. (D) As in (C), but removing Bantu speakers from coastal Kenya from the list of sources. (E) As in (D) but removing Bantu speakers from the Great Lakes from the list of sources. Full lists of source populations are provided in supplementary table 7, Supplementary Material online.

Inferred average ancestry of Bantu-speaking groups from South Africa. The most important contributing regions and best source populations are provided in the legend. (A) 71 source populations from Sub-Saharan Africa. (B) As in (A), but removing Mozambique from the list of sources. (C) As in (B), but removing Tanzanian Bantu speakers from the list of sources. (D) As in (C), but removing Bantu speakers from coastal Kenya from the list of sources. (E) As in (D) but removing Bantu speakers from the Great Lakes from the list of sources. Full lists of source populations are provided in supplementary table 7, Supplementary Material online. In a further step, we identified and dated signals of admixture in the history of the studied populations using GLOBETROTTER. We found no evidence for admixture between any two Mozambican populations (not shown), suggesting that the intermediate position of central Mozambique in the North–South gradient of genetic relatedness (figs. 1) is not the result of admixture between populations from northern and southern Mozambique but rather a cline of stepwise genetic differentiation. At the same time, we found that the Khoisan ancestry detected in South Africans (Sotho and Zulu) and at low frequencies in southern Mozambican populations (Ronga, Changana, Tswa, Bitonga, and Chopi; fig. 5) resulted from admixture events occurring around 1,165 BP (range: 756–1,851 BP), involving the Karretjie people from South Africa as best matching Khoisan source and the Tanzanian Mzigua as best-matching Bantu-speaking population (P-values for evidence of admixture < 0.05; supplementary table 8, Supplementary Material online). This date is remarkably consistent with the first Iron Age arrivals to southern Mozambique associated with the Matola pottery, which stylistically resembles the Kwale ceramics from Tanzania and has been dated to the early and mid-first millennium AD (Sinclair et al. 1993). We also found evidence (P < 0.05) for admixture with Afro-Asiatic (Amhara and Oromo) and Nilotic (Kalenjin and Maasai) speakers in Bantu-speaking groups from the Great Lakes, coastal Kenya and Tanzania (supplementary table 8, Supplementary Material online). The average estimated antiquity of these admixture events dates to ∼760 BP (570–1,047 BP) and is in close agreement with Bantu/non-Bantu eastern African admixture dates inferred by Skoglund et al. (2017). These estimates postdate the Bantu/Khoisan admixture inferred for Mozambique and South Africa, suggesting that the bulk of admixture between Bantu and non-Bantu speakers in East Africa occurred only after Bantu speakers had already begun their migration toward the South. This is also supported by the low eastern African ancestry detected in Bantu speakers from Mozambique and South Africa.

Conclusion

Using a country-wide sample of 12 Mozambican populations, we were able to fill an important gap in the understanding of the expansion of Bantu speakers from the Great Lakes region to the eastern half of southern Africa. Our results suggest that, in spite of the present-day homogeneity of East Bantu languages, the arrival of Bantu-speaking groups in eastern Africa was associated with a period of genetic differentiation in the area between the Great Lakes and the Indian Ocean Coast, followed by a southwards dispersal out-of Tanzania, along a latitudinal axis spanning cross Mozambique into South Africa. The resulting gradient of genetic relatedness is accompanied by a gradual reduction in genetic diversity possibly indicative of serial bottlenecks, as well as by a progressive loss of the genetic similarity between East Bantu speakers and Bantu-speaking peoples remaining in West-Central Africa. This increased genetic differentiation, however, cannot be attributed to admixture with resident populations. In fact, the absence of a substantial Khoisan contribution to the genetic make-up of Mozambican Bantu speakers (1–5%) suggests that the migrants had very low levels of admixture with resident populations until they reached the southernmost areas of eastern Africa, where Sotho and Zulu display considerable admixture proportions (24%). Moreover, the dates we obtained for admixture between Bantu speakers and Khoisan groups (∼1,165 BP) are remarkably close to the dates for the first archaeological attestations of the presence of Bantu speakers in southeastern Africa. We therefore conclude that our results provide a genetic counterpart to the distribution of EIA assemblages associated with the Kwale ceramic tradition, which are thought to constitute the material evidence for the southward movement of Bantu speech communities along the Indian Ocean coast.

Materials and Methods

Population Samples

A total of 221 samples from 12 ethnolinguistic groups from Mozambique and three groups from Angola were included in the present study (fig. 1). Sampling procedures in Mozambique and Angola were described elsewhere (Alves et al. 2011; Oliveira et al. 2018). All samples were collected with informed consent from healthy adult donors, in collaboration with the Portuguese-Angolan TwinLab established between CIBIO/InBIO and ISCED/Huíla Angola and the Pedagogic and Eduardo Mondlane Universities of Mozambique. Ethical clearances and permissions were granted by CIBIO/InBIO-University of Porto, ISCED, the Provincial Government of Namibe (Angola), and the Mozambican National Committee for Bioethics in Health (CNBS).

Genotyping and Phasing

DNA samples were extracted from buccal swabs and genotyped with the Illumina Infinium Omni2-5Exome-8 v1-3_A1 BeadChip (Gunderson et al. 2005; Steemers et al. 2006), after Whole Genome Amplification (WGA). Of a total of 2,612,357 genomic variants initially typed in 221 samples from Angola and Mozambique, a final set of 200 individuals typed for 1,946,715 autosomal SNPs was retained after applying quality control filters. Haplotypes and missing genotypes were inferred using SHAPEIT2 (Delaneau et al. 2013). Geographic locations, linguistic affiliations and sample sizes for all groups are presented in supplementary table 1, Supplementary Material online. Details about DNA extraction, genotyping, haplotyping and quality control filtering are provided in supplementary material, Supplementary Material online.

Data Merging

The newly generated data from Angola and Mozambique was merged with eight publicly available data sets (Li et al. 2008; Henn et al. 2011; Schlebusch et al. 2012; 1000 Genomes Project Consortium et al. 2015; Gurdasani et al. 2015; Busby et al. 2016; Montinaro et al. 2017; Patin et al. 2017), following the approach described in supplementary material, Supplementary Material online. The final merged data set consists of 1,466 individuals from 89 populations typed for 105,286 SNPs (supplementary table 6, Supplementary Material online).

Genetic Data Analysis

PCA was performed with the EIGENSOFT v7.2.1 package (Patterson et al. 2006). Unsupervised clustering analysis was done with ADMIXTURE (Alexander et al. 2009) applying a cross-validation (CV) procedure. We performed 20 independent runs for each number of clusters (K) and postprocessed and plotted the results with the pong software (Behr et al. 2016). For PC and ADMIXTURE analyses, SNPs in LD (r2 > 0.5) were removed using PLINK 1.9 (Chang et al. 2015), which reduced the newly generated and merged data sets to 927,435 and 98,570 independent autosomal SNPs, respectively. To assess the relationship between genetic, geographic, and linguistic data, we used Procrustes analysis (Wang et al. 2010), EEMS (Petkova et al. 2016), and Mantel tests (Mantel 1967), as detailed in supplementary material, Supplementary Material online. Levels of genetic diversity were assessed by using HH, RoH, and LD, as described in supplementary material, Supplementary Material online. All reported correlations were assessed using Pearson correlation coefficient (r). To infer “painting” or copying profiles and quantify the ancestry contributions of different African groups to Bantu-speaking populations of Mozambique, Angola and South Africa, we used CHROMOPAINTER v.2 (Lawson et al. 2012) in combination with the MIXTURE MODEL regression implemented in the GLOBETROTTER software (Hellenthal et al. 2014). GLOBETROTTER was also used to infer and date admixture events. Details on the application of these methods are provided in supplementary material, Supplementary Material online.

Linguistic Data Analysis

We collected published lexical data from 24 languages from Mozambique (10), Angola (3), eastern (9), and southern Africa (2; supplementary figs. 2 and 10, Supplementary Material online), based on the wordlist published by Grollemund et al. (2015) consisting of 100 meanings (supplementary table 4, Supplementary Material online). Using reconstructions provided in the online database Bantu lexical reconstructions 3 (Bastin et al. 2002) in combination with standard methodology from historical-comparative linguistics, we identified 636 cognate sets, and all languages were coded for presence (1) or absence (0) of a particular lexical root. On the basis of our coded data set (supplementary table 5, Supplementary Material online), we used the software SplitsTree v4.14.2 (Huson and Bryant 2006) to generate a matrix of pairwise linguistic distances (1-the percentage of cognate sharing) and computed Neighbor-Joining networks with 10,000 Bootstrap replicates (supplementary figs. 2, Supplementary Material online). We further applied to our coded data set a Bayesian phylogenetic approach as implemented in the BEAST2 software (Bouckaert et al. 2014), using the Continuous Time Markov Chain (CTMC) model (Greenhill and Gray 2009) included in the Babel package (Bouckaert 2016). We assumed 10,000,000 generations and sampled every 1,000th generation. The first 1,000 generations were discarded as burn-in. The resulting consensus tree was converted in a radial tree using FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/; supplementary figs. 2, Supplementary Material online). Click here for additional data file.
  39 in total

Review 1.  Farmers and their languages: the first expansions.

Authors:  Jared Diamond; Peter Bellwood
Journal:  Science       Date:  2003-04-25       Impact factor: 47.728

2.  Whole-genome genotyping with the single-base extension assay.

Authors:  Frank J Steemers; Weihua Chang; Grace Lee; David L Barker; Richard Shen; Kevin L Gunderson
Journal:  Nat Methods       Date:  2006-01       Impact factor: 28.547

3.  Genetic homogeneity across Bantu-speaking groups from Mozambique and Angola challenges early split scenarios between East and West Bantu populations.

Authors:  Isabel Alves; Margarida Coelho; Christopher Gignoux; Albertino Damasceno; Antonio Prista; Jorge Rocha
Journal:  Hum Biol       Date:  2011-02       Impact factor: 0.553

4.  Improved whole-chromosome phasing for disease and population genetic studies.

Authors:  Olivier Delaneau; Jean-Francois Zagury; Jonathan Marchini
Journal:  Nat Methods       Date:  2013-01       Impact factor: 28.547

5.  Cultural phylogeography of the Bantu Languages of sub-Saharan Africa.

Authors:  Thomas E Currie; Andrew Meade; Myrtille Guillon; Ruth Mace
Journal:  Proc Biol Sci       Date:  2013-05-08       Impact factor: 5.349

6.  Comparing spatial maps of human population-genetic variation using Procrustes analysis.

Authors:  Chaolong Wang; Zachary A Szpiech; James H Degnan; Mattias Jakobsson; Trevor J Pemberton; John A Hardy; Andrew B Singleton; Noah A Rosenberg
Journal:  Stat Appl Genet Mol Biol       Date:  2010-01-27

7.  The genetic structure and history of Africans and African Americans.

Authors:  Sarah A Tishkoff; Floyd A Reed; Françoise R Friedlaender; Christopher Ehret; Alessia Ranciaro; Alain Froment; Jibril B Hirbo; Agnes A Awomoyi; Jean-Marie Bodo; Ogobara Doumbo; Muntaser Ibrahim; Abdalla T Juma; Maritha J Kotze; Godfrey Lema; Jason H Moore; Holly Mortensen; Thomas B Nyambo; Sabah A Omar; Kweli Powell; Gideon S Pretorius; Michael W Smith; Mahamadou A Thera; Charles Wambebe; James L Weber; Scott M Williams
Journal:  Science       Date:  2009-04-30       Impact factor: 47.728

8.  Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago.

Authors:  Carina M Schlebusch; Helena Malmström; Torsten Günther; Per Sjödin; Alexandra Coutinho; Hanna Edlund; Arielle R Munters; Mário Vicente; Maryna Steyn; Himla Soodyall; Marlize Lombard; Mattias Jakobsson
Journal:  Science       Date:  2017-09-28       Impact factor: 47.728

9.  Genomic variation in seven Khoe-San groups reveals adaptation and complex African history.

Authors:  Carina M Schlebusch; Pontus Skoglund; Per Sjödin; Lucie M Gattepaille; Dena Hernandez; Flora Jay; Sen Li; Michael De Jongh; Andrew Singleton; Michael G B Blum; Himla Soodyall; Mattias Jakobsson
Journal:  Science       Date:  2012-09-20       Impact factor: 47.728

10.  Admixture into and within sub-Saharan Africa.

Authors:  George Bj Busby; Gavin Band; Quang Si Le; Muminatou Jallow; Edith Bougama; Valentina D Mangano; Lucas N Amenga-Etego; Anthony Enimil; Tobias Apinjoh; Carolyne M Ndila; Alphaxard Manjurano; Vysaul Nyirongo; Ogobara Doumba; Kirk A Rockett; Dominic P Kwiatkowski; Chris Ca Spencer
Journal:  Elife       Date:  2016-06-21       Impact factor: 8.140

View more
  7 in total

1.  Genetic substructure and complex demographic history of South African Bantu speakers.

Authors:  Dhriti Sengupta; Ananyo Choudhury; Cesar Fortes-Lima; Shaun Aron; Gavin Whitelaw; Koen Bostoen; Hilde Gunnink; Natalia Chousou-Polydouri; Peter Delius; Stephen Tollman; F Xavier Gómez-Olivé; Shane Norris; Felistas Mashinya; Marianne Alberts; Scott Hazelhurst; Carina M Schlebusch; Michèle Ramsay
Journal:  Nat Commun       Date:  2021-04-07       Impact factor: 14.919

2.  Genetic structure correlates with ethnolinguistic diversity in eastern and southern Africa.

Authors:  Elizabeth G Atkinson; Shareefa Dalvie; Yakov Pichkar; Allan Kalungi; Lerato Majara; Anne Stevenson; Tamrat Abebe; Dickens Akena; Melkam Alemayehu; Fred K Ashaba; Lukoye Atwoli; Mark Baker; Lori B Chibnik; Nicole Creanza; Mark J Daly; Abebaw Fekadu; Bizu Gelaye; Stella Gichuru; Wilfred E Injera; Roxanne James; Symon M Kariuki; Gabriel Kigen; Nastassja Koen; Karestan C Koenen; Zan Koenig; Edith Kwobah; Joseph Kyebuzibwa; Henry Musinguzi; Rehema M Mwema; Benjamin M Neale; Carter P Newman; Charles R J C Newton; Linnet Ongeri; Sohini Ramachandran; Raj Ramesar; Welelta Shiferaw; Dan J Stein; Rocky E Stroud; Solomon Teferra; Mary T Yohannes; Zukiswa Zingela; Alicia R Martin
Journal:  Am J Hum Genet       Date:  2022-09-01       Impact factor: 11.043

Review 3.  Bantu-speaker migration and admixture in southern Africa.

Authors:  Ananyo Choudhury; Dhriti Sengupta; Michele Ramsay; Carina Schlebusch
Journal:  Hum Mol Genet       Date:  2021-04-26       Impact factor: 6.150

4.  The Genes of Freedom: Genome-Wide Insights into Marronage, Admixture and Ethnogenesis in the Gulf of Guinea.

Authors:  João Almeida; Anne-Maria Fehn; Margarida Ferreira; Teresa Machado; Tjerk Hagemeijer; Jorge Rocha; Magdalena Gayà-Vidal
Journal:  Genes (Basel)       Date:  2021-05-28       Impact factor: 4.096

Review 5.  The genomic prehistory of peoples speaking Khoisan languages.

Authors:  Brigitte Pakendorf; Mark Stoneking
Journal:  Hum Mol Genet       Date:  2021-04-26       Impact factor: 6.150

6.  Male-biased migration from East Africa introduced pastoralism into southern Africa.

Authors:  Mário Vicente; Imke Lankheet; Thembi Russell; Nina Hollfelder; Vinet Coetzee; Himla Soodyall; Michael De Jongh; Carina M Schlebusch
Journal:  BMC Biol       Date:  2021-12-07       Impact factor: 7.431

7.  High-depth African genomes inform human migration and health.

Authors:  Ananyo Choudhury; Shaun Aron; Laura R Botigué; Dhriti Sengupta; Gerrit Botha; Taoufik Bensellak; Gordon Wells; Judit Kumuthini; Daniel Shriner; Yasmina J Fakim; Anisah W Ghoorah; Eileen Dareng; Trust Odia; Oluwadamilare Falola; Ezekiel Adebiyi; Scott Hazelhurst; Gaston Mazandu; Oscar A Nyangiri; Mamana Mbiyavanga; Alia Benkahla; Samar K Kassim; Nicola Mulder; Sally N Adebamowo; Emile R Chimusa; Donna Muzny; Ginger Metcalf; Richard A Gibbs; Charles Rotimi; Michèle Ramsay; Adebowale A Adeyemo; Zané Lombard; Neil A Hanchard
Journal:  Nature       Date:  2020-10-28       Impact factor: 69.504

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.