Literature DB >> 30895292

Dissecting the Pre-Columbian Genomic Ancestry of Native Americans along the Andes-Amazonia Divide.

Guido Alberto Gnecchi-Ruscone^1,2, Stefania Sarno¹, Sara De Fanti¹, Laura Gianvincenzo¹, Cristina Giuliani¹, Alessio Boattini¹, Eugenio Bortolini³, Tullia Di Corcia⁴, Cesar Sanchez Mellado⁵, Taylor Jesus Dàvila Francia⁵, Davide Gentilini⁶, Anna Maria Di Blasio⁶, Patrizia Di Cosimo⁷, Elisabetta Cilli³, Antonio Gonzalez-Martin⁸, Claudio Franceschi⁹, Zelda Alice Franceschi¹⁰, Olga Rickards⁴, Marco Sazzini¹, Donata Luiselli³, Davide Pettener¹.

Abstract

Extensive European and African admixture coupled with loss of Amerindian lineages makes the reconstruction of pre-Columbian history of Native Americans based on present-day genomes extremely challenging. Still open questions remain about the dispersals that occurred throughout the continent after the initial peopling from the Beringia, especially concerning the number and dynamics of diffusions into South America. Indeed, if environmental and historical factors contributed to shape distinct gene pools in the Andes and Amazonia, the origins of this East-West genetic structure and the extension of further interactions between populations residing along this divide are still not well understood. To this end, we generated new high-resolution genome-wide data for 229 individuals representative of one Central and ten South Amerindian ethnic groups from Mexico, Peru, Bolivia, and Argentina. Low levels of European and African admixture in the sampled individuals allowed the application of fine-scale haplotype-based methods and demographic modeling approaches. These analyses revealed highly specific Native American genetic ancestries and great intragroup homogeneity, along with limited traces of gene flow mainly from the Andes into Peruvian Amazonians. Substantial amount of genetic drift differentially experienced by the considered populations underlined distinct patterns of recent inbreeding or prolonged isolation. Overall, our results support the hypothesis that all non-Andean South Americans are compatible with descending from a common lineage, while we found low support for common Mesoamerican ancestors of both Andeans and other South American groups. These findings suggest extensive back-migrations into Central America from non-Andean sources or conceal distinct peopling events into the Southern Continent.

Entities: Chemical Disease Gene Species

Keywords: Amazonia; Andes; Native American ancestry; genome-wide SNPs; population genomics

Mesh：

Year: 2019 PMID： 30895292 PMCID： PMC6526910 DOI： 10.1093/molbev/msz066

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

The history of Native American populations is one of the most debated topics in the study of ancient human migrations, which puzzles academics from many different fields (Dillehay 2009) . Recently, new sources of evidence coming from genomic data of both modern populations and ancient human specimens have been contributing to unveil novel aspects on the genetic ancestry and population history of First Americans (FA). Overall, it has been confirmed that present-day Native American groups descend from human expansions entering North America from East Asia through the Beringia land corridor, although subsequent timings, number of founder events, and especially diffusion processes within the Americas are still a matter of intense debate (Skoglund and Reich 2016). Ancient genomes from North America and Siberia revealed that present-day Northern Native American populations harbor an intricate mixture of four main streams of ancestry, which were brought into the continent during at least three different diffusion processes (Raghavan et al. 2014, 2015; Rasmussen et al. 2014, 2015; Lindo et al. 2017; Moreno-Mayar, Potter, et al. 2018). Although documenting a complex pattern of secondary migrations into North America, the first and oldest of these waves (i.e., FA) was until recently supposed to be the one contributing to the ancestry of all present-day Central and South American groups (Schurr and Sherry 2004; Tamm et al. 2007; Wang et al. 2007; Fagundes et al. 2008; Kitchen et al. 2008; Perego et al. 2010; Reich et al. 2012; Battaglia et al. 2013). Consistently with this scenario, a 12.6-ka human sample recovered in western Montana (Anzick-1) was found to derive all of his ancestry from the same FA source and, in fact, resulted to be genetically closer to Native Central and South Americans than to any other Northern American group (Rasmussen et al. 2014). In addition, coalescent analyses of ancient mitochondrial genomes from South America further suggested a small population entering the Americas around 16 ka after a few millennia of Beringia standstill and rapidly expanding southward (Llamas et al. 2016). In agreement with archaeological records, and particularly with the presence in Southern Chile of one of the oldest American archeological sites (Monte Verde, 14–15 ka), these results pointed toward a strong founder effect and an early and rapid peopling from North to South America, plausibly along a costal Pacific route (Dillehay and Collins 1988; Dillehay et al. 2008). However, other studies questioned the model of a single wave of genetically homogeneous migrants as being responsible for the entire ancestry of Central and South American populations (Skoglund et al. 2015; Brandini et al. 2018 ; Scheib et al. 2018). Accordingly, recent evidence based on genomic data generated from ancient human remains retrieved in Central and South America confirmed the occurrence of multiple waves of diffusion into the south of the continent and suggested a complex scenario involving the spread of ancient populations that were already genetically structured (Moreno-Mayar, Vinner, et al. 2018; Posth et al. 2018). Among these samples, the oldest ones (dating to ∼11–10 ka), either from North America (Nevada) or from South America (West and East of the Andes), were those showing the highest genetic affinity with Anzick-1. This corroborates the hypothesis that the first diffusion from North into South America was extremely rapid (∼1–2 ka) and was not limited to the West coast since it is supported by samples from the entire continent. However, the genetic footprints of this first peopling event were found to be subtler in more recent samples, suggesting an extensive population replacement beginning from around 9 ka, by a different ancestral lineage with respect to that represented in the Clovis-associated Anzick-1 (Posth et al. 2018). Subsequent migrations after the initial diffusion were associated with expansion from Mesoamerica occurred sometime after ∼8.7 ka, which spread first southward (contributing to the ancestry of all present-day South Americans) and then northward, as suggested by ∼2 ka ancient samples from Nevada (Moreno-Mayar, Vinner, et al. 2018). Despite the common view that these processes contributed significantly to the formation of the modern South American genomic landscape, the two above-mentioned studies did not clarify in detail the relative proportions of these ancestries in the genomes of contemporary populations, being instead focused mostly on the relationship between ancient samples. Finally, minor contributions to the South American gene pool (i.e., <5% of ancestry) were ascribable to the affinity with an Austro Melanesian–related ancestry source already attested for some present-day Amazonian groups (Skoglund et al. 2015). This pattern was recognized also in a ∼10-ka sample from Brazil (Moreno-Mayar, Vinner, et al. 2018), coupled with a newly described connection between ancient samples from the California Channel Islands and the Late Central Andes since around 4.2 ka (Posth et al. 2018). Overall, these studies revealed that the dynamics of demographic events occurred between Central and South America subsequently to the initial peopling of these regions, as well as within the southern continent itself, have been more complicated than previously thought. Within South America, mitochondrial DNA, Y-chromosome and autosomal data showed a clear structure East/West of the Andes, in agreement with a long-standing geographic barrier between Andeans and Amazonians (Luiselli et al. 2000; Tarazona-Santos et al. 2001; Fuselli et al. 2003; Reich et al. 2012; Homburger et al. 2015). In addition, populations from the Andean cordillera experienced an additional history of adaptation to high-altitude environments with respect to the other South Americans (Bigham et al. 2010; Crawford et al. 2017; Lindo, Haas, et al. 2018). More recently, the Andes were the cradle of the major South American Pre-Columbian civilizations, last of which the Inca Empire (D’Altroy 2014). In the same way, complex demographic histories and different patterns of gene flow among and within Central and South America may have further affected the genetic structuring of Southern Native Americans during and after the initial peopling process. In conclusion, the dynamics that characterized the entering and diffusion of Native American ancestors in South America are still unclear and many questions remain open. More specifically, there is no clear evidence about 1) how the different diffusions into South America described by Moreno-Mayar, Vinner, et al. (2018) and Posth et al. (2018) reconcile with the genetic structure observable in present-day South Americans East and West of the Andes and 2) the extent to which subsequent contacts and gene flow between Central and South Americans, as well as between Andeans and Amazonians, occurred. Indeed, although local patterns of back-migrations and gene flow between the Caribbean region and northern South America has been detected (Reich et al. 2012; Moreno-Estrada et al. 2013; Schroeder et al. 2018 ), it is still not clear if and to what extent the Andean and non-Andean gene pools have admixed after their initial split, with inevitable implications for the identification of a correct divergence time between the two groups. Hints from uniparental markers suggested that some gene flow between the Andes and Amazon could have occurred (Barbieri et al. 2014; Di Corcia et al. 2017; Gómez-Carballa et al. 2018). Furthermore, they also show different patterns of genetic drift and gene flow, with larger effective population sizes and higher migration rates within the Andes, compared with lower gene flow and higher genetic drift in the eastern populations settled in Amazonian and Chaco regions (Tarazona-Santos et al. 2001; Lewis et al. 2005; Sevini et al. 2013). From a genome-wide perspective, a strong limitation in the study of Native American population history is due to the dramatic demographic changes that they experienced after the European colonization of the 15th century (Lindo et al. 2016; Llamas et al. 2016; Lindo, Rogers, et al. 2018). In fact, it is well known that, because of these processes, present-day American populations appear as a mixture of ancestral sources from different continents, mostly Europe and Africa, with varying proportions from one country to another. This makes extremely limited the possibility of making historical inferences and testing demographic models based only on the fractioned Native American genomic portions (Gravel et al. 2013; Moreno-Estrada et al. 2014; Homburger et al. 2015; Kehdy et al. 2015; Montinaro et al. 2015). In the present study, we aimed at investigating some aspects of the peopling processes of South America both 1) at a continent-wide scale in relation to Central American populations and 2) at a more local scale as concerns the interactions between the high-altitude Andeans and the neighboring populations from Peruvian Amazon and Argentinian Gran Chaco regions. To this end, we analyzed 229 individuals representative of 11 Central and South Native American ethnic groups whose DNA was genotyped for ∼720,000 genome-wide single nucleotide polymorphisms (SNPs). Taking advantage from samples previously typed for uniparental markers, we generated new genomic data for five ethnic groups from Peruvian Amazon (Barbieri et al. 2014; Di Corcia et al. 2017), one group from the Gran Chaco region (Sevini et al. 2013) and four high-altitude Andean ethno-linguistic groups from the Titicaca lake area in Peru (Barbieri et al. 2011), as well as for newly collected samples from the Bolivian Andes. In addition, we included one Mexican ethnic group (Tzotzil) as representative of the “Mayan Cluster” identified by Moreno-Estrada et al. (2014), which was missing in previous Native American reference data sets that we included in our study (Li et al. 2008; Reich et al. 2012). By applying fine-scale haplotype-based analyses and demographic modeling inferences, we provided new insights into the origins of ancestral gene pools East-West of the Andes–Amazonia divide, as well as on local patterns of isolation and admixture that differently shaped the genetic and cultural complexity of present-day South American populations.

Results

After the quality control (QC) steps detailed in Materials and Methods, we obtained an “extended” data set consisting of 207,165 genome-wide SNPs typed in 178 newly analyzed samples from Mesoamerican (Meso) and South American (SA) populations, 431 individuals from 50 additional Amerindian groups already included in previous reference studies, and 92 non-Native American populations retrieved from the literature (supplementary table S1, Supplementary Material online; Li et al. 2008; Reich et al. 2012; 1000 Genomes Project Consortium et al. 2015). We used this “extended” data set to frame the genetic variation of analyzed populations into the context of worldwide genomic landscape (supplementary Results and supplementary fig. S1, Supplementary Material online) and to assess the extent of non-Native American admixture in the studied Amerindian groups (supplementary Results, supplementary fig. S2, and supplementary table S2, Supplementary Material online). Overall, our newly generated data revealed very limited non-Native American admixture, with only one Wichi and two Yanesha samples showing appreciable levels of African ancestry and a low number of individuals per group presenting proportions of European admixture higher than the considered threshold (supplementary Results and supplementary fig. S2, Supplementary Material online).

Native American Genetic Structure

Principal Component Analysis (PCA) performed only on the Native American populations retained in the pruned “un-admixed” data set showed a good resemblance with both the geographic distribution and the linguistic affiliation of analyzed populations (fig. 1). Accordingly, it generally confirmed a pattern of North-to-South variability, with the exceptions of Costa Ricans and western Brazilians (i.e., Surui and Karitiana), which instead occupied an outlier position along PC1 and PC2, respectively. In this context, our newly analyzed SA groups formed two well-distinguishable clusters, encompassing all the Andeans from one hand and the Amazonians with Gran Chaco populations on the other.

. 1.

Principal component and ADMIXTURE analyses performed on Native American populations included in the pruned “un-admixed” data set. (a) Plot of PC1 versus PC2 for the 43 un-admixed Native American groups reported in the bottom legends of left and right plots. Individuals are color-coded according to their country of origin (left) or language family affiliation (right). In order to allow continent-wide comparison, we used the same Greenberg’s classification (Greenberg 1987) of languages as in Reich et al. (2012). (b) Results of ADMIXTURE unsupervised cluster-based analysis at K = 8. Average proportions of inferred ancestral components are plotted at population level. Pie charts diameters are proportional to the sample sizes of each considered group ranging from N = 1 to N = 20 (full set of populations, K tested and cross-validation errors are reported in supplementary figs. S3 and S4, Supplementary Material online). The geographical map has been plotted using the R software (v.3.2.4). In order to investigate more deeply patterns of Native American substructure and to infer proportions of different ancestral genetic components, we run the unsupervised ADMIXTURE analysis on this Native American “un-admixed” pruned data set (supplementary fig. S3, Supplementary Material online), including a European population (Utah residents with Northern and Western European ancestry, CEU) as a further check for non-Amerindian gene flow. At the best-fit model of K = 8, all Native Americans clustered according to seven highly specific genetic components, also corroborating the absence of any detectable European admixture since the remaining last component was restricted exclusively to CEU (fig. 1 and supplementary figs. S3 and S4, Supplementary Material online). Overall, the detected Native American genetic ancestries revealed a clear geographic distribution. One component is highly represented in Mesoamerican populations and gradually decreases southward. Another component is mostly observed in Costa Rican groups, such as Maleku, Teribe, Bribri, and Cabecar, being also present at lower proportions in Colombian populations (i.e., Waunana, Embera, Wayuu, and Kogi). Importantly, two other components were highly enriched in all Andeans and in Peruvian Amazonian populations, respectively, thus suggesting an East-West structuring pattern between different Andean- and Amazonian-specific ancestries. Instead, the remaining three components resulted to be private, respectively, of Karitiana, Surui, and Wichi, although this last one was observed also in Chane, Guarani, and Jamanadi. Interestingly, at K = 9, Cashibo acquired a private genetic component as well (supplementary fig. S3, Supplementary Material online). Outgroup-f3 statistics were used to formally infer the sharing of genetic drift between couples of populations (i.e., genetic relatedness between groups). In agreement with ADMIXTURE and PCA results, all non-Andean SA populations were found to be more closely related to each other than to all Andeans and finally to all Meso populations, and symmetrically all the Andean groups appeared to be more closely related to each other, than to all the other SA and then to Mesoamericans (supplementary fig. S5, Supplementary Material online). The sole exception to this trend was represented by Costa Rican groups, particularly Cabecar, to which almost all Amazonians (except Shipibo and Yanesha) and Grand Chaco populations are genetically closer with respect to the Andeans. Consistently with these results and previous studies, the topologies of phylogenetic trees reconstructed with TreeMix generally confirmed a North-to-South progressive pattern of population splitting, with Meso branching out from the tree before the split of Costa Rican and SA groups (supplementary figs. S6 and S7, Supplementary Material online; Reich et al. 2012). However, allowing for migration events among populations revealed more complex patterns of genetic relationships between groups, involving changes in the order of splits between Costa Ricans and SA or between the Andean and the non-Andean SA major clades, as well as some connections between single populations (supplementary Results and supplementary figs. S6 and S7, Supplementary Material online).

Intrapopulation Patterns of Genomic Diversity

To better understand how the different histories of Meso, Andean, and non-Andean SA groups have shaped their genomic diversity, we explored patterns of within-population genetic variation. In particular, to test how the demographic and evolutionary history of each population may have affected the observed ancestry patterns, we calculated the extension of regions with continuous homozygous SNPs (i.e., runs of homozygosity, ROH) and we classified them according to length into three different classes (see Materials and Methods). By investigating the distribution of ROH length over all individuals in each population, we found that Peruvian and Brazilian Amazonians particularly represented by Surui, Karitiana, and Cashibo, showed enrichment of longer ROH classes (fig. 2), especially if compared with Andean groups and to the Wichi from Gran Chaco, who instead harbor shorter ROH segments.

. 2.

Intrapopulation patterns of homozygosity and haplotype sharing. (a) ROH calculated for the Native American groups with N ≥ 5 included in the “un-admixed” data set. Top panel shows the distribution of all ROH lengths (black) and their inferred assignment into four bin classes identified by Mclust (blue, red, green, and purple for class 1–4, respectively). Since, class 4 is represented by only three outlier individuals, they were removed from downstream analyses. Bottom panel shows the average length of ROHs over all individuals within each Native American population for each of three considered length classes (i.e., 1–3). (b) Pattern of intrapopulation haplotype sharing measured as the average total length of genome shared IBD between every couple of samples within each population (WAB). Within-population IBD-sharing was calculated for nine bins of IBD lengths, corresponding to different degrees of relatedness according to Moreno-Estrada et al. (2014). Dashes lines represent the distribution of inferred statistics over the considered length classes (see also supplementary fig. S8, Supplementary Material online). The mode of the distribution is plotted as the corresponding labeled point for each population. These patterns were further explored with fastIBD by comparing values of identity by descent (IBD) sharing within the analyzed “un-admixed” Native American populations and by visualizing the distribution of the total length of shared IBD segments at different bin thresholds (see Materials and Methods). Consistently with ROH results, values of within-population average IBD-sharing (WAB) appeared significantly higher for Cashibo, Surui, and Karitiana. Furthermore, these groups showed tract lengths distributions that—if compared with the rest of Native American groups—are particularly shifted toward the highest classes of IBD binning, which indicates more recent genetic relatedness (fig. 2 and supplementary fig. S8, Supplementary Material online). Compared with the rest of Native Americans, also Wichi and Cabecar showed relatively high values of WAB, but their tract lengths distributions are within the ranges observed for all the other Amerindian populations analyzed (fig. 2).

Fine-Scale Interpopulation Haplotype Sharing

To evaluate at a finer scale the genomic structure of un-admixed Native American groups, we applied the fineSTRUCTURE clustering algorithm to the CHROMOPAINTER “chunk-counts” matrix of individual haplotype sharing. We first included also European (CEU), East Asian (Han Chinese in Beijing, CHB), and African (Yoruba from Nigeria, YRI) groups to definitely verify the absence of post-Colombian admixture (supplementary Results and supplementary fig. S9, Supplementary Material online), and then we considered only the Native American groups (excluding Chipewyan) to specifically focus on intra-Amerindian haplotype sharing patterns. Overall, clusters of genetically homogeneous individuals identified by fineSTRUCTURE largely matched with population labels (fig. 3 and supplementary fig. S10, Supplementary Material online). Few exceptions involved individuals belonging to closely related groups. For example, one Guahibo individual clustered with all the Piapoco, as does one Guaranì with neighboring Chane and one Shipibo within the Cashibo cluster. Analogously, Aymara individuals previously sampled in Bolivia (Reich et al. 2012) and Bolivian Aymara from our study appeared highly intermingled, as well as some Cabecar individuals with the other Costa Rican groups (i.e., Maleku, Teribe, and Bribri). Albeit we caution that fineSTRUCTURE hierarchical clustering does not imply any evolutionary relationship between distinct clusters, and thus should not be interpreted as a phylogenetic tree (Lawson et al. 2012; Leslie et al. 2015), if considering the clusters independently (fig. 3), they perfectly matched with the clades identified by TreeMix and were consistent with the broad pattern described by outgroup-f3 statistics (supplementary figs. S5–S7, Supplementary Material online). In fact, all Andeans formed a clade that departs from all the other SA. Similarly, and in agreement with the genetic difference between Peruvians and other Amazonian groups appreciable with TreeMix, all Peruvian Amazons formed a separate clade among each other, the sole exception being Huambisa that instead clustered with all the other Amazonian groups, as well as with Chane and Guarani (fig. 3). As concerns Mesoamericans, they all clustered together presenting internal relationships again in agreement with TreeMix results, that is, Pima split first with respect to the Central Mexican groups of Tepehuano, Zapotec, and Mixe on one hand and the Southern Mexican Tzotzil and Guatemala populations on the other. Finally, all the Wichi (i.e., the 17 new individuals from our study and four previously published by Reich et al. [2012]) formed an outlier cluster, and so did a separate clade encompassing all the Costa Rican groups (i.e., Cabecar, Maleku, Teribe, and Bribri).

. 3.

fineSTRUCTURE hierarchical clustering dendrogram calculated between pairs of Native American individuals of the “un-admixed” data set. The 26 clusters highlighted with different colors are highly concordant with the actual population labels, with the exclusions of partially overlapping geographically close groups of Costa Rica (i.e., Maleku, Bribri, Teribe, and some Cabecar samples), Chane and Guarani, Wayuu and Kogi, Waunana and Embera, Aymara from Bolivia. In the figure, these samples were thus merged in the same cluster. For detailed annotation of individuals inside each cluster, see supplementary figure S10, Supplementary Material online. Comparison between the clustering pattern and the “chunk-lengths” matrix pinpointed additional interesting features (supplementary fig. S11, Supplementary Material online). First, Karitiana, Surui, Cashibo, Wichi, and Cabecar, who showed higher proportions of homozygous segments and of total length of shared IBD (fig. 2), were also the ones presenting the lowest (∼0) proportion of haplotype “copying” with other groups. However, the difference between these groups is that although Surui, Karitiana, and Cashibo clustered within their corresponding clade (i.e., West and Peruvian Amazonian, respectively), the Wichi and Cabecar were outliers with respect to all the other groups. Furthermore, only few sets of populations revealed evident traces of high haplotype sharing outside from their own cluster, namely the Chane and Guarani with the Wichi, and the Colombian Wayuu, Waunana, Embera, and Kogi with the Costa Rican clade, thus signaling possible events of gene flow between these populations. However, attempts to date these admixture events using the GLOBETROTTER pipeline, which is based on the “chunk-lengths” matrix produced by CHROMOPAINTER, were unsuccessful. In fact, the coancestry curves were too noisy to successfully fit an exponential function describing the admixture parameters (supplementary fig. S12, Supplementary Material online). This may be due in part to the low haplotype resolution and ascertainment bias of SNP-chip data and in part to the fact that the populations involved in this study are related to each other to the point that the method is unable to produce clear patterns of haplotype chunks belonging to one or another ancestral source.

Demographic Modeling

We attempted to formally assess the genealogical relationships between the Andean and non-Andean SA with respect to the Meso populations with simplified four-population treelike models by applying f4 and D-statistics for all possible combinations of the studied groups (supplementary tables S3 and S4, Supplementary Material online). Overall, tests in the form of (CHB, Meso; SA, SA) and (CHB, Andean; Meso, non-Andean) confirmed that SA groups are consistent with forming a clade with respect to the Meso groups. The only populations breaking this trend were the Cabecar when considered in the Meso position and the Tzotzil when SA were specified as combinations of Andean and Amazonian groups, respectively (supplementary Results and supplementary tables S3 and S4, Supplementary Material online). In fact, Andeans and non-Andeans resulted differently related to Mesoamericans when we tested the topology in the form of (CHB, Meso; Meso, Andeans or non-Andeans) when the two Meso populations were the Zapotec and the Tzotzil, respectively (supplementary Results and supplementary table S5, Supplementary Material online). That being so, to identify demographic models explaining the intricate relationships between Mesoamericans, Andean, and non-Andean SA, we finally used the admixture graph (AG) approach as described in Materials and Methods. The simplest AG test (supplementary fig. S13a and supplementary table S6, Supplementary Material online) modeled Andean and non-Andean SA as descending from a common ancestral population that is a sister group of the Zapotec and provided good fits except for some non-Andean populations (i.e., Guahibo and Huambisa). However, poor fits to the data extended to all non-Andean groups when we included the Tzotzil in the demography as the last Meso group before the divergence within SA (supplementary fig. S13b, Supplementary Material online). In these cases, none of the combinations between Andeans and non-Andeans can be successfully modeled as forming a clade with respect to the Tzotzil (supplementary table S7, Supplementary Material online). Since AGs without admixture represent poor fits in the history of these populations, we tried to model alternative topologies allowing for mixture events. In particular, following the results of f4 and D analyses (supplementary Results, Supplementary Material online), we tested AG configurations connecting the Andeans to the Zapotec or the non-Andeans to the Tzotzil through one admixture event between a lineage ancestral to these Mesoamerican groups and the other SA ancestral pool, respectively (supplementary fig. S14a and b and supplementary tables S8 and S9, respectively, Supplementary Material online). Interestingly, although both such cases provided no good fit, a demography where the Andeans are instead admixed between a deeper Mesoamerican node (i.e., ancestral to the Tepehuano) and the non-Andean SA lineage showed several good fits and thus cannot be definitively ruled out (supplementary fig. S14c and supplementary table S10, Supplementary Material online). However, among all the tested demographic models, the ones that maximized the fits to the data are those where the Tzotzil were modeled as a mixture of ancestry strands related to a lineage leading to all non-Andean SA and to a node ancestral to the Zapotec (fig. 4 and supplementary table S11, Supplementary Material online). Furthermore, Guahibo and Huambisa groups can be successfully modeled only if considering a further admixture event between a node ancestral to the Andeans and a node ancestral to the Zapotec, before the above-mentioned admixture involving non-Andean groups with the Tzotzil (supplementary fig. S14d and supplementary table S12, Supplementary Material online).

. 4.

Best-fitting AGs obtained with qpGraph. (a) Schematic summary of models testing a topology where the Tzotzils descend from an admixture between a node ancestral to the Zapotec and the non-Andean lineage, while using all possible combinations of Andean and non-Andean populations. (b) Schematic summary of all AGs obtained testing in turn four Peruvian Amazonian groups (i.e., Cashibo, Shipibo, Yanesha, and Ashaninka) as admixed between a non-Andean, specifically Amazonian, lineage and a node ancestral to the Andeans. Dotted lines represent the two-way admixture events tested and the percentages of ancestry on each line denote the proportions of admixture relative to the two admixing lineages. Units along solid lines indicate the measure of drift. Ranges of admixture proportions and drift lengths represent the min and max values reported in supplementary tables S11 and S13, Supplementary Material online, for all of the tests performed. Red nodes represent the two possible events of diffusion into South America hypothesized in the Discussion section. We finally attempted to test the demography within SA and especially between the different non-Andean clades taking into account the results from Reich et al. (2012) and modeling our newly generated data (fig. 4 and supplementary fig. S15, Supplementary Material online). We found good fits for the Gran Chaco (represented by the Wichi) as the first non-Andean clade branching out, and the Guarani could be successfully modeled only as admixed between a node ancestral to this Gran Chaco lineage and a node leading to other Amazonians, thus confirming results from Reich et al. (2012). Importantly, the Peruvian Amazonian groups (i.e., Cashibo, Shipibo, and Yanesha) best fit when modeled as admixed between the SA Amazonian lineage and the Andean clade (fig. 4 and supplementary table S13, Supplementary Material online). On the contrary, trying to fit them in a demography without admixture generally resulted in f-statistics that are more than |Z|>3 standard errors from expectation, thus supporting a model with admixture as a better choice (supplementary fig. S15, Supplementary Material online). It is worth nothing that in such a model the Yanesha presented an extra affinity with the YRI outgroup, that is, an outlier f4-statistics (Z < −3) in the form (YRI, Zapo/Wichi; Surui/Karitiana, Yanesha). This result complies with some outlier f4 and D-statistics in the form (CHB, Meso; non-Andean, Yanesha) (supplementary table S3, Supplementary Material online) and may suggest a possible remnant of cryptic post-Columbian African admixture undetected by previous analyses. The Ashaninka revealed good fits for both models—that is, either accounting for additional mixture or not—but again with a slight increase in fit for the admixture case (supplementary fig. S15 and supplementary table S13, Supplementary Material online). Overall, the “Andean” admixture component in Peruvian Amazonians was very low, ranging from ∼5% in Ashaninka to ∼15% in Yanesha, which is consistent with them harboring mostly a non-Andean and specifically an Amazonian genetic ancestry (fig. 4).

Discussion

To shed light into the genetic history of SA populations with fine-scale genomic analyses and to overcome the inferential limitations imposed by recent post-Columbian admixture, we genome-wide genotyped individuals representative of ten South and one Central American ethnic groups. In particular, to address the investigation of both broader and local scale patterns of peopling processes and of genetic relationships East and West of the Andes/Amazonian divide, we integrated previous Native American reference panels (Reich et al. 2012) with new data from high-altitude Andean groups from Peru and Bolivia, Peruvian Amazonians, Wichí from the Gran Chaco and Mexican Mayan Tzotzil (see Materials and Methods). The reduced non-Native American ancestry detected especially in the newly typed samples (supplementary Results and supplementary figs. S2 and S9, Supplementary Material online) allowed us to exclude recently admixed individuals, still relying on a good sample size per group (supplementary table S1, Supplementary Material online). Global population structure analyses on the Native American “un-admixed” data set revealed a clear-cut pattern of structuring between groups, coupled with substantial intrapopulation homogeneity (fig.1). Overall, individuals belonging to the same population formed tight clusters on the PCA space (fig. 1) and presented similar admixture proportions (fig. 1). Even at the finer-scale structuring level explored by haplotype-based fineSTRUCTURE analyses, individuals were consistently found to cluster according to their respective population, with only few exceptions of single samples assigned to neighboring groups (fig. 3 and supplementary fig. S10, Supplementary Material online). Genetic relationships between populations were broadly concordant with their language family affiliation and corresponded to geographic locations at a local scale (fig. 1). The Meso groups showed a general North to South clustering pattern (with the exception of the outlier position of Costa Ricans), whereas among the Peruvian samples emerged a sharp distinction between the tight cluster of high-altitude Andeans and the Amazonians, the latter grouping with the bulk of other non-Andean SA from Brazil, Colombia and Gran Chaco (fig. 1). Inferences of ancestry proportions showed the presence of distinct Native American genetic components largely corresponding to one Central American (i.e., highest in all Mexican groups), one Costa Rican, one Andean and different non-Andean SA components maximized in Peruvian Amazonians, Brazilian Surui and Karitiana, and Wichi from the Gran-Chaco (fig. 1 and supplementary fig. S3, Supplementary Material online). Proportions of Meso components (fig. 1) were observed at different levels among some non-Andean populations, suggesting shared ancestry or recent contacts between Meso and SA groups. In particular, this latter case could explain the proportions of the Costa Rican-like component observed in Northern SA from Colombia (i.e., Kogi, Embera, Waunana, and Wayuu), who in fact occupied an intermediate position in the PCA with respect to the neighboring Amazonians (fig. 1). In agreement with previous studies (Reich et al. 2012; Homburger et al. 2015), admixture between Colombian groups and Costa Ricans emerged also from several TreeMix runs (supplementary Results and supplementary figs. S6 and S7, Supplementary Material online) and was supported by the high sharing of haplotypes between these two clusters revealed by CHROMOPAINTER analyses (supplementary figs. S9 and S11, Supplementary Material online). Patterns of haplotype sharing from outside their own-specific cluster were observed also for the Chane and Guarani groups, which revealed significant proportions of the Wichi-like component. In fact, they clustered with the Wichi in TreeMix phylogenies, although showing migration edges with Amazonians (supplementary Results and supplementary figs. S6 and S7, Supplementary Material online). Analyses of intrapopulation diversity, measuring both the length of genotype homozygous tracts (fig. 2) and the genome-wide haplotype IBD-sharing between individuals belonging to the same group (fig. 2), concurrently confirmed a general pattern of higher drift experienced by non-Andean SA and Costa Rican groups with respect to the Meso and Andean populations. This likely reflects known differences in the past population histories and effective population sizes between these groups (Wang et al. 2007). In fact, during pre-Columbian times the area of present-day Mexico in Mesoamerica and the Andes witnessed the rise of complex urban societies, whereas in other regions the populations remained mainly organized in smaller groups thus probably incrementing inbreeding within populations and experiencing variable degrees of isolation (D’Altroy 2014; Arias et al. 2018). Nevertheless, detected differences in intragroup genetic patterns allowed the distinction between the effects of substantial inbreeding and/or small effective population sizes (Ne) from the ones of prolonged isolation. For instance, the Brazilian Amazonian groups of Surui and Karitiana and the Peruvian Cashibo, besides exhibiting private genetic components according to ADMIXTURE analysis (fig. 1), also presented higher long-tract ROH and IBD values (fig. 2), as well as longer tip branches in the inferred TreeMix trees (supplementary figs. S6 and S7, Supplementary Material online) and AGs (supplementary tables S4–S12, Supplementary Material online), thus signaling evidence of recent population-level relatedness and high genetic drift. This confirmed previous results obtained for Surui and Karitiana (Wang et al. 2007; Li et al. 2008; Verdu et al. 2014) and is in line with the reduced uniparental lineage composition already observed for Cashibo (Di Corcia et al. 2017). On the contrary, Wichi and Cabecar, despite showing overall high levels of homozygosity and intragroup haplotype sharing, presented an average shorter length of homozygous tracts and chunks of shared haplotypes, more compatible with a prolonged isolation rather than high inbreeding (fig. 2). This is reflected also in their outlier position with respect to other Central and South American clusters identified by fineSTRUCTURE (fig. 3 and supplementary fig. S10, Supplementary Material online). For what concerns the Wichi, these patterns were in agreement with the strong founder effect and the subsequent high diversification of mitochondrial lineages observed in a previous study (Sevini et al. 2013), thus confirming that this region has been long populated and that Wichi remained genetically isolated from both the neighboring Andes on the West and Amazonia on the North. Effects of such an isolation were evident in the population-specific clustering of Wichi both in genotype-based ADMIXTURE and haplotype-based fineSTRUCTURE analyses (figs. 1). As for their relationships with the other main branches of SA lineages (i.e., Andean and Amazonian), the instable position of Wichi in TreeMix phylogenies was paralleled by an ancestral connection between Meso and Andean clades each time the Gran Chaco group split out before the Andeans instead of being a sister clade of all the other non-Andean SA (supplementary Results and supplementary figs. S6 and S7, Supplementary Material online). Consistently with the closer relationship with Amazonians outlined by both PCA and outgroup-f3 analyses, formal tests of treeness through four-population statistics (supplementary table S3, Supplementary Material online) showed that the Wichi belong to the same non-Andean lineage of all Amazonians. Importantly, AGs further suggested that they likely descend from one of the first splits within this lineage (fig. 4 and supplementary fig. S15, Supplementary Material online). In fact, when formally tested with f4 and D-statistics, the non-Andean SA together with Costa Ricans were consistent with forming a clade with respect to Meso and Andeans (supplementary Results and supplementary table S3, Supplementary Material online). This is also in agreement with them being more closely related to each other than to all Andeans according to the outgroup-f3 analyses (supplementary fig. S5, Supplementary Material online). Instead, f4 and D-statistics revealed that Andeans and non-Andeans are differently related to the Meso groups. In fact, all non-Andean SA and Cabecar if tested as forming a clade together with the Andeans showed a significant extra genetic affinity with the Tzotzil (supplementary Results and supplementary table S3, Supplementary Material online). Furthermore, when we directly assessed to which Meso lineage the SA are more closely related, the Andeans revealed a closer relationship to the Zapotec with respect to the Tzotzil, whereas the opposite applied for all non-Andean and Cabecar groups (supplementary Results and supplementary table S3, Supplementary Material online). AGs corroborated these patterns and formally confirmed that for Meso and SA populations a demographic model that follows the simple treelike topology of TreeMix does not fit with the data unless accounting for at least one admixture event between the main branches (supplementary fig. S13a and supplementary table S4, Supplementary Material online). More specifically, the best-fitting model obtained by testing all different combinations of possible admixture events suggested by outlier f4 and D-statistics, was the one where the Tzotzil descend from an admixture between a Meso branch and the South American lineage contributing to all present-day non-Andean populations, after the divergence between Andean and non-Andean SA (fig. 4). This demographic model is in accordance with two opposite scenarios (and with all possible events in between). According to the first scenario, if the split between the Andean and non-Andean lineages happened within SA, back-migrations and/or bi-directional gene flow between Meso and SA extended far beyond the attested contacts between North SA and the Caribbean region (see also Reich et al. 2012; Moreno-Estrada et al. 2013). In particular, these processes would have involved northward diffusions of the ancestors of all non-Andean SA that can be genetically detected up to present-day highlands of Southern Mexico. However, this hypothesis does not seem to find support from the ancient genomes recently studied by Moreno-Mayar, Vinner, et al. (2018) and Posth et al. (2018). In fact, both these studies did not detect any evidence of back-migrations from South America into Central America, even though the only ancient data from the entire Meso and North SA regions (i.e., northern Peru) are represented by two samples from Belize dated >7.4 ka (Posth et al. 2018). Therefore, it is not possible to exclude that such population movements occurred after this period. According to the second scenario, if the split between the Andean and non-Andean lineages occurred before the entrance into SA (i.e., somewhere in North America or most likely in Mesoamerica), the admixture involving the Tzotzil ancestors would have happened outside SA between two already diverged Meso lineages, one ancestral to the Zapotec and the other one instead related to all present-day non-Andean SA. This latter lineage could therefore represent an additional gene pool, distinct from the one leading to the Andeans, which spread into South America on one side and admixed in Mesoamerica on the other. However, it is not possible to attest which of these events happened earlier. Interestingly, we cannot exclude an alternative model where the Andeans are admixed between the non-Andean SA lineage and a more ancient Meso branch of the Northern Mexican Tepehuano lineage (supplementary fig. S14c and supplementary table S8, Supplementary Material online). This alternative demography is more concordant with the second scenario underlying the previous model and reconciles the pattern observed in several TreeMix replicates, which showed a migration edge between the Andean cluster and a node ancestral to all the Mesoamericans (supplementary Results and supplementary figs. S6 and S7, Supplementary Material online). By considering recent findings emerged from ancient DNA, the connection between the Andeans and a northern Meso lineage (i.e., ancestral to Tepehuano) could be associated with the small amount of gene flow (2–4%) identified by Posth et al. (2018) between ancient individuals from the California Channel Islands (California) and the ancient Late Central Andeans, which dates back to sometime after ∼4.2 ka. Finally, both models described so far, together with TreeMix results (fig. 4 and supplementary figs. S6, S7, and S14c, Supplementary Material online), could imply the same event of slow population replacement that started around 9 ka and went on for several thousands of years (Moreno-Mayar, Vinner, et al. 2018; Posth et al. 2018). Indeed, a connection between the high-altitude Andeans and a more rooted (i.e., ancient) Meso lineage with respect to non-Andeans is in agreement with the longer standing genetic continuity attested so far in the Andes (started 8–9 ka; Lindo, Haas, et al. 2018; Posth et al. 2018) as compared with other regions of SA. For instance, in Patagonia this population replacement occurred not earlier than ∼5 ka (Moreno-Mayar, Vinner, et al. 2018). Therefore, such a long-term population replacement could be at the origin of the East and West of the Andes genetic structure that we observe in present-day Native South Americans. Within South America, we finally confirmed the role of sharp genetic barrier represented by the Andes. This pattern holds even considering the geographically close groups from both the Gran Chaco region in Argentina and the neighboring Peruvian Amazon included in our newly typed samples. Nevertheless, even if these low-altitude Peruvian groups are indeed of Amazonian ancestry (i.e., more closely related to the rest of non-Andean populations; fig. 1 and supplementary fig. S5, Supplementary Material online), a small proportion of gene flow from the Andeans into the Peruvian Amazonians can be detected (fig. 4), in agreement with previous results from uniparental data (Barbieri et al. 2014; Di Corcia et al. 2017). Conversely, we did not detect evidence of recent gene flow in the opposite direction, from non-Andean sources into the Andean groups, as suggested in a recent study (Harris et al. 2018). Although being aware that the over-simplified nature of demographic models cannot explain the actual complexity of thousands of years of population history, and that isolation-by-distance may account for a great deal of the genetic structure among present-day populations (thus masking most of the demographic events occurred through time), our results support the view that a simple divergence from common Mesoamerican ancestors along with an unidirectional latitudinal expansion is not sufficient to explain the genetic diversity of Native South Americans. Overall, the present study reconciles the genetic structure of modern South American populations with the recent findings emerged from ancient DNA analyses (Moreno-Mayar, Vinner, et al. 2018; Posth et al. 2018) and provide intriguing hypotheses that could be tested with new data from additional ancient samples. Furthermore, although future studies will further benefit from the analyses of new complete genomes, our results stress the importance of implementing accurate sampling strategies and of selecting representative populations based on historical/linguistic and anthropological information to add new insights into the pre-Columbian history of Native Americans.

Materials and Methods

Samples Collection and Genotyping

In this study, we analyzed a total of 229 individuals belonging to 11 ethnic groups from Meso and South America, namely, Tzotzil from Chiapas (Mexico); Cashibo, Shipibo, Huambisa, Ashaninka, and Yanesha from Peruvian Amazon (Peru); Quechua, Aymara, and Uros from the Peruvian Andes (Peru); Aymara from the Bolivian Andes (Bolivia); and Wichi from the Gran-Chaco (Argentina). Subjects were surveyed for being native of their respective ethnic group by at least three generations. Saliva samples from Bolivian Aymara were collected with the Oragene-DNA Self Collection Kit OG500 (DNA Genotek Inc., Ottawa, Ontario, Canada). Genomic DNA was purified with the prepIT-L2P protocol (DNA Genotek) and quantified by fluorometric methods (Qubit dsDNA BR Assay Kit, Life Technologies, Carlsbad, CA). The other samples were collected and processed for DNA extraction as described elsewhere (Barbieri et al. 2011, 2014; Sevini et al. 2013; Moreno-Estrada et al. 2014; Di Corcia et al. 2017). The participants provided a written informed consent to data treatment and project objectives. Approvals from local Institutional Review Boards were obtained as well. In particular, authorization by the Unidad de Identificación Genética (UNIGEN) de la Universidad Mayor de San Andrés (UMSA) was obtained for the collection of new samples from Bolivia. Approvals by the representative of the regional organization of Ucayali (AIDESEP) and by the president of COSHIKOX (Consejo Shipibo Conibo Xetebo), by the representative of the Yanesha political association and the FECONAYA (Federacion de Comunidades Nativas Yanesha), by the University Hospital of Maternity and Neonatology of the Universidad Nacional de Cordoba and the Ministry of Health of the province of Chaco, as well as by the University of Guadalajara, the National Institute of Medical Sciences and Nutrition Salvador Zubirán (INNSZ) and the National Institute of Genomic Medicine (INMEGEN), were previously obtained for already collected samples from Peru (Barbieri et al. 2011, 2014; Di Corcia et al. 2017), Argentina (Sevini et al. 2013), and Mexico (Moreno-Estrada et al. 2014). On April 8, 2013, the Bioethics Committee of the University of Bologna also approved all the procedures concerning this study (within the framework of the ERC-2011-AdG 295733 project). Moreover, this study was designed and conducted according to the relevant guidelines, regulations and ethical principles for medical research involving human subjects stated by the WMA Declaration of Helsinki. All DNA samples (n = 229) were genotyped for ∼720,000 SNPs distributed along the whole genome at an average spacing of 4 kb, with the HumanOmniExpress 1.1 BeadChip (Illumina, San Diego, CA). Genotyping experiments were performed at the facilities of the Center for Biomedical Research and Technologies of the Italian Auxologic Institute.

Data Curation

Obtained genotype data were filtered using the PLINK software package v.1.07 (Purcell et al. 2007) by applying a series of QC steps to remove individuals and variants with low call rates, SNPs with ambiguous alleles and inbred individuals. More precisely, we excluded variants with missing call rates exceeding 5%, SNPs showing significant deviations from the Hardy–Weinberg equilibrium (P < 0.01) and those with ambiguous A/T or G/C strand polymorphisms. As for per-individual QC, we removed samples showing more than 2% of missing genotypes (n = 28) and/or a high degree of IBD-sharing (n = 24). In particular, we estimated inbreeding for each pair of individuals on an LD pruned data set (r2 > 0.1), by calculating the genome-wide proportion of shared alleles and we randomly excluded one individual from each pair showing an IBD coefficient >0.25. Moreover, because populations that experienced long-term isolation and small Ne are generally characterized by higher mean values of IBD, we assessed with the Grubb test (package outlier of the R software) the presence of outlier values in the IBD distribution of all possible pairs of individuals belonging to the same ethnic group. We then removed one individual from every outlier pair showing P values <0.05. A final high-density “clean” data set of 178 samples typed for 660,772 SNPs was used for merging with a reference population panel of publicly available genome-wide data from the HGDP project (Li et al. 2008), the 1000 Genomes Project (1000 Genomes Project Consortium et al. 2015) and Native American populations from Reich et al. (2012) (supplementary table S1, Supplementary Material online). Before merging procedure, we performed the same QC described above on each reference data set separately. After merging, we obtained an “extended” data set including 431 additional individuals from 50 Native American ethnic groups typed for a common set of 207,165 SNPs. This data set was used to perform the haplotype-based analyses described below, and thinned for genotype-based analyses by removing SNPs in high LD (r2 > 0.2) within a sliding window of 50 SNPs advanced by 5 SNPs at the time, as well as variants with a minor allele frequency <0.01, thus obtaining a pruned “extended” data set consisting of 96,991 SNPs.

Population Structure and Admixture Analyses

PCA were carried out on the pruned “extended” data set by using the smartpca method implemented in the EIGENSOFT package v6.0.1 (Patterson et al. 2006). PCA was first applied on all the worldwide populations to check for the presence of genotyping errors or inconsistency between the data (supplementary fig. S1, Supplementary Material online). Then, we performed PCA only on the Native American groups included in the pruned “un-admixed” data set, that is, after having checked for limited non-Native American admixture (fig. 1). Indeed, since recent events of European and African admixture may complicate the study of pre-Columbian history, we assessed the presence of non-Native American genetic components in the analyzed American populations of the pruned “extended” data set by running unsupervised clustering analyses implemented in ADMIXTURE v.1.22. (Alexander et al. 2009). First, we tested K = 2 to K = 15 clusters including 32 European, African, and Asian groups in addition to 66 American groups and we excluded all American individuals (or entire groups) showing proportions of European and African ancestry >2% and 1%, respectively, by considering K = 6 because at this number of clusters all the main non-Native American ancestral components were resolved (supplementary fig. S2, Supplementary Material online). Then, we replicated ADMIXTURE testing from K = 2 through K = 10 on the remaining 43 Amerindian populations contained in this pruned “un-admixed” data set, including also European ancestry CEU to further check for the absence of external admixture (fig. 1 and supplementary fig. S3, Supplementary Material online). For each K tested, we performed 50 independent ADMIXTURE runs with a different random seed to monitor convergence and only those with the highest log-likelihood were considered for the plots. Concurrently, we calculated cross-validation errors for each K in order to identify the most reliable number of genetic clusters concordant with the data (supplementary fig. S4, Supplementary Material online). The reliability of obtained ADMIXTURE results was further assessed applying the pong algorithm (Behr et al. 2016). This method identifies the number of modes (i.e., number of different Q matrices) present across the 50 independent runs performed for each given K and evaluates the average pairwise similarity within and between the eventual different modes. The maximum-likelihood clustering approach implemented in ADMIXTURE indeed ignores possible cases of multimodality, that is, multiple sets of membership coefficients inferred from a set of runs on the same data that may differ nontrivially as belonging to different modes (Behr et al. 2016). The ADMIXTURE performed on the “extended” data set produced highly consistent results across the different runs, and up to K = 9 pong identified no more than two modes per K, with the average similarity being always >99.9% within mode and >88% between modes. In particular, the K = 6 run that we considered for inferring European and African admixture belongs to the major mode (i.e., the one replicated in most runs) of the two identified, with a between-mode similarity of 88%. The ADMIXTURE performed on the “un-admixed” data set produced highly consistent results. No more than four modes per K were identified and the average similarity within mode and between modes was always >99.9% and >87%, respectively. More specifically, the K = 8 run reported in figure 1 belongs to the major out of four modes with a between-mode similarity of 93%. To formally test for the presence of African and European admixture on the pruned “un-admixed” data set, we also calculated f3-statistics using the qp3Pop program implemented in the ADMIXTOOLS v3.0 package (Reich et al. 2009). We considered negative statistics with Z-score values below −2 as significant signals of admixture (supplementary table S2, Supplementary Material online). The same software was used to apply an outgroup-f3 approach in order to measure the sharing of genetic drift (i.e., genetic similarity) between each pair of Native American populations (Raghavan et al. 2014; supplementary fig. S5, Supplementary Material online). In particular, we tested each possible pair of Native American groups with N ≥ 5 as sources of admixture and an outgroup population (YRI) as “target” of such an admixture. Finally, we used TreeMix v1.12 (Pickrell and Pritchard 2012) to build phylogenetic trees on the Amerindian populations present in the pruned “un-admixed” data set including CEU as root population for an additional check of further signals of European gene flow. TreeMix was first used to construct a tree without allowing any migration and then we tested sequentially 1–4 migration events. These analyses were performed both including only populations with N ≥ 5 individuals, as well as on the whole data set implementing the TreeMix sample size correction flag (supplementary figs. S6 and S7, Supplementary Material online).

Intrapopulation Genetic Structure

To explore patterns of within-population genetic variation, we calculated the extension of ROH segments and the average length of genome shared IBD. ROH were calculate on the “un-admixed” data set considering only groups with N ≥ 5 individuals (to reduce possible biases due to small sample sizes) by using the command “–homozyg” implemented in the PLINK software package v.1.07 (Purcell et al. 2007) under default parameter settings. SNPs were considered to be part of a homozygous segment, when the proportion of overlapping homozygous windows was above 5% (Anagnostou et al. 2017). Then, we considered a default Gaussian fitting of the ROH length distribution, using the Mclust function from the R package mclust V3 (Fraley and Raftery 2002), which identified three different ROH length classes (fig. 2). Patterns of IBD-sharing within populations were estimated on the phased data by using the fastIBD method implemented in the BEAGLE 3.3 software (Browning and Browning 2011). The phase of haplotypes for the “un-admixed” data set was statistically reconstructed using SHAPEIT2 v2.r790 (Delaneau et al. 2013) by applying default parameters and HapMap phase 3 recombination maps. FastIBD was run ten times for each chromosome using different random seeds. To call IBD blocks we postprocessed results with the “plus-process-fibd.py” pipeline modified by Ralph and Coop (2013) . We set the fastIBD threshold to 1e-10 and considered only blocks longer than 1 cM. As summary IBD-statistics, we computed the total length of genome shared IBD averaged over the number of possible pairs of individuals within each population (WAB metric; Atzmon et al. 2010). The average IBD-sharing was calculated for nine different bin categories (supplementary fig. S8, Supplementary Material online) (Moreno-Estrada et al. 2014).

CHROMOPAINTER and fineSTRUCTURE Analyses

To explore fine-grained population structure and define clusters of genetically homogeneous individuals, we exploited the haplotype-based approach implemented in CHROMOPAINTERv2/fineSTRUCTURE. Samples were phased with the SHAPEIT software as specified above. We applied the CHROMOPAINTERv2/fineSTRUCTURE pipeline (Lawson et al. 2012) separately, but following the same steps detailed below to 1) the “un-admixed” data set including CEU, CHB, and YRI to further control for allele sharing pattern with non-Native American populations and 2) the “un-admixed” data set including only the Meso and South American groups (fig. 3 and supplementary figs. S9–S11, Supplementary Material online). We first estimated the mutation/emission and the switch rate parameters with ten steps of the Expectation–Maximization (E–M) algorithm on a subset of chromosomes {4, 10, 15, 22} using every individual both as “donor” and “recipient.” Then, we averaged the obtained values across chromosomes (weighting by the number of markers) and individuals, and we used the estimated mutation/emission and switch rate parameters to run CHROMOPAINTER again on all chromosomes, considering a parameter k = 50 to specify the number of expected chunks to define a region. This value was suggested to be preferable compared with the default value of 100 (Leslie et al. 2015) when painting closely related populations. The obtained matrix of haplotype sharing “chunk” counts was summed up across all the 22 autosomes and submitted to the fineSTRUCTURE clustering algorithm version fs2.1 (Lawson et al. 2012). We ran fineSTRUCTURE pipeline by setting 1,000,000 “burn-in” MCMC iterations, followed by additional 2,000,000 iterations and sampling the inferred clustering patterns every 10,000 runs. Finally, we set 1,000,000 additional hill-climbing steps to improve posterior probability and merge clusters in a step-wise fashion. Individuals were hierarchically assembled into clusters until reaching the final configuration tree. We then applied the GLOBETROTTER algorithm (Hellenthal et al. 2014) in the attempt to infer dates for the admixture events between Native American groups suggested by different analyses. We run GLOBETROTTER on CHROMOPAINTER runs performed only on the Meso and South American groups of the “un-admixed” data set, grouping the samples according to the clusters identified by fineSTRUCTURE and excluding each time from the donors the cluster tested as target of admixture in GLOBETROTTER. In particular, we tried to fit the Waunana_Embera and the Wayuu clusters as target of admixture by using all the other Native American clusters of the “extended un-admixed” data set as parental proxies (supplementary fig. S12, Supplementary Material online). Moreover, we performed an additional CHROMOPAINTER/GLOBETROTTER workflow on the phased high-density “clean” data set (i.e., consisting of only our newly-typed populations, but relying on a greater number of markers), in the attempt to date the admixture events observed for the Peruvian Amazonian groups between all Andean and non-Andean possible sources present in the original data set (supplementary fig. S12, Supplementary Material online). All GLOBETROTTER runs were conducted according to guidelines reported in Hellenthal et al. (2014) and performing a first run standardizing over a null individual.

Tests for Treeness

We assessed consistency with a four-population tree topology using the functions implemented in ADMIXTOOLS v3.0 to calculate f4 (Reich et al. 2009) and D-statistics (Green et al. 2010). In particular, we tested if all SA form a consistent clade with respect to all Meso and an outgroup (CHB), considering all possible combinations of populations with f4 and D in the form (CHB, Meso; SA, SA) (supplementary table S3, Supplementary Material online). Using again CHB as outgroup, we then tested whether Meso populations were consistent with forming a clade with non-Andean SA (i.e., Amazonians and Gran Chacos) testing all possible combinations of f4 and D in the form (CHB, Andeans; Meso, non-Andean SA) (supplementary table S4, Supplementary Material online). Finally, we tested if Andean and non-Andean SA were differently related to present-day Meso groups by testing all possible combinations of f4 and D in the form (CHB, Meso; Meso, Andeans) and (CHB, Meso; Meso, non-Andean SA) (supplementary table S5, Supplementary Material online).

Admixture Graphs

To test more refined demographic hypotheses explaining relationships between Meso and SA populations, as well as within SA itself, we applied the f-statistic based modeling approach implemented in the qpGraph software of the ADMIXTOOLS v3.0 package (Reich et al. 2012). We set YRI as the outgroup and CHB as the last non-Native American population in root position to all Meso and SA considered groups. To keep the models simple, in order to avoid overfitting and neglect gene flow between closely related populations that will add unnecessary complexity (Patterson et al. 2012), we considered one population as representative of the main clusters that were identified according to TreeMix and fineSTRUCTURE. In particular, we used Northern Tepehuano, Southern Zapotec, and Mayan Tzotzil as representative of the three Mesoamerican clades and then we iteratively tested all possible combinations of one non-Andean and one Andean group as representatives of the two main South American gene pools (fig. 4 and supplementary figs. S13 and S14 and supplementary tables S6–S12, Supplementary Material online). As for the intra-SA models, we also added an additional Andean population (but not all three together to simplify the otherwise complex high intra-Andean gene flow) and one representative for each non-Andean cluster (i.e., Grand Chaco, Peruvian Amazonians, and Brazilian Amazonians) (fig. 4 and supplementary fig. S15 and supplementary table S13, Supplementary Material online). Unless otherwise specified, we considered as significant evidence of rejection the models presenting one or more outlier f-statistics (with Z-score > |3|) and significant P values (<0.05) for the nominal χ2 statistic, indicator of no evidence for a poor fit (Patterson et al. 2012). Inversely, models with no outlier f-statistics and presenting slightly nonsignificant P values (>0.01 and <0.05) are still discussed as reasonable fits in the absence of better fitting alternative models.

Data Availability

The genotype data generated during the current study is available at https://figshare.com/articles/South_American_dataset_Gnecchi-Ruscone_et_al_2019_/7667174.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.

15 in total

1. Genomic insight into the origins and dispersal of the Brazilian coastal natives.

Authors: Marcos Araújo Castro E Silva; Kelly Nunes; Renan Barbosa Lemes; Àlex Mas-Sandoval; Carlos Eduardo Guerra Amorim; Jose Eduardo Krieger; José Geraldo Mill; Francisco Mauro Salzano; Maria Cátira Bortolini; Alexandre da Costa Pereira; David Comas; Tábita Hünemeier
Journal: Proc Natl Acad Sci U S A Date: 2020-01-13 Impact factor: 11.205

Review 2. Generations of genomes: advances in paleogenomics technology and engagement for Indigenous people of the Americas.

Authors: Krystal S Tsosie; Rene L Begay; Keolu Fox; Nanibaa' A Garrison
Journal: Curr Opin Genet Dev Date: 2020-07-25 Impact factor: 5.578

3. Archaeogenomic distinctiveness of the Isthmo-Colombian area.

Authors: Marco Rosario Capodiferro; Bethany Aram; Alessandro Raveane; Nicola Rambaldi Migliore; Giulia Colombo; Linda Ongaro; Javier Rivera; Tomás Mendizábal; Iosvany Hernández-Mora; Maribel Tribaldos; Ugo Alessandro Perego; Hongjie Li; Christiana Lyn Scheib; Alessandra Modi; Alberto Gòmez-Carballa; Viola Grugni; Gianluca Lombardo; Garrett Hellenthal; Juan Miguel Pascale; Francesco Bertolini; Gaetano Salvatore Grieco; Cristina Cereda; Martina Lari; David Caramelli; Luca Pagani; Mait Metspalu; Ronny Friedrich; Corina Knipper; Anna Olivieri; Antonio Salas; Richard Cooke; Francesco Montinaro; Jorge Motta; Antonio Torroni; Juan Guillermo Martín; Ornella Semino; Ripan Singh Malhi; Alessandro Achilli
Journal: Cell Date: 2021-03-23 Impact factor: 41.582

4. Uniparental Lineages from the Oldest Indigenous Population of Ecuador: The Tsachilas.

Authors: Tullia Di Corcia; Giuseppina Scano; Cristina Martínez-Labarga; Stefania Sarno; Sara De Fanti; Donata Luiselli; Olga Rickards
Journal: Genes (Basel) Date: 2021-08-20 Impact factor: 4.096

5. The genetic structure and adaptation of Andean highlanders and Amazonians are influenced by the interplay between geography and culture.

Authors: Víctor Borda; Isabela Alvim; Marla Mendes; Carolina Silva-Carvalho; Giordano B Soares-Souza; Thiago P Leal; Vinicius Furlan; Marilia O Scliar; Roxana Zamudio; Camila Zolini; Gilderlanio S Araújo; Marcelo R Luizon; Carlos Padilla; Omar Cáceres; Kelly Levano; César Sánchez; Omar Trujillo; Pedro O Flores-Villanueva; Michael Dean; Silvia Fuselli; Moara Machado; Pedro E Romero; Francesca Tassi; Meredith Yeager; Timothy D O'Connor; Robert H Gilman; Eduardo Tarazona-Santos; Heinner Guio
Journal: Proc Natl Acad Sci U S A Date: 2020-12-04 Impact factor: 12.779

6. Dietary, Cultural, and Pathogens-Related Selective Pressures Shaped Differential Adaptive Evolution among Native Mexican Populations.

Authors: Claudia Ojeda-Granados; Paolo Abondio; Alice Setti; Stefania Sarno; Guido Alberto Gnecchi-Ruscone; Eduardo González-Orozco; Sara De Fanti; Andres Jiménez-Kaufmann; Héctor Rangel-Villalobos; Andrés Moreno-Estrada; Marco Sazzini
Journal: Mol Biol Evol Date: 2022-01-07 Impact factor: 16.240

7. Association of IL18 genetic polymorphisms with Chagas disease in Latin American populations.

Authors: Mariana Strauss; Marialbert Acosta-Herrera; Alexia Alcaraz; Desiré Casares-Marfil; Pau Bosch-Nicolau; María Silvina Lo Presti; Israel Molina; Clara Isabel González; Javier Martín
Journal: PLoS Negl Trop Dis Date: 2019-11-21

8. Exome Sequencing of Native Populations From the Amazon Reveals Patterns on the Peopling of South America.

Authors: André M Ribeiro-Dos-Santos; Amanda Ferreira Vidal; Tatiana Vinasco-Sandoval; João Guerreiro; Sidney Santos; Ândrea Ribeiro-Dos-Santos; Sandro J de Souza
Journal: Front Genet Date: 2020-10-29 Impact factor: 4.599

9. Demographic history and selection at HLA loci in Native Americans.

Authors: Richard M Single; Diogo Meyer; Kelly Nunes; Rodrigo Santos Francisco; Tábita Hünemeier; Martin Maiers; Carolyn K Hurley; Gabriel Bedoya; Carla Gallo; Ana Magdalena Hurtado; Elena Llop; Maria Luiza Petzl-Erler; Giovanni Poletti; Francisco Rothhammer; Luiza Tsuneto; William Klitz; Andrés Ruiz-Linares
Journal: PLoS One Date: 2020-11-04 Impact factor: 3.240

10. Population Histories and Genomic Diversity of South American Natives.

Authors: Marcos Araújo Castro E Silva; Tiago Ferraz; Cainã M Couto-Silva; Renan B Lemes; Kelly Nunes; David Comas; Tábita Hünemeier
Journal: Mol Biol Evol Date: 2022-01-07 Impact factor: 16.240