Literature DB >> 31036896

The genetic history of admixture across inner Eurasia.

Choongwon Jeong^1,2,3, Oleg Balanovsky^4,5, Elena Lukianova⁴, Nurzhibek Kahbatkyzy^6,7, Pavel Flegontov^8,9, Valery Zaporozhchenko^4,5, Alexander Immel¹⁰, Chuan-Chao Wang^10,11, Olzhas Ixan⁶, Elmira Khussainova⁶, Bakhytzhan Bekmanov^6,7, Victor Zaibert¹², Maria Lavryashina¹³, Elvira Pocheshkhova¹⁴, Yuldash Yusupov¹⁵, Anastasiya Agdzhoyan^4,5, Sergey Koshel¹⁶, Andrei Bukin¹⁷, Pagbajabyn Nymadawa¹⁸, Shahlo Turdikulova¹⁹, Dilbar Dalimova¹⁹, Mikhail Churnosov²⁰, Roza Skhalyakho⁵, Denis Daragan⁵, Yuri Bogunov^4,5, Anna Bogunova⁵, Alexandr Shtrunov⁵, Nadezhda Dubova²¹, Maxat Zhabagin^22,23, Levon Yepiskoposyan²⁴, Vladimir Churakov²⁵, Nikolay Pislegin²⁵, Larissa Damba²⁶, Ludmila Saroyants²⁷, Khadizhat Dibirova^4,5, Lubov Atramentova²⁸, Olga Utevska²⁸, Eldar Idrisov²⁹, Evgeniya Kamenshchikova⁵, Irina Evseeva³⁰, Mait Metspalu³¹, Alan K Outram³², Martine Robbeets³³, Leyla Djansugurova^6,7, Elena Balanovska⁵, Stephan Schiffels¹⁰, Wolfgang Haak¹⁰, David Reich^34,35, Johannes Krause³⁶.

Abstract

The indigenous populations of inner Eurasia-a huge geographic region covering the central Eurasian steppe and the northern Eurasian taiga and tundra-harbour tremendous diversity in their genes, cultures and languages. In this study, we report novel genome-wide data for 763 individuals from Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine and Uzbekistan. We furthermore report additional damage-reduced genome-wide data of two previously published individuals from the Eneolithic Botai culture in Kazakhstan (~5,400 BP). We find that present-day inner Eurasian populations are structured into three distinct admixture clines stretching between various western and eastern Eurasian ancestries, mirroring geography. The Botai and more recent ancient genomes from Siberia show a decrease in contributions from so-called 'ancient North Eurasian' ancestry over time, which is detectable only in the northern-most 'forest-tundra' cline. The intermediate 'steppe-forest' cline descends from the Late Bronze Age steppe ancestries, while the 'southern steppe' cline further to the south shows a strong West/South Asian influence. Ancient genomes suggest a northward spread of the southern steppe cline in Central Asia during the first millennium BC. Finally, the genetic structure of Caucasus populations highlights a role of the Caucasus Mountains as a barrier to gene flow and suggests a post-Neolithic gene flow into North Caucasus populations from the steppe.

Entities: Chemical

Mesh：

Year: 2019 PMID： 31036896 PMCID： PMC6542712 DOI： 10.1038/s41559-019-0878-2

Source DB: PubMed Journal: Nat Ecol Evol ISSN： 2397-334X Impact factor: 15.460

Present-day human population structure is often marked by a correlation between geographic and genetic distances1,2, reflecting continuous gene flow among neighboring groups, a process known as “isolation by distance”. However, there are also striking failures of this model, whereby geographically proximate populations can be quite distantly related. Such barriers to gene flow often correspond to major geographic features, such as the Himalayas3 or the Caucasus Mountains4. Many cases also suggest the presence of social barriers to gene flow. For example, early Neolithic farming populations in Central Europe show a remarkable genetic homogeneity suggesting minimal genetic exchange with local hunter-gatherer populations through the initial expansion; mixing of these two gene pools became evident only after thousands of years in the middle Neolithic5. Present-day Lebanese populations provide another example by showing a population stratification reflecting their religious community6. There are also examples of geographically very distant populations that are closely related: for example, people buried in association with artifacts of the Yamnaya horizon in the Pontic-Caspian steppe and the contemporaneous Afanasievo culture 3,000 km east in the Altai-Sayan Mountains7,8. The vast region of the Eurasian inland (“inner Eurasia” herein) is split into distinct ecoregions, such as the Eurasian steppe in central Eurasia, boreal forests (taiga) in northern Eurasia, and the Arctic tundra at the periphery of the Arctic Ocean (Fig. 1). These ecoregions stretch in an east-west direction within relatively narrow north-south bands. Various cultural features show a distribution that broadly mirrors the eco-geographic distinction in inner Eurasia. For example, indigenous peoples of the Eurasian steppe traditionally practice nomadic pastoralism9,10, while northern Eurasian peoples in the taiga mainly rely on reindeer herding and hunting11. The subsistence strategies in each of these ecoregions are often considered to be adaptations to the local environments12.

Fig. 1

Geographic locations of the Eneolithic Botai site (red triangle), 65 groups including newly sampled individuals (filled diamonds) and nearby groups with published data (filled squares).

Mean latitude and longitude values across all individuals under each group label were used. Two zoom-in plots for the Caucasus (blue) and the Altai-Sayan (magenta) regions are presented in the lower left corner. A list of new groups, their three-letter codes, and the number of new individuals (in parenthesis) are provided at the bottom. Present-day populations are color-coded based on the language family for Figs. 1-3, following key codes listed in Fig. 2. Corresponding information for the previously published groups is provided in Supplementary Table 2. The map is overlayed with ecoregional information, divided into 14 biomes, downloaded from https://ecoregions2017.appspot.com/ (credited to Ecoregions 2017 © Resolve). The main inner Eurasian map is on the Albers equal area projection and was produced using the spTransform function in the R package rgdal v1.2-5.

At present there is limited information about how environmental and cultural influences are mirrored in the genetic structure of inner Eurasians. Recent genome-wide studies of inner Eurasians mostly focused on detecting and dating genetic admixture in individual populations13–16. So far only three studies have reported recent genetic sharing between geographically distant populations based on the analysis of “identity-by-descent” segments13,17,18. One study reports a long-distance extra genetic sharing between Turkic populations based on a detailed comparison between Turkic-speaking groups and their non-Turkic neighbors13. The other two studies extend this approach to some Uralic and Yeniseian-speaking populations17,18. However, a comprehensive spatial genetic analysis of inner Eurasian populations is still lacking. Ancient DNA studies have already shown that human populations of this region have dramatically transformed over time. For example, the Upper Paleolithic genomes from the Mal’ta and Afontova Gora sites in southern Siberia revealed a genetic profile, often called “Ancient North Eurasians (ANE)”, which is deeply related to Paleolithic/Mesolithic hunter-gatherers in Europe and also substantially contributed to the gene pools of present-day Native Americans, Siberians, Europeans and South Asians19,20. Studies of Bronze Age steppe populations found the appearance of additional Western Eurasian-related ancestries across the steppe from the Pontic-Caspian to the Altai-Sayan regions, here we collectively refer to as “Western Steppe Herders (WSH)”: the earlier populations associated with the Yamnaya and Afanasievo cultures (often called “steppe Early and Middle Bronze Age”; “steppe_EMBA”) and the later ones associated with many cultures such as Potapovka, Sintashta, Srubnaya and Andronovo to name a few (often called “steppe Middle and Late Bronze Age”; “steppe_MLBA”)8. The steppe_MLBA gene pool was largely descended from the preceding steppe_EMBA gene pool, with a substantial contribution from Late Neolithic Europeans.21 Also, recent archaeogenetic studies trace multiple large-scale trans-Eurasian migrations over the last several millennia using ancient inner Eurasian genomes22,23, including individuals from the Eneolithic Botai culture in northern Kazakhstan in the 4th millennium BC24. These studies now provide a rich context to interpret present-day population structure of inner Eurasians and to characterize ancient admixtures in fine resolution. In this study, we analyzed newly produced genome-wide data for 763 individuals belonging to 60 self-reported ethnic groups to provide a dense portrait of the genetic structure of inner Eurasians. We also produced damage-reduced genome-wide data of two ancient Botai individuals, whose genome-wide data were recently published23, to explore the genetic structure of pre-Bronze Age populations in inner Eurasia (Table 1). We aimed at characterizing the genetic composition of inner Eurasians in fine resolution by applying both allele frequency- and haplotype-based methods. Based on the fine-scale genetic profile, we further explored if and where the barriers and conduits of gene flow exist in inner Eurasia.

Table 1

Sequencing statistics and radiocarbon dates of two Eneolithic Botai individuals analyzed in this study.

For Botai individuals we produced additional data, we provide corresponding individual ID from a previous publication23 (“Published ID”), radiocarbon date, the number of total reads sequenced, mean autosomal coverage for the 1240K target sites, the number of SNPs covered at least once for the 1240K and HumanOrigins panels, uniparental haplogroup and contamination estimates.

ID	Published ID	Genetic Sex	Uncal.¹⁴C Date	Cal.¹⁴C Date(2-sigma)b	# of reads sequenced	Mean autosomal coverage	# of SNPs coveredc	MT / Y haplogroup	MT.contd	X.conte
TU45	BOT14	M	4620 ± 80a	3632-3100cal. BCE	84,170,835	0.827x	169,053(77,363)	K1b2 / R1b1a1	0.02(0.01-0.03)	0.0122(0.0050)
BKZ001	BOT2016	F	4660 ± 25	3517-3367cal. BCE	69,678,735	2.420x	825,332(432,078)	Z1 / NA	0.01(0.00-0.02)	NA

The uncalibrated date of TU45 was published in Levine (1999) under the ID OxA-431670.

The calibrated 14C dates are calculated based on uncalibrated dates, by the OxCal v4.3.2 program71 using the INTCAL13 atmospheric curve72.

The number of SNPs in the 1240K panel (out of 1,233,013) or autosomal SNPs in the HumanOrigins array (out of 581,230; within the parenthesis) covered at least by one read. Only transversion SNPs are considered for the non-UDG libraries (both of the TU45 libraries, one of two BKZ001 libraries).

The contamination rate of mitochondrial reads estimated by the Schmutzi program (95% confidence interval in parentheses)

The nuclear contamination rate for the male (TU45) estimated based on X chromosome data by ANGSD software (standard error in parentheses)

Results

Present-day Inner Eurasians form distinct east-west genetic clines mirroring geography

We generated genome-wide genotype data of 763 participants who represent a majority of large ethnic groups in Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine, and Uzbekistan (Fig. 1 and Table S1). We merged new data with published data of present-day20,25,26 and ancient individuals3,8,19–23,27–42 (Table S2). The final data set covers 581,230 autosomal single nucleotide polymorphisms (SNPs) in the Affymetrix Axiom® Genome-wide Human Origins 1 (“HumanOrigins”) array platform43. In a Principal Component Analysis (PCA) of Eurasian individuals, we find that PC1 separates eastern and western Eurasian populations, PC2 splits eastern Eurasians along a north-south cline, and PC3 captures variation in western Eurasians with Caucasus and northeastern European populations at opposite ends (Fig. 2a and Supplementary Figs. 1-2). Inner Eurasians are scattered across PC1 in between, mirroring their geographic locations. Strikingly, they seem to be structured into three distinct west-east genetic clines running between different western and eastern Eurasian groups, instead of being evenly spaced in PC space. The uppermost cline, composed of individuals from northern Eurasia, mostly speaking Uralic or Yeniseian languages, connects northeast Europeans and the Uralic (Samoyedic) speaking Nganasans from northern Siberia. The other two lower clines are occupied by individuals from the Eurasian steppe, mostly speaking Turkic and Mongolic languages. Both clines run into Turkic/Mongolic-speaking populations in southern Siberia and Mongolia, and further into Tungusic-speaking populations in Manchuria and the Russian Far East in the East; however, they diverge in the west, one heading to the Caucasus and the other heading to populations of the Volga-Ural area (Fig. 2 and Supplementary Fig. 2). Four groups, Daur, Mongola, Tu and Dungans, are located alongside other East Asian populations and displaced from the three inner Eurasian clines.

Fig. 2

The genetic structure of inner Eurasian populations.

(a) The first two PCs of 2,077 Eurasian individuals separate western and eastern Eurasians (PC1) and Northeast and Southeast Asians (PC2). Most inner Eurasians are located between western and eastern Eurasians on PC1. Ancient individuals (color-filled shapes) are projected onto PCs calculated based on contemporary individuals. Present-day individuals are marked by grey dots, with their per-group mean coordinates marked by three-letter codes listed in Supplementary Table 2. Individuals are colored by their language family. (b) ADMIXTURE results for a chosen set of ancient and present-day groups (K = 14). The top row shows ancient inner Eurasians and representative present-day eastern Eurasians. The following three rows show forest-tundra, steppe-forest and southern steppe cline populations. Most inner Eurasians are modeled as a mixture of components primarily found in eastern or western Eurasians. Results for the full set of individuals are provided in Supplementary Fig. 3.

A model-based clustering analysis using ADMIXTURE shows a similar pattern (Fig. 2b and Supplementary Fig. 3). Overall, the proportions of ancestry components associated with eastern or western Eurasians are well correlated with longitude in inner Eurasians (Fig. 3). Notable outliers include known historical migrants such as Kalmyks, Nogais and Dungans. The Uralic- and Yeniseian-speaking populations, as well as Russians from multiple locations, derive most of their eastern Eurasian ancestry from a component most enriched in Nganasans, while Turkic/Mongolic-speakers have this component together with another component most enriched in populations from the Russian Far East, such as Ulchi and Nivkh (Supplementary Fig. 3). Turkic/Mongolic-speakers comprising the bottom-most cline have a distinct western Eurasian ancestry profile: they have a high proportion of a component most enriched in Mesolithic Caucasus hunter-gatherers (“CHG”)30 and Neolithic Iranians (“Iran_N”)20 and frequently harbor another component enriched in present-day South Asians (Supplementary Fig. 4). Based on the PCA and ADMIXTURE results, we heuristically assign inner Eurasians into three clines: the “forest-tundra” cline includes Russians and all Uralic- and Yeniseian-speakers, the “steppe-forest” cline includes Turkic- and Mongolic-speaking populations from the Volga and the Altai-Sayan regions and southern Siberia, and the “southern steppe” cline includes the rest of populations. We separate four groups (Daur, Mongola, Tu and Dungans) as “others” (Supplementary Table 2).

Fig. 3

Correlation of longitude and ancestry proportion across inner Eurasian populations.

Across inner Eurasian populations, mean longitudinal coordinates (x-axis) and mean eastern Eurasian ancestry proportions (y-axis) are strongly correlated. Eastern Eurasian ancestry proportions are estimated from ADMIXTURE results with K=14 by summing up six components maximized in Surui, Chipewyan, Itelmen, Nganasan, Atayal and early Neolithic Russian Far East individuals (“Devil’s Gate”), respectively (Supplementary Fig. 3). The yellow curve shows a probit regression fit following the model in Sedghifar et al.69. Three groups (Kalmyks, Dungans, Nogai2) are marked with grey square due to their substantial deviation from the curve as well as their historically known migration history.

The genetic barriers splitting the inner Eurasians are also found in the EEMS (“estimated effective migration surface”) analysis44 (Supplementary Fig. 5). Inferred barriers to gene flow are often co-localized with geographic features or genetic gaps. We observe a barrier overlapping with the Urals, one separating Beringian populations from the rest, one separating southern Siberians from central and northern Siberians, and one separating Caucasus populations from those further to the north. The southern Siberian barrier matches with our distinction between the steppe-forest and forest-tundra populations, with the exception of two northern-most Turkic speaking populations, Yakuts and Dolgans. The Caucasus barrier also matches with our distinction between the southern steppe and steppe-forest populations. A local EEMS analysis on the Caucasus shows fine-scale barriers and conduits of gene flow, matching with the fine-scale structure within Caucasus populations (Supplementary Note 1).

High-resolution tests of admixture distinguish the genetic profile of source populations in the inner Eurasian clines

We performed both allele frequency-based three-population (f) tests and a haplotype-sharing-based GLOBETROTTER analysis to characterize the admixed gene pools of inner Eurasian groups. For these group-based analyses, we manually removed 87 outliers based on PCA results (Supplementary Table 1). We also split a few inner Eurasian groups showing genetic heterogeneity into subgroups based on PCA results and their sampling locations (Supplementary Table 1). This was done to minimize false positive admixture signals. Including two Aleut populations as positive control targets, we chose a total of 73 groups as the targets of admixture tests and another 260 groups (167 present-day and 93 ancient groups) as the “sources” to represent world-wide genetic diversity (Supplementary Table 2). Testing all possible pairs of 167 present-day “source” groups as references, we detect highly significant f statistics for 66 of 73 targets (< -3 SE; standard error; Supplementary Table 3). Negative f values mean that allele frequencies of the target group are on average intermediate between those of the references, providing unambiguous evidence that the target population is a mixture of groups related, perhaps deeply, to the source populations.43 Extending the references to include 93 ancient groups, the remaining seven groups also have small f statistics around zero (-5.1 SE to +2.7 SE). Reference pairs with the most negative f statistics for the most part involve one eastern and one western Eurasian groups supporting the qualitative impression of east-west admixture from PCA and ADMIXTURE analysis. To highlight the difference between the distinct inner Eurasian clines, we looked into f results with representative reference pairs comprising two ancient western (Srubnaya to represent MLBA_steppe ancestry21 and Chalcolithic Iranians (“Iran_ChL”) to represent West/South Asian-related ancestry20; Supplementary Table 1) and three eastern Eurasian groups (Mixe, Nganasan and Ulchi). In the southern steppe cline populations, reference pairs with Chalcolithic Iranians tend to produce more negative f statistics than those with Srubnaya while the opposite pattern is uniformly observed for the steppe-forest and forest-tundra populations (Fig. 4a). Reference pairs with Nganasans mostly result in more negative f statistic than those with Ulchi in the forest-tundra populations, but the opposite pattern is dominant in the southern steppe populations. The steppe-forest cline populations show an intermediate pattern: seven northern groups (Chuvash, Bashkir_north, Tatar_Zabolotniye, Todzin, Tofalar, Dolgan and Yakut) have more negative f with Nganasans while the others have more negative f with Ulchi. Most of these seven groups are also upward-shifted in PCA toward the forest-tundra cline, suggesting a cross-talk between two clines.

Fig. 4

Characterization of the western and eastern Eurasian source ancestries in inner Eurasian populations.

(a) Admixture f values are compared for different eastern Eurasian references (Mixe, Nganasan, Ulchi; left) or western Eurasian ones (Srubnaya, Iran_ChL; right). For each target group, darker shades mark more negative f values. (b) Weights of donor populations in two sources characterizing the main admixture signal (“date 1 PC 1”) in the GLOBETROTTER analysis. We merged 167 donor populations into 12 groups, as listed on the top right side. Target populations are split into five groups: Aleuts, the forest-tundra cline populations, the steppe-forest cline populations, the southern steppe cline populations and the rest of four populations (“others”), from the top to bottom.

To perform a higher resolution characterization of the admixture landscape, we performed a haplotype-based GLOBETROTTER analysis. We took a “regional” approach, meaning that all 73 target groups were modeled as a patchwork of haplotypes from the 167 reference groups but not those from any target. The goal of this approach was to minimize false negative results due to sharing of admixture history between targets. All 73 targets show a robust signal of admixture: i.e. a correlation of ancestry status shows a distinct pattern of decay over genetic distance in all bootstrap replicates (bootstrap p < 0.01 for all 73 targets; Supplementary Table 4). When the relative contribution of references, categorized to 12 groups (Supplementary Table 2), into the two main sources of the admixture signal (“date 1 PC 1”) is considered, we observe a pattern comparable to PCA, ADMIXTURE and f results (Fig. 4b). The European references provide a major contribution for the western Eurasian-related source in the forest-tundra and steppe-forest populations while the Caucasus/Iranian references do so in the southern steppe populations. Similarly, Siberian references make the highest contribution to the eastern Eurasian-related source in the forest-tundra populations, followed by the steppe-forest and southern steppe ones. Admixture date estimates from GLOBETROTTER range 7-55 generations (200-1600 BP; years before present; using 29 years per generation45; Supplementary Fig. 6 and Supplementary Note 2). These match with previous reports using similar methodologies13, but much younger observed admixtures in the Late Bronze and Iron Ages8,39.

Admixture modeling of inner Eurasians shows multiple different temporal layers for present-day admixture clines

Using F-statistic-based approaches, we show that the Eneolithic Botai gene pool was closely related to the ANE ancestry and substantially contributed to the later Okunevo individuals (Supplementary Note 3). To test if this ancient layer left a genetic legacy in later populations of inner Eurasia, we systematically explored diverse qpAdm-based admixture models to inner Eurasian populations. Two-way mixture of Ulchi/Nganasan and Srubnaya approximates the steppe-forest populations surprisingly well (χ ≥ 0.05 and ≥ 0.01 for 12/24 and 18/24 populations, respectively; Supplementary Table 5). A more complex three-way model of Ulchi+Srubnaya+AG3 fits all steppe-forest populations (χ ≥ 0.05 for 24/24 populations; Fig. 5 and Supplementary Table 5). Similarly, Nganasan+Srubnaya+AG3 provides a good fit to most populations, but with negative contribution from AG3 (χ ≥ 0.05 for 19/24 populations). We interpret this as reflecting a minor heterogeneity in the eastern Eurasian source, with average affinity to the ANE ancestry is intermediate between Ulchi and Nganasan. Based on this admixture modeling, we suggest that the steppe-forest cline does not keep a detectable level of contribution from the older clines, the sources of which have higher ANE ancestry in both western and eastern Eurasian parts.

Fig. 5

qpAdm-based admixture models for the forest-tundra and steppe-forest cline populations.

For the forest-tundra population to the west of the Urals, Nganasan+Srubnaya+WHG+LBK_EN or its submodel provides a good fit, while additional ANE-related contribution (AG3) is required for those to the east of the Urals (Enets, Selkups, Kets, and Mansi). For the steppe-forest populations, Srubnaya+Ulchi, Srubnaya+Ulchi+AG3, or Srubnaya+Nganasan provides a good fit. 5 cM jackknifing standard errors are marked by the horizontal bar. Models with p-value between 0.01 and 0.05 are marked by grey color and those with p-value < 0.01 are marked by grey color and italic font. Details of the model information are presented in Supplementary Tables 5 and 8.

In contrast, the southern steppe populations do not match with the Ulchi+Srubnaya model (χ ≤ 1.34×10-7; Supplementary Table 6). Adding Chalcolithic Iranians as the third ancestry significantly improves model fit with substantial contribution from them (χ ≤ 5.10×10-5 with 7.0-64.6% contribution; Fig. 5 and Supplementary Table 6), although the three-way model still does not adequately explain data. Ancient individuals from the Tian Shan region22, dated to 2,200-1,100 BP, show a similar pattern (Supplementary Table 7). However, older individuals from Central Kazakhstan dated to 2,500 BP (“Saka_Kazakhstan_2500BP”)22 are adequately modeled as Nganasan+Srubnaya or Ulchi+Srubnaya+AG3 (χ = 0.057 and 0.824, respectively; Supplementary Table 7). For the forest-tundra populations, the Nganasan+Srubnaya model is adequate only for the two Volga region populations, Udmurts and Besermyans (Fig. 5 and Supplementary Table 8). For the other populations west of the Urals, six from the northeastern corner of Europe are modeled with additional Mesolithic western European hunter-gatherers (“WHG”) contribution (8.2-11.4%; Supplementary Table 8), while the rest need both WHG and early Neolithic European farmers (EEF; represented by “LBK_EN”; Supplementary Table 2)5,21. Nganasan-related ancestry substantially contributes to their gene pools and cannot be removed from the model without a significant decrease in model fit (4.1% to 29.0% contribution; χ2 p ≤ 1.68×10-5; Supplementary Table 8). For the four populations east of the Urals (Enets, Selkups, Kets and Mansi), for which the above models are not adequate, Nganasan+Srubnaya+AG3 provide a good fit (χ2 p ≥ 0.018; Fig. 5 and Supplementary Table 8). Substituting Nganasan to early Bronze Age populations from the Baikal Lake region (“Baikal_EBA”; Supplementary Table 2)23, the two-way model of Baikal_EBA+Srubnaya provides a reasonable fit (χ ≥ 0.016; Supplementary Table 8) and three-way model of Baikal_EBA+Srubnaya+AG3 are adequate but with negative AG3 contribution for Enets and Mansi (χ ≥ 0.460; Supplementary Table 8). Bronze/Iron Age populations from southern Siberia also show a similar ancestry composition with high ANE affinity (Supplementary Table 9). The additional ANE contribution beyond the Nganasan+Srubnaya model suggests a legacy from ANE-ancestry-rich clines prior to Late Bronze Age.

Discussion

In this study, we analyzed new genome-wide data of indigenous peoples from inner Eurasia, providing a dense representation for human genetic diversity in this vast region. Our finding of inner Eurasian populations being structured into three largely distinct clines shows a striking correlation between genes, geography and language (Figs. 1-2). Ecoregion-wide, the three clines match boreal forests and tundra, forest-steppe zone and steppe/shrub-land further to the south, respectively. Language-wide, they match the distribution of the Uralic, northern and southern Turkic-speaking languages. We acknowledge that the distinction of three clines is far from complete and that there are cases of intermediate patterns. For example, Turkic- and Uralic-speakers from the Volga region are genetically quite similar, but the Uralic speakers still have extra affinity with the Uralic speakers further to the east (e.g. Nganasans; Supplementary Fig. 4b). Likewise, a number of Turkic-speaking populations (e.g. Dolgans, Todzins, Tofalars and Tatar_Zabolotniye), living at the periphery or even inside of the taiga belt, do show a genetic influence from the forest-tundra cline (Fig. 4). It may be viewed that our sampling scheme is not uniform geographically, although gathering the vast majority of ethnic groups and quite dense geographically. Indeed, the gaps between distinct genetic clines (with only a few groups located in between) tend to correspond to the gaps in sampling locations (Fig. 1-2). Although this non-uniformity of sampling largely results from the non-uniformity in the density of (language-defined) ethnic groups, it is important to organize a future study for further sampling on sparsely populated regions between the clines (e.g. central Kazakhstan or East Siberia). The steppe cline populations derive their eastern Eurasian ancestry from a gene pool similar to contemporary Tungusic speakers from the Amur river basin (Figs. 2 and 4), thus suggesting a genetic connection among the speakers of languages belonging to the Altaic macrofamily (Turkic, Mongolic and Tungusic families). Based on our results as well as early Neolithic genomes from the Russian Far East38, we speculate that such a gene pool may represent the genetic profile of prehistoric hunter-gatherers in the Amur river basin. On the other hand, a distinct Nganasan-related eastern Eurasian ancestry in the forest-tundra cline suggests a substantial separation between these two eastern ancestries. Nganasans have high genetic affinity with prehistoric individuals with the “ANE” ancestry in North Eurasia, such as the Upper Paleolithic Siberians or the Mesolithic EHG, which is exceeded only by Native Americans and by Beringians among eastern Eurasians (Supplementary Fig. 7). Also, Northeast Asians are closer to Nganasans than they are to either Beringians, Native Americans or ancient Baikal populations, and the ANE affinity in East Asians is correlated well with their affinity with Nganasans (Supplementary Fig. 8). We hypothesize that Nganasans may be relatively isolated descendants of a prehistoric Siberian meta-population with high ANE affinity, which formed present-day Northeast Asians by mixing with populations related to the Neolithic Northeast Asians38. Forest-tundra populations to the east of the Urals, such as Selkups and Kets, show excess ANE affinity, suggesting a legacy from the ANE-ancestry-rich pre-Bronze Age gene pools (Supplementary Table 8). In contrast, admixture modeling finds that no contemporary steppe-forest cline population is required to have additional ANE ancestry beyond what a mixture model of Bronze Age steppe plus present-day Eastern Eurasians can explain (Supplementary Table 5). This suggests that both western and eastern Eurasian ancestries of the steppe-forest populations are largely inherited from later gene flows since Late Bronze Age: Srubnaya-like WSH ancestry for the western Eurasian part and present-day Tungusic speaker-related ancestry for the eastern Eurasian part. Additional ancient genomes from Siberia will be critical to reconstruct changes in the ANE-related ancestries in Siberia over time and to understand the formation of Nganasan gene pool. The southern steppe populations differentiate from the steppe-forest ones to the north by having a strong genetic affinity broadly to West/ South Asian ancestries (Supplementary Fig. 4 and Supplementary Table 6). Ancient Tian Shan populations dating back up to 2,200 BP show the same property (Supplementary Table 7), while Sintashta culture-related WSH ancestry was widely reported in this region during the Late Bronze Age46. Together with the lack of West/South Asian affinity in the Saka culture individuals in Kazakhstan around 2,500 BP (Supplementary Table 7), we suggest a northward influx of West/South Asian-related ancestry into the Tian Shan region during the first half of the first millennium BC and into Kazakhstan further to the north slightly later. It will be extremely important to expand the set of available ancient genomes across inner Eurasia. Inner Eurasia has functioned as a conduit for human migration and cultural transfer since the first appearance of modern humans in this region. As a result, we observe deep sharing of genes between western and eastern Eurasian populations in multiple layers: the Pleistocene ANE ancestry in Mesolithic EHG and contemporary Native Americans, Bronze Age steppe ancestry from Europe to Mongolia, and Nganasan-related ancestry extending from western Siberia into Eastern Europe. More recent historical migrations, such as the westward expansions of Turkic and Mongolic groups, further complicate genomic signatures of admixture and have overwritten those from older events. Ancient genomes of Iron Age steppe individuals, already showing signatures of west-east admixture in the 5th to 2nd century BC39, provide further direct evidence for the hidden old layers of admixture, which is often difficult to appreciate from present-day populations as shown in our finding of a discrepancy between the estimates of admixture dates from contemporary individuals and those from ancient genomes.

Methods

Study participants and genotyping

We collected samples from 763 participants from nine countries (Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine, and Uzbekistan). The sampling strategy included sampling a majority of large ethnic groups in the studied countries. Within groups, we sampled subgroups if they were known to speak different dialects; for ethnic groups with large area, we sampled within several districts across the area. We sampled individuals whose grandparents were all self-identified members of the given ethnic groups and were born within the studied district(s). Most of the ethnic Russian samples were collected from indigenous Russian areas (present-day Central Russia) and had been stored for years in the Estonian Biocenter; samples from Mongolia, Tajikistan, Uzbekistan, and Ukraine were collected partially in the framework of the Genographic project. Most DNA samples were extracted from venous blood via the phenol-chloroform method. For this study we identified 112 subgroups (belonging to 60 ethnic group labels) which were not previously genotyped on the Affymetrix Axiom® Genome-wide Human Origins 1 (“HumanOrigins”) array platform43 and selected on average 7 individuals per subgroup (Fig. 1 and Supplementary Table 1). Genome-wide genotyping experiments were performed on the HumanOrigins array platform. We removed 18 individuals from further analysis either due to high genotype missing rate (> 0.05; n=2) or due to being outliers in principal component analysis (PCA) relative to other individuals from the same group (n=16). The remaining 745 individuals assigned to 60 group labels were merged to published HumanOrigins data sets of world-wide contemporary populations20 and of four Siberian ethnic groups (Enets, Kets, Nganasans and Selkups)25. Diploid genotype data of six contemporary individuals (two Saami, two Sherpa and two Tibetans) were obtained from the Simons Genome Diversity Panel data set26. We also added ancient individuals from published studies3,8,19–23,27–42, by randomly sampling a single allele for 581,230 autosomal single nucleotide polymorphisms (SNPs) in the HumanOrigins array (Supplementary Table 2).

Sequencing of the ancient Botai genomes

We extracted genomic DNA from four skeletal remains belonging to two individuals and built sequencing libraries either with no uracil-DNA glycosylase (UDG) treatment or with partial treatment following published protocols47,48 (Table 1). Radiocarbon dating of BKZ001 was conducted by the CEZ Archaeometry gGmbH (Mannheim, Germany) for one of two bone samples used for DNA extraction. All libraries were barcoded with two library-specific 8-mer indices49. The samples were manipulated in dedicated clean room facilities at the University of Tübingen or at the Max Planck Institute for the Science of Human History (MPI-SHH). Indexed libraries were enriched for about 1.24 million informative nuclear SNPs using the in-solution capture method (“1240K capture”)5,21. Libraries were sequenced on the Illumina HiSeq 4000 platform with either single-end 75 bp (SE75) or paired-end 50 bp (PE50) cycles following manufacturer’s protocols. Output reads were demultiplexed by allowing up to 1 mismatch in each of two 8-mer indices. FASTQ files were processed using EAGER v1.9250. Specifically, Illumina adapter sequences were trimmed using AdapterRemoval v2.2.051, aligned reads (30 base pairs or longer) onto the human reference genome (hg19) using BWA aln/samse v0.7.1252 with relaxed edit distance parameter (“-n 0.01”). Seeding was disabled for reads from non-UDG libraries by adding an additional parameter (“-l 9999”). PCR duplicates were then removed using DeDup v0.12.250 and reads with Phred-scaled mapping quality score < 30 were filtered out using Samtools v1.353. We did several measurements to check data authenticity. First, patterns of chemical damages typical to ancient DNA were tabulated using mapDamage v2.0.654. Second, mitochondrial contamination for all libraries was estimated by Schmutzi55. Third, nuclear contamination for libraries derived from males was estimated by the contamination module in ANGSD v0.91056. Prior to genotyping, the first and last 3 bases of each read were masked for libraries with partial UDG treatment using the trimBam module in bamUtil v1.0.1357. To obtain haploid genotypes, we randomly chose one high-quality base (Phred-scaled base quality score ≥ 30) for each of the 1.24 million target sites using pileupCaller (https://github.com/stschiff/sequenceTools). We used masked reads from libraries with partial UDG treatment for transition (Ts) SNPs and used unmasked reads from all libraries for transversions (Tv). Mitochondrial consensus sequences were obtained by the log2fasta program in Schmutzi with the quality cutoff 10 and subsequently assigned to haplogroups using HaploGrep258. Y haplogroup R1b was assigned using the yHaplo program59. To estimate the phylogenetic position of the Botai Y haplogroup more precisely, Y chromosomal SNPs were called with Samtools mpileup using bases with quality score ≥ 30: a total of 2,481 SNPs out of ~30,000 markers included in the 1240K capture panel were called with mean read depth of 1.2. Twenty-two SNP positions relevant to the up-to-date haplogroup R1b tree (www.isogg.org; www.yfull.com) confirmed that the sample was positive for the markers of R1b-P297 branch but negative for its R1b-M269 sub-branch. The frequency distribution map of this Y chromosomal clade was created by the GeneGeo software60,61 using the average weighed interpolation procedure with the weight function of degree 3 and radius 1,200 km. The initial frequencies were calculated as proportion of samples positive for “root” R1b marker M343 but negative for M269; these proportions were calculated for the 577 populations from the in-home Y-base database, which was compiled mainly from the published datasets.

Analysis of population structure

We performed principal component analysis (PCA) of various groups using smartpca v13050 in the EIGENSOFT v6.0.1 package62. We used the “lsqproject: YES” option to project individuals not used for calculating PCs (this procedure avoids bias due to missing genotypes). We performed unsupervised model-based genetic clustering as implemented in ADMIXTURE v1.3.063. For that purpose, we used 118,387 SNPs with minor allele frequency (maf) 1% or higher in 3,507 individuals after pruning out linked SNPs (r2 > 0.2) using the “--indep-pairwise 200 25 0.2” command in PLINK v1.9064. For each value of K ranging from 2 to 20, we ran 5 replicates with different random seeds and took one with the highest log likelihood value.

F-statistics analysis

We computed various f and f statistics using the qp3Pop (v400) and qpDstat (v711) programs in the ADMIXTOOLS package43. We computed f-statistics with the “f4mode: YES” option. For these analyses, we studied a total of 301 groups, including 73 inner Eurasian target groups and 167 contemporary and 93 ancient reference groups (Supplementary Table 2). We included two groups from the Aleutian Islands (“Aleut” and “Aleut_Tlingit”; Supplementary Table 2) as positive control targets with known recent admixture. Aleut_Tlingits are Aleut individuals whose mitochondrial haplogroup lineages are related to Tlingits31. For each target, we calculated outgroup f statistic of the form f(Target, X; Mbuti) against all targets and references to quantify overall allele sharing and performed admixture f test of the form f(Ref1, Ref2; Target) for all pairs of references to explore the admixture signal in targets. We estimated standard error (SE) using a block jackknife with 5 centiMorgan (cM) block62. We performed f statistic-based admixture modeling using the qpAdm (v632) program20 in the ADMIXTOOLS package. We used a basic set of 7 outgroups, unless specified otherwise, to provide high enough resolution to distinguish various western and eastern Eurasian ancestries: Mbuti (n=10; central African), Natufian (n=6; early Holocene Levantine)20, Onge (n=11; from the Andaman Islands), Iran_N (n=5; Neolithic Iranian)20, Villabruna (n=1; Paleolithic European)28, Ami (n=10; Taiwanese aborigine) and Mixe (n=10; Central American). Prior to qpAdm modeling, we checked if the reference groups are well distinguished by their relationship with the outgroups using the qpWave (v400) program65. We used the qpGraph (v6065) program in the ADMIXTOOLS package for graph-based admixture modeling. Starting with a graph of (Mbuti, Ami, WHG), we iteratively added AG3 (n=1; Paleolithic Siberian)28, EHG (n=4; Mesolithic hunter-gatherers from Karelia or Samara)5,23,28, and Botai onto the graph by testing all possible topologies allowing up to one additional gene flow. After obtaining the best two-way admixture model for Botai, we tested additional three-way admixture models.

GLOBETROTTER analysis

We performed a GLOBETROTTER analysis of admixture for 73 inner Eurasian target populations to obtain haplotype sharing based evidence of admixture, independent of the allele frequency based f-statistics, as well as estimates of admixture dates and a fine-scale profile of their admixture sources14. We followed the “regional” approach described in Hellenthal et al.14, in which target haplotypes can only be copied from the haplotypes of 167 contemporary reference groups, but not from those of the other target groups. This approach is recommended when multiple target groups share a similar admixture history14, which is likely to be the case for our inner Eurasian populations. We jointly phased the contemporary genome data without a pre-phased set of reference haplotypes, using SHAPEIT2 v2.837 in its default setting66. We used a genetic map for the 1000 Genomes Project phase 3 data, downloaded from: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html. We used haplotypes from a total of 2,615 individuals belonging to 240 groups (73 recipients and 167 donors; Supplementary Table 2) for the GLOBETROTTER analysis. To reduce computational burden and to provide more balanced set of donor populations, we randomly sampled 20 individuals if a group contained more than 20 individuals. Using these haplotypes, we performed GLOBETROTTER analysis following the recommended workflow14. We first ran 10 rounds of the expectation-maximization (EM) algorithm for chromosomes 4, 10, 15 and 22 in ChromoPainter v2 with “-in” and “-iM” switches to estimate chunk size and switch error rate parameters67. Both recipient and donor haplotypes were modeled as a patchwork of donor haplotypes. The “chunk length” output was obtained by running ChromoPainter v2 across all chromosomes with the estimated parameters averaged over both recipient and donor individuals (“-n 238.05 -M 0.000617341”). We also generated 10 painting samples for each recipient group by running ChromoPainter with the parameters averaged over all recipient individuals (“-n 248.455 -M 0.000535236”). Using the chunklength output and painting samples, we ran GLOBETROTTER with the “prop.ind: 1” and “null.ind: 1” options. We estimated significance of estimated admixture date by running 100 bootstrap replicates using the “prop.ind: 0” and “bootstrap.date.ind: 1” options; we considered date estimates between 1 and 400 generations as evidence of admixture14. For populations that gave evidence of admixture by this procedure, we repeated GLOBETROTTER analysis with the “null:ind: 0” option14. We also compared admixture dates from GLOBETROTTER analysis with those based on weighted admixture linkage disequilibrium (LD) decay, as implemented in ALDER v1.368. As the reference pair, we used (French, Eskimo_Naukan), (French, Nganasan), (Georgian, Ulchi), (French, Ulchi) and (Georgian, Ulchi) for the target group categories 1 to 5, respectively, based on their genetic profile (Supplementary Table 2). We used a minimum inter-marker distance of 1.0 cM to account for LD in the references.

EEMS analysis

To visualize the heterogeneity in the rate of gene flow across inner Eurasia, we performed the EEMS (“estimated effective migration surface”) analysis44. We included a total of 1,214 individuals from 98 groups in the analysis (Supplementary Table 2). In this dataset, we kept 101,370 SNPs with maf ≥ 0.01 after LD pruning (r2 ≤ 0.2). We computed the mean squared genetic difference matrix between all pairs of individuals using the “bed2diffs_v1” program in the EEMS package. To reduce distortion in northern latitudes due to map projection, we used geographic coordinates in the Albers equal area conic projection (“+proj=aea +lat_1=50 +lat_2=70 +lat_0=56 +lon_0=100 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs”). We converted geographic coordinates of each sample and the boundary using the “spTransform” function in the R package rgdal v1.2-5. We ran five initial MCMC runs of 2 million burn-ins and 4 million iterations with different random seeds and took a run with the highest likelihood. Starting from the best initial run, we set up another five MCMC runs of 2 million burn-ins and 4 million iterations as our final analysis. We used the following proposal variance parameters to keep the acceptance rate around 30-40%, as recommended by the developers44: qSeedsProposalS2 = 5000, mSeedsProposalS2 = 1000, qEffctProposalS2 = 0.0001, mrateMuProposalS2 = 0.00005. We set up a total of 532 demes automatically with the “nDemes = 600” parameter. We visualized the merged output from all five runs using the “eems.plots” function in the R package rEEMSplots44. We performed the EEMS analysis for Caucasus populations in a similar manner, including a total of 237 individuals from 21 groups (Supplementary Table 2). In this dataset, we kept 95,442 SNPs with maf ≥ 0.01 after LD pruning (r2 ≤ 0.2). We applied the Mercator projection of geographic coordinates to the map of Eurasia (“+proj=merc +datum=WGS84”). We ran five initial MCMC runs of 2 million burn-ins and 4 million iterations with different random seeds and took a run with the highest likelihood. Starting from the best initial run, we set up another five MCMC runs of 1 million burn-in and 4 million iterations as our final analysis. We used the default following proposal variance parameters: qSeedsProposalS2 = 0.1, mSeedsProposalS2 = 0.01, qEffctProposalS2 = 0.001, mrateMuProposalS2 = 0.01. A total of 171 demes were automatically set up with the “nDemes = 200” parameter.

59 in total

1. Worldwide human relationships inferred from genome-wide patterns of variation.

Authors: Jun Z Li; Devin M Absher; Hua Tang; Audrey M Southwick; Amanda M Casto; Sohini Ramachandran; Howard M Cann; Gregory S Barsh; Marcus Feldman; Luigi L Cavalli-Sforza; Richard M Myers
Journal: Science Date: 2008-02-22 Impact factor: 47.728

2. Reconstructing Native American population history.

Authors: David Reich; Nick Patterson; Desmond Campbell; Arti Tandon; Stéphane Mazieres; Nicolas Ray; Maria V Parra; Winston Rojas; Constanza Duque; Natalia Mesa; Luis F García; Omar Triana; Silvia Blair; Amanda Maestre; Juan C Dib; Claudio M Bravi; Graciela Bailliet; Daniel Corach; Tábita Hünemeier; Maria Cátira Bortolini; Francisco M Salzano; María Luiza Petzl-Erler; Victor Acuña-Alonzo; Carlos Aguilar-Salinas; Samuel Canizales-Quinteros; Teresa Tusié-Luna; Laura Riba; Maricela Rodríguez-Cruz; Mardia Lopez-Alarcón; Ramón Coral-Vazquez; Thelma Canto-Cetina; Irma Silva-Zolezzi; Juan Carlos Fernandez-Lopez; Alejandra V Contreras; Gerardo Jimenez-Sanchez; Maria José Gómez-Vázquez; Julio Molina; Angel Carracedo; Antonio Salas; Carla Gallo; Giovanni Poletti; David B Witonsky; Gorka Alkorta-Aranburu; Rem I Sukernik; Ludmila Osipova; Sardana A Fedorova; René Vasquez; Mercedes Villena; Claudia Moreau; Ramiro Barrantes; David Pauls; Laurent Excoffier; Gabriel Bedoya; Francisco Rothhammer; Jean-Michel Dugoujon; Georges Larrouy; William Klitz; Damian Labuda; Judith Kidd; Kenneth Kidd; Anna Di Rienzo; Nelson B Freimer; Alkes L Price; Andrés Ruiz-Linares
Journal: Nature Date: 2012-08-16 Impact factor: 49.962

3. Inferring admixture histories of human populations using linkage disequilibrium.

Authors: Po-Ru Loh; Mark Lipson; Nick Patterson; Priya Moorjani; Joseph K Pickrell; David Reich; Bonnie Berger
Journal: Genetics Date: 2013-02-14 Impact factor: 4.562

4. Inference of population structure using dense haplotype data.

Authors: Daniel John Lawson; Garrett Hellenthal; Simon Myers; Daniel Falush
Journal: PLoS Genet Date: 2012-01-26 Impact factor: 5.917

5. The genetic legacy of the expansion of Turkic-speaking nomads across Eurasia.

Authors: Bayazit Yunusbayev; Mait Metspalu; Ene Metspalu; Albert Valeev; Sergei Litvinov; Ruslan Valiev; Vita Akhmetova; Elena Balanovska; Oleg Balanovsky; Shahlo Turdikulova; Dilbar Dalimova; Pagbajabyn Nymadawa; Ardeshir Bahmanimehr; Hovhannes Sahakyan; Kristiina Tambets; Sardana Fedorova; Nikolay Barashkov; Irina Khidiyatova; Evelin Mihailov; Rita Khusainova; Larisa Damba; Miroslava Derenko; Boris Malyarchuk; Ludmila Osipova; Mikhail Voevoda; Levon Yepiskoposyan; Toomas Kivisild; Elza Khusnutdinova; Richard Villems
Journal: PLoS Genet Date: 2015-04-21 Impact factor: 5.917

6. Genomic study of the Ket: a Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestry.

Authors: Pavel Flegontov; Piya Changmai; Anastassiya Zidkova; Maria D Logacheva; N Ezgi Altınışık; Olga Flegontova; Mikhail S Gelfand; Evgeny S Gerasimov; Ekaterina E Khrameeva; Olga P Konovalova; Tatiana Neretina; Yuri V Nikolsky; George Starostin; Vita V Stepanova; Igor V Travinsky; Martin Tříska; Petr Tříska; Tatiana V Tatarinova
Journal: Sci Rep Date: 2016-02-11 Impact factor: 4.379

7. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

8. Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA.

Authors: Gabriel Renaud; Viviane Slon; Ana T Duggan; Janet Kelso
Journal: Genome Biol Date: 2015-10-12 Impact factor: 13.583

9. The Complex Admixture History and Recent Southern Origins of Siberian Populations.

Authors: Irina Pugach; Rostislav Matveev; Viktor Spitsyn; Sergey Makarov; Innokentiy Novgorodov; Vladimir Osakovsky; Mark Stoneking; Brigitte Pakendorf
Journal: Mol Biol Evol Date: 2016-03-18 Impact factor: 16.240

10. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.

Authors: Swapan Mallick; Heng Li; Mark Lipson; Iain Mathieson; Melissa Gymrek; Fernando Racimo; Mengyao Zhao; Niru Chennagiri; Susanne Nordenfelt; Arti Tandon; Pontus Skoglund; Iosif Lazaridis; Sriram Sankararaman; Qiaomei Fu; Nadin Rohland; Gabriel Renaud; Yaniv Erlich; Thomas Willems; Carla Gallo; Jeffrey P Spence; Yun S Song; Giovanni Poletti; Francois Balloux; George van Driem; Peter de Knijff; Irene Gallego Romero; Aashish R Jha; Doron M Behar; Claudio M Bravi; Cristian Capelli; Tor Hervig; Andres Moreno-Estrada; Olga L Posukh; Elena Balanovska; Oleg Balanovsky; Sena Karachanak-Yankova; Hovhannes Sahakyan; Draga Toncheva; Levon Yepiskoposyan; Chris Tyler-Smith; Yali Xue; M Syafiq Abdullah; Andres Ruiz-Linares; Cynthia M Beall; Anna Di Rienzo; Choongwon Jeong; Elena B Starikovskaya; Ene Metspalu; Jüri Parik; Richard Villems; Brenna M Henn; Ugur Hodoglugil; Robert Mahley; Antti Sajantila; George Stamatoyannopoulos; Joseph T S Wee; Rita Khusainova; Elza Khusnutdinova; Sergey Litvinov; George Ayodo; David Comas; Michael F Hammer; Toomas Kivisild; William Klitz; Cheryl A Winkler; Damian Labuda; Michael Bamshad; Lynn B Jorde; Sarah A Tishkoff; W Scott Watkins; Mait Metspalu; Stanislav Dryomov; Rem Sukernik; Lalji Singh; Kumarasamy Thangaraj; Svante Pääbo; Janet Kelso; Nick Patterson; David Reich
Journal: Nature Date: 2016-09-21 Impact factor: 49.962

31 in total

Review 1. Origin of ethnic groups, linguistic families, and civilizations in China viewed from the Y chromosome.

Authors: Xueer Yu; Hui Li
Journal: Mol Genet Genomics Date: 2021-05-26 Impact factor: 3.291

2. New insights into the fine-scale history of western-eastern admixture of the northwestern Chinese population in the Hexi Corridor via genome-wide genetic legacy.

Authors: Hongbin Yao; Mengge Wang; Xing Zou; Yingxiang Li; Xiaomin Yang; Ailin Li; Hui-Yuan Yeh; Peixin Wang; Zheng Wang; Jingya Bai; Jianxin Guo; Jinwen Chen; Xiao Ding; Yan Zhang; Baoquan Lin; Chuan-Chao Wang; Guanglin He
Journal: Mol Genet Genomics Date: 2021-03-01 Impact factor: 3.291

3. Ethnic and geographic diversity of chronic lymphocytic leukaemia.

Authors: Shenmiao Yang; Abraham M Varghese; Nitin Sood; Carlos Chiattone; Norah O Akinola; Xiaojun Huang; Robert Peter Gale
Journal: Leukemia Date: 2020-10-19 Impact factor: 11.528

4. Genomic diversity and post-admixture adaptation in the Uyghurs.

Authors: Yuwen Pan; Chao Zhang; Yan Lu; Zhilin Ning; Dongsheng Lu; Yang Gao; Xiaohan Zhao; Yajun Yang; Yaqun Guan; Dolikun Mamatyusupu; Shuhua Xu
Journal: Natl Sci Rev Date: 2021-09-11 Impact factor: 17.275

5. Mitogenomics of modern Mongolic-speaking populations.

Authors: Miroslava Derenko; Galina Denisova; Irina Dambueva; Boris Malyarchuk; Boris Bazarov
Journal: Mol Genet Genomics Date: 2021-11-10 Impact factor: 3.291

6. Early nomads of the Eastern Steppe and their tentative connections in the West.

Authors: Alexander Savelyev; Choongwon Jeong
Journal: Evol Hum Sci Date: 2020-05-07

7. A Dynamic 6,000-Year Genetic History of Eurasia's Eastern Steppe.

Authors: Choongwon Jeong; Ke Wang; Shevan Wilkin; William Timothy Treal Taylor; Bryan K Miller; Jan H Bemmann; Raphaela Stahl; Chelsea Chiovelli; Florian Knolle; Sodnom Ulziibayar; Dorjpurev Khatanbaatar; Diimaajav Erdenebaatar; Ulambayar Erdenebat; Ayudai Ochir; Ganbold Ankhsanaa; Chuluunkhuu Vanchigdash; Battuga Ochir; Chuluunbat Munkhbayar; Dashzeveg Tumen; Alexey Kovalev; Nikolay Kradin; Bilikto A Bazarov; Denis A Miyagashev; Prokopiy B Konovalov; Elena Zhambaltarova; Alicia Ventresca Miller; Wolfgang Haak; Stephan Schiffels; Johannes Krause; Nicole Boivin; Myagmar Erdene; Jessica Hendy; Christina Warinner
Journal: Cell Date: 2020-11-05 Impact factor: 41.582

8. Novel Sequence Types of Listeria monocytogenes of Different Origin Obtained in the Republic of Serbia.

Authors: Tatiana Yu Bespalova; Tatiana V Mikhaleva; Nadezhda Yu Meshcheryakova; Olga V Kustikova; Kazimir Matovic; Marko Dmitrić; Sergey S Zaitsev; Maria A Khizhnyakova; Valentina A Feodorova
Journal: Microorganisms Date: 2021-06-12

9. Human mobility at Tell Atchana (Alalakh), Hatay, Turkey during the 2nd millennium BC: Integration of isotopic and genomic evidence.

Authors: Tara Ingman; Stefanie Eisenmann; Eirini Skourtanioti; Murat Akar; Jana Ilgner; Guido Alberto Gnecchi Ruscone; Petrus le Roux; Rula Shafiq; Gunnar U Neumann; Marcel Keller; Cäcilia Freund; Sara Marzo; Mary Lucas; Johannes Krause; Patrick Roberts; K Aslıhan Yener; Philipp W Stockhammer
Journal: PLoS One Date: 2021-06-30 Impact factor: 3.240

10. Ancient genomes from northern China suggest links between subsistence changes and human migration.

Authors: Chao Ning; Tianjiao Li; Ke Wang; Fan Zhang; Tao Li; Xiyan Wu; Shizhu Gao; Quanchao Zhang; Hai Zhang; Mark J Hudson; Guanghui Dong; Sihao Wu; Yanming Fang; Chen Liu; Chunyan Feng; Wei Li; Tao Han; Ruo Li; Jian Wei; Yonggang Zhu; Yawei Zhou; Chuan-Chao Wang; Shengying Fan; Zenglong Xiong; Zhouyong Sun; Maolin Ye; Lei Sun; Xiaohong Wu; Fawei Liang; Yanpeng Cao; Xingtao Wei; Hong Zhu; Hui Zhou; Johannes Krause; Martine Robbeets; Choongwon Jeong; Yinqiu Cui
Journal: Nat Commun Date: 2020-06-01 Impact factor: 14.919