Literature DB >> 34992635

The Genetic Structure and East-West Population Admixture in Northwest China Inferred From Genome-Wide Array Genotyping.

Bin Ma1, Jinwen Chen2, Xiaomin Yang3, Jingya Bai1, Siwei Ouyang1, Xiaodan Mo1, Wangsheng Chen1, Chuan-Chao Wang2,3,4, Xiangjun Hai1.   

Abstract

Northwest China is a contacting region for East and West Eurasia and an important center for investigating the migration and admixture history of human populations. However, the comprehensive genetic structure and admixture history of the Altaic speaking populations and Hui group in Northwest China were still not fully characterized due to insufficient sampling and the lack of genome-wide data. Thus, We genotyped genome-wide SNPs for 140 individuals from five Chinese Mongolic, Turkic speaking groups including Dongxiang, Bonan, Yugur, and Salar, as well as the Hui group. Analysis based on allele-sharing and haplotype-sharing were used to elucidate the population history of Northwest Chinese populations, including PCA, ADMIXTURE, pairwise Fst genetic distance, f-statistics, qpWave/qpAdm and ALDER, fineSTRUCTURE and GLOBETROTTER. We observed Dongxiang, Bonan, Yugur, Salar, and Hui people were admixed populations deriving ancestry from both East and West Eurasians, with the proportions of West Eurasian related contributions ranging from 9 to 15%. The genetic admixture was probably driven by male-biased migration- showing a higher frequency of West Eurasian related Y chromosomal lineages than that of mtDNA detected in Northwest China. ALDER-based admixture and haplotype-based GLOBETROTTER showed this observed West Eurasian admixture signal was introduced into East Eurasia approximately 700 ∼1,000 years ago. Generally, our findings provided supporting evidence that the flourish transcontinental communication between East and West Eurasia played a vital role in the genetic formation of northwest Chinese populations.
Copyright © 2021 Ma, Chen, Yang, Bai, Ouyang, Mo, Chen, Wang and Hai.

Entities:  

Keywords:  admixture history; gansu; gene flow; genetic structure; northwest China; steppe population; trans-Eurasia; west Eurasia

Year:  2021        PMID: 34992635      PMCID: PMC8724515          DOI: 10.3389/fgene.2021.795570

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


Introduction

The human history of East Asia can be traced back to the Late Paleolithic Age. The anatomically modern humans permanently made an occupation in East Asia about 50,000 years ago (Alexander et al., 2009). Numerous evidences from ancient and present-day human genomes suggested an initial settlement in East Asia about 60,000 years ago and multiple waves of population expansion in Paleolithic and Neolithic periods (Fu et al., 2013; Bai et al., 2020; Zhang et al., 2020). The Pan-Asia project suggested the main southern migration route contributed much more to the peopling of the East Asia compared to the northern migration route by analyzing genome-wide data of 1900 individuals from 73 populations (HUGO Pan-Asian SNP Consortium et al., 2009; Cao et al., 2020). However, paternal Y chromosome and maternal mitochondrial DNA indicated that the gene flows from the west and northern Eurasia into East Asia were through the northern migration route (Su et al., 1999; Wen et al., 2004). East Asia is an important earliest center of animal and plant domestication in the world (Wang et al., 2021a). Paleogenomic studies documented that the genetic diversity in prehistoric Asia was higher than in more recent periods of human history and population migration between northern and southern East Asia that started in Late Neolithic Age influenced the genetic formation of modern East Asiana (Ning et al., 2020; Yang et al., 2020; Wang et al., 2021a; Wang et al., 2021b; Xiaowei et al., 2021). These expansion events were associated with the spread of the major language families existing in East Asia. There is also a remarkable diversity of human languages spoken in East Asia, including Sino-Tibetan, Hmong-Mien, Austroasiatic, Tai-Kadai, Austronesian, Indo-European, Turkic, Mongolic, Tungusic, Japonic, Koreanic, Yukaghiric, and Chukotko-Kamchatkan (Wang et al., 2021a; Uesugi et al., 2021). The formation of East Asians is suggested to has involved genetic contributions from various ancestral human populations (Duan et al., 2018; Sun et al., 2019; Wang et al., 2021a). The Eastern Steppe is characterized with grasslands, forest steppe, and desert steppe, connecting Russia, Mongolia, and China. The Eastern Eurasian Steppe is home to historic empires of nomadic pastoralists, including Xiongnu, Turkic Khaganate, and the Mongols. The East Steppe have also served as the important communication node between West and East Eurasia. The Central/East Steppe has witnessed intensive East and West communications and interactions in many aspects (Elfari et al., 2005; Hyten et al., 2010; Stoneking and Delfin, 2010; Liu et al., 2018; Chen et al., 2019; Lan et al., 2019; Cao et al., 2020; Tangkanchanapas et al., 2020; Rodin et al., 2021). Historical and archeological studies demonstrated that the western Eurasian cultural factors were once brought into the north region of China through the East-West communication corridors (Sanchez-Burks et al., 2003; Xu, 2008; Ning et al., 2019). In the past, the ancient Silk Road was an important connection of the West Eurasia and China, which contributed much to the intensified transcontinental culture and population communications s between the East and West Eurasia (Cheng, 1985; Robino et al., 2014). The Silk Road was at its most bustling time in Tang Dynasty, but before that time the east-west communication was established for a long time, which could be traced back to the Early Bronze Age (Haak et al., 2015; Goldberg et al., 2017; Lazaridis and Reich, 2017; Saag et al., 2017). The corresponding trans-continental population migration during the Late Neolithic Age, the Bronze Age to the Iron Age and historical period had been demonstrated in the core regions of Siberia (Abelson, 1978; Matsumoto et al., 1995; Hemphill and Mallory, 2004; Maramovich et al., 2008; Jeong et al., 2018; Juras et al., 2020; Stoof-Leichsenring et al., 2020). The archeological evidence supported the interaction between the westward spread of millet agriculture and also the eastward spread of barley and wheat agriculture with population migration (Zohary and Hopf, 1973; Medjugorac et al., 1994; Hemphill and Mallory, 2004; Saisho and Purugganan, 2007; Wang et al., 2016; De Barros Damgaard et al., 2018b; Bento et al., 2018; Jeong et al., 2018). The Trans-Eurasian cultural and genetic exchanges have significantly influenced the demographic dynamics of Eurasian populations (Peel and Talley, 1996; Khan et al., 2017; Miller et al., 2017; De Barros Damgaard et al., 2018a; De Barros Damgaard et al., 2018b; Damgaard et al., 2018; Antwerpen et al., 2019; Coulehan, 2020; Saint Onge and Brooks, 2020; Zhou et al., 2020). The EasternEurasian Forest steppe zone was genetically structured during the Pre-Bronze and Early Bronze Age, with a strong west-east admixture cline of ancestry stretching from Botai in central Kazakhstan to Lake Baikal in southern Siberia, and to the Devil’s Gate Cave in the Russian Far East (Jeong et al., 2020). During the Bronze Age, the eastward migration of Western Eurasian nomadic populations related to Afanasievo and Andronovo Culture into Eastern Steppe have not only influenced the gene pool of eastern Eurasian populations (Ning et al., 2019; Wang et al., 2021a), but also drastically changed lifeways and subsistence on the Eastern Steppe. The milk consumption in Mongolia started prior to 2500 BCE by groups related to Afanasievo and Chemurchek culture (Jeong et al., 2018). Until the Iron Age, the pastoralists established the nomadic empire in Eastern Steppe. The Xiongnu empire was the first historically recorded nomadic empire in Eastern Steppe, which had a profound influence on the demographics and geopolitics of Eurasia by expanding into northern China, southern Siberia, and Central Asia, even as far as the West Eurasian (Damgaard et al., 2018). During 13th century, the Mongols group eventually controlled a vast territory and numerous trade routes stretching from China to the Mediterranean (Jeong et al., 2020). The archaeological evidence showed Mongolia Plateau is a conduit for cultural exchanges between the East and the West Eurasia (Malyarchuk et al., 2016; Wang et al., 2021a; Liu et al., 2021). Northwest China locates in the west-east Eurasian interaction core region, populations in this region mainly belongs to Altaic language family which includes Mongolic, Turkic, and Tungusic language based on language classification. Modern populations in Northwest China were typical admixtures between populations all around the trans-Eurasia continent (Feng et al., 2017; Yao et al., 2021). Uyghur derived western related ancestry from West Eurasians and South Asians, while the eastern related components were from the East Asians, and the Siberians (Ma et al., 2014; Feng et al., 2017; Heizhati et al., 2020). Gansu province connecting the Hexi Corridor and the Tibetan-Yi Corridor in northwest China is not only takes part in the west-east Eurasian communication, but also plays an important role in the southwards population expansion which contributed to the formation of Tibeto-Burman speaking population (Feng et al., 2020; Luo et al., 2020). Human population genetic researches had been carried out based on low-density genetic markers and limited sample sizes to explore the genetic history of Gansu province (Yao et al., 2016; Yao et al., 2017; Wen et al., 2019). But a comprehensive survey of the genetic diversity and fine-scale genetic structure of Gansu province based on genome-wide data were still sparse. Therefore, to shed more light on the genetic profile of northwest China, 140 individuals from Gansu including Hui, Dongxiang, Bonan, Yugur, and Salar ethnic groups were collected and genotyping with Illumina gene arrays at approximately 700,000 genome-wide single-nucleotide polymorphisms (SNPs). We merged the genotyping data with reference data of worldwide populations, and carried out population genetics analysis to explore the genetic structure and uncovered the admixture history of Altaic speaking populations in Northwest China.

Materials and Methods

Ethics Statement

The procedures of the sample collection and the investigations were reviewed and approved by the Medical Ethics Committee of Xiamen University and Northwest Minzu University and were in accordance with the recommendations provided by the revised Helsinki Declaration of 2000. Moreover, our study stuff had already informed these potential participants about our purposes of this project, and every participant in our study had provided the informed consent.

Sample Collection

Our study focused on Gansu province in Northwest China. We collected 140 saliva samples from unrelated individuals of Altaic speaking populations and Hui group from Sunan, Linxia, Lanzhou, Dahejia, and Kangle, including 24 samples from Hui, 30 samples from Dongxiang, 30 samples from Bonan, 30 samples from Yugur, and 26 samples from Salar (Figure 1). All inclued individuals were required to be indigenous self-declared, following the critera that requiring an indigenous person with at least three generations of history in the area and the offspring of a non-consanguineous marriage within populations.s.
FIGURE 1

The geographical map of our samples collection.

The geographical map of our samples collection.

Genotyping and Data Mergeing

We used PureLink Genomic DNA Mini Kit (Thermo Fisher Scientific) to extract DNA and measure the concentration via the Nanodrop-2000 following the manufacturer’s instructions. All these qualified samples were genotyped using the Illumina WeGene Arrays covering about 700,000 Single nucleotide polymorphisms (SNPs) at the WeGene genotyping centre in Shenzhen. We first analyzed the biological relatedness of individuals using plink (Chang et al., 2015) softwere and all individuals were filtered. We also conducted the quality control process. There were 25,653 SNPs which were removed due to high percentage of missingness with “--geno 0.1 –mind 0.1” option using plink. Then we applied a HWE threshold by 0.001, and 17,153 SNPs were removed. We pruned the Linkage Disequilibrium by “--indep-pairwise 200 25 0.4” for ADMIXTURE analysis. We obtained a dataset covring 72,541 SNPs when merged our 140 samples with the previously published data from Human Origin Datasetand a dataset covering merged 95,675 SNPs when merged with 1240 K capture dataset from David Reich Lab (https://reich.hms.harvard.edu/downloadablegenotypes-present-day-and-ancient-dna-data-compiled-published-papers) (Patterson et al., 2006; 2012).

Principal Component Analysis

Principal component analysis (PCA) was carried out using the software called smartpca built in the EIGENSOFT package (Patterson et al., 2006). The PCA analysis was performed at the individual level to describe the genetic structure of all of our samples in Gansu province and the reference populations. We used the following parameters: the numoutlieriter: 0 and lsqproject: YES options. We projected ancient individuals onto the first two components calculated by present-day samples. We visualized the PCA results by the ggplot2 package in the R software (http://www.r-project.org/).

ADMIXTURE

We carried out ADMIXTURE (Alexander et al., 2009) analysis after pruning for strong linkage disequilibrium in Plink V.1.9 (Purcell et al., 2007; Chang et al., 2015) with the parameters “-indep-pairwise 200 25 0.4”. We ran ADMIXTURE with the 10-fold cross-validation (−CV = 10), varying the number of ancestral populations between K = 2 and K = 20 in 100 bootstraps with different random seeds. We chose the best run according to the highest log-likelihood with the lowest CV error value.

F-Statistics

We computed f statistics using ADMIXTOOLS with the default parameters, and calculated standard errors (statistical significance) using a block jackknife resampling across the genome (Patterson et al., 2006). We carried out outgroup f -statistics of the form f (X, Y; Mbuti) to measure the shared genetic drifts between population X and population Y since their separation from an outgroup population. We here used Mbuti as an outgroup population, a group who lived in the Congo basin in the middle region of Africa. We next used admixture-f statistics in the form of f (X, Y; Target) for all pairs of references populations to make an evaluation of the possible admixture signals for the target populations. We conducted the heatmap visualization of the outgroup-f statistics values by the pheatmap package in the R software.

Streams of Ancestry and the Inference of Admixture Proportions

We investigated the admixture source numbers, plausible admixture sources, and the corresponding admixture proportions based on qpAdm program as implemented in ADMIXTOOLS (Patterson et al., 2006). We used this f -statistics based admixture modeling to explore whether a batch of target populations were consistent with being related via N streams of source populations from a basic set of some outgroups and calculated the admixture proportions of the given source populations quantitatively.

Y-Chromosomal and mtDNA Haplogroup Assignment

We assigned the Y chromosomal haplogroups by genotyping the most derived allele upstream and the most ancestral allele downstream in the phylogenetic tree by using an in-house script following the recommendations of the International Society of Genetic Genealogy (ISOGG; http://www.isogg.org/). The mtDNA haplogroups assignment was identified with mtDNA phylogenetic tree Build 16 (http://www.phylotree.org/).

Fst Calculation

The Fst values were calculated by the smartpca of EIGENSOFT (Patterson et al., 2006). We ran the smartpca with the parameters: inbreed: YES and fstonly: YES, and then output the results by phylipoutname parameter. We found that the inbreeding corrected and uncorrected Fst were nearly identical. In the following, we performed the phylogenetic tree by the Fst values of the populations in Eurasia. We performed the phylogenetic tree by the NJ tree using MEGA software (Kumar et al., 2016).

Weighted Linkage Disequilibrium Analysis

Linkage disequilibrium decay was computed by ALDER (Loh et al., 2013) to infer the admixture time for our studied populations.

Fine-Scale Genetic Structure Based on FineSTRUCTURE

Bayesian clustering implemented in FineSTRUCTURE was used to reconstruct polygenetic relationships and further identify population structure. To reduce the computational burden, we selected 10–20 individuals randomly in a reference group and 15 individuals in our studied group. We phased genome-wide dense SNP data using the SHAPEIT2 (Delaneau et al., 2013) and then conducted FineSTRUCTURE (Lawson et al., 2012) analysis.

ChromoPaintev2 and GLOBETROTTER Admixture Modeling

We performed a GLOBETROTTER (Hellenthal et al., 2014) analysis for our studied groups to obtain haplotype-sharing based evidence of admixture. Using these haplotypes from SHAPEIT2, the “chunk length” output was obtained by running ChromoPainterv2 across all chromosomes. We ran GLOBETROTTER to estimate admixture events by 100 bootstrap replicates, assuming that there is detectable admixture using the “pro.ind:1”, and “bootstrap.date.ind:1” options.

Runs of Homozygosity

We calculated the Runs of homozygosity by PLINK software. The related parameters were: “--homozyg-density 50, --homozyg-window-het 1, --homozyg-window-threshold 0.05”. Then we presented the counts and lengths of ROH.

Results

Population Genetic Structure of the Northwest China

In the beginning of the population genetic analysis, we presented the results of ROH computation (Figure 2). In our studied populations in Northwest China, the ROH segments were mainly short fractions which were between 1 and 2 Mb. And the long fractions which were longer than 20 Mb were rare. Therefore, our studied populations were not consanguineous communities.
FIGURE 2

The results of ROH calculation. (A) the length distribution; (B) the average length.

The results of ROH calculation. (A) the length distribution; (B) the average length. We firstly conducted PCA to infer the general genetic structure of our sampled populations with other East Asians (Figure 3). From the PCA plot, we found the genetic clusters were consistent with the geographic, and linguistic categories in East Asia. We observed the following clear genetic clusters or clines. A genetic cline related to Turkic speaking populations, which was driven by populations with a large amount of West Eurasian related ancestry, such as Uyghur and Uzbek ethnic groups; a cluster with the Mongolic speaking populations; a cluster related to Tungusic speaking populations; a cluster of populations in West Eurasia. A cluster of Tibetan populations on the high-altitude region; a cluster with Han Chinese groups; and a huge cluster related to southern populations in East Asia speaking Hmong-Mien, Austroasiatic, Tai-Kadai, and Austronesian languages. Our newly reported samples in Gansu province clustered genetically between the Han Chinese groups and the Turkic speaking populations. We next removed the populations from southern China and Southeast Asia and the human groups in West Eurasia to show a more clearly clustering pattern among northern populations. In the zoomed PCA, our newly reported populations were close to the Han Chinese cluster, but also shifted towards the Turkic genetic cline, showing genetic affinity with both Turkic populations, and Han Chinese.
FIGURE 3

Patterns of genetic relationship among published East Asian populations and our newly genotyped five populations inferred from the principal component analysis. (A) East Asians including southern populations and with the West Eurasians; (B) East Asians without southern populations and without the West Eurasians.

Patterns of genetic relationship among published East Asian populations and our newly genotyped five populations inferred from the principal component analysis. (A) East Asians including southern populations and with the West Eurasians; (B) East Asians without southern populations and without the West Eurasians. We next carried out the model-based ADMIXTURE clustering analysis. We observed the lowest CV error at K = 5. We then made the visualization of the result at K = 5 with five colors (Figure 4): The red component was primarily enriched in West Eurasians; the blue component was largely shown in the Mongolic and Tungusic speaking populations; the orange component was mainly detected in the Tibetan groups; the green component was largely presented in Austronesian speaking populations; the purple component was mainly enriched in some southern groups in East Asia. Our newly reported Hui, Dongxiang, Bonan, Yugur, and Salar samples harbored large orange and purple ancestral component related to East Asia and a part of red ancestral component related to the West Eurasia. The ancestry assignment was consistent with previous PCA analysis.
FIGURE 4

ADMIXTURE analysis result visualization at K = 5 as the corresponding cross-validation error was the lowest. And our studied populations in Gansu were marked by red color.

ADMIXTURE analysis result visualization at K = 5 as the corresponding cross-validation error was the lowest. And our studied populations in Gansu were marked by red color. In the following, we calculated the pairwise Fst values for our studied populations in Gansu province together with reference populations in Eurasia and constructed a phylogenetic tree (Figure 5). In this phylogenetic tree, our newly reported groups in Gansu province clustered closely with the surrounding Altaic speaking populations in northern China. Notablely, The Yugur group clustered together with Tibetans from Xunhua and Gannan and Tu.
FIGURE 5

Phylogenetic tree among our studied populations in Gansu and reference populations in Eurasia. Our samples in Gansu province were marked with red color.

Phylogenetic tree among our studied populations in Gansu and reference populations in Eurasia. Our samples in Gansu province were marked with red color. Next, we characterized the finer-scale population structure of our studied groups in Gansu by the haplotype-based fineSTRUCTURE. The inferred polygenetic tree based on the linked coancesty matrix showed all populations were clustered well according to geographical positions and language classification. Overall, our studied population clustered with published Mongolic speakers and Turkic speakers Kazakh in China, forming the major branch that also included Han, Tibetan, and Mongolia of China. Our Yugur_Gansu population formed relatively sporadic and formed serval small branches, even one individual clustered with published Yugur (Figure 6A). In addition, Hui people clustered with Bonan, Dongxiang, Salar, Yugur groups. Heatmap (Figure 6B) and the corresponding clustering patterns showed five major clusters, the Sino-Tibetan-Mongolic cluster included Chinese Mongolic populations in northwestern China, Tibetan and Han populations, our studied populations the larger amount of haplotype sharing among those populations.
FIGURE 6

The heat map of sharing haplotypes and clustering dendrogram by fineSTRUCTURE. DX = Dongxiang. (A) the dendrogram. (B) the heat map of sharing haplotypes.

The heat map of sharing haplotypes and clustering dendrogram by fineSTRUCTURE. DX = Dongxiang. (A) the dendrogram. (B) the heat map of sharing haplotypes.

Continuity and Admixture of Populations by the Allele-Shared f- Statistics

In the following, we calculated the outgroup-f statistics in the form of f (X, Y; Mbuti) to quantify the population differentiation across East Asia. We showed the results in a heatmap plot (Figure 7). The larger value of the statistics indicated that the two groups shared more genetic drifts after the separation from an African outgroup. We found the majority of Han Chinese populations shared more alleles with each other and clustered together. The Mongolic and Tungusic populations (Ulchi, Nanai, Oroqen, Daur, Hezhen) also clustered together. Our studied populations Hui, Dongxiang, Bonan, Yugur, and Salar clustered together and shared more genetic drifts with Han Chinese populations than with Tibetan groups.
FIGURE 7

Heatmap results of the outgroup-f statistics of the form f (X, Y; Mbuti). The larger values indicated that they shared more genetic drifts. Here the Outgroup was Mbuti.

Heatmap results of the outgroup-f statistics of the form f (X, Y; Mbuti). The larger values indicated that they shared more genetic drifts. Here the Outgroup was Mbuti. In addition, we performed the admixture-f statistics in the form of f (Source1, Source2; Target) to explore the possible ancestral source populations for our studied populations in Gansu province. We observed the most significant negative signals when using the Neolithic Yellow River farming groups and the Bronze Age to Iron Age Steppe groups from West Eurasia and Central Asia as sources (Table 1), suggesting the gene flow from West Eurasia into northwest China.
TABLE 1

Admixture f3 statistics of the form (Source1, Source2; Target) with the lowest f3 values.

Source 1Source 2Targetf_3Std. errZSNPs
Kazakhstan_Andronovo.SGUpper_YR_LNHui−0.0103050.001639−6.28855271
Kazakhstan_Andronovo.SGUpper_YR_IAHui−0.0097010.001942−4.99553232
Kazakhstan_Kangju.SGShimao_LNHui−0.0092470.001029−8.99161279
Russia_Alan.SGWLR_LNHui−0.0092210.001209−7.63136872
Russia_Alan.SGYR_LBIAHui−0.009003−0.000714−12.609165864
Russia_Alan.SGWLR_LNDongxiang−0.014067−0.001139−12.348138809
CHB.SGAnatolia_NDongxiang−0.013755−0.000316−43.553172663
CHB.SGRussia_MLBA_SintashtaDongxiang−0.0135730.000324−41.928171061
Anatolia_NYR_LBIADongxiang−0.013525−0.000628−21.527170054
Kazakhstan_Kangju.SGShimao_LNDongxiang−0.013429−0.00097−13.847163617
Russia_MLBA_SintashtaMiaozigou_MNBonan−0.0105360.001382−7.62149714
Kazakhstan_Andronovo.SGUpper_YR_LNBonan−0.0104320.001621−6.43655987
Russia_Alan.SGWLR_LNBonan−0.0102980.001191−8.645138290
Kazakhstan_Kangju.SGShimao_LNBonan−0.010185−0.000996−10.226163067
CHB.SGAnatolia_NBonan−0.010179−0.000311−32.782172482
Kazakhstan_Andronovo.SGUpper_YR_LNYugur−0.0088960.001651−5.38855670
Russia_Alan.SGWuzhuangguoliangYugur−0.0088140.002125−4.14828513
Russia_MLBA_SintashtaMiaozigou_MNYugur−0.0086760.001367−6.34649569
Kazakhstan_Kangju.SGShimao_LNYugur−0.008430.001004−8.393162332
Kazakhstan_Andronovo.SGUpper_YR_IAYugur−0.0082790.001914−4.32653651
Kazakhstan_Andronovo.SGUpper_YR_LNSalar−0.0119220.001611−7.455593
Kazakhstan_Andronovo.SGUpper_YR_IASalar−0.0112970.001916−5.89753574
Kazakhstan_Kangju.SGShimao_LNSalar−0.011024−0.001022−10.781162165
Russia_Alan.SGWLR_LNSalar−0.0109660.001182−9.28137507
Russia_Alan.SGShimao_LNSalar−0.010849−0.000996−10.893165726
Admixture f3 statistics of the form (Source1, Source2; Target) with the lowest f3 values.

The Ancestry Inference of the Populations in Northwest China

We next carried out qpAdm analysis to infer the admixture proportions in our studied Gansu populations (Figure 8; Table 2). The eastern ancestral source populations we selected were the Yellow River farming groups from the Bronze Age to Iron Age, and the western ancestral source populations we selected were ancient populations of Andronovo and Alan cultures, since theyprovided the most significant negative admixture-f values. We used the following set of populations as outgroups: Mbuti, Russia_EBA_Yamnaya_Samara, Anatolia_N, Russia_MA1, Russia_Afanasievo, Mongolia_N_East, Ust_Ishim, Russia_Kostenki14, Iran_C_SehGabi. Our studied populations could be modeled by two-way admixture with the p-value > 0.05 at rank = 1. We estimated the genetic proportions of Russia_Andronovo related ancestry were 9.1 ∼ 11.8%, while the genetic proportions of YR_LBIA farming group related ancestry were 88.2 ∼ 90.9% in Hui, Bonan, Yugur, and Salar groups. Given the pair groups consisting of Late Neolithic farmers in West Liao River (WLR_LN) and Iron Age Alan people in Russia (Russia_Alan) in admixture f showed the most significant admixture signal, we found that the Dongxiang group derived 14.9% western Eurasian ancestry from Russia_Alan related groups and the left from WLR_LN related groups. In general, the qpAdm model indicated the west-east admixture in our five studied populations, showing East Asian related ancestry dominantly made contribution to the genetic formation of Northwest Chinses Altaic speaking groups with different proportions of West Eurasian related ancestry.
FIGURE 8

qpAdm based admixture models for the populations in our study in Gansu province. The 2-way admixture models for our Gansu samples were presented when the p values >0.05 at the rank = 1. (A) Hui, Bonan, Yugur, Salar ethnic groups. (B) Dongxiang ethnic group.

TABLE 2

Two-way qpAdm models of studied populations in Gansu.

Studied populationProportionStd. errProportionStd. err p value
Russia_Andronovo.SGYR_LBIA
Hui0.0910.0070.9090.0070.206
Bonan0.1090.0070.8910.0070.0536
Yugur0.1110.0070.8890.0070.546
Salar0.1180.0070.8820.0070.098
Russia_Alan.SGWLR_LN
Dongxiang0.1490.0110.8510.0110.304
qpAdm based admixture models for the populations in our study in Gansu province. The 2-way admixture models for our Gansu samples were presented when the p values >0.05 at the rank = 1. (A) Hui, Bonan, Yugur, Salar ethnic groups. (B) Dongxiang ethnic group. Two-way qpAdm models of studied populations in Gansu.

Y Chromosomal and MtDNA Haplogroup Assignment

We assigned the haplogroups of Y chromosome and mtDNA for our newly genotyped samples (Table 3). The haplogroup R1a1a1b2 was the most frequent patrilineal lineage in the Hui, Bonan, and Salar groups. We also detected haplogroup D1a1a1a1a2a∼, H1a1a1a, J2a1a, J2a1h2b, J2a2, N1a2b3, O2a2a1a2a1a, and O2a2b1a1a6b in our Hui samples. Haplogroup D1a1a1a1a2a∼ and O2a2b1a1a were also found in Bonan group. The haplogroup O1b1a1a1b2 was also presented in Salar group. Haplogroup J2a1h2, which was mostly found in the Middle East, was the most prevailing lineage in Dongxiang people. We also found D1a1a1a2, O2a2b1a1a6, and R2a2 in the Dongxiang group. Haplogroups C2b1a1, D1a1a1a1a2a∼, O2a2b1a2a1a2, O2a2b1a2b2, and Q1b2b1b2b2∼ were the prevalent lineages in the studied Yugur group. The distribution of Y haplotype indicated the influence of westward expansion of several ancestral sources in genetic formation of Northwest Chinese Altaic populations, including West Eurasian, Sino-Tibetan, common ancestor of Altaic related ancestry.
TABLE 3

The Y-chromosome haplogroups distribution of our studied populations.

Y HaplogroupFrequency
HuiD1a1a1a1a2a∼0.100
H1a1a1a0.100
J2a1a0.100
J2a1h2b0.100
J2a20.100
N1a2b30.100
O2a2a1a2a1a0.100
O2a2b1a1a6b0.100
R1a1a1b20.200
DongxiangD1a1a1a20.133
E1b1a1a1a2a1a3b1a10b∼0.067
J2a1h20.200
J2a20.067
L1a2a1b2∼0.067
N1a1a1a1a3a2a∼0.067
N1a3∼0.067
O2a2b1a1a60.133
R1a1a1b20.067
R2a20.133
BonanC2b1a2a2a∼0.0625
D1a1a1a1a2a∼0.125
D1a2a1∼0.0625
J2a1h20.0625
N1a2b3a∼0.0625
O1b1a1a1a1b1b0.0625
O1b1a1a1a20.0625
O2a2b1a1a0.125
O2a2b1a2a1d0.0625
Q1b1a3b1a1∼0.0625
Q2a1c1b1∼0.0625
R1a1a1b20.1875
YugurC2b1a10.133
C2b1a3b∼0.067
D1a1a1a1a2a∼0.133
D1a1a1a20.067
O2a2b1a1a60.067
O2a2b1a2a1a20.133
O2a2b1a2b20.133
O2a2b2a2a10.067
Q1b1a3a∼0.067
Q1b2b1b2b2∼0.133
SalarI2a2a1b2a1b1b2a2∼0.091
J2a10.091
N1b2a2∼0.091
O1b1a1a1b20.182
O2a1a1b1a20.091
O2a1c1a1a1a1a1b1a∼0.091
O2a2b1a2a1a1a10.091
R1a1a1b20.273
The Y-chromosome haplogroups distribution of our studied populations. We next assigned the matrilineal mtDNA haplogroups for our studied populations. In the Hui group, we observed diverse mtDNA haplogroups, including D4, D5a2a1, F1, G3a1′2, M7, M8, Z3, and Z4. The maternal profile of Dongxiang group was similar to that in the Hui group, but the haplogroup A, B4, and F2 were more prevalent in Dongxiang. We found D4 was the most dominant lineage in Bonan group and we also detected B and G2a in Bonan group. Haplogroup D4 was also the most dominant haplogroup in Yugur group, following by A1, C4, F1g, and M9a1 haplogroups. Haplogroup A was the most prevailing haplogroup in the Salar group, following by F1, M9a1b1, and Z3. The main mtDNA haplogroups in our samples were also prevalent in East Asia, suggesting the local East Asians largely contributed to the maternal gene pool of Gansu Altaic speaking populations. The genetic influence from the West Eurasian human populations were more significant in the patrilineal lineages than in the matrilineal lineages. The details of the distribution of mtDNA haplogroups were listed in Table 4.
TABLE 4

The mtDNA haplogroups distribution for our studied populations.

HuiDongxiangBonanYugurSalar
HaplogroupFrequencyHaplogroupFrequencyHaplogroupFrequencyHaplogroupFrequencyHaplogroupFrequency
A160.041666667A0.06666667A0.03333333A10.1A0.115385
B4c1b2c0.041666667A10.1B40.06666667A6b0.033333333A180.038462
B5b20.041666667A6b0.03333333B50.06666667B4a30.033333333A5b1b0.038462
C4d0.041666667B40.1B6a0.03333333C40.1A8a0.038462
D40.083333333C5d20.03333333C40.1D40.4B4b1a2a0.038462
D50.041666667D40.1C5b1b0.03333333D5a2a10.033333333C4d0.038462
D5a2a1a10.04166667D50.06666667D40.2F1g0.1D40.076923
F10.125F10.06666667F1g0.03333333M9a1a1c1b1a0.066666667F10.115385
F4a20.041666667F20.1G20.1M9a1b10.033333333F3a10.038462
G3a1′20.083333333F4b0.03333333H0.06666667R9b1a30.033333333G1a10.038462
M70.125H150.03333333M10a1a1b0.03333333U4b1a1a10.033333333G2a0.076923
M80.083333333H50.06666667M7b1a0.03333333U7a0.033333333H7b10.038462
N9a20.04166667M7b1a1a30.03333333M80.06666667M11a20.038462
Z30.083333333M80.06666667M9a1a1c1a0.03333333M21b0.038462
Z40.083333333T2a1a0.03333333X2b40.03333333M9a1b10.115385
X20.03333333Z3a0.06666667Z30.115385
Y1b1a0.03333333
The mtDNA haplogroups distribution for our studied populations.

The Admixture Time Estimation for the Populations in Northwest China

We estimated the admixture time between the East and West Euraisan related ancestry in Northwest Chinese populaton using the weighted linkage disequilibrium-based admixture inference implemented in ALDER (Loh et al., 2013). We used Han_HGDP and Sardinian as two ancestral surrogates to calculated the east-west admixture time and listed the results in Table 5. The average admixture time calculated by the 2-ref weighted LD for our five studied populations ranged from 25 to 31 generations, which was approximately 750–930 years before present assuming 30 years one generation (Table 5). The east-west interactions were suggested to have occurred in about the Song and Yuan Dynasty of China.
TABLE 5

The admixture time estimation by ALDER for our studied populations.

Population1-Ref weighted LD with weights Sardinian (generation)Z-score1-Ref weighted LD with weights Han_HGDP (generation)Z-score2-Ref weighted LD with weights Sardinian and Han_HGDP (generation)Z-score
Hui34.98 ± 4.208.3297.35 ± 35.922.7131.36 ± 3.279.58
Dongxiang28.71 ± 2.6011.0340.77 ± 7.405.5126.73 ± 2.6110.24
Bonan33.21 ± 2.4213.72--26.08 ± 2.5010.42
Yugur33.65 ± 4.537.42--25.32 ± 3.816.65
Salar25.70 ± 3.647.0733.74 ± 11.912.8324.77 ± 3.836.47
The admixture time estimation by ALDER for our studied populations. We further performed haplotype-based GLOBETROTTER to obtain the admixture landscaped of our studied northwestern Chinese populations (Table 6). The east-west admixture could be traced back to ∼21 to ∼25 generations ago (approximately ∼630–750 years ago assuming 30 years one generations), with inferring western Eurasian related ancestry represented by English ranging from 16 to 24%, coinciding with the results from ALDER. In addition, we observed the minor southern population admixture in Hui, Yugur and Salar (0.2, 0.06, and 0.04, respectively).
TABLE 6

The admixture events of our studied populations by GLOBETROTTER.

Recipient.PopulationModelGen.1dateProportion.source1Bestmatch.event1.source1Bestmatch.event1.source2Proportion.event2.source1Bestmatch.event2.source1Bestmatch.event2.source2MaxR2fit.1dateFit.quality.1eventFit.quality.2eventsGen.2dates.date1Gen.2dates.date2Proportion.date1.source1Bestmatch.date1.source1Bestmatch.date1.source2Proportion.date2.source1Bestmatch.date2.source1Bestmatch.date2.source2MaxScore.2events
Hui_Gansu1-DATE24.888017030.18EnglishHan_NChina0.2Kinh_VietnamSalar_Gansu0.9201028250.999969660.9999965171.00000432724.445149190.42AtayalSalar_Gansu0.18EnglishHan_NChina0.111809238
Dongxiang_Gansu1-DATE21.430936940.24EnglishHan_NChina0.41MongolDongxiang0.9436710530.9999968810.9999996121.00002345926.761531430.13TurkmenTu0.24EnglishHan_NChina0.154655956
Bonan_Gansumultiple-dates24.952477580.19EnglishHan_NChina0.49YugurBonan0.9182129590.9999690010.99999798.27027773330.573663680.34Uyghur.DGBonan0.18EnglishHan_NChina0.478201465
Yugur_Gansu1-DATE23.475100820.16EnglishYugur0.06AtayalTibetan_Lhasa0.8842533470.9999912890.99999897411.7111950844.603204050.07EnglishTu0.13EnglishYugur0.073485128
Salar_Gansu1-DATE20.802955570.2EnglishHan_NChina0.04AtayalHui_Gansu0.9433659420.99999999110.7407022543.59907880.07EnglishHui_Gansu0.16EnglishHan_NChina0.176636782
The admixture events of our studied populations by GLOBETROTTER.

Discussion

The East Asia is a region with diverse culture communications, multiple language interactions, and complex population history. Many previous studies provided that the genetic substructure of populations in East Asia was consistent with the language affinities. The Hexi Corridor and its surrounding regions were known for the famous Majiayao civilization in the middle and late Neolithic Age and subsequently controlled by the Rong-Di tribes before the Han Dynasty. Moreover, the Northwest China witnessed the intersection of the eastward expansion of the barley and wheat agriculture and the westward expansion of the millet agriculture in the Neolithic to Bronze Age. Gansu province isone of the key regions in Northwest China where also connects the Hexi Corridor and Tibetan-Yi Corridor. The genetic diversity, fine-scale genetic substructure, and the western Eurasian admixture in the populations of Gansu are still needed to be fully explored. We collected 140 modern individuals from Hui, Dongxiang, Bonan, Yugur, and Salar groups from the Gansu province and genotyped with genome-wide SNPs. We reconstructed the population admixture history of the Altaic speaking populations in northwest China. Our studied populations of Northeast China showed similar genetic profile among those populations, suggesting the relatively genetic homogeneity in Northwest China, even though harboring subtle different proportions of East, and West Eurasian related ancestry. The close genetic affinity among Chinese Turkic speakers, Tungusic, and Mongolic populations indicated the probability of common ancestor of Altaic speakers. Our results showed that both West and East Eurasian contributed the genetic formation of Altaic populations in Northwest China, which coinciding with previous studies suggested the east-west admixture in Alatic populations and Hui population (Xu and Jin, 2008; Bai et al., 2018; Jeong et al., 2019; Zhao et al., 2020; Ma et al., 2021). The closer genetic relationship between our studied population and Sino-Tibetan populations and the results of qpAdm and GLOBETROTTER suggested the majority contributing East Eurasian ancestry might derived from millet farmers in Yellow River Basin related population. The eastward expansion of Bronze Age West Steppe nomadic groups limitedly impacted the gene pool of the East Eurasian. The five studied Altaic speaking groups were suggested to harbored the lower proportion of Middle and Late Bronze West Steppe pastoralists represented by Andronovo culture. This was also supported by the high frequencies of Y chromosomal haplogroup R1a1a1b2 which prevailed Middle and Late Bronze Age Steppe populations in Hui, Bonan, and Salar groups (Narasimhan et al., 2019). The genetic admixture from West Eurasians was probably driven by male dominant migration which showing the higher frequencies of West Eurasian related paternal Y chromosome lineages and the absence of maternal mtDNA lineage related to West Eurasian. The paleogenomic studies exhibited the most complex pattern of male-biased admixture in the demographic dynamics of the East Steppe (Jeong et al., 2020). Considering that the West Eurasian related ancestry proportions were limited in our studied populations (<15%), we noted that it was hard to determine the exact genetic source for the admixture. The sequencing of more ancient genomes from Northwest China may shed more light on determining the West Eurasian sources. We estimated the admixture event to have occurred in historic period based on ALDER and GLOBETROTTER (approximately dating to ∼750–930 years ago, ∼630–750 years ago, respectively). The ancient admixture we identified was roughly corresponding to the Song to Yuan Dynasty. But we noted if the admixture did not happen immediately after arrival or multiple times over an extended period, however, the true start of admixture would have been more ancient. Furthermore, the intensive and continuous contact between West and East Eurasian population started as early as the Bronze Age due to the advantage of horses, and the interaction became more frequent with the opening of Silk Road in the Han Dynasty. The establishment of Mongols empire and the Mongolian Conquests in the 13th and 14th centuries facilitated the west-east contacts. The true admixture history in Northwest China could be more complex than the simplified models as we presented in this study, the populations studied here, however, harbored prominent local East Eurasian related ancestry and limited West Eurasian related ancestry. Running through the ancient Silk Road, the human groups were all presented a west-east admixture structure. The Uyghur in Xinjiang was a typical one. Besides, the Altaic speaking populations in Central Asia all have the west-east interactions in genetic structure and culture. The east endpoint of the ancient Silk Road was near Chang’an City, and the Gansu pathway was the only route to it. The Altaic populations in this region lack of large-scale sampling and genome-wide genetic analysis. Our research answered this issue at a certain degree, but the more elaborate admixture history needed to be explored from the whole genome sequencing next.
  82 in total

1.  Phylogeography of Y-chromosome haplogroup Q1a1a-M120, a paternal lineage connecting populations in Siberia and East Asia.

Authors:  Na Sun; Peng-Cheng Ma; Shi Yan; Shao-Qing Wen; Chang Sun; Pan-Xin Du; Hui-Zhen Cheng; Xiao-Hua Deng; Chuan-Chao Wang; Lan-Hai Wei
Journal:  Ann Hum Biol       Date:  2019-07-10       Impact factor: 1.533

2.  Cultural variation in communal versus exchange norms: Implications for social support.

Authors:  Joan G Miller; Hiroko Akiyama; Shagufa Kapadia
Journal:  J Pers Soc Psychol       Date:  2017-02-27

3.  The exchange and use of cultural and social capital among community health workers in the United States.

Authors:  Jarron M Saint Onge; Joanna Veazey Brooks
Journal:  Sociol Health Illn       Date:  2020-11-19

4.  Genetic substructure and admixture of Mongolians and Kazakhs inferred from genome-wide array genotyping.

Authors:  Jing Zhao; Jin Sun; Ziyang Xia; Guanglin He; Xiaomin Yang; Jianxin Guo; Hui-Zhen Cheng; Yingxiang Li; Song Lin; Tie-Lin Yang; Xi Hu; Hua Du; Peng Cheng; Rong Hu; Gang Chen; Haibing Yuan; Xiu-Fang Zhang; Lan-Hai Wei; Hu-Qin Zhang; Chuan-Chao Wang
Journal:  Ann Hum Biol       Date:  2020-11-23       Impact factor: 1.533

5.  Paleolithic genetic link between Southern China and Mainland Southeast Asia revealed by ancient mitochondrial genomes.

Authors:  Fan Bai; Xinglong Zhang; Xueping Ji; Peng Cao; Xiaotian Feng; Ruowei Yang; Minsheng Peng; Shuwen Pei; Qiaomei Fu
Journal:  J Hum Genet       Date:  2020-07-11       Impact factor: 3.172

6.  Y-Chromosome evidence for a northward migration of modern humans into Eastern Asia during the last Ice Age.

Authors:  B Su; J Xiao; P Underhill; R Deka; W Zhang; J Akey; W Huang; D Shen; D Lu; J Luo; J Chu; J Tan; P Shen; R Davis; L Cavalli-Sforza; R Chakraborty; M Xiong; R Du; P Oefner; Z Chen; L Jin
Journal:  Am J Hum Genet       Date:  1999-12       Impact factor: 11.025

7.  Impacts of sugarcane agriculture expansion over low-intensity cattle ranch pasture in Brazil on greenhouse gases.

Authors:  Camila Bolfarini Bento; Solange Filoso; Leonardo Machado Pitombo; Heitor Cantarella; Raffaella Rossetto; Luiz Antonio Martinelli; Janaina Braga do Carmo
Journal:  J Environ Manage       Date:  2017-12-07       Impact factor: 6.789

8.  Phylogenetic Placement of Isolates Within the Trans-Eurasian Clade A.Br.008/009 of Bacillus anthracis.

Authors:  Markus Antwerpen; Wolfgang Beyer; Olga Bassy; María Victoria Ortega-García; Juan Carlos Cabria-Ramos; Gregor Grass; Roman Wölfel
Journal:  Microorganisms       Date:  2019-12-12

9.  The deep population history of northern East Asia from the Late Pleistocene to the Holocene.

Authors:  Xiaowei Mao; Hucai Zhang; Shiyu Qiao; Yichen Liu; Fengqin Chang; Ping Xie; Ming Zhang; Tianyi Wang; Mian Li; Peng Cao; Ruowei Yang; Feng Liu; Qingyan Dai; Xiaotian Feng; Wanjing Ping; Chuzhao Lei; John W Olsen; E Andrew Bennett; Qiaomei Fu
Journal:  Cell       Date:  2021-05-27       Impact factor: 41.582

10.  Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History.

Authors:  Siyang Liu; Shujia Huang; Fang Chen; Lijian Zhao; Yuying Yuan; Stephen Starko Francis; Lin Fang; Zilong Li; Long Lin; Rong Liu; Yong Zhang; Huixin Xu; Shengkang Li; Yuwen Zhou; Robert W Davies; Qiang Liu; Robin G Walters; Kuang Lin; Jia Ju; Thorfinn Korneliussen; Melinda A Yang; Qiaomei Fu; Jun Wang; Lijun Zhou; Anders Krogh; Hongyun Zhang; Wei Wang; Zhengming Chen; Zhiming Cai; Ye Yin; Huanming Yang; Mao Mao; Jay Shendure; Jian Wang; Anders Albrechtsen; Xin Jin; Rasmus Nielsen; Xun Xu
Journal:  Cell       Date:  2018-10-04       Impact factor: 66.850

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.