| Literature DB >> 28619099 |
Chao Zhang1,2, Yan Lu1, Qidi Feng1,2, Xiaoji Wang1,2,3, Haiyi Lou1, Jiaojiao Liu1,2,3, Zhilin Ning1,2, Kai Yuan1,2, Yuchen Wang1,2, Ying Zhou1,2, Lian Deng1,2, Lijun Liu4, Yajun Yang5, Shilin Li5, Lifeng Ma4, Zhiying Zhang4, Li Jin5, Bing Su6, Longli Kang4, Shuhua Xu7,8,9,10.
Abstract
BACKGROUND: The genetic relationships reported by recent studies between Sherpas and Tibetans are controversial. To gain insights into the population history and the genetic basis of high-altitude adaptation of the two groups, we analyzed genome-wide data in 111 Sherpas (Tibet and Nepal) and 177 Tibetans (Tibet and Qinghai), together with available data from present-day human populations.Entities:
Keywords: Gene flow; High-altitude adaptation; Next-generation sequencing; Population history; Sherpa; Tibetan
Mesh:
Year: 2017 PMID: 28619099 PMCID: PMC5472941 DOI: 10.1186/s13059-017-1242-y
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Culturally defined regions of historical Tibet and geographical locations of Sherpa and Tibetan samples analyzed in this study. The culturally defined regions in historical Tibet are illustrated in different colors: red, Ü-Tsang (central Tibet); green, Kham (eastern Tibet); and purple, Amdo (northeastern Tibet). Dots with distinct colors represent subgroups classified according to the collected geographical locations: blue for two SHP subpopulations; and non-blue for regional Tibetans. The locations of Amdo and Kham regions make Tibetans there more easily influenced by cultures and genetics from East Asians or Central Asians/Siberians. The figure was modified from one obtained from Wikipedia (https://en.wikipedia.org/wiki/Kham)
Fig. 2Principal component analysis (PCA) for SHP, TBN, and their subgroups. PCA of a SHP and TBN within the context of some East Asians, b SHP and TBN, c SHP subgroups, and d TBN subgroups. Subgroups are classified according to their geographic locations. Numbers in parentheses denote variance explained by each principal component (PC). Note that three outliers in d (one in TBN.Nyingchi and two in TBN.Shigatse) were removed when we drew the figure
Fig. 3Panel 2 dataset-based results of genetic admixture when assuming K = 6. Each individual is represented by a single line broken into K = 6 colored segments, with lengths proportional to the K = 6 inferred clusters. Results for all SHP and TBN are further summarized and displayed in the two large pie charts in the center of the circle plot with component proportion denoted as percentage. Proportions of each genetic component for SHP and TBN subgroups are summarized in the small pie charts with their proportions listed below. TC Tibetan major component, SC SHP major component, EAC East Asian major component, SC1 Central Asian/Siberian major component 1, SC2 Central Asian/Siberian major component 1, SAC South Asian major component
Fig. 4Evidence of gene flow between SHP and TBN subgroups. We performed f 3 tests to detect gene flow events from the TBN subgroup to SHP subgroup (Additional file 1: Figure S23), SHP subgroup to TBN subgroup (Additional file 1: Figure S24), and within SHP subgroups (Additional file 1: Figure S25). The f 3 statistics were significantly negative (with Z score ≤3) for: a f 3(SHP.Zhangmu; TBN.Shigatse, X) when X was assumed as a South Asian population; b f 3(TBN.Nyingchi; SHP.Khumbu, X) when X is an East Asian population; and c f 3(SHP.Zhangmu; SHP.Khumbu, X) when X was South Asians and some Central Asians/Siberians. Results provide evidence for gene flow from South Asians and Nepalese Sherpas to Chinese Sherpas, and from East Asians and Nepalese Sherpas to Tibetans in Nyingchi. **Significantly negative value with Z scores ≤3; *score of 3 < Z ≤ 2. Highlander subgroups are highlighted with red fonts and blue arrows
Summary of population samples and data used in this study
| Population | Number of samples | Number passing QC | Platform | Collected region | Altitude (m) | Source | Symbol | Panel |
|---|---|---|---|---|---|---|---|---|
| Tibetan | 31 | 31 | Affy 6.0 | Qinghai (31) | ~4350 | Simonson et al. [ | TBN.Qinghai (42) | 1, 2 |
| Tibetan | 50 | 49 | Affy 6.0 | Lhasa (20), Shigatse (18), Qinghai (11) | >3000 | Peng et al. [ | ||
| Tibetan | 69 | 64 | Affy 6.0 | Lhasa (10), Chamdo (9), Nyingchi (9), Shannan (9) and Shigatse (25) | >3000 | Xu et al. [ | ||
| Sherpa | 61 | 55 | Affy 6.0 | Zhangmu Town, Shigatse (55) | ~3400 | This study | SHP.Zhangmu (55) | 1, 2 |
| Sherpa | 2 | 2 | NGS | Solo-Khumbu region, Nepal (2) | ~3800 | Jeong et al. [ | SHP.Khumbu (2) | 1, NGS panel |
| Sherpa | 69 | 49 | Illumina HO-Q | Solo-Khumbu region, Nepal (49) | ~3800 | Jeong et al. [ | SHP.Khumbu (49) | 2 |
| Sherpa | 5 | 5 | NGS | Zhangmu Town, Tibet. (5) | ~3400 | Lu et al. [ | SHPseq (5) | NGS panel |
| Tibetan | 33 | 33 | NGS | Lhasa (3), Chamdo (6), Nagqu (3), Nyingchi (2), Shannan (7), and Shigatse (12) | >3000 | Lu et al. [ | TBNseq (33) | NGS panel |
| HAN Chinese | 39 | 39 | NGS | Diverse region in China (39) | <2500 | Lu et al. [ | HANseq (39) | NGS panel |
| Indian | 7 | 7 | NGS | Diverse region in South Asia | <2500 | Chambers et al. [ | IND | NGS panel |
| 203 worldwide populations | 2345 | 2345 | Affy HumanOri | Worldwide regions (2345) | - | Patterson et al. [ | Followed the original paper | 1, 2 |
| Tibetan | 118 | 118 | SNaPshot | Six prefectures in Tibet | >3000 | This study | - | Target-genotyping panel |
| Sherpa | 78 | 78 | SNaPshot | Zhangmu Town, Tibet | ~3400 | This study | - | Target-genotyping panel |
Included are both our newly generated genomes and other previously published samples. We assigned four different panels for distinct investigations: panels 1 and 2 comprised SNP array data, except the Nepalese Sherpas; the NGS panel contained enrolled NGS genomes; and the Target-genotyping panel was used to validate allele frequencies of interesting SNPs by enlarging size. Subgroup symbols are classified according to their geographical locations (see also Fig. 1). Numbers in brackets are the counts of individuals after quality control with proportion of identity by descant (IBD) smaller than 3.5 and individual SNP missing rate less than 0.1. Abbreviations: Affy 6.0 Affymetrix Genome-wide Human SNP Array 6.0, Illumina HO-Q Illumina HumanOmni1-Quad beadchip, NGS next-generation sequencing, Affy HumanOri Affymetrix Axiom Genome-wide Human Origins 1 array
Fig. 5The historical effective population size (N e) and divergence time between SHP and TBN. Estimates of a N e and divergence time between b SHP.Zhangmu and others and c SHP.Khumbu and others using MSMC. The N e was estimated using autosomal sequences of two genomes (four haplotypes) for each population. Divergence time between each pair of populations was evaluated using autosomal sequences of four genomes, i.e., two individuals for each population. An autosomal mutation rate (μ Auto) with 1.25 × 10−8 per base-pair per generation and 25 years per generations (g) were used
Fig. 6A proposed model of demographic history of SHP and TBN. A simplified model for the origins and evolutionary history of Tibetans and Sherpas based on the observations and estimations from this study. GF gene flow, MRCA most recent common ancestor. Dashed lines indicate gene flow events and arrows denote directions. MRCA1, MRCA2, and MRCA3 are based on Fig. 5b. We inferred GF1 from the treemix results (Additional file 1: Figures S31 and S32) and the observation that both SHP (mainly for Chinese Sherpa) and TBN contain an East Asian genetic component (EAC) (Fig. 3). GF2 was based on the excess EAC in TBN compared to SHP (Fig. 3; Additional file 1: Figure S27). Based on the f 3 tests (Fig. 4b; Additional file 1: Figure S22) and the higher proportion of EAC in Kham and Amdo Tibetans (Fig. 3), we confirmed GF3. GF4 is based on Fig. 4b and Additional file 1: Figure S23 and the historical record that Sherpas migrated from the Kham region in eastern Tibet to Nepal within the last 300–400 years, possibly supporting the genetic contact between Khumbu Sherpas and Kham Tibetans. GF5 is based on the excess Sherpa genetic component in Ü-Tsang Tibetans compared to that in Kham and Amdo Tibetans (Fig. 3) and also on the results shown in Additional file 1: Figure S26. GF6 is based on Fig. 4a. The higher South Asian component in Chinese Sherpas compared to that in Nepalese Sherpas (Fig. 3) and the f 3 statistics (Fig. 4a) validated the presence of GF7. Population substructures in both SHP and TBN are based on PCA (Fig. 2), ADMIXTURE (Fig. 3), F ST (Additional file 1: Figures S4 and S5), outgroup f 3 tests (Additional file 1: Figures S8–S10), and D statistics (Additional file 1: Figures S28–S30). Estimates of MRCA1, MRCA2, and MRCA3 are based on Fig. 5b and Additional file 1: S34
Selected putatively adaptive genetic variants in SHP
| Chrom | Position | rsID | Ref | Alt | Ances | DAFSHPseq | DAFTBNseq | DAFHANseq | DAFTBN* | DAFSHP* | DAFSHPseq2 | DAFESA | DAFSAS | DAFAFR | DAFEUR | DAFAMR | CADD | GERP | Gene |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8 | 107691513 | rs28921397 | A | G | A | 0.5 | 0.0000 | 0.0000 | 0.0172 | 0.0789 | 0.0000 | 0.0029 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
|
|
|
|
| 8 | 108264111 | NeA | G | A | G | 0.5 | 0.0152 | 0.0000 | 0.0086 | 0.0724 | 0.0000 | - | - | - | - | - |
|
|
|
|
| 17 | 19645417 | NA | T | A | T | 0.4 | 0.0000 | 0.0000 | 0.0000 | 0.0987 | 0.0000 | - | - | - | - | - |
|
|
|
|
| 3 | 196921405 | rs527829647 | A | G | A | 0.3 | 0.0000 | 0.0000 | 0.0129 | 0.0789 | 0.0000 | 0.0011 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
|
|
|
|
| 12 | 117768315 | rs549340789 | G | A | G | 0.3 | 0.0000 | 0.0000 | 0.0086 | 0.0855 | 0.0000 | 0.0022 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
|
|
|
|
| 10 | 45956828 | rs3764990 | G | A | G | 0.4 | 0.0303 | 0.0256 | 0.0905 | 0.1974 | 0.2500 | 0.0528 | 0.0226 | 0.0143 | 0.0787 | 0.0293 |
|
|
|
|
| 7 | 21948010 | rs200891942 | A | G | A | 0.3 | 0.0152 | 0.0000 | 0.0129 | 0.1053 | 0.0000 | 0.0010 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
|
|
|
|
| 2 | 37232879 | rs2302657 | A | C | A | 0.3 | 0.0152 | 0.0128 | 0.0086 | 0.1067 | 0.2500 | 0.0325 | 0.0023 | 0.0000 | 0.0000 | 0.0000 |
|
|
|
|
| 2 | 109545691 | rs61761321 | T | C | T | 0.3 | 0.0152 | 0.1795 | 0.0345 | 0.1447 | 0.0000 | 0.0910 | 0.0023 | 0.0012 | 0.0000 | 0.0000 | 10.27 |
|
|
|
| 1 | 91405998 | rs149597385 | C | T | C | 0.3 | 0.0152 | 0.0000 | 0.0259 | 0.1184 | 0.0000 | 0.0100 | 0.0012 | 0.0000 | 0.0000 | 0.0000 |
|
|
|
|
| 1 | 156551628 | rs116035113 | G | T | G | 0.3 | 0.0152 | 0.0000 | 0.0474 | 0.1250 | 0.0000 | 0.0030 | 0.0010 | 0.0475 | 0.0125 | 0.0124 |
|
|
|
|
| 2 | 46707674 | rs116983452 | C | T | C | 0.4 | 0.2121 | 0.0256 | 0.7241 | 0.6389 | 1.0000 | 0.0207 | 0.0035 | 0.0000 | 0.0000 | 0.0000 | 11.18 |
|
|
|
| 1 | 231557623 | rs186996510 | G | C | G | 0.1 | 0.5910 | 0.0385 | 0.5500 | 0.4800 | 0.25 | 0.0100 | 0.0020 | 0.0000 | 0.0000 | 0.0020 | 14.73 |
|
|
|
Among these adaptive genetic variants (AGVs), ten (top 10) showed differences between SHP and TBN, and two (rs116983452 in TMEM247 near the EPAS1 region and rs186996510 in EGLN1) had similar derived allele frequencies (DAFs), suggesting that both distinct and shared genetic adaptations occurred between TBN and SHP. Conservation scores with CADD >15 and GERP >2 are highlighted in bold. NA denotes that the variant is novel and has no current rsID. The physical position of each site follows GRCh37. The p value for each candidate was estimated by simulation based on the demographic history of SHP.Zhangmu estimated by MSMC. Chrom chromosome, Ref reference, Alt alteration, Ances ancestral, ESA East Asians, SAS South Asians, AFR Africans, EUR Europeans, and AMR Americans
Fig. 7Example of a putatively functional adaptive variant. A novel missense variant (chr17: 19645417) located in ALDH3A1 was selected as an example. a The derived allele frequency (DAF) of this SNP in SHP and TBN was estimated based on the Target-genotyping panel (Tables 1 and 2). b Median-joining network of ALDH3A1 showing a Sherpa-specific haplogroup. Haplotypes consisted of the missense variant and 30 randomly selected shared variants between SHP and non-SHP residing at the ALDH3A1 region with minor allele frequency (MAF) larger than 5%. The derived allele is specific to SHP in the SHP-specific haplogroup. c Positive selection signals of extended haplotype homozygosity (EHH) and Integrated Haplotype Score (iHS). Analyses in b and c are based on 55 imputed genomes of Zhangmu Sherpas. d Functional consequences of the missense variant