| Literature DB >> 33170928 |
Yun-Hua Lo1, Hsueh-Chien Cheng1, Chia-Ni Hsiung2, Show-Ling Yang2, Han-Yu Wang1, Chia-Wei Peng1, Chun-Yu Chen1, Kung-Ping Lin1, Mei-Ling Kang1, Chien-Hsiun Chen2, Hou-Wei Chu2, Chiao-Feng Lin3, Mei-Hsuan Lee4, Quintin Liu5, Yoko Satta5, Cheng-Jui Lin6, Marie Lin6, Shu-Miaw Chaw7, Jun-Hun Loo6, Chen-Yang Shen2, Wen-Ya Ko1.
Abstract
The Taiwanese people are composed of diverse indigenous populations and the Taiwanese Han. About 95% of the Taiwanese identify themselves as Taiwanese Han, but this may not be a homogeneous population because they migrated to the island from various regions of continental East Asia over a period of 400 years. Little is known about the underlying patterns of genetic ancestry, population admixture, and evolutionary adaptation in the Taiwanese Han people. Here, we analyzed the whole-genome single-nucleotide polymorphism genotyping data from 14,401 individuals of Taiwanese Han collected by the Taiwan Biobank and the whole-genome sequencing data for a subset of 772 people. We detected four major genetic ancestries with distinct geographic distributions (i.e., Northern, Southeastern, Japonic, and Island Southeast Asian ancestries) and signatures of population mixture contributing to the genomes of Taiwanese Han. We further scanned for signatures of positive natural selection that caused unusually long-range haplotypes and elevations of hitchhiked variants. As a result, we identified 16 candidate loci in which selection signals can be unambiguously localized at five single genes: CTNNA2, LRP1B, CSNK1G3, ASTN2, and NEO1. Statistical associations were examined in 16 metabolic-related traits to further elucidate the functional effects of each candidate gene. All five genes appear to have pleiotropic connections to various types of disease susceptibility and significant associations with at least one metabolic-related trait. Together, our results provide critical insights for understanding the evolutionary history and adaption of the Taiwanese Han population.Entities:
Keywords: Taiwanese Han; adaptation; admixture; ancestry; natural selection
Mesh:
Year: 2021 PMID: 33170928 PMCID: PMC8476137 DOI: 10.1093/molbev/msaa276
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Inferred genetic ancestries in the Sino-Tibetan people and their neighboring populations in East Asia. (A) Admixture results for the Sino-Tibetan and their neighboring populations. Each individual is indicated by a vertical line, which is subdivided into K (=13) colored segments, where K is the number of ancestral populations assumed in the analysis. The y-axis represents the estimated ancestry proportions. Ethnicity names are labeled on the x-axis. The abbreviations of all populations are given in supplementary table S4, Supplementary Material online. The ADMIXTURE analysis was performed across 99 Asian populations for a total of 19,290 SNPs in 2,304 individuals, but only the Sino-Tibetan, and several neighboring populations from Altaic, Turkic, Tungusic, Mongolic, Koreanic, and Japonic linguistic groups as well as two Taiwanese Austronesian populations—Ami (AX-AM) and Atayal (AX-AT) are shown. (B) Average proportions of ancestry of these populations. (C) Geographic distributions of the four major ancestries of the Taiwanese Han are shown for the populations with average proportions ≥0.35 in each ancestry. The genetic ancestries for the remaining populations are provided in supplementary figure S1, Supplementary Material online.
Fig. 2.Geographic distributions of populations with admixed signature with the Taiwanese Han or Chinese Han, and the inferred ancestries of 14,401 individual genomes in the Taiwan Biobank. (A) Geographic distribution of populations in the F3 tests—F3(Taiwanese Han; Ami, pop), where the Taiwanese Han (recipient population) is labeled in blue and the Ami (donor population) is labeled in green to represent the ISEA ancestry. (B) Geographic distribution of populations in the F3 tests—F3(Chinese Han; CN-HM, pop), where the recipient population is the Chinese Han, whereas the donor population is Chinese Hmong (CN-HM) to represent the Southeastern ancestry. (C) Admixture results across 52 populations for a total of 101,959 SNPs (after removing all populations from the Pan-Asia SNP data set).
F4 Test of Population Mixture for the Sino-Tibetan Speaking Populations.
| Pop1 (A) | Pop2 (B) | Pop3 (C) | Pop4 (D) |
|
|
|
|
|---|---|---|---|---|---|---|---|
| Yoruba | AX-AM | SG-CH | TWB | 0 | 0.49 | 0.49 | 0.49 |
| Yoruba | AX-AM | CN-GA | TWB | 0.0011 | 0.22 | 0.22 | 0.24 |
| Yoruba | AX-AM | Tujia | TWB | 0.0062 | 3.2 | 0.00078 | 0.0010 |
| Yoruba | AX-AM | Han | TWB | 0.0066 | 5.3 | 7.0 × 10−8 | 1.0 × 10−7 |
| Yoruba | AX-AM | Lahu | TWB | 0.0144 | 5.8 | 4.2 × 10−9 | 7.3 × 10−9 |
| Yoruba | AX-AM | CHB | TWB | 0.01 | 7.7 | 6.8 × 10−15 | 1.4 × 10−14 |
| Yoruba | AX-AM | CN-SH | TWB | 0.015 | 8.8 | <6.8 × 10−15 | <1.4 × 10−14 |
| Yoruba | AX-AM | Yizu | TWB | 0.020 | 9.3 | <6.8 × 10−15 | <1.4 × 10−14 |
| Yoruba | AX-AM | Naxi | TWB | 0.023 | 9.3 | <6.8 × 10−15 | <1.4 × 10−14 |
| Yoruba | AX-AM | TH-KA | TWB | 0.026 | 11.7 | <6.8 × 10−15 | <1.4 × 10−14 |
| Yoruba | AX-AM | IN-TB | TWB | 0.0823 | 35.3 | <6.8 × 10−15 | <1.4 × 10−14 |
Note.—F4 was conducted by assuming F4(A, B; C, D) where the four populations are related by the unrooted population tree ((A, B), (C, D)). Population abbreviations are: AX-AM, Ami; SG-CH, Singapore Chinese; CN-GA, Chinese Cantonese; CHB, Chinese Han in Beijing; CN-SH, Chinese Han in Shanghai; and TH-KA, Thailand Karen.
Fig. 3.Genome-wide scans of signatures of recent positive selection for the Taiwanese Han people. The Manhattan plot demonstrates the |iHS| scores across 22 autosomes and the X chromosome for a total of 562,982 SNPs in 14,401 Taiwanese Han people. The threshold is set at the highest |iHS| score (4.18) of EDAR, a well-studied gene targeted by recent positive selection in the Han Chinese (Sabeti et al. 2007; Grossman et al. 2010; Kamberov et al. 2013). The selection-candidate SNP clusters are colored in red. Gene symbols of the selection-candidate loci are shown if the underlying candidate genes can be unambiguously identified by iSAFE.
List of Candidate Regions Targeted by Positive Selection in the Taiwanese Han Population.
| Chr | Site (Mb) | SNP| | | | Freq. | #SNP | Gene | SNPiSAFE | iSAFE |
|---|---|---|---|---|---|---|---|---|
| 2 | 80.2–80.5 | rs10496236 | 5.15 | 0.55 | 18 |
| rs17018689 | 0.18 |
| 2 | 108.8–109.8 | rs17034770 | 4.18 | 0.89 | 10 |
| rs1469965 | 0.72 |
| 2 | 141.5–141.9 | rs79810070 | 5.50 | 0.05 | 19 |
| rs17516755 | 0.20 |
| 2 | 163.8–164.3 | rs61158130 | 5.96 | 0.08 | 29 | Intergenic | rs10167931 | 0.21 |
| 3 | 13.6–14.3 | rs873853 | 6.16 | 0.56 | 10 | Multiple genes | rs17038710 | 0.09 |
| 4 | 18.8–19.6 | rs73803337 | 4.73 | 0.10 | 14 | Intergenic | rs1382157 | 0.16 |
| 5 | 111.3–111.8 | rs59969240 | 6.42 | 0.07 | 27 | Multiple genes | rs351772 | 0.40 |
| 5 | 122.6–123.1 | rs4572998 | 4.31 | 0.13 | 22 |
| rs6868518 | 0.31 |
| 6 | 28.5–33.1 | rs9262558 | 7.51 | 0.11 | 325 | — | — | |
| 6 | 83.0–83.7 | rs287848 | 4.55 | 0.31 | 43 | intergenic | rs992013 | 0.25 |
| 6 | 107.6–108.0 | rs79851990 | 5.94 | 0.17 | 15 |
| rs7767511 | 0.23 |
| 7 | 133.3–134.0 | rs9656434 | 4.91 | 0.49 | 19 |
| rs992013 | 0.15 |
| 9 | 3.9–4.1 | rs4741879 | 5.50 | 0.22 | 20 |
| rs72685692 | 0.11 |
| 9 | 119.0–119.3 | rs10983123 | 5.99 | 0.19 | 17 |
| rs888401 | 0.19 |
| 14 | 35.6–36.0 | rs10483453 | 9.56 | 0.13 | 25 | rs10144857 | 0.21 | |
| 15 | 72.8–73.8 | rs9806341 | 6.12 | 0.46 | 25 |
| rs8039418 | 0.41 |
Note.—“SNP” is the rs ID of the core SNP with the highest |iHS| score at a given candidate region. “Freq.” is the derived-allele frequency of the core SNP. “#SNP” is the number of SNPs whose |iHS| values ≥ 2.66 (top 1%). “SNPiSAFE”, the rs ID of the SNP with the highest iSAFE score. “iSAFE” is the value of iSAFE of each SNPiSAFE. “Chr” represents chromosome.
Fig. 4.Plots of |iHS| and iSAFE scores and LD heat maps of five selection-targeted genes in the Taiwanese Han population. The |iHS| and iSAFE scores were plotted against each of the five selection-candidate loci where the selection-targeted gene can be unambiguously identified (based on the iSAFE signals). In each iSAFE plot, the point size and color gradient represent C scores that were estimated to profile the degree of functional importance (deleteriousness) according to Kircher et al. (2014) and Rentzsch et al. (2019). The heat map demonstrates the pairwise estimates of LD. Each pixel represents a pairwise LD estimate using the squared correlation coefficient scaled by allele frequency (r2). All possible pairs of polymorphic sites were measured. Levels of LD ranging from 0 to 1 are illustrated according to a white to red color gradient. The physical position of each polymorphic site is marked by a black line segment above the heat map, which is aligned with the plot of gene structures (based on the GRCh37/hg19, UCSC genome browser). The plots for the remaining candidate loci are presented in supplementary figure S6, Supplementary Material online.
List of SNPs and Associated Metabolic-Related Traits in the Five Genes Targeted by Positive Natural Selection.
| Gene | SNP | Chr | Position | Trait | Reg. Cof. |
| iSAFE | Imp. |
|---|---|---|---|---|---|---|---|---|
|
| 2_80464202 | 2 | 80464202 | Albumin | −1.25 | 9.3 × 10−5 | — | 0.54 |
| rs554504577 | 2 | 80362623 | −0.91 | 0.0026 | — | 0.51 | ||
| rs17018689 | 2 | 80373740 | 0.043 | 0.071 | 0.19 | 0.99 | ||
|
| rs186045033 | 2 | 141638598 | LDLC | 0.20 | 3.3 × 10−5 | — | 0.87 |
| rs185095358 | 2 | 141631133 | 0.20 | 3.3 × 10−5 | — | 0.87 | ||
| rs144464547 | 2 | 141580213 | SGOT | −0.69 | 5.2 × 10−5 | — | 0.91 | |
|
| 5_123001857 | 5 | 123001857 | HbA1C | −1.09 | 3.0 × 10−5 | — | 0.81 |
| 5_122978454 | 5 | 122978454 | −1.09 | 3.0 × 10−5 | — | 0.89 | ||
| rs79451111 | 5 | 122983696 | TG | 1.19 | 3.8 × 10−5 | — | 0.52 | |
| rs6868518 | 5 | 122838766 | BUN | 0.021 | 0.069 | 0.31 | 1.00 | |
|
| rs564508867 | 9 | 119135159 | FBG | −0.69 | 1.0 × 10−4 | — | 0.73 |
| rs888401 | 9 | 119207606 | Albumin | −0.028 | 0.028 | 0.19 | 1.00 | |
| T-BIL | 0.024 | 0.054 | ||||||
|
| 15_73481424 | 15 | 73481424 | SGOT | 0.58 | 1.2 × 10−5 | — | 0.85 |
| rs146077526 | 15 | 73424172 | 0.32 | 2.4 × 10−4 | — | 0.93 | ||
| 15_73587033 | 15 | 73587033 | Creatinine | 0.72 | 8.6 × 10−5 | — | 0.69 | |
| rs8039418 | 15 | 73441432 | BUN | 0.035 | 0.0023 | 0.41 | 0.99 | |
| UA | 0.019 | 0.053 |
Note.—Multiple linear regressions were conducted for the iSAFE peak region (iSAFE ≥ 0.1) in each of the five genes across 16 metabolic-related traits. The SNP with an iSAFE score is the SNP of the highest iSAFE for a given candidate gene. The listed traits are albumin (g/dl), low-density lipoprotein cholesterol (LDLC, g/dl), serum level of aspartate aminotransferase (SGOT, U/l), hemoglobin A1c (HbA1C, %), triglyceride (TG, mg/dl), blood urea nitrogen (BUN, mg/dl), fasting blood glucose (FBG, mg/dl), total bilirubin (T-BIL, mg/dl), creatinine (mg/dl), and uric acid (UA, mg/dl). “Chr” represents chromosome. “Reg. cof.” represents regression coefficient. “Imp.” represents imputation posterior probability.
Fig. 5.Multiple linear regression analyses of metabolic-related traits for five selection-candidate genes in the Taiwanese Han population. For each trait, significant variants (highlighted in red) were identified based on a LD-based clumping method (Purcell et al. 2007) within the selection-targeted region (colored in dark gray). The plots for the remaining metabolic traits are presented in supplementary figure S7, Supplementary Material online.