| Literature DB >> 28448578 |
Hao Hu1, Nayia Petousi2, Gustavo Glusman3, Yao Yu1, Ryan Bohlender4, Tsewang Tashi5, Jonathan M Downie6, Jared C Roach3, Amy M Cole7, Felipe R Lorenzo5, Alan R Rogers4, Mary E Brunkow3, Gianpiero Cavalleri7, Leroy Hood3, Sama M Alpatty8, Josef T Prchal5,6, Lynn B Jorde6, Peter A Robbins9, Tatum S Simonson10, Chad D Huff1.
Abstract
The indigenous people of the Tibetan Plateau have been the subject of much recent interest because of their unique genetic adaptations to high altitude. Recent studies have demonstrated that the Tibetan EPAS1 haplotype is involved in high altitude-adaptation and originated in an archaic Denisovan-related population. We sequenced the whole-genomes of 27 Tibetans and conducted analyses to infer a detailed history of demography and natural selection of this population. We detected evidence of population structure between the ancestral Han and Tibetan subpopulations as early as 44 to 58 thousand years ago, but with high rates of gene flow until approximately 9 thousand years ago. The CMS test ranked EPAS1 and EGLN1 as the top two positive selection candidates, and in addition identified PTGIS, VDR, and KCTD12 as new candidate genes. The advantageous Tibetan EPAS1 haplotype shared many variants with the Denisovan genome, with an ancient gene tree divergence between the Tibetan and Denisovan haplotypes of about 1 million years ago. With the exception of EPAS1, we observed no evidence of positive selection on Denisovan-like haplotypes.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28448578 PMCID: PMC5407610 DOI: 10.1371/journal.pgen.1006675
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1MSMC estimate of population histories for Tibetans and Han Chinese.
(A) The effective population size as the function of time in the past for Tibetans (red) and Han (blue), estimated using 4 Han and 4 Tibetan genomes with greater than 99% corresponding genetic ancestries. (B) The relative coalescence rate between Tibetans and Han as the function of time in the past, estimated using 1 Han and 1 Tibetan genomes with greater than 99% corresponding genetic ancestries. Solid color indicates the estimates from the actual data, and the corresponding lighter colors indicate the estimates from 20 bootstrapped datasets.
Fig 2Illustration of the best-fitting ∂a∂i model.
Point estimate and 95% confidence interval of the parameters of the best fitting ∂a∂i model (see Fig 2 for the explanation of the parameters).
| point estimate | lower 95% CI | upper 95% CI | |
|---|---|---|---|
| nA1 | 12804 | 12619 | 13445 |
| nC1 | 500 | 412 | 607 |
| nC2 | 75203 | 70047 | 86703 |
| nC3 | 1326988 | 490163 | 5940902 |
| nT1 | 2445 | 2098 | 5251 |
| nT2 | 13292 | 10292 | 14033 |
| nT3 | 77743 | 63131 | 148702 |
| T1 (years) | 6355 | 4365 | 38991 |
| T2 (years) | 44588 | 35784 | 47330 |
| T3 (years) | 9419 | 8573 | 11166 |
| m11 (/gen/chrom) | 6.8E-04 | 5.8E-04 | 9.0E-04 |
| m12 (/gen/chrom) | 9.0E-04 | 8.0E-04 | 1.0E-03 |
| m21 (/gen/chrom) | 1.3E-11 | 8.6E-12 | 1.9E-11 |
| m22 (/gen/chrom) | 4.0E-07 | 2.3E-18 | 6.5E-06 |
Fig 3Manhattan plot of the CMS score across the autosome.
The x-axis represents the chromosome number and each dot represents one SNV. A) All autosomal SNVs; B) SNVs that are present in the high-coverage reference Denisovan genome and Tibetans but uncommon (MAF<5%) in Yoruba, Europeans, Native Americans and Asians. CMS scores all have negative values, with higher scores corresponding to stronger positive selection signals.
Top 10 regions with the highest CMS scores (hg19 coordinates), with connecting 200-kb windows merged.
| Chr | Start | End | Highest CMS score | Genes in the region |
|---|---|---|---|---|
| 1 | 231200001 | 232000000 | -41.7 | |
| 2 | 46400001 | 46800000 | -46.4 | |
| 13 | 77200001 | 77600000 | -57.8 | |
| 21 | 37800001 | 38000000 | -58.5 | |
| 20 | 48000001 | 48200000 | -59.0 | |
| 7 | 35400001 | 35600000 | -59.8 | None |
| 20 | 1800001 | 2000000 | -59.9 | |
| 12 | 48200001 | 48400000 | -60.3 | |
| 10 | 101600001 | 101800000 | -61.0 | |
| 7 | 52400001 | 52800000 | -61.7 | None |
Fig 4Haplotype map and CMS score in four genomic regions.
The upper bar plot demonstrates the CMS values at each SNV. The middle plot shows the haplotype structure of 27 Tibetan genomes in the region; each row represents one genome and each column is one SNV aligned with its CMS score in the upper figure. Both red and green color indicates uncommon variants (MAF<5%) in Yoruba, Europeans, Native Americans and Asians; variants in green in addition must be in the high-coverage reference Denisovan genome. When applicable, the lower plot represents the gene model for the protein-coding gene in the region, with each vertical line representing one exon. a) VDR gene region (Chr12:48257328–48357328); b) EPAS1 gene region (Chr2: 46533376–46792633). The arrowed block above the bar plot indicates a previously identified 32.7kb region enriched for Denisovan variants; the dot above the bar plot indicates a previously identified deletion common in Tibetans (chr2: 46694276–46697683); c) EGLN1 gene region (Chr1: 231457651–231657496); d) a 200-kb genomic region that presumably underwent no positive selection (showing Chr21: 32800690–33000356, the 200-kb region with the median CMS score).
Over-represented GO terms within the top 0.2% CMS windows with q-value <0.05.
| GO term | description | q-value | Genes |
|---|---|---|---|
| GO:0050892 | intestinal absorption | ||
| GO:0071456 | cellular response to hypoxia | 0.0295875 | |
| GO:0036294 | cellular response to decreased oxygen levels | 0.0295875 | |
| GO:0071453 | cellular response to oxygen levels | 0.0295875 |
200-kb genome regions with higher Denisovan ancestry in Tibetan than in Han Chinese in both S* and D* tests.
| Chr | start | end | raw D-value | D* | p-value | q-value | Genes in the regions |
|---|---|---|---|---|---|---|---|
| 7 | 26800001 | 27000000 | 0.86 | 6.25 | 6.1E-06 | 2.7E-02 | |
| 2 | 46400001 | 46600000 | 0.56 | 6.15 | 6.1E-06 | 2.7E-02 | |
| 2 | 47600001 | 47800000 | 0.65 | 5.92 | 6.1E-06 | 2.7E-02 | |
| 5 | 200001 | 400000 | 0.52 | 4.99 | 6.1E-05 | 1.4E-01 | |
| 12 | 8800001 | 9000000 | 0.63 | 4.63 | 1.2E-04 | 1.7E-01 | |
| 4 | 100400001 | 100600000 | 0.48 | 4.58 | 1.2E-04 | 1.7E-01 |
Fig 5Timeline of important evolutionary events in the demographic history of Tibetans.
The horizontal axis represents the estimated time of events (in years) in the past. The age of the EGLN1 D4E mutation is based on previous estimates.