| Literature DB >> 29026132 |
Christina A Eichstaedt1,2, Luca Pagani3,4, Tiago Antao5, Charlotte E Inchley3, Alexia Cardona6, Alexander Mörseburg3, Florian J Clemente7, Timothy J Sluckin8, Ene Metspalu4,9, Mario Mitt10,11, Reedik Mägi10, Georgi Hudjashov4,12, Mait Metspalu4,9, Maru Mormina13, Guy S Jacobs14, Toomas Kivisild3,4.
Abstract
The aim of this study is to identify genetic variants that harbour signatures of recent positive selection and may facilitate physiological adaptations to hypobaric hypoxia. To achieve this, we conducted whole genome sequencing and lung function tests in 19 Argentinean highlanders (>3500 m) comparing them to 16 Native American lowlanders. We developed a new statistical procedure using a combination of population branch statistics (PBS) and number of segregating sites by length (nSL) to detect beneficial alleles that arose since the settlement of the Andes and are currently present in 15-50% of the population. We identified two missense variants as significant targets of selection. One of these variants, located within the GPR126 gene, has been previously associated with the forced expiratory volume/forced vital capacity ratio. The other novel missense variant mapped to the EPAS1 gene encoding the hypoxia inducible factor 2α. EPAS1 is known to be the major selection candidate gene in Tibetans. The derived allele of GPR126 is associated with lung function in our sample of highlanders (p < 0.05). These variants may contribute to the physiological adaptations to hypobaric hypoxia, possibly by altering lung function. The new statistical approach might be a useful tool to detect selected variants in population studies.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29026132 PMCID: PMC5638799 DOI: 10.1038/s41598-017-13382-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1MSMC plot, effective population sizes and split times of Collas, Native American lowlanders, Siberians and Africans. Effective population size (Ne) and split time estimates are based on 1, 2 or 4 genomes for the Native Americans and on 2 and 4 genomes for the other populations. Coloured dots show where the MSMC curves based on 1, 2 or 4 genomes were joined together to provide a comprehensive representation of the changes in Ne over time for each of the analysed population. Around 100,000 years the out of Africa exit starts to reduce population size in non-Africans. Yoruba show a limited decline as they remain in Africa. Ne of Collas rises up to 30,000 units 3,000 years ago. Andean highlanders: Collas; Calchaquíes (Cachi): intermediate altitude population in Argentina (2300 m); American lowlanders: Mexican, Wichí; Siberians: Eskimo, Koryaks, Chukchi.
Top 1% PBS alleles located within top 1% nSL windows.
| Chr: position | Identifier | Allele frequency Collas | Allele frequency Lowlanders | Allele frequency Siberians | 1000 Genomes DAF | Impact | Associated gene | PBS p-value |
|---|---|---|---|---|---|---|---|---|
| 11:67209515 | rs111451405 | 74% | 44% | 3% | 4% | syn |
| 0.001 |
| 11:67072382 | rs61731786 | 71% | 44% | 3% | 7% | syn |
| 0.002 |
| 11:67076748 | rs12417770 | 71% | 44% | 3% | 7% | intronic |
| 0.002 |
| 11:67159257 | rs17880138 | 71% | 44% | 3% | 7% | promoter |
| 0.002 |
| 11:67165015 | rs2066494 | 71% | 44% | 3% | 6% | syn |
| 0.002 |
| 11:67057599 | rs1574103 | 71% | 47% | 3% | 4% | syn |
| 0.005 |
| 6:159062992 | rs882735 | 63% | 38% | 0% | 2% | intronic |
| 0.006 |
| 20:2840929 | rs2297048 | 58% | 25% | 18% | 4% | syn |
| 0.008 |
| 20:2844911 | rs2274671 | 58% | 25% | 18% | 5% | syn |
| 0.008 |
Chr = chromosome, DAF = derived allele frequency, syn = synonymous; CADD score for all SNPs > 5.
Figure 2Distribution of nSL + PBS and power calculations. (a) Observed joint distribution of nSL and p(PBS|DAF) of all exonic SNPs (blue), with top-1% outliers coloured teal (not missense), orange (missense) or red with black border (missense; the 11 highly damaging variants). (b) Simulated joint distribution of nSL and p(PBS|DAF), with X- and Y-axes as in (a). Results from neutral scenarios are shown in blue, with those from selected scenarios (s = 0.02) in red. Note the clustering of selected SNPs in the low-p(PBS|DAF), low-nSL regime, corresponding to that detected by our PBS + nSL selection statistic; also note the especial clustering of most damaging missense variants in (a) in this region of the plot. (c) Power from the simulations of the three statistics to detect selection, with selected variants with a p-value ≤ 0.0062 classified as successful detections. X- and Y-axes are derived allele frequency and power, respectively.
Top PBS + nSL missense hits based on intermediate derived allele frequencies between 15–50%.
| Chr: Position | Identifiera | Ref | Alt | Allele frequency Collas | Allele frequency Lowlanders | Allele frequency Siberians | 1000 Genomes DAF | Gene | Amino acid change | PBS + nSL score |
|---|---|---|---|---|---|---|---|---|---|---|
| 8:24349417 | rs3736281 | T | C | 47% | 6% | 11% | 2.8% |
| I453T | 0.000 |
| 6:142688969 | rs17280293 | A | G | 50% | 3% | 8% | 2.8% |
| S123G | 0.000 |
| 21:33689199 | rs762225 | G | C | 34% | 3% | 5% | 1.4% |
| P2071R | 0.001 |
| 1:40766943 | rs140041506 | G | T | 37% | 6% | 5% | 0.7% |
| P661T | 0.001 |
| 14:58958923 | rs3783697 | A | G | 29% | 6% | 32% | 2.4% |
| D1263G | 0.001 |
| 14:33165230 | rs543290190 | T | C | 16% | 0% | 0% | 0.1%b |
| W972R | 0.002 |
| 3:77629200 | rs188582283 | C | T | 24% | 0% | 0% | 0.9% |
| R811W | 0.002 |
| 2:46588031 | rs570553380 | A | G | 32% | 6% | 0% | 0.2%c |
| H194R | 0.004 |
| 6:76631823 | rs76604824 | C | T | 24% | 3% | 3% | 4.6% |
| D793N | 0.004 |
| 22:30888494 | rs17738527 | C | T | 50% | 16% | 0% | 14.4% |
| E211K | 0.005 |
| 16:3708170 | rs2791 | C | T | 24% | 0% | 0% | 1.7% |
| R692H | 0.006 |
aAll variants were classed as possibly/probably damaging in PolyPhen 2 and damaging in SIFT and had a CADD score > 17, for values see Supplementary Table S1
bOnly found in 4 heterozygote Peruvians from Lima out of 85 individuals. cOnly found in 12 heterozygote Peruvians from Lima out of 85 individuals; lowland allele frequency strongly influenced by 20% allele frequency in Calchaquíes (n = 2). Chr = chromosome, Ref = reference allele, Alt = alternative allele, DAF = derived allele frequency.
Figure 3FEV1/FVC in Collas sorted by derived allele frequency. The ratio increases with number of derived alleles indicating an improvement of lung function in Collas. One-tailed p-value = 0.029, r2 = 0.195; FEV1: forced expiratory volume in the first second, FVC: forced vital capacity.