| Literature DB >> 32023328 |
Andrey V Khrunin1, Gennady V Khvorykh1, Alexei N Fedorov1,2, Svetlana A Limborska1.
Abstract
Natural selection of beneficial genetic variants played a critical role in human adaptation to a wide range of environmental conditions. Northern Eurasia, despite its severe climate, is home to lots of ethnically diverse populations. The genetic variants associated with the survival of these populations have hardly been analyzed. We searched for the genomic signatures of positive selection in (1) the genome-wide microarray data of 432 people from eight different northern Russian populations and (2) the whole-genome sequences of 250 people from Northern Eurasia from a public repository through testing the extended haplotype homozigosity (EHH) and direct comparison of allele frequency, respectively. The 20 loci with the strongest selection signals were characterized in detail. Among the top EHH hits were the NRG3 and NBEA genes, which are involved in the development and functioning of the neural system, the PTPRM gene, which mediates cell-cell interactions and adhesion, and a region on chromosome 4 (chr4:28.7-28.9 Mb) that contained several loci affiliated with different classes of non-coding RNAs (RN7SL101P, MIR4275, MESTP3, and LINC02364). NBEA and the region on chromosome 4 were novel selection targets that were identified for the first time in Western Siberian populations. Cross-population comparisons of EHH profiles suggested a particular role for the chr4:28.7-28.9 Mb region in the local adaptation of Western Siberians. The strongest selection signal identified in Siberian sequenced genomes was formed by six SNPs on chromosome 11 (chr11:124.9-125.2 Mb). This region included well-known genes SLC37A2 and PKNOX2. SLC37A2 is most-highly expressed in the gut. Its expression is regulated by vitamin D, which is often deficient in northern regions. The PKNOX2 gene is a transcription factor of the homeobox family that is expressed in the brain and many other tissues. This gene is associated with alcohol addiction, which is widespread in many Northern Eurasian populations.Entities:
Mesh:
Year: 2020 PMID: 32023328 PMCID: PMC7001972 DOI: 10.1371/journal.pone.0228778
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Geographical locations of the populations and samples studied.
The numbers denote the following population samples: 1, Veps; 2, Russians from Ustyuzhna, Vologda region; 3, Russians from Mezen, Archangelsk region; 4, Priluzski Komi from Ob’yachevo District, Komi Republic; 5, Izhemski Komi from Izhma district, Komi Republic; 6, Mansi from Khanty-Mansi Autonomous Okrug; 7, Khanty from Khanty-Mansi Autonomous Okrug; 8, Nenets from Yamalo-Nenets Autonomous Okrug; 9, Karelians; 10, Estonians; 11, Ingrians; 12, Vepsas; 13, Komis; 14, Belarusians; 15, Ukrainians West; 16, Ukrainians East; 17, Swedes; 18, Latvians; 19, Lithuanians; 20, Poles; 21, Hungarians; 22, Moldavians; 23, Saami; 24, Finnish; 25, Mordvins; 26, Maris; 27, Chuvashes; 28, Udmurds; 29, Bashkirs; 30, Tatars; 31, Mansis; 32, Khantys; 33, Selkups; 34, Nenets; 35, Shor; 36, Nganasans; 37, Evenks; 38, Evens; 39, Sakha; 40, Kets; 41, Eskimo; 42, Chukchis; 43, Koryaks; 44, Kazakhs; 45, Altaians; 46, Tuvinians; 47, Mongolians; 48, Buryats. The numbers from 1 to 8 represent the samples genotyped by authors with microarrays and the rest were obtained from the Estonian Biocentre Human Genome Diversity Panel (EGDP [5,29]). The colors represent the following populations and groups of samples: red–datasets obtained with microarrays, sky blue–Altaians, yellow–Asia, deep green–Chukchi, magenta–Europe, light green–samples from North East Europe populations (NEE group), blue–samples from North West Asia populations (NWA group), pink–Slavs, brown–Tatar. The complete list of samples taken from the EGDP is presented in S3 Table (i.e., samples of non-Eurasian geography). The map was generated using R package tmap (v2.3–1) [30].
Fig 2Genome-wide (autosomes 1–22) distribution of p-values for iHS scores in three out of the eight populations studied.
The populations are (from top to bottom): Russians from Mezen, Russians from Ustyuzhna, and Veps. Horizontal red lines indicate P-value threshold applied (P ≤ 1 × 10−5). Loci of interest are pointed with arrows.
SNPs with significant (p ≤ 1 x 10−5) iHS scores found in populations tested.
| SNP #rs ID | Chr | Position | Alleles | Selected allele | Functional consequence | Annotated genes | |iHS| score | iHS log P-value | Population |
|---|---|---|---|---|---|---|---|---|---|
| rs3738544 | 1 | 236,914,576 | C/T | C | Intron variant | 4.5; 4.6 | 5.2; 5.4 | Ru-Ust, Veps | |
| rs17014454 | 3 | 24,277,049 | C/T | T | Intron variant | 4.4; 5.1 | 5.0; 6.4 | Ru-Ust, Ru-Me | |
| rs1387010 | 3 | 161,937,965 | C/T | T | Intergenic | 5.1; 4.9 | 6.0; 6.5 | Veps, Ru-Me | |
| rs7695045 | 4 | 28,850,524 | A/G | G | Intergenic | 5.1; 4.5 | 5.3; 6.4 | Khanty, Mansi | |
| rs12774724 | 10 | 83,958,152 | C/T | C | Intron variant | 5.6; 4.8 | 5.8; 7.7 | Ru-Ust, Komi-Izh | |
| rs12769829 | 10 | 83,958,312 | A/G | G | Intron variant | 5.6; 4.4 | 5.0; 7.7 | Ru-Ust, Komi-Izh | |
| rs958793 | 13 | 36,076,555 | A/G | A | Intron variant | 5.0; 4.6 | 5.4; 6.3 | Khanty, Nenets | |
| rs554825 | 18 | 7,486,106 | A/G | A | Intergenic | 4.5; 5.2; 5.0; 4.9; 4.9 | 5.2; 6.6; 6.2; 6.0; 6.1 | Veps, Ru-Me, TSI, CEU, FIN | |
| rs10502389 | 18 | 8,978,472 | A/G | A | Intergenic | 4.9; 4.7; 4.9 | 5.9; 5.7; 5.9 | Ru-Ust, Ru-Me, TSI |
*The SNPs are very closely-spaced (located at 170 bp from each other) and thus can be considered as a single locus.
**Ru-Ust–Russians from Ustyuzhna, Ru-Me–Russians from Mezen, Komi-Izh–Izhemski Komi.
Fig 3The decay of extended haplotype homozygosity in the region of SNP rs554825: Evidence from the Veps population.
The bottom of the figure illustrates the location of the SNP in the corresponding part of chromosome 18, as at the NCBI variation viewer (GRCh37.p13), and the distances at which EHH for the ancestral allele drops to the threshold limit.
Candidate loci for positive selection identified using XP-EHH tests (p ≤ 1 x 10−5)*.
| Chr | Posi-tion (Mb) | CEU | CHB | FIN | Komi-Izh | Khanty | Mansi | Ru-Me | Nenets | Komi-Ob | TSI | Ru-Ust | Veps | Genes and gene regions annotated |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 55.5 | Y | Y, T | Y, T | Y, T | |||||||||
| 1 | 76.6 | Y | ||||||||||||
| 1 | 154.8 | Y | Y, T | Y | Y, T | |||||||||
| 1 | 189.5 | Y | Y | Y | ||||||||||
| 2 | 108.9 | Y, T | Y, T | T | Y, T | |||||||||
| 2 | 109.6 | Y | Y | |||||||||||
| 2 | 178.5 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 3 | 45.1 | Y, T | T | |||||||||||
| 4 | 28.7 | Y, T | Y, T | Y, T | ||||||||||
| 4 | 123.8 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| 5 | 109.6 | Y | Y | Y | Y | Y | Y | Y, T | Y | |||||
| 6 | 35.4 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 6 | 136.1 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |||
| 10 | 83.9 | C | Y, C | |||||||||||
| 10 | 94.8 | Y, T | Y, T | Y, T | Y, T | |||||||||
| 12 | 1.6 | Y, T | Y, T | Y, T | ||||||||||
| 12 | 1.6 | T, C | T, C | T, C | ||||||||||
| 12 | 89.2 | Y | Y | Y | Y | Y | Y | Y | Y | |||||
| 13 | 36.1 | Y, C | Y, C | |||||||||||
| 15 | 45.4 | Y | Y | Y | Y | Y | Y | Y | Y | |||||
| 15 | 64.2 | Y, T | ||||||||||||
| 17 | 28.5 | Y | Y | Y | Y | |||||||||
| 17 | 28.6 | Y | Y | Y | Y | |||||||||
| 17 | 59.2 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| 18 | 7.5 | Y | Y | Y | Y | Y | Y | Y | ||||||
| 18 | 7.7 | Y, C | Y, C | Y, C | Y | Y, C | Y | Y, C | Y, C | Y, C | Y, C |
*Y, T, C–reference populations: YRI, TSI and CHB, respectively.
SNPs of sequenced genomes within 9 loci described in Table 1*.
| Locus | SNP #rs ID | Chr | Position | Derived allele frequency | SNP over-repre-sentence | P-value | Annotated genes |
|---|---|---|---|---|---|---|---|
| 1 | Not found | 1 | 236,914,576 | ||||
| 2 | Not found | 3 | 24,277,049 | ||||
| 3 | rs149915236 rs148419167 rs148368744 rs182084000 rs78230670 rs79428199 rs140812904 rs75151827 rs149123142 rs117166396 rs73172723 rs73172728 | 3 | 160,963,232 160,974,524 161,109,954 161,135,155 161,652,536 161,652,656 161,681,122 161,693,930 161,697,079 161,731,309 161,777,884 161,779,104 | 0.30 NWA | ≥ x2.2 | 0.003 | |
| 4 | Not found | 4 | 28,850,524 | ||||
| 5,6 | rs1336274 rs61863039 rs72827309 rs10509451 rs61863041 rs61863047 rs61863048 rs17737264 rs61863049 rs61864201 rs11596426 rs72827335 rs75497737 | 10 | 84,028,864 84,029,031 84,029,228 84,032,154 84,034,357 84,038,763 84,039,605 84,042,161.84,042,276 84,070,366 84,071,127 84,074,280 84,074,407 | 0.30 NEE | ≥x2.2 | 0.1 | |
| 7 | rs147713651 rs117041926 rs147476568 rs111977790 rs112259913 rs190723450 | 13 | 35,159,886 35,162,526 35,182,715 35,456,591 35,481,662 35,550,828 | 0.21 NEE | ≥x5.0 | 0.001 | |
| 8 | rs141455074 | 18 | 8,441,576 | 0.28 NWA | ≥x3.4 | 0.0001 | |
| 9 | rs67830720 | 18 | 9,493,156 | 0.34 NEE | ≥x2.0 | 0.1 |
*Column 1 points to the rows from the Table 1 that describe SNPs from the same chromosomal regions. Column 5 shows the highest derived allele frequency of the identified SNPs observed in North West Asia (NWA) populations, located at the East of Ural Mountains, or North East Europe (NEE) populations (at the West of Ural Mountains). Column 6 presents the ratio of overabundance of the SNPs in North Eurasia populations in comparison to other populations under analysis. It shows how many times the frequency of SNPs in NEE or NWA is higher than in all other populations. Column 7 shows P-value–the estimated probability to find by chance the SNP overrepresented in North Eurasia populations with the observed ratio within the current loci (± 0.8 Mb from the location in Table 1). Column 8 presents genes that host the identified SNPs associated with signals of positive selection.
Six loci with the strongest signals for positive selection in North Eurasia populations identified by whole-genome allele frequency analysis*.
| SNP #rs ID | Chr | Position | Haplo-type size (kb) | Derived allele frequency | SNP over-repre-sentance | P-value | Annotated genes |
|---|---|---|---|---|---|---|---|
| rs77191500 rs75647011 rs77338913 rs78178069 | 3 | 54,081,807 54,082,785 54,083,724 54,084,458 | 4 | 0.34 NWA | ≥x4.4 | <10−8 | |
| rs17198295 rs41345846 rs115064616 rs4678429 rs116387571 rs76578737 rs74641205 | 3 | 135,856,318 135,956,639 135,966,726 135,993,276 136,339,949 136,350,891 136,374,804 | 500 | 0.30 NEE | ≥x4.3 | <10−7 | |
| rs77693347 | 3 | 143,571,375 | 0.26 NWA | ≥x3.2 | <10−8 | ||
| rs62469388 rs62471762 rs62471765 rs62471766 rs62471768 | 7 | 89,038,148 89,168,136 89,195,490 89,228,812 89,253,624 | 210 | 0.27 NEE | ≥x5.3 | <10−8 | |
| rs117000964 | 11 | 40,094,935 | 0.26 NWA | ≥x3.2 | <10−8 | ||
| rs118138358 rs75705739 rs36015256 rs148184827 rs117952463 rs74566282 | 11 | 124,948,795 124,954,830 125,074,999 125,085,473 125,103,343 125,159,749 | 210 | 0.30 NWA | ≥x9.0 | <10−8 |
*Column 4 –presents size of the haplotype formed by the SNPs (the distance between the first and the last SNP from this locus). Column 5 –the highest derived allele frequency of the SNPs observed in NWA or NEE populations. Specific geographic locations and list of individuals with these alleles are listed in the S3 Table. Column 6—ratio of overabundance of these SNPs in North Eurasian populations in comparison to other populations under analysis. Column 7 shows P-value–the estimated probability to find by chance the SNP overrepresented in North Eurasia populations with the observed ratio. Column 8 presents genes that host the detected SNPs associated with positive selection.
List of most prominent missense mutations that are overrepresented in North Eurasia.
| SNP #rs ID | Chr | Position | Mutation Ref -> Alt | Gene | Over-repre-sentance | Population |
|---|---|---|---|---|---|---|
| rs12039961 | 1 | 203,276,546 | G->A | ≥x1.8 | NWA | |
| rs147417448 | 7 | 47,894,518 | T->A | ≥x2.2 | NWA | |
| rs17132289 | 7 | 48,428,715 | A->T | ≥x1.9 | NWA | |
| rs78225095 | 7 | 99,725,146 | C->G | ≥x2.3 | NWA+NEE | |
| rs78279132 | 7 | 100,137,085 | G->C | ≥x2.5 | NWA | |
| rs143390591 | 15 | 75,980,291 | T->C | ≥x1.8 | NWA | |
| rs116676388 | 19 | 19,613,284 | G->A | ≥x2.6 | NWA | |
| rs16986050 | 19 | 55,401,170 | A->G | ≥x1.9 | NEE |
Column 1 shows SNPs inside coding regions that have signals of positive selection revealed by allele frequency analysis. Column 2 presents the number of the chromosome containing the described SNP. Column 3 shows position of these SNPs on chromosomes. Column 4 –mutations that change coding sequences. Column 5 presents genes that host the detected SNPs associated with positive selection. Column 6—ratio of overabundance of these SNPs in North Eurasian populations in comparison to other populations under analysis. Column 7 –North Eurasian populations where these derived alleles are overrepresented.