| Literature DB >> 36011383 |
Larisa Fedorova1, Andrey Khrunin2, Gennady Khvorykh2, Jan Lim1, Nicholas Thornton1, Oleh A Mulyar1, Svetlana Limborska2, Alexei Fedorov1,3.
Abstract
Common alleles tend to be more ancient than rare alleles. These common SNPs appeared thousands of years ago and reflect intricate human evolution including various adaptations, admixtures, and migration events. Eighty-four thousand abundant region-specific alleles (ARSAs) that are common in one continent but absent in the rest of the world have been characterized by processing 3100 genomes from 230 populations. Also computed were 17,446 polymorphic sites with regional absence of common alleles (RACAs), which are widespread globally but absent in one region. A majority of these region-specific SNPs were found in Africa. America has the second greatest number of ARSAs (3348) and is even ahead of Europe (1911). Surprisingly, East Asia has the highest number of RACAs (10,524) and the lowest number of ARSAs (362). ARSAs and RACAs have distinct compositions of ancestral versus derived alleles in different geographical regions, reflecting their unique evolution. Genes associated with ARSA and RACA SNPs were identified and their functions were analyzed. The core 100 genes shared by multiple populations and associated with region-specific natural selection were examined. The largest part of them (42%) are related to the nervous system. ARSA and RACA SNPs are important for both association and human evolution studies.Entities:
Keywords: computational biology; genetic variation; genomics; polymorphism; single nucleotide
Mesh:
Year: 2022 PMID: 36011383 PMCID: PMC9408407 DOI: 10.3390/genes13081472
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1Legend: Computational methodology for characterization of ARSA SNPs with three validation points.
Figure 2Examples of ARSA SNPs for Africa, East Asia, and Europe from population data of 1000 genome project as they presented on the Ensembl genome browser. A pie chart of regional SNP allele frequencies taken from the Ensemble website “Allele and genotype frequencies by population” (https://useast.ensembl.org/Homo_sapiens/Variation/ accessed on 20 May 2022) through SNP RS-identifiers. (A) Allele “T” is present only in Africa but absent in other continents (small frequency of “T” in Americas is due to admixture in 1000 Genomes populations). (B) Allele “A” is present only in East Asia and nowhere else. (C) Allele “G” is present in Europe but absent in Africa and East Asia. The presence of the “G” allele in Americas and South Asia (SAS) is due to admixture of Europeans with populations from these regions. African populations from the USA (ASW and ACB from 1000 Genomes) also have the “G” allele with 4 to 9% frequency that resulted in its 2% frequency in the total African sample. These ASW and ACB were excluded from our Step-2 calculations. Distribution of ARSA SNPs among continents is present in Table 1.
Numbers of ARSA SNPs in five remote geographical regions.
| Region | Step-1. | Step-2. | Step-3. | Step-4. | ARSA SNP Allele Status (Ancestral vs. Derived) |
|---|---|---|---|---|---|
| Africa | 204,983 | 112,658 | 77,820 | 28,774 | 22% vs. 78% |
| Americas | 46,994 | 4133 | 3348 | 3222 | 1% vs. 99% |
| East Asia | 7789 | 441 | 362 | 272 | 7% vs. 93% |
| Europe | 6585 | 2484 | 1911 | 1394 | 3% vs. 97% |
| Oceania | 77,437 | 71,848 * | 1358 | 453 | 4% vs. 96% |
* Oceania populations are absent in 1000 Genomes; thus, the requirement for MAF > 18% is omitted for OCE at Step-2.
Figure 3Examples of RACA SNPs for Africa, East Asia, and Europe. A pie chart of regional SNP allele frequencies taken from the Ensemble web site “Allele and genotype frequencies by population” (https://useast.ensembl.org/Homo_sapiens/Variation/ accessed on 20 May 2022) through SNP RS-identifiers. (A) Allele “G” is present globally, except for East Asia. (B) Allele “A” is present globally, except for Africa. (C) Allele “G” is present globally, except for Europe. One explanation for this phenomenon is fixation of common alleles in particular regions/continents. Distribution of RACA SNPs is shown in Table 2.
Numbers of RACA SNPs in three remote geographical regions.
| Region | Number of RACA SNPs | Number of RACA SNPs Clusters | RACA SNP Allele Status |
|---|---|---|---|
| Africa | 6897 | 4159 | 3% vs. 97% |
| East Asia | 10,524 | 3021 | 38% vs. 62% |
| Europe | 25 | 16 | 88% vs. 12% |
Note: Due to admixture of European, American, and Indian populations, the numbers may be biased, so Europe might be underrepresented. Nonetheless, European counts were much lower than African and EAS counts. The last column shows whether RACA is the ancestral or derived allele.
Number of SNPs from Table 1 and Table 2, with biological effects accessed with BioMart.
| Region | # ARSA SNPs Total | # ARSA SNPs | # RACA SNPs Total | # RACA SNPs |
|---|---|---|---|---|
| Africa | 353 | 191 | 382 | 362 |
| America | 17 | 5 | N/A | N/A |
| East Asia | 4 | 4 | 445 | 408 |
| Europe | 92 | 87 | 4 | 4 |
| Oceania | 3 | 0 | N/A | N/A |
Figure 4UpSet diagrams showing the number and nature intersections of gene lists for ARSA (A) and RACA (B). Vertical bars show the number of genes resulting from the intersections between gene lists. Horizontal bars depict the total number of genes in each continental group. The nature of intersections is shown with gray and black dots. A black dot means the presence of the gene in the corresponding continental group, while a gray dot means the absence of the gene in corresponding group. Black dots connected by lines indicate the continental groups involved in the interaction. For example, one black dot and four gray ones correspond to the vertical bar that shows the number of genes present in one continental group and absent in all other groups.
Figure 5Distribution of the number of ARSA (A) and RACA (B) SNPs per gene.
Functional annotation of genes for ARSA SNPs obtained by the DAVID bioinformatics resource.
| AFR | AMR | EAS | EUR | OCE | ||
|---|---|---|---|---|---|---|
|
| Cell adhesion | 8.8 × 10−6 | 0.003 | + | + | |
| Neurosciences | 0.007 | 0.0004 | ||||
| Ion transport | 0.007 | 0.005 | ||||
| Calcium transport | 0.007 | + | + | + | ||
| Transport | + | 0.008 | ||||
| Potassium transport | 0.021 | |||||
| Endosome | + | 0.029 | ||||
| Golgi apparatus | 0.02 | 0.05 | ||||
|
| Calcium channel | 0.002 | + | + | ||
| Ion channel | 0.005 | 0.015 | ||||
| Guanine nucleotide releasing factor | 0.009 | 0.015 | ||||
| Actin binding | 0.014 | + | + | |||
| Kinase | 0.014 | 0.04 | ||||
| Serine/threonine-protein kinase | 0.03 | + | ||||
| Voltage-gated channel | 0.03 | 0.04 | + | |||
| Potassium channel | 0.015 | + | ||||
|
| Cell junction | 8.4 × 10−10 | 3.3 × 10−7 | 5.4 × 10−4 | + | |
| Synapse | 1.2 × 10−6 | 6.6 × 10−5 | + | 1.2 × 10−2 | + | |
| Cell membrane | 1.8 × 10−4 | 0.006 | + | |||
| Cell protection | 1.9 × 10−4 | 1.8 × 10−4 | + | 5.4 × 10−4 | ||
| Cytoskeleton | 0.002 | + | + | + | ||
| Membrane | 0.0024 | 0.032 | + | 0.035 | ||
| Postsynaptic cell membrane | + | 0.009 | ||||
| Cytoplasm | + | 0.0025 | + | 0.049 | ||
| Endosome | + | 0.03 | ||||
| Golgi apparatus | 0.02 | 0.049 | ||||
|
| Cortisol synthesis and secretion | 0.007 | ||||
| cGMP-PKG signaling pathway | 0.013 | + | + | |||
| Tight junction | 0.013 | |||||
| Axon guidance | 0.025 | 0.0035 | ||||
| Arrhythmogenic right ventricular cardiomyopathy | 0.026 | 0.025 | ||||
| Parathyroid hormone synthesis secretion and action | 0.026 | + | ||||
| Type II diabetes mellitus | 0.026 | |||||
| Calcium signaling pathway | 0.026 | + | + | |||
| Adrenergic signaling in cardiomyocytes | 0.026 | + | + | |||
| Circadian entrainment | 0.028 | + | ||||
| Oxytocin signaling pathway | 0.028 | + | ||||
| Cushing syndrome | 2.80 × 10−2 | |||||
| MAPK signaling pathway | 4.90 × 10−2 | + | ||||
| Long-term potentiation | 4.90 × 10−2 | |||||
| Cholinergic synapse | + | 6.30 × 10−3 | + | |||
| Pathways in cancer | + | 7.70 × 10−3 | ||||
| Glutamatergic synapse | + | 1.20 × 10−2 | ||||
| Dopaminergic synapse | + | 3.90 × 10−2 | ||||
| Insulin secretion | + | 3.90 × 10−2 | ||||
| Inflammatory mediator regulation of TRP channels | + | 3.90 × 10−2 | ||||
| Choline metabolism in cancer | + | 3.90 × 10−2 | ||||
| Pancreatic secretion | 5.00 × 10−2 |
Note: Significant (p < 0.05) p-values after false discovery Benjamini correction for multiple testing are presented. Plus “+” denotes nonsignificant p-values, while the empty cells correspond to the biological process, molecular function, or pathway that were missing in the DAVID output for a specific region.
The number of ARSA- and RACA-associated genes shared with the published core sets of gens under natural selection.
| Tab/Continental Group | SNPs | Set 1 (1995 Genes) [ | Set 2 (273 Genes) [ | Set 3 (1365 Genes) * | Set 4 (4172 Genes) ** | ||||
|---|---|---|---|---|---|---|---|---|---|
| n | n | n | n | ||||||
| Africa | ARSA | 337 | 1 | 53 | 1 | 369 | 1.6 × 10−2 | 1171 | 2.2 × 10−22 |
| America | ARSA | 68 | 0.17 | 9 | 0.5 | 47 | 0.17 | 112 | 0.9 |
| Europe | ARSA | 12 | 0.93 | 5 | 0.21 | 10 | 0.93 | 39 | 0.21 |
| East Asia | ARSA | 15 | 5.6 × 10−4 | 4 | 0.02 | 4 | 0.87 | 12 | 0.87 |
| Oceania | ARSA | 10 | 0.1 | 0 | 1 | 2 | 1 | 9 | 1 |
| AFR | RACA | 115 | 1 | 16 | 1 | 143 | 5.3 × 10−6 | 349 | 1.6 × 10−5 |
| EUR | RACA | 2 | 0.71 | 0 | 1 | 3 | 0.31 | 5 | 0.31 |
| EAS | RACA | 89 | 0.83 | 10 | 0.83 | 80 | 2.8 × 10−2 | 242 | 3.4 × 10−10 |
* Genes from genome-wide published data (Table S2 from [33]). ** Genes from rawPophumanscanTable of the PopHumanScan catalog [34].