| Literature DB >> 35860855 |
Rocio Caro-Consuegra1, Maria A Nieves-Colón2,3,4, Erin Rawls3, Verónica Rubin-de-Celis5, Beatriz Lizárraga6, Tatiana Vidaurre7, Karla Sandoval2, Laura Fejerman8, Anne C Stone3,9, Andrés Moreno-Estrada2, Elena Bosch1,10.
Abstract
Peru hosts extremely diverse ecosystems which can be broadly classified into the following three major ecoregions: the Pacific desert coast, the Andean highlands, and the Amazon rainforest. Since its initial peopling approximately 12,000 years ago, the populations inhabiting such ecoregions might have differentially adapted to their contrasting environmental pressures. Previous studies have described several candidate genes underlying adaptation to hypobaric hypoxia among Andean highlanders. However, the adaptive genetic diversity of coastal and rainforest populations has been less studied. Here, we gathered genome-wide single-nucleotide polymorphism-array data from 286 Peruvians living across the three ecoregions and analyzed signals of recent positive selection through population differentiation and haplotype-based selection scans. Among highland populations, we identify candidate genes related to cardiovascular function (TLL1, DUSP27, TBX5, PLXNA4, SGCD), to the Hypoxia-Inducible Factor pathway (TGFA, APIP), to skin pigmentation (MITF), as well as to glucose (GLIS3) and glycogen metabolism (PPP1R3C, GANC). In contrast, most signatures of adaptation in coastal and rainforest populations comprise candidate genes related to the immune system (including SIGLEC8, TRIM21, CD44, and ICAM1 in the coast; CBLB and PRDM1 in the rainforest; and BRD2, HLA-DOA, HLA-DPA1 regions in both), possibly as a result of strong pathogen-driven selection. This study identifies candidate genes related to human adaptation to the diverse environments of South America.Entities:
Keywords: Peruvian populations; high-altitude adaptation; human adaptation
Mesh:
Substances:
Year: 2022 PMID: 35860855 PMCID: PMC9356722 DOI: 10.1093/molbev/msac158
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 8.800
Individual Samples Grouped by Department and Ecoregion
| Ecoregion | Department | Altitude range (mamsl) |
|
|---|---|---|---|
| Coast | Ica | 15–585 | 11 (+1)[ |
| La Libertad | 28 | 28 (+1)[ | |
| Lambayeque | 5–43 | 18 | |
| Piura | 5–60 | 11 | |
| Tumbes | 6–12 | 5 | |
| Highlands | Apurimac | 2,760–3,665 | 20 |
| Cusco | 3,345–3,913 | 24 | |
| Ancash | 2,965–3,281 | 10 | |
| La Libertad | 2,641–3,099 | 3 | |
| Junin | 3,249 | 2 | |
| Lima | 2,836 | 10 | |
| Puno | 3,827 | 25 (+75)[ | |
| Rainforest | Amazonas | 1,022–1,630 | 2 |
| Loreto | 100–111 | 9 | |
| San Martin | 207–860 | 9 | |
| Junin | 631 | 3 | |
| Ucayali | 280 | 4 |
Samples were assigned to each department and ecoregion according to the population collection site and/or additional origin information available (for details, see supplementary table S1, Supplementary Material online). La Libertad and Junin departments comprise two differentiated ecoregions.
One individual from La Libertad and one from Ica were excluded from the selection analyses due to the high proportions of European and African ancestries detected in them, respectively.
Puno samples were randomly downsampled to avoid over-representation in the genetic structure and selection analyses.
mamsl, meters above mean sea level.
Fig. 1.(A) Map of Peru indicating 15 departments where sampled populations are located. Data points colored by ecoregion. Elevation is represented in meters above sea level using the colorbar scale. (B) PCA of Peruvian individuals colored by ecoregion, including populations from the 1000 Genomes Project (1 KGP*). (C) ADMIXTURE plot at K = 4 with the newly analyzed Peruvian samples classified per ecoregion. (D) Zoom in on the ADMIXTURE plot at K = 4 for the analyzed Peruvian samples further divided by department. *YRI, Yoruba from Ibdaban, Nigeria; CEU, Utah residents with Northern and Western European Ancestry; CHB, Han Chinese from Beijing; PEL, Peruvians from Lima.
Fig. 2.Manhattan plots representing the scores of each statistic for selection tests performed on highland (A), coastal (B), and rainforest (C) populations. In red, the top 10 significant candidate regions identified in each test. H, Highlands; C, Coast; R, Rainforest. (D) Number of top signals of positive selection detected with the PBS, his, and XP-EHH statistics per each ecoregion. Genomic regions are considered under selection exclusively in an ecoregion when they are not found within the top 50 signals obtained by tests performed on other ecoregions. In brackets, total number of top 10 signals detected without filtering out those also within the top 50 of the remaining ecoregions. *Indicates that the highest scoring SNPs within a peak are in an intergenic region. *1 in (B): HLA-DRB1, HLA-DQA1, HLA-DQA2, HLA-DQB2, HLA-DOB, TAP2, PSMB8, PSMB8-AS1, TAP1, PSMB9, LOC100294145, HLA-DMB, HLA-DMA, BRD2, HLA-DOA, HLA-DPA1. *2 in (B): SSSCA1-AS1, EHBP1L1, KCNK7, MAP3K11, PCNX3, RELA, KAT5, RNASEH2C, AP5B1, MIR1234, OVOL1-AS1, OVOL1, SNX32, CFL1, MUS81, EFEMP2, FIBP.
Top 10 Selection Signals per Statistic and Population Comparison Exclusively Found in the Highland Ecoregion
| Candidate region | Candidate genes | PBS | iHS | XP-EHH | Nat Am LAP (%) | Nat Am LAD (SDs) | RS id—Ref. allele | Allele freq. in H | Allele freq. in C | Allele freq. in R |
|---|---|---|---|---|---|---|---|---|---|---|
| 1:111740082–111944073 |
|
|
|
| 88.03 | 0.53 | rs2275254-T | 0.89 | 0.84 | 0.76 |
| 1:178563987–178671750 |
|
|
|
| 93.62 | 1.68 | rs1122579-T | 0.15 | 0.42 | 0.41 |
| 1:184029993–184283674 | Intergenic— |
| — | — | 94.15 | 1.90 | rs12119930-A | 0.21 | 0.5 | 0.43 |
| 2:146944278–147020927 | Intergenic— |
|
|
| 90.43 | 0.35 | rs2016340-G | 0.72 | 0.58 | 0.65 |
| 2:70187173–72306479 |
|
| — | H (HR) | 92.02 | 1.02 | rs6714409-G | 0.24 | 0.49 | 0.61 |
| 4:166597189–167143385 |
|
|
|
| 92.02 | 1.02 | rs1995126-A | 0.94 | 0.74 | 0.94 |
| 4:178992130–179614998 | Intergenic | — | H |
| 88.83 | 0.31 | rs2702432-G | 0.54 | 0.73 | 0.57 |
| 4:61626653–61950476 |
|
|
|
| 90.96 | 0.58 | rs7699903-G | 0.18 | 0.28 | 0.35 |
| 5:11641039–11742668 |
|
|
|
| 90.43 | 0.35 | rs4702813-G | 0.95 | 0.72 | 0.93 |
| 5:154745227–155568469 | Intergenic— |
| H |
| 91.49–90.43 | 0.8–0.35 | rs1432723-G | 0.11 | 0.34 | 0.48 |
| 5:29886971–30137291 | Intergenic— |
|
|
| 94.68 | 2.12 | rs10940848-C | 0.14 | 0.44 | 0.19 |
| 5:86932232–87197535 | Intergenic— |
|
|
| 90.96 | 0.58 | rs710375-T | 0.84 | 0.69 | 0.72 |
| 6:153600685–153955693 |
|
| — | — | 90.43 | 0.35 | rs1221930-G | 0.22 | 0.49 | 0.63 |
| 9:18046709–18349993 | Intergenic— |
| — | — | 95.74 | 2.56 | rs10810942-A | 0.97 | 0.84 | 0.78 |
| 9:3962727–4529671 |
|
|
|
| 94.15 | 1.90 | rs7024944-A | 0.90 | 0.78 | 0.85 |
| 10:93369096–93542186 |
|
| — | — | 91.49 | 0.80 | rs150183914-T | 0.06 | 0.16 | 0.28 |
| 11:134196849–134622517 | Intergenic— |
|
|
| 91.49 | 0.80 | rs10750576-A | 0.90 | 0.75 | 0.81 |
| 12:47312454–47985899 |
| HC HR | — |
| 90.96 | 0.58 | rs855185-G | 0.25 | 0.49 | 0.56 |
| 14:21924207–22160553 |
|
|
|
| 93.09 | 0.90 | rs1263807-T | 0.14 | 0.36 | 0.30 |
| 18:72073738–72108787 |
|
|
|
| 94.15 | 1.90 | rs377380065-T | 0.04 | 0.16 | 0.22 |
| 21:38251558–39104180 |
|
|
|
| 88.83 | 0.31 | rs11911146-T | 0.36 | 0.55 | 0.48 |
| 22:45421536–45527815 |
|
|
|
| 92.55 | 1.24 | rs738548-A | 0.80 | 0.53 | 0.54 |
For each candidate region, the Native American local ancestry proportion (LAP) is included together with the corresponding local ancestry deviation (LAD). The allele frequency per ecoregion (H, C, and R for highlands, coast and rainforest, respectively) for the reference allele of a top outlier SNP within each candidate region is also indicated. When the region is identified by multiple tests, the top SNP is taken from the highest ranked and marked with *. Genes in intergenic candidate regions are only shown when their distance to the peak is <500 kbp. In bold, top 10 selection signals with the number representing the ranking of that signal; otherwise additional hits detected within the top 50 signals in the highlands.
PBS, Population Branch Statistic; iHS, integrated Haplotype Score; XP-EHH, cross-population Extended Haplotype Homozygosity; HC, highlands to coast comparison; HR, highlands to rainforest comparison; H (HC), Highlands signal for the XP-EHH highlands to coast comparison; H (HR), Highlands signal for the XP-EHH highlands to rainforest comparison; H, highlands.
Top 10 Selection Signals per Statistic and Population Comparison Exclusively Found in the Coast Ecoregion
| Candidate region | Candidate genes | PBS | iHS | XP-EHH | Nat Am LAP (%) | Nat Am LAD (SDs) | RS id—Ref. allele | Allele freq. in H | Allele freq. in C | Allele freq. in R |
|---|---|---|---|---|---|---|---|---|---|---|
| 2:227205102–227428810 | Intergenic— | CR | — |
| 78.77 | 0.81 | rs9789638-A | 0.80 | 0.84 | 0.65 |
| 2:231044200–231429220 |
|
|
|
| 81.51 | 1.63 | rs58941251-C | 0.63 | 0.88 | 0.81 |
| 2:98629966–100380563 |
|
|
|
| 72.60 | 1.05 | rs2632277-C | 0.76 | 0.48 | 0.65 |
| 3:195477791–196737295 |
|
|
|
| 76.03 | 0.02 | rs544688-G | 0.48 | 0.72 | 0.74 |
| 5:1875037–1948519 |
|
|
|
| 82.19 | 1.84 | rs200756822-G | 0.53 | 0.33 | 0.46 |
| 5:63309892–64417806 |
|
|
|
| 78.77 | 0.81 | rs16892721-A | 0.51 | 0.53 | 0.85 |
| 7:101060777–101449071 |
|
|
|
| 76.03 | 0.02 | rs28759973-T | 0.81 | 0.77 | 0.74 |
| 8:10390452–10485154 |
|
|
|
| 73.29 | 0.84 | rs150931842-G | 0.85 | 0.97 | 0.94 |
| 8:82335354–82849452 |
|
|
|
| 73.29 | 0.84 | rs11991098-A | 0.51 | 0.32 | 0.46 |
| 8:96200507–96223793 |
|
|
|
| 69.18 | 2.08 | rs77609822-C | 0.95 | 0.89 | 0.91 |
| 9:109802363–109982588 | Intergenic— | CR | — |
| 75.34 | 0.22 | rs12551497-A | 0.71 | 0.87 | 0.69 |
| 10:63248358–63583070 |
|
| — | — | 78.77 | 0.81 | rs1456279-A | 0.22 | 0.12 | 0.30 |
| 11:35123195–35360016 |
|
| — | — | 81.51 | 1.63 | rs4756196-G | 0.49 | 0.79 | 0.83 |
| 11:4434519–4460261 |
|
|
|
| 82.19 | 1.84 | rs10633520-A | 0.32 | 0.14 | 0.15 |
| 13:24656407–24789809 |
|
|
|
| 77.40 | 0.39 | rs60187376-T | 0.18 | 0.11 | 0.19 |
| 15:63290705–63372180 |
|
|
|
| 84.25 | 2.46 | rs72741190-A | 0.87 | 0.94 | 0.70 |
| 18:6667606–6776759 | Intergenic— |
|
|
| 71.92 | 1.25 | rs12456358-T | 0.69 | 0.91 | 0.91 |
| 19:10285806–13466988 |
|
|
|
| 71.23 | 1.46 | rs74257295-G | 0.90 | 0.94 | 0.59 |
| 19:51892016–52250216 |
| CH | — |
| 80.14 | 1.22 | rs39711-T | 0.68 | 0.83 | 0.81 |
| 19:52009560–52246157 |
|
|
| C (CH) | 80.82 | 1.42 | rs10500308-T | 0.61 | 0.37 | 0.54 |
| 20:37013181–37291377 |
|
|
|
| 79.45 | 1.01 | rs74983286-T | 0.96 | 0.95 | 0.81 |
| 22:45116127–45367385 |
|
|
|
| 77.40 | 0.39 | rs132410-A | 0.66 | 0.78 | 0.69 |
For each candidate region, the Native American local ancestry proportion (LAP) is included together with the corresponding local ancestry deviation (LAD). The allele frequency per ecoregion (H, C and R for highlands, coast and rainforest, respectively) for the reference allele of a top outlier SNP within each candidate region is also indicated. Genes in intergenic candidate regions are only shown when their distance to the peak is <500 kbp. In bold, top 10 selection signals with the number representing the ranking of that signal; otherwise additional hits detected within the top 50 signals in the coast.
PBS, Population Branch Statistic; iHS, integrated Haplotype Score; XP-EHH, cross-population Extended Haplotype Homozygosity. CR, coast to rainforest comparison; CH, coast to highlands comparison; C (HC), Coast signals from the XP-EHH highlands to coast comparison; C (RC), Coast signals from the XP-EHH rainforest to coast comparison; C, coast.
Top 10 Selection Signals per Statistic and Population Comparison Exclusively Found in the Rainforest Ecoregion
| Candidate region | Candidate genes | PBS | iHS | XP-EHH | Nat Am LAP (%) | Nat Am LAD (SDs) | RS id—Ref. allele | Allele freq. in H | Allele freq. in C | Allele freq. in R |
|---|---|---|---|---|---|---|---|---|---|---|
| 1:198957867–199707262 | Intergenic— |
|
|
| 88.89 | 2.03 | rs1325187-C | 0.43 | 0.51 | 0.31 |
| 2:12821349–13089226 |
|
| — | — | 81.48 | 0.42 | rs973977-T | 0.61 | 0.64 | 0.22 |
| 2:164399671–164879136 |
|
| — | — | 81.48 | 0.42 | rs13003002-T | 0.51 | 0.52 | 0.31 |
| 2:41894321–42063475 | Intergenic— |
|
|
| 79.63 | 0.02 | rs4952511-C | 0.60 | 0.66 | 0.63 |
| 2:54506070–55183537 |
|
|
| R (RC) | 70.37 | 1.99 | rs17046413-A | 0.54 | 0.65 | 0.54 |
| 3:105334386–105790348 |
|
| R |
| 87.04 | 1.63 | rs61138958-A | 0.40 | 0.55 | 0.31 |
| 3:192279892–192338861 |
|
|
|
| 81.48 | 0.42 | rs781417-C | 0.88 | 0.78 | 0.89 |
| 3:59667385–59821071 |
|
| — | — | 83.33 | 0.82 | rs1683366-G | 0.71 | 0.66 | 0.41 |
| 6:119335865–119655145 |
|
|
|
| 72.22 | 1.59 | rs612607-G | 0.74 | 0.82 | 0.91 |
| 7:116617532–117003695 |
|
|
|
| 83.33 | 0.82 | rs4612282-T | 0.60 | 0.73 | 0.81 |
| 8:59017474–59194917 |
|
|
|
| 87.04 | 1.63 | rs7843838-A | 0.49 | 0.56 | 0.33 |
| 9:73915883–74044936 | Intergenic— |
|
|
| 85.19 | 1.22 | rs147846688-G | 0.84 | 0.84 | 0.52 |
| 10:31860345–32054109 | Intergenic— | RC | — |
| 85.19 | 1.22 | rs796159-T | 0.46 | 0.51 | 0.74 |
| 12:97635817–97737126 | Intergenic— |
|
|
| 74.07 | 1.19 | rs6538791-A | 0.61 | 0.68 | 0.46 |
| 13:29222286–29411957 | Intergenic— |
|
|
| 79.63 | 0.02 | rs17561728-A | 0.69 | 0.80 | 0.44 |
| 13:97027560–97323933 |
|
|
|
| 85.19 | 1.22 | rs643765-T | 0.57 | 0.60 | 0.76 |
| 14:32389856–32488986 |
|
| R | — | 79.63 | 0.02 | rs4640079-G | 0.62 | 0.59 | 0.39 |
| 15:92467322–92717179 |
|
|
|
| 77.78 | 0.38 | rs371469142-C | 0.17 | 0.12 | 0.04 |
| 16:50361170–50505628 |
|
| — | — | 81.48 | 0.42 | rs142711401-G | 0.78 | 0.86 | 0.50 |
| 17:9536109–9724255 |
|
|
|
| 83.33 | 0.82 | rs380596-G | 0.79 | 0.87 | 0.98 |
| 18:65485121–65528209 |
|
|
|
| 85.19 | 1.22 | rs12454152-G | 0.72 | 0.84 | 0.70 |
| 19:48622545–48935653 |
|
|
|
| 77.78 | 0.38 | rs4801753-G | 0.59 | 0.72 | 0.76 |
| 21:43444731–43454614 |
|
|
|
| 79.63 | 0.02 | rs2839444-C | 0.84 | 0.80 | 0.57 |
For each candidate region, the Native American local ancestry proportion (LAP) is included together with the corresponding local ancestry deviation (LAD). The allele frequency per ecoregion (H, C and R for highlands, coast and rainforest, respectively) for the reference allele of a top outlier SNP within each candidate region is also indicated. When the region is identified by multiple tests, the top SNP is taken from the highest ranked and marked with *. Genes in intergenic candidate regions are only shown when their distance to the peak is <500 kbp. In bold, top 10 selection signals with the number representing the ranking of that signal; otherwise additional hits detected within the top 50 signals in the rainforest ecoregion.
PBS, Population Branch Statistic; iHS, integrated Haplotype Score; XP-EHH, cross-population Extended Haplotype Homozygosity. RH, rainforest to highlands comparison; RC, rainforest to coast comparison; R (HR), Rainforest signal from the XP-EHH highlands to rainforest comparison; R (RC), Rainforest signal from the XP-EHH rainforest to coast comparison; R, rainforest.