Literature DB >> 34037784

Broad Concordance in the Spatial Distribution of Adaptive and Neutral Genetic Variation across an Elevational Gradient in Deer Mice.

Rena M Schweizer1, Matthew R Jones1,2, Gideon S Bradburd3, Jay F Storz4, Nathan R Senner1,5, Cole Wolf1, Zachary A Cheviron1.   

Abstract

When species are continuously distributed across environmental gradients, the relative strength of selection and gene flow shape spatial patterns of genetic variation, potentially leading to variable levels of differentiation across loci. Determining whether adaptive genetic variation tends to be structured differently than neutral variation along environmental gradients is an open and important question in evolutionary genetics. We performed exome-wide population genomic analysis on deer mice sampled along an elevational gradient of nearly 4,000 m of vertical relief. Using a combination of selection scans, genotype-environment associations, and geographic cline analyses, we found that a large proportion of the exome has experienced a history of altitude-related selection. Elevational clines for nearly 30% of these putatively adaptive loci were shifted significantly up- or downslope of clines for loci that did not bear similar signatures of selection. Many of these selection targets can be plausibly linked to known phenotypic differences between highland and lowland deer mice, although the vast majority of these candidates have not been reported in other studies of highland taxa. Together, these results suggest new hypotheses about the genetic basis of physiological adaptation to high altitude, and the spatial distribution of adaptive genetic variation along environmental gradients.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  clinal selection; gene–environment association; high-altitude adaptation; local adaptation; population structure

Mesh:

Year:  2021        PMID: 34037784      PMCID: PMC8476156          DOI: 10.1093/molbev/msab161

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

For species that are continuously distributed across environmental gradients, the spatial scale of local adaptation is determined by the interplay between divergent selection and gene flow (Slatkin 1987; Lenormand 2002; Polechová and Barton 2015; Riesch et al. 2018; Bachmann et al. 2020). One way to gain insight into the spatial scale of local adaptation is through geographic cline analyses of allele frequencies across an environmental transect (Nagylaki 1975; Barton 1979, 1983; Stankowski et al. 2016; Storfer et al. 2018; Bradburd and Ralph 2019). When applied at a genomic scale, geographic cline analyses can be used to generate hypotheses about the environmental drivers of allele frequency variation and to identify processes that shape the distribution of adaptive genetic variation among interconnected populations (reviewed in Storfer et al. 2018). Locus-specific spatial patterns of genetic variation are determined by the combined effects of selection, gene flow, recombination rate, and the distribution of selection coefficients on nearby loci (Barton 1979, 1983; Lenormand 2002; Polechová and Barton 2015; Bachmann et al. 2020). These processes can result in clines in the frequencies of alleles at selected loci that are offset from one another, and from those of neutral loci, reflecting a combination of background population structure and selection at linked sites (Lenormand 2002; Yeaman and Whitlock 2011). However, because patterns of allele frequency variation are jointly determined by multiple demographic processes, as well as the spatial scale of selective pressures and the intensity of selection on specific loci, it is not always the case that neutral and adaptive clines are offset. For example, in zones of ecological transition, demographic processes and environmental selective pressures may align such that allele frequencies for both neutral and adaptive loci have similar geographic patterns (Endler 1977; Moore 1977). Under these scenarios, clines for adaptive and neutral loci may often be concordant. Given these expectations, an open and important question in evolutionary genetics is whether adaptive genetic variation tends to be structured differently than neutral variation along environmental gradients. Elevational gradients are particularly well-suited to address this question. High-elevation environments are characterized by extreme cold and low partial pressures of oxygen. Along elevational gradients, these environmental selection pressures covary in intensity and can often be combined with other covarying stressors (e.g., aridity). As a result, highland animals have evolved physiological modifications to cope with the environmental challenges of alpine environments. Many of these adaptations influence multiple steps of the oxygen transport cascade—the series of physiological processes that transport oxygen from the environment to respiring cells—to ensure matching of O2 supply and demand (fig. 1; e.g., Storz et al. 2010; Storz and Scott 2019). In North America, deer mice (Peromyscus maniculatus) have an elevational range of almost 4,500 m and populations at different elevations experience a wide range of oxygen availability, temperature, precipitation, and snowpack. Deer mice native to high elevations in the Rocky Mountains have evolved a suite of physiological changes that contribute to adaptive enhancements of whole-animal aerobic performance in hypoxia ( fig. 1; Cheviron et al. 2012, 2013, 2014; Lui et al. 2015; Ivy and Scott 2017, 2018; Lau et al. 2017; Dawson et al. 2018; Scott et al. 2015; Mahalingam et al. 2017, 2020; Nikel et al. 2018; Tate et al. 2017; Storz et al. 2019), a trait that influences survival in this species (Hayes and O’Connor 1999; Wilde et al. 2019).
Fig. 1.

Deer mice native to high elevations in the Rocky Mountains have evolved a suite of physiological changes that contribute to adaptive enhancements of whole-animal aerobic performance in hypoxia. For each step of the oxygen transport cascade (OTC), physiological differences between highland and lowland deer mice are summarized on the left and previously-collected representative physiological data are on the right. References: 1) Tate et al. (2020); 2) West et al. (2021); 3) Tate et al. (2017); 4) Ivy et al. (2020); 5) Scott et al. (2015); 6) Mahalingam et al. (2017); 7) Lui et al. (2015); 8) Nikel et al. (2018).

Deer mice native to high elevations in the Rocky Mountains have evolved a suite of physiological changes that contribute to adaptive enhancements of whole-animal aerobic performance in hypoxia. For each step of the oxygen transport cascade (OTC), physiological differences between highland and lowland deer mice are summarized on the left and previously-collected representative physiological data are on the right. References: 1) Tate et al. (2020); 2) West et al. (2021); 3) Tate et al. (2017); 4) Ivy et al. (2020); 5) Scott et al. (2015); 6) Mahalingam et al. (2017); 7) Lui et al. (2015); 8) Nikel et al. (2018). Although the physiological mechanisms of high-elevation adaptation have been well-characterized in deer mice, our understanding of the genetic basis of these adaptations is far from complete (Storz et al. 2019; Storz 2021). Transcriptomic analyses have demonstrated that some phenotypic adaptations are associated with differential regulation of core metabolic and cell signaling pathways (Cheviron et al. 2012, 2014; Scott et al. 2015; Velotta et al. 2016, 2020), but direct connections between genotypic and phenotypic variation have been restricted to studies of a few candidate genes (e.g., Storz et al. 2009, 2010; Natarajan et al. 2015; Schweizer et al. 2019; Wearing et al. 2020; Song et al. 2021). These studies have revealed sharp clines in allele frequency that are centered at relatively low elevations in the geographic transition between the Great Plains of the central United States and the Front Range of the Rocky Mountains in Colorado (Storz et al. 2012; Schweizer et al. 2019). Two key examples include Epas1 (endothelial PAS domain-containing protein 1), a gene involved in the transcriptional response to hypoxia, and tandemly linked gene duplicates that encode the β-subunits of adult hemoglobin (β-globin genes Hbb-T1 and Hbb-T2). Across the Rocky Mountains, clines for both Epas1 allele and β-globin haplotype frequencies are centered at ∼1,400 m above sea level (a.s.l.) and span approximately 600 m of vertical relief (Storz et al. 2012; Schweizer et al. 2019). Clines for both of these putatively adaptive loci are also statistically indistinguishable from those representing background population structure (Schweizer et al. 2019). Whether these loci are representative of other putatively adaptive loci across the genome is not known. Here, we examined elevational patterns of exome-wide variation to: 1) test whether genes that are associated with known, putatively adaptive, phenotypic differences between highland and lowland deer mice also bear signatures of positive selection in highland mice, and 2) identify additional loci that may have contributed to high-altitude adaptation. We complemented this analysis with a post hoc survey of similar population genomic studies of high-elevation adaptation in other endothermic species to determine whether targets of selection in deer mice are common in other taxa. Finally, once these putatively selected loci were identified, we then characterized geographic clines of allele frequency variation along an elevational gradient of almost 4,000 m of elevational relief. This analysis enabled us to examine the extent to which patterns of clinal variation at putatively adaptive loci differ from observed patterns across the remainder of the genome. Together, our results shed light on the spatial distribution of adaptive genetic variation across elevational gradients and suggest new hypotheses about the genetic basis of physiological adaptation to hypoxia.

Results

Sequencing Results

We sampled deer mice at low-elevation sites in the Great Plains, as well as along a transect spanning ∼4,000 m of vertical relief in the southern Rocky Mountains in Colorado (fig. 2; table 1). In addition, we included samples from a low-elevation site in Merced, CA, as a reference population for some of our selection tests (supplementary table S1, Supplementary Material online). We captured ∼77.4 Mb of sequence from each individual using a custom-designed capture array that targeted two sequence classes from the deer mouse genome, including: 1) the nuclear exome, and 2) 5,000 randomly selected nongenic segments of 500 bp each, to be used as a neutral control to model population structure (see Materials and Methods).
Fig. 2.

(A) Sampling localities for deer mice along the transition between the Great Plains and the Front Range of the Southern Rocky Mountains. Dashed box represents those populations grouped within the “Rocky Mountains” for analyses. (B) Within the Front Range of the Southern Rocky Mountains, we sampled four elevational transects: 1) the Pikes Peak Transect: Pike Low, Pike Medium, Pike High (purples, circles); 2) the Boreas Pass Transect: Pike, Lost Park, Colorado Trail (reds, squares); 3) the Mount Evans Transect: Mount Evans, Summit Lake, Echo Lake, Chicago Creek, Spring Gulch (greens, triangles); 4) the Niwot Peak Transect: Niwot Peak, Saint Vrain, Lefthand Canyon (blues, diamonds). Shape sizes in (B) are proportional to number of individuals (see table 1 for sample sizes). SV, Saint Vrain; BT, Big Thompson; NP, Niwot Peak; LC, Lefthand Canyon; BP, Boreas Pike; LP, Lost Park; CT, Colorado Trail.

Table 1.

Locations and Sampling Efforts for 256 Deer Mouse Samples.

LocalityTotal No. of Indiv.No. of Unrel. Indiv.Mean Elevation (m)Latitude (°)Longitude (°)
Rocky Moutains
Big Thompson1181,83240.4300−105.3149
Boreas Pike National Forest1063,63239.4071−105.9577
Chicago Creek542,92639.7053−105.6062
Colorado Trail862,31139.3480−105.3558
Echo Lake983,26439.6611−105.5779
Lefthand Canyon1062,30640.0757−105.4120
Lost Park1082,99739.3856−105.7370
Mesa Reservoir1051,63940.0739−105.2659
Mount Evans48324,30239.5872−105.6444
Niwot Peak853,49840.0948−105.5571
Pike High534,03438.8493−105.0589
Pike Low962,08938.8789−104.9417
Pike Middle443,29638.8872−105.0694
Saint Vrain852,61840.1754−105.5259
Spring Gulch1052,41139.7381−105.5304
Summit Lake1053,91239.5986−105.6406
Great Plains
Bonny Reservoir1091,15839.6173−102.2507
Fort Larned10762038.1831−99.2181
Lincoln372735840.8106−96.6803
Pawnee991,59440.7580−104.0309
Merced15n/a23937.5986−120.3415
Total256168
(A) Sampling localities for deer mice along the transition between the Great Plains and the Front Range of the Southern Rocky Mountains. Dashed box represents those populations grouped within the “Rocky Mountains” for analyses. (B) Within the Front Range of the Southern Rocky Mountains, we sampled four elevational transects: 1) the Pikes Peak Transect: Pike Low, Pike Medium, Pike High (purples, circles); 2) the Boreas Pass Transect: Pike, Lost Park, Colorado Trail (reds, squares); 3) the Mount Evans Transect: Mount Evans, Summit Lake, Echo Lake, Chicago Creek, Spring Gulch (greens, triangles); 4) the Niwot Peak Transect: Niwot Peak, Saint Vrain, Lefthand Canyon (blues, diamonds). Shape sizes in (B) are proportional to number of individuals (see table 1 for sample sizes). SV, Saint Vrain; BT, Big Thompson; NP, Niwot Peak; LC, Lefthand Canyon; BP, Boreas Pike; LP, Lost Park; CT, Colorado Trail. Locations and Sampling Efforts for 256 Deer Mouse Samples. After filtering, we proceeded with a set of 256 individuals genotyped at 5,546,642 high-quality biallelic variable sites, with a mean depth of coverage of 22.24 ± 7.90× (supplementary fig. S1, Supplementary Material online) and mean missing data of 9.07 ± 12.11% (supplementary table S1, Supplementary Material online). We also generated a data set for analyses within the Rocky Mountain and Great Plains populations (table 1) that consisted of 168 unrelated individuals genotyped at 6,029,294 sites of which 267,264 were nongenic LD-pruned sites to be used for analyses of population structure. Exploratory PCA plots did not show evidence of any consistent effects of missing data (supplementary figs. S2 and S3, Supplementary Material online).

Population Genetic Structure of Rocky Mountain and Great Plains Mice

To determine how populations should be grouped for downstream analyses, we assessed fine-scale patterns of population structure across the Great Plains-Rocky Mountains transition. We used a statistical framework that tests for discrete patterns of population structure against a backdrop of continuous geographic differentiation, implemented in the software conStruct (Bradburd et al. 2018). Analyses using conStruct showed strong and consistent support for two clusters—one comprised largely of Rocky Mountain populations (dashed box in fig. 2 Pike Low, Pike Medium, Pike High, Boreas Pike, Lost Park, Colorado Trail, Mount Evans, Summit Lake, Echo Lake, Chicago Creek, Spring Gulch, Niwot Peak, St Vrain, Lefthand Canyon, Mesa, Big Thompson, CO), the other centered on the Great Plains samples (Pawnee, CO; Bonny Reservoir, CO; Ft. Larned, KS; Lincoln, NE)—with a clear pattern of isolation by distance within each cluster (supplementary figs. S4 and S5, Supplementary Material online). Given that the Rocky Mountain samples were well-described by a single, spatial cluster, we treated populations from across the elevational sampling (fig. 2) as a single transect for downstream analyses.

Exome-Wide Signatures of Selection in Mt Evans Mice

To identify genes under positive selection in high-elevation mice, we used the population branch statistic (PBS; Yi et al. 2010) to measure the branch length of Mt Evans relative to both Lincoln—the lowest elevation site along our elevational transect—and Merced populations by a transformation of pairwise FST values (see Materials and Methods). Loci with elevated PBS values are indictive of loci under selection in Mt Evans. Overall, we observed relatively low differentiation across most of the 105,571 overlapping 5 kb (supplementary fig. S6, Supplementary Material online) windows, which is consistent with previous analyses of broad-scale population structure (Storz and Kelly 2008; Storz et al. 2012; Natarajan et al. 2015; Schweizer et al. 2019). Using data simulated under an inferred demographic model (Schweizer et al. 2019), we determined the upper 0.1% of the simulated distribution of PBS (PBS ≥ 0.112) and used that as a threshold to identify candidate windows. A total of 4,118 windows exceeded the top 0.1% of the simulated PBS distribution, and these windows represented 1,993 unique genes (fig. 3).
Fig. 3.

Distribution of (A) PBS, (B) RDA, and (C) Δπ scores for 105,571 5-kb windows (PBS, Δπ) and 1,109,794 SNPs (RDA) across the exome. (A) For PBS, points above the red line indicate windows with a PBS score above the 99.9th percentile of the demographically corrected null distribution. (B) For RDA, points above and below the upper and lower red lines, respectively, indicate SNPs with an RDA value that is ±3 standard deviations of the mean. (C) For delta pi, points below the red line indicate windows with a Δπ in the 99.9th percentile of the null distribution for ME versus LN, and pink dots indicate windows that are outliers for ME relative to both LN and CA. See text for details.

Distribution of (A) PBS, (B) RDA, and (C) Δπ scores for 105,571 5-kb windows (PBS, Δπ) and 1,109,794 SNPs (RDA) across the exome. (A) For PBS, points above the red line indicate windows with a PBS score above the 99.9th percentile of the demographically corrected null distribution. (B) For RDA, points above and below the upper and lower red lines, respectively, indicate SNPs with an RDA value that is ±3 standard deviations of the mean. (C) For delta pi, points below the red line indicate windows with a Δπ in the 99.9th percentile of the null distribution for ME versus LN, and pink dots indicate windows that are outliers for ME relative to both LN and CA. See text for details.

Genotype−Environment Analysis Narrows Down Top Candidate Genes

To complement the PBS analysis, we used redundancy analysis (RDA) to test for genotypic associations with elevation in the full set of samples from our Great Plains-Rocky Mountains transect. Our RDA analysis of the 1,109,794 sites (a subset with low missingness that were pruned for linkage, see Materials and Methods) revealed two significant RDA axes related to elevation (RDA1) and precipitation (RDA2) (supplementary fig. S7, Supplementary Material online). We identified a total of 15,713 unique candidate SNPs that loaded ±3 standard deviations from the mean on each axis (12,637 on RDA1, 2,230 on RDA2, and 846 on RDA1 and RDA2). Of the total RDA outliers, 89.4% (14,055 SNPs representing 5,643 genes; fig. 3) were more strongly correlated with elevation than precipitation. We chose to focus on the elevation outliers in subsequent analyses, but we acknowledge that there are potentially interesting links with precipitation. A total of 992 unique genes overlapped between the SNP-based RDA elevation outliers and the window-based PBS outliers (top 35 in table 2; all 992 in supplementary table S2, Supplementary Material online). These genes represent a set of loci (henceforth, “two-way” candidate loci) that are associated with elevation and exhibit evidence of selection in the Mt. Evans population. Among these genes, there was a significant functional enrichment of eight Biological Process, four Cellular Component, and four Molecular Function categories (supplementary table S3, Supplementary Material online). The most significantly enriched category was “anatomical structure development” (GO: 0048856; P-value: 6.42E−06). Other significant categories of interest included “ion binding” (GO: 0043167, P-value: 0.0109), “regulation of gene expression” (GO:0010468; P-value: 0.024), “muscle cell migration” (GO:0014812; P-value: 0.0293), and “G-protein coupled purinergic nucleotide receptor signaling pathway” (GO:0035589; P-value: 0.0307; supplementary table S3, Supplementary Material online).
Table 2.

Top 35 Candidate Genes That Are Outliers for Both PBS and RDA (two-way candidates).

Gene Symbol (P. maniculatus)Gene Name (P. maniculatus)Gene Symbol (M. musculus)Window LocationPBSSNP LocationRDA
Stx16 Syntaxin 16 Stx16 6501107.1:962501 − 9675001.9816501107.1:9678160.795
Aff2 AF4/FMR2 family member 2 Aff2 6501704.1:907501 − 9125001.7886501704.1:9107600.790
Hps3 HPS3 biogenesis of lysosomal organelles complex 2 subunit 1 Hps3 6501722.1:610001 − 6150001.5736501722.1:5900830.737
Gabrq Gamma-aminobutyric acid type A receptor theta subunit Gabrq 6501587.1:1297501 − 13025001.5306501587.1:13002790.657
Cpa3 Carboxypeptidase A3 Cpa3 6501722.1:272501 − 2775001.3816501722.1:2839510.486
Cwf19l2 CWF19-like 2 cell cycle control (S. pombe) Cwf19l2 6501598.1:270001 − 2750001.3246501598.1:3341830.777
Usp28 a Ubiquitin specific peptidase 28 Usp28 6501163.1:165001 − 1700001.3206501163.1:2082770.823
LOC102906935 Interstitial collagenase A-like Mmp1a 6501245.1:2847501 − 28525001.2996501245.1:28414070.596
Znfx1 Zinc finger protein OZF Znfx1 6501708.1:275001 − 2800001.2756501708.1:2841430.625
Dync2h1 Dynein cytoplasmic 2 heavy chain 1 Dync2h1 6501245.1:2427501 − 24325001.2686501245.1:23991270.808
Exoc6b Exocyst complex component 6B Exoc6b 6501567.1:1457501 − 14625001.2516501567.1:10610860.687
Gria4 Glutamate ionotropic receptor AMPA type subunit 4 Gria4 6501245.1:37501 − 425001.2196501245.1:402690.837
Kmt2a Lysine methyltransferase 2A Kmt2a 6501694.1:475001 − 4800001.1866501694.1:4816880.832
Ptprz1 Protein tyrosine phosphatase type IVA member 1 Ptprz1 6501405.1:145001 − 1500001.1696501405.1:1882270.646
Rims2 Regulating synaptic membrane exocytosis 2 Rims2 6501417.1:1140001 − 11450001.1526501417.1:11776900.456
Cpsf6 Cleavage and polyadenylation specific factor 6 Cpsf6 6501066.1:10150001 − 101550001.1366501066.1:101729040.852
LOC102906541 Pyrethroid hydrolase Ces2e-like Ces2b 6501344.1:235001 − 2400001.1026501344.1:2442020.686
Ctsz Cathepsin Z Ctsz 6501107.1:1315001 − 13200001.0956501107.1:13061340.417
Hspa4l Heat shock protein family A (Hsp70) member 4 like Hspa4l 6501366.1:1650001 − 16550001.0916501366.1:16823580.736
Edn3 Endothelin 3 Edn3 6501107.1:1652501 − 16575001.0716501107.1:16602230.656
Bud13 BUD13 homolog Bud13 6501163.1:3185001 − 31900001.0586501163.1:31868630.803
Ncam1 a Neural cell adhesion molecule 1 Ncam1 6501057.1:4725001 − 47300001.0586501057.1:47336620.781
Cpb1 Carboxypeptidase B1 Cpb1 6501722.1:225001 − 2300001.0526501722.1:2214960.736
Cadm1 Cell adhesion molecule 1 Cadm1 6501163.1:1602501 − 16075001.0416501163.1:15528480.831
Nnat Neuronatin Nnat 6501257.1:617501 − 6225001.0376501257.1:6201800.343
Supt20h SPT20 homolog SAGA complex component Supt20 6501125.1:3700001 − 37050001.0356501125.1:36700220.530
Med12l a Mediator complex subunit 12 like Med12l 6501814.1:835001 − 8400001.0326501814.1:8720410.661
LOC102923525 Probable G-protein coupled receptor 83 Gpr165 6501190.1:5030001 − 50350001.0126501190.1:50315320.496
Plch1 a Phospholipase C eta 1 Plch1 6501060.1:2525001 − 25300001.0076501060.1:24402720.704
Fxr1 FMR1 autosomal homolog 1 Fxr1 6501046.1:2650001 − 26550000.9946501046.1:26336640.512
Angpt1 Angiopoietin 1 Angpt1 6501143.1:3235001 − 32400000.9916501143.1:30327400.735
Mme a Membrane metallo-endopeptidase Mme 6501060.1:2105001 − 21100000.9726501060.1:21489090.693
Peg3 Paternally expressed 3 Peg3 6501712.1:890001 − 8950000.9676501712.1:8892160.781
LOC102914701 Arylacetamide deacetylase-like 2 Gm8298 6501814.1:1227501 − 12325000.9616501814.1:12554210.468
Nlgn1 Neuroligin 1 Nlgn1 6501046.1:11272501 − 112775000.9606501046.1:112810420.700

Gene is also a significant three-way outlier (see supplementary table S2, Supplementary Material online). Locations are provided as NW contig ID and position.

Top 35 Candidate Genes That Are Outliers for Both PBS and RDA (two-way candidates). Gene is also a significant three-way outlier (see supplementary table S2, Supplementary Material online). Locations are provided as NW contig ID and position.

Evidence of Recent Selective Sweeps

To identify loci that bear signatures of recent selective sweeps at high elevations, we also identified windows with significantly reduced nucleotide diversity (Δπ) in Mt Evans mice relative to the two lowland populations (see Materials and Methods). We used these data for two purposes. First, we tested whether the 992 two-way outliers exhibited significantly higher Δπ values than a random sample of nonoutlier loci, as would be expected if they had experienced a recent selective sweep at high elevation. This analysis revealed that the mean Δπ value for two-way outliers was indeed higher than that of random loci (Welch two sample t-test; t = −11.283, P < 2.2 × 10−16). Second, we performed a scan of Δπ outliers to identify additional loci that may have experienced a recent selective sweep at high elevation. As we did with PBS, we simulated nucleotide diversity for 100,000 5 kb windows under our demographic model to generate a null distribution of Δπ values. Our scan identified 314 windows with a Δπ value that was greater than the top 0.1% of the demographically corrected distribution for both highland/lowland comparisons (fig. 3). Of the 246 unique genes in these windows, 89 overlapped with the 992 two-way candidate loci (supplementary table S4, Supplementary Material online). Gene enrichment analyses of this reduced candidate set (henceforth, “three-way” candidate loci) showed an enrichment of GO categories such as “negative regulation of catecholamine secretion” (GO:0033604) and “G-protein coupled nucleotide receptor activity” (GO:0001608) (supplementary table S5, Supplementary Material online). There was also significant enrichment within the reactome pathway (R-HSA-417957) “P2Y receptors.”

Clinal Patterns of Variation for Candidate Loci

Given that we observed high levels of gene flow and low levels of differentiation among Rocky Mountain populations, we analyzed all populations, including those from the Great Plains, as a single elevational transect to assess clinal patterns of variation. For each of the 992 two-way candidate loci, we used the software HZAR (Derryberry et al. 2014) to fit a sigmoidal tanh cline model to the relationship between allele frequency and sampling elevation (see Materials and Methods). We estimated the cline center (c) and width (w) for 992 SNPs within our two-way candidate subset (supplementary table S6, Supplementary Material online). The variables c and w characterize the geographic location along the transect where the allele frequency turnover is greatest and the geographic region corresponding to the inverse of the maximum cline slope, respectively. We compared these best-fit clines with a cline generated using PC1 of our nongenic SNPs (supplementary fig. S8, Supplementary Material online; see Materials and Methods), and identified a subset of two-way candidate loci with cline centers (n = 297; 29.9%) and widths (n = 240; 24.2%) outside of the 95% confidence interval (CI) of the nongenic PC1 cline (fig. 4). Genes with clines centers that were offset upslope (i.e., occurring at higher elevation; n = 158 or 15.9%) of the nongenic PC1 cline showed an enrichment of functions related to catecholamine secretion and multiple categories related to odor perception (“sensory perception of smell,” “odorant binding,” and “olfactory receptor activity”; supplementary table S7, Supplementary Material online), whereas genes with cline centers that were offset downslope (n = 297 or 29.9%) showed enrichment of reactome pathways related to transport and cell junction organization (supplementary table S7, Supplementary Material online). There were no significantly enriched GO categories (supplementary table S8, Supplementary Material online) among the genes with cline widths that were narrower than the nongenic PC1 cline, whereas genes with cline widths greater than the nongenic PC1 cline only showed enrichment of broad categories such as “cell periphery” and “membrane part” (supplementary table S8, Supplementary Material online). The majority of our two-way candidate loci, however, were characterized by cline centers (n = 695 or 70.1%) or widths (n = 752 or 75.8%) that were indistinguishable from the PC1 cline representing background population structure.
Fig. 4.

Histograms of cline centers and cline widths for 992 SNPs from the two-way candidate loci. Solid and dashed lines represent the mean and upper and lower 95% CIs, respectively, of the nongenic PC1 cline.

Histograms of cline centers and cline widths for 992 SNPs from the two-way candidate loci. Solid and dashed lines represent the mean and upper and lower 95% CIs, respectively, of the nongenic PC1 cline.

Post Hoc Review of Candidate Genes in High-Elevation Vertebrates

To place the results of our selection scan into a broader context and to assess the degree of overlap in the genomic targets of selection across other similar studies, we performed a post hoc review of 14 studies in nine different species of terrestrial vertebrates (supplementary table S9, Supplementary Material online). We limited our survey to those that used population genomic data sets and allele-frequency-based tests of selection to allow for more direct comparison with our results. This survey identified a total of 3,983 unique genes that have been identified as selection candidates in other studies. Of our 992 two-way outlier candidate genes, only 163 (16.4%) have been identified in other selection scans of high-altitude populations. Of those 163 genes, 139 (85.3%) were only identified in one other study, 22 were identified in two additional studies, and 2 were identified in three or more additional studies (Epas1 and Kcnma1).

Discussion

In this study, we took a population genomic approach to assess the concordance of spatial patterns of genetic variation between adaptive and neutral loci. We combined selection scans with genotype−environment associations and geographic cline analysis along an elevational gradient of nearly 4,000 m vertical relief to identify loci that bear the signature of natural selection at high elevations. We then compared spatial patterns of allele frequency variation at putatively adaptive loci with those at neutral loci. The combined results of these analyses reveal three primary insights. First, multiple lines of evidence indicate that altitude-related selection has shaped patterns of genetic variation across a large portion of the genome. As detailed below, we identified hundreds of loci that bear signatures of selection in highland population samples, many of which can be plausibly linked to known physiological differences between high- and low-elevation deer mice. Second, the vast majority of the selection candidates we identified have not been reported in similar studies of other highland taxa, suggesting potentially novel candidate genes and physiological pathways for adaptation to high elevation. Finally, our geographic cline analysis revealed that most loci under selection show clinal patterns of allele frequency variation that are concordant with background population structure, as only a small subset of putatively adaptive loci are characterized by cline centers shifted significantly up- or downslope from the genome-wide average.

Genomic Signatures of Selection on Genes That Span Multiple Physiological Systems

Recent studies have uncovered many of the physiological traits involved in high-elevation adaptation in deer mice, in addition to some of their gene regulatory underpinnings (reviewed in Storz et al. 2019; Storz and Cheviron 2021). This suite of physiological adaptations spans multiple steps in the transport pathways for oxygen and metabolic substrates (fig. 1). Consistent with this pattern of multitrait adaptation, we found hundreds of loci bearing signatures of positive selection at high elevations, many of which are likely involved in functions that relate to processes that influence oxygen homeostasis and aerobic metabolism (fig. 1). For brevity, we highlight just a few of the most promising candidate genes that may relate to known physiological differences between highland and lowland deer mice below. A list of the top 35 two-way candidates is presented in table 2, and the full list of selection candidates is provided in the supplemental materials (supplementary tables S2, S4, and S6, Supplementary Material online). A key aspect of circulatory oxygen and metabolic fuel delivery is the regulation of blood flow via dynamic modifications of local blood pressure. Several recent studies have documented adaptive modifications of blood pressure regulation under hypoxia in highland deer mice. For example, highland mice exhibit reduced pulmonary hypertension under hypoxia compared with their lowland counterparts (Velotta et al. 2018); this lack of a global vasoconstrictive response likely contributes to their ability to achieve higher pulmonary O2 extraction (Tate et al. 2017) (fig. 1). Angiotensins, and their precursor angiotensinogen, are key regulators of blood pressure and fluid homeostasis (Wu et al. 2011). One of our most compelling two-way outliers is an angiotensin receptor (Agtr1b, angiotensin II receptor type 1) that mediates cardiovascular effects of angiotensin, including vasoconstriction (Bonnardeaux et al. 1994). Similarly, several other outliers are P2Y receptors—purinergic G protein-coupled receptors that also play a role in vasodilation (Burnstock and Ralevic 2013; Sluyter 2015). It is conceivable that allelic variation at these loci could contribute to the modifications of adaptive pulmonary function in highland deer mice. Highland deer mice also show a greater capacity for tissue oxygen extraction (Tate et al. 2020), in part because of evolved differences in skeletal muscle capillarity, fiber composition, and mitochondrial density and distribution (Lui et al. 2015; Scott et al. 2015; Mahalingam et al. 2017). One potential candidate gene that could contribute to these differences in muscle phenotype is Itga7 (integrin subunit alpha 7; two-way outlier), which may have a role in the formation of muscle fibers (Mayer et al. 1997). Other candidate genes seem to be related to these phenotypic differences as well. One example is Angpt1 (angiopoietin; two-way outlier), which plays an important role in vascular development and angiogenesis, as well as blood vessel maturation. Due to their effects on relevant muscle phenotypes, variants at these loci may contribute to known differences in tissue diffusion capacity between highlanders and lowlanders (Lui et al. 2015; Scott et al. 2015; Mahalingam et al. 2017; Nikel et al. 2018; Mahalingam et al. 2020). Finally, highland deer mice exhibit greater whole-animal aerobic performance under hypoxia that is associated with a number of measured phenotypes, including differential regulation of core metabolic pathways that contribute to lipid and carbohydrate metabolism and oxidative phosphorylation (Cheviron et al. 2012, 2013; 2014; Lau et al. 2017). Several genes that participate in these processes also bear signatures of selection in the highland population. One example is Ndufc1 (NADH: Ubiquinone Oxidoreductase Subunit C1), which encodes a subunit in the first enzyme complex of the electron transport chain in mitochondria. Previous work on deer mice has demonstrated an increased mitochondrial respiratory capacity in muscle, and other evolved changes in mitochondrial physiology of high-altitude populations (Mahalingam et al. 2017; fig. 1); variation at Ndufc1 and other genes in related pathways may contribute to these differences in mitochondrial function (supplementary tables S2, S4, and S6, Supplementary Material online). Importantly, although we have highlighted a few key genes related to several steps of the oxygen homeostasis and aerobic metabolism pathways, we have not demonstrated associations between putatively adaptive phenotypes and allelic variation at these loci. Many of the other candidate genes we have identified, but did not highlight, could also be involved in these complex physiological processes (supplementary tables S2, S4, and S6, Supplementary Material online), although it is also possible that our candidate gene lists contain false positives. Future work should aim to experimentally document phenotypic effects of mutations in some of the most compelling candidates, and to test for their effects on fitness (Barrett and Hoekstra 2011). Additionally, further efforts should focus on formal tests of polygenic adaptation by determining specific alleles associated with phenotypes of interest (e.g., through a genome-wide association study), and demonstrating that those alleles have population frequency differences that consistently increase or decrease (Berg and Coop 2014; Jeong et al. 2018).

Most Genomic Targets of Selection Are Unique to Deer Mice

Recent surveys in humans and domesticated animals have documented overlap in the genomic targets of selection among independent high-elevation populations, suggesting that adaptation to these environments may often involve repeated selection on a common set of genes (Witt and Huerta-Sanchez 2019; Storz and Cheviron 2021). Although a number of recent studies have documented selection on obvious candidate genes, such as Epas1, in independent lineages of wild vertebrates, these examples are often cherry-picked from a list of outliers specifically because they have been highlighted in other studies which may therefore give a biased view of the degree of convergence in selection targets at high elevation. In our study, only two genes—Epas1 and KCNMA1 (potassium calcium-activated channel subfamily M alpha 1)—that were detected as outliers in highland deer mice have been identified in more than three additional highland populations of other species. Epas1 encodes the oxygen sensitive subunit of hypoxia-inducible factor, a transcription factor that coordinates the transcriptional response to hypoxia, and was an outlier in 9 of the 14 studies representing 7 different species (Simonson et al. 2010; Yi et al. 2010; Li et al. 2014; Zhang et al. 2014; Song et al. 2016; Liu et al. 2019). KCNMA1 encodes the alpha-pore of calcium-sensitive potassium channels that influences vascular tone and blood flow by regulating K+ efflux in vascular smooth muscle cells (Brayden and Nelson 1992; Knaus et al. 1995), and was a target of selection in four different species in addition to deer mice (Zhang et al. 2014; Qu et al. 2015; Song et al. 2016; Jeong et al. 2018). These two examples aside, the general lack of overlap suggests a diversity of different mechanisms underlying high-elevation adaptation. However, we cannot rule out the possibility that the lack of overlap is, to some extent, due to the presence of false positives in our or other studies. Nonetheless, the unique selection targets identified here may provide novel insight into physiological mechanisms to surmount the challenges of high-elevation environments.

Spatial Patterns in Adaptive Genetic Variation Are Largely Consistent With Population Structure

Across environmental gradients, the relative strength of selection and gene flow shape spatial patterns of genetic variation (Slatkin 1987; Lenormand 2002). When many loci are subject to spatially varying selection, variation in local recombination rates may lead to variable levels of gene flow across loci (Endler 1977). This process can result in allele frequency clines for selected loci that are discordant with background population structure, though this need not always be the case (Yeaman and Whitlock 2011; Lenormand 2002). Our analysis revealed that the majority of putatively selected loci had cline centers (70.1%) and cline widths (75.8%) that fell within the 95% CI of genome-wide estimates from nongenic regions. This result suggests that, for the most part, loci that have experienced a history of altitude-related selection do not exhibit patterns of spatial structure that are distinct from background population structure. Not all cline shapes were concordant with population structure, however, suggesting that locus-specific levels of gene flow may structure allelic variation at such loci differently than background neutral genetic variation. This finding supports previous results from simulation studies suggesting that spatially varying selection can structure groups of locally adapted alleles over large geographic distances, even among populations that are connected by high rates of gene flow (Yeaman and Whitlock 2011). Often, the loci with cline shapes that deviated from background population structure were not enriched for specific functions, suggesting that this class of loci participate in a broad range of biological functions. The one exception is for genes with clines that are centered upslope of the nongenic PC1 cline. This list of genes showed an enrichment of functions related to catecholamine secretion and odor perception. Catecholamine synthesis and secretion by the adrenal gland is responsive to environmental hypoxia, and catecholamines can affect many physiological processes that impinge on oxygen delivery and consumption, including heart rate and vasoconstrictive responses (Brown et al. 2009). High- and low-elevation deer mice differ in their catecholamine response to hypoxia (Scott et al. 2019), and it is possible that these loci may contribute to this physiological difference. For example, previous work has shown that Epas1, a two-way outlier, is a target of selection in deer mice and is associated with variation in heart rate under hypoxia (Schweizer et al. 2019). Allelic variation in Epas1 causes a differential regulation of the catecholamine biosynthesis pathway (Schweizer et al. 2019), and the specific mutation under selection at high elevation disrupts interaction with a transcriptional coactivator, thus providing a possible mechanistic explanation (Song et al. 2021). We identified several additional promising candidate genes, such as P2RY1, DRD2, and P2RY12, that are related to catecholamine regulation under hypoxia (fig. 5). Studies of the phenotypic effects of allelic variation at these loci would be a fruitful area for future work.
Fig. 5.

Maximum-likelihood allele frequency clines for catecholamine genes located upslope of the nongenic PC1 cline. For the nongenic cline, the y-axis shows the PC1 values, whereas for the DRD2, P2RY1, and P2RY12 clines, the y-axis represents the frequency of the nonreference allele. Shaded regions show the 95% CIs.

Maximum-likelihood allele frequency clines for catecholamine genes located upslope of the nongenic PC1 cline. For the nongenic cline, the y-axis shows the PC1 values, whereas for the DRD2, P2RY1, and P2RY12 clines, the y-axis represents the frequency of the nonreference allele. Shaded regions show the 95% CIs. No other grouping of loci with discordant clines exhibited clear gene ontology enrichment. Loci with clines that were centered downslope, as well as those with widths that were significantly narrower or greater than the PC1 cline, were not enriched for terms that were obviously related to abiotic selective pressures along elevational gradients. We note that functional enrichments such as those presented above are post hoc tests that generate hypotheses to be addressed in future work, rather than formal tests of previously specified hypotheses.

Conclusions

Our results demonstrate that hundreds of genes have experienced a history of spatially varying selection at high elevation in deer mice, and many of these loci participate in physiological processes that underlie known phenotypic differences between highland and lowland populations. The vast majority of selection targets we identified have not been reported in similar studies of other highland terrestrial vertebrates. Although convergence in phenotypic traits is relatively common in high-altitude vertebrates (reviewed in Storz and Scott 2019), the general lack of overlap among selection targets between deer mice and other highland species suggests that the genetic underpinnings of this phenotypic convergence may be more idiosyncratic. Finally, our results also show that, at least for deer mice along this elevational gradient, adaptive and neutral genetic variation tend to be structured similarly across the landscape. If this is a general outcome, it may have important implications not only for our understanding of the process of local adaptation, but also for more directed applications in conservation genetics, such as assisted migration and genetic rescue.

Materials and Methods

Sampling Scheme

Tissue samples of deer mice were collected from a variety of sources. In the field, we live-trapped deer mice using baited Sherman traps. We also used liver samples from euthanized mice, and blood or tail clip samples from mice that were part of a mark-recapture study (e.g., Wilde et al. 2019), in which individuals were released after capture. To augment our sample sizes for some sites, we used tissue samples from previously collected museum specimens that are cataloged in the mammal collections of the Denver Museum of Nature and Science, the University of Arizona Museum of Natural History Museum, the Museum of Vertebrate Zoology, and the Museum of Southwestern Biology (see Natarajan et al. 2015).

Exome Design, Capture, and High-Throughput Sequencing

To identify all annotated exons, we downloaded the Peromyscus maniculatus bairdii general feature file (GFF) v101 from NCBI (ftp://ftp.ncbi.nih.gov/genomes/Peromyscus_maniculatus_bairdii/GFF/), and extracted all features annotated as an exon. The final set of unique, nonpseudogenized exonic regions consisted of 218,065 exons in 25,246 genes. We designed 500 bp nongenic regions to be located at least 10 kb from any annotated gene, outside repetitive DNA regions, containing GC content within one standard deviation of the mean GC content, and located on contigs larger than 20 kb (Wall et al. 2008; Schweizer et al. 2016). Five thousand autosomal regions were randomly chosen that satisfied these criteria. In total, a custom Roche NimbleGen SeqCap EZ Library captured 226,973 regions (77,559,614 bp). We used 256 mouse specimens for the analysis of exomic variation (table 1), 100 of which were used in a previous study (NCBI Short Read Archive PRJNA528923) (Schweizer et al. 2019). We extracted DNA from tissue samples of 156 deer mice (supplementary table S1, Supplementary Material online) using a Qiagen DNeasy kit and sheared DNA to ∼300 bp using a Covaris E220 Focused Ultrasonicator. A genomic library for each individual was prepared using 200 ng of sheared DNA with a NEBNext Ultra II kit and unique index following the manufacturer’s protocols (New England Biolabs). Batches of 24 indexed libraries were pooled, then target enriched and PCR amplified according to the NimbleGen Seq Cap EZ protocol (Roche). Quality control for each capture pool included a check of its size distribution on an Agilent TapeStation, as well as a check for enrichment of targeted regions and lack of enrichment of nontargeted regions using custom primers and the Luna qPCR master mix (NEB). Each capture pool of 24 individuals was sequenced with 100 bp paired-end sequencing on an Illumina HiSeq 4000. All 256 individuals were processed concurrently. Data preprocessing and variant discovery on all samples followed the recommendations of the Broad Institute GATK v3.7-0-gcfedb67 Best Practices pipeline (https://software.broadinstitute.org/gatk/best-practices/workflow; last accessed June 2, 2021) and followed our previously published methods (Schweizer et al. 2019). Briefly, we trimmed sequence reads of adapter sequences and bases with quality below 20 using fastq_illumina_filter 0.1 (http://cancan.cshl.edu/labmembers/gordon/fastq_illumina_filter/; last accessed November 1 2018) and trim_galore 0.3.1 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/; last accessed June 2, 2021). Forward and reverse reads were aligned and mapped to the P. maniculatus baiardii genome using bwa mem (Li and Durbin 2010), then duplicates were removed using samtools rmdup (Li et al. 2009). After two rounds of GATK Base Quality Score Recalibration, we genotyped each sample using GATK HaplotypeCaller with the “–emitRefConfidence” flag, then called variants using GATK GenotypeGVCFs. GVCFs were combined and filtered to remove SNPs with a quality of depth <2.0, an FS >60, mapping quality <40, mapping quality rank sum <−12.5, and read position rank sum <−8.0. This process identified 106,883,914 variable sites in at least 1 of our 256 individuals. However, three sampled individuals were dropped from further analysis due to high levels of missing data (>50% sites with missing genotype calls). After assessing the quality of filtered reads using the vcftools package (Danecek et al. 2011), we further filtered variants so that a site was called in at least 75% of individuals, was biallelic, and had a minimum depth of 5 and genotype quality of 20. To focus our efforts on the Rocky Mountains-Great Plains transect, we removed the geographically disparate Merced samples and used an LD-pruned set of nongenic SNPs (using the “–indep-pairwise 50 5 0.5” flag in PLINK; Purcell et al. 2007). These nongenic SNPs are best suited for assessing neutral population processes. We also identified a subset of unrelated individuals using PRIMUS (Staples et al. 2013) and a maximum identity-by-descent of 0.1875, as recommended in Anderson et al. (2010). The method implemented in the software conStruct is well-suited to our sampling design and the geographic distribution of deer mice, where there is a high likelihood of isolation-by-distance and the sampling is discontinuous (Bradburd et al. 2018). We ran conStruct on two data sets: All unrelated samples (N = 168), and just those east of the Rocky Mountains (i.e., excluding Bonny Reservoir, Ft. Larned, and Lincoln; N = 117). We evaluated both the spatial and nonspatial models with the number of discrete populations (K) varying between 1 and 5. For each model, we ran 2 replicate analyses, each for 5,000 iterations. The performance of the MCMC was assessed by comparison between replicate runs and visual inspection of marginal parameter estimate trace plots. We compared models using the “layer comparisons” approach outlined in Bradburd et al. (2018).

Exome Scan for Selection

To calculate PBS, for each population pair we used vcftools to calculate pairwise FST in 5-kb windows with a step size of 2.5 kb (following Jones et al. 2018) and specified vcftools to only use sequenced sites for those calculations. We then transformed FST and calculated PBS for each window following Yi et al. (2010). We calculated PBS for 105,571 overlapping 5 kb windows from the exomes of 48 Mt Evans, 37 Lincoln, and 15 Merced mice, then, as a demographic control, simulated the distribution of FST under our previously characterized demographic history of these same populations (Schweizer et al. 2019). Using point estimates from our previously characterized demographic history of the Mt Evans, Lincoln, and Merced mice (Schweizer et al. 2019), we used msms (Ewing and Hermisson 2010) to simulate the distributions of FST for 17,000 5-kb windows. We calculated PBS values for all simulated windows, then used the 99.9% quantile of the simulated distribution to set the significance threshold for the empirical data using the ecdf() function in R. For each 5 kb window, we identified which genes overlapped that window using the bedtools intersect function and a bed file of targeted exonic regions.

Multivariate Genotype−Environment Associations

RDA characterizes a response matrix (here, genotypes) in relation to an explanatory matrix (here, environmental data) using multivariate linear regressions, followed by a PCA to produce canonical axes (Van Den Wollenberg 1977; Legendre et al. 2011). We implemented RDA in our set of mice sampled across the entire transect (N = 165 unrelated individuals) following the recommendations of (Forester et al. 2018). Three of the 168 individuals were not included in RDA because high missingness might bias the genotype imputation done for that analysis. RDA shows low false positive and high true positive rates under a variety of selection and demographic scenarios (Forester et al. 2018). Briefly, we obtained population-level environmental data for precipitation and temperature from BIOCLIM (Hijmans et al. 2005) using the getData function within the “raster” package (https://rspatial.org/raster/; last accessed June 2, 2021). We also obtained estimates of snowpack (measured as daily mean snow water equivalent from 1915 to 2011) from Livneh data provided by the NOAA/OAR/ESRL PSL, Boulder, Colorado, USA at their web site at https://psl.noaa.gov/ (last accessed June 2, 2021) (Livneh et al. 2013). We initially chose 11 environmental variables summarizing precipitation, temperature, and snowpack, then removed those variables with a Pearson correlation greater than |0.70|, as recommended by Dormann et al. (2013), while prioritizing the retention of elevation as a variable. Given that many environmental variables are highly correlated with elevation (supplementary fig. S9, Supplementary Material online), we subsequently chose a subset of two variables (elevation and annual precipitation) for further analysis. With this approach we aimed to identify the variable (or correlated variables) that caused the observed spatial patterning of allele frequencies. Due to computational limitations within RDA, we subsampled our genotype data by further pruning for linkage (r2 ≤ 0.75) and missingness (≤1%) in PLINK, resulting in a set of 1,109,794 sites. Given that RDA requires no missing data and removing sites with any missing data would have resulted in the loss of 483,261 (43.5%) sites, we imputed missing genotypes for our data set (n = 483,261 or 0.26% of all data) using the most common genotype across all individuals (Forester et al. 2018). After running RDA, we identified significant constrained axes with a P-value of <0.05 after 999 permutations of the genotypic data. We identified candidate SNPs as those with a loading greater than 3 standard deviations of the mean and characterized each candidate SNP by the environmental variable with which it had the highest correlation.

Detection of Selective Sweeps

To identify loci that bear signatures of selective sweeps in the Mt Evans population, we identified 5 kb windows with a reduced nucleotide diversity (π) both in Mt Evans relative to Lincoln (by calculating delta pi (Δπ), or πMt Evans−πLincoln), and in Mt Evans relative to Merced (by calculating Δπ as πMt Evans−πMerced). With this approach we attempted to mirror the polarized calculations of the PBS. As with PBS, we simulated nucleotide diversity for 100,000 5 kb windows under our demographic model. Then, we used a custom Python script to calculate empirical nucleotide diversity in overlapping 5 kb windows and a demographically controlled threshold for significance of 99.9% (see Supplemental Materials).

GO Enrichment

We performed functional enrichment analysis to test for significantly enriched gene sets and functional categories of genes within our two-way and three-way outlier sets that may reflect a history of altitude-related selection. For each P. maniculatus gene, we identified the orthologous Mus musculus gene, then used gProfiler (Reimand et al. 2007, 2011, 2016) to analyze enrichment for GO, biological pathways such as Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto 2000), and Reactome (Jassal et al. 2019), regulatory motifs in DNA, protein databases, and human phenotype ontology (Reimand et al. 2016). We used strong hierarchical filtering (returning only the most general term per parent term) to identify enriched gene functional categories below a false discovery rate corrected significance of P < 0.05.

Clinal Patterns of Variation for Candidate Genes

HZAR fits genetic data to equilibrium cline models using an MCMC algorithm and estimates parameters such as the cline center (c) and width (w); c and w characterize the geographic location along the transect where the allele frequency turnover is greatest and the geographic region corresponding to the inverse of the maximum cline slope, respectively. The values of these parameters can be estimated within HZAR by 15 models that vary in the number of estimated parameters (e.g., exponential decay on either side of the cline center, minimum and/or maximum allele frequencies). For each of our two-way candidate loci, we identified the highest ranked SNP (first by occurrence in an outlier PBS window and then ranked by maximum RDA correlation) with a ≥75% call rate amongst our 165 unrelated individuals from the Rocky Mountain and Great Plains populations, then calculated that SNP’s allele sample frequency for each population sampled across our elevational transect. For each of those SNPs, we modeled the cline shape parameters using HZAR and determined the cline center and cline width. We used sampling elevation in meters as a proxy for geographic distance and used a burn-in of 10,000 iterations. To set a neutral background expectation for clinal patterns of allele frequency variation, we performed a PCA on the set of 282,617 LD-pruned nongenic SNPs within PLINK. There is a clear pattern of geographic structure on both the first (PC1; east/west) and second principal component axes (PC2; north/south). Therefore, we used PC1 values for each individual to fit clines in HZAR, as has recently been done elsewhere (Hague et al. 2020). We used similar parameters as when generating the allele frequency clines, with appropriate modifications for trait data (e.g., avoiding models that set scaling to a fixed minimum and maximum of 0 and 1, respectively). Because principal components, especially those generated on geographically structured data, can be shaped by mathematical artifacts that can make interpretation unintuitive (Novembre and Stephens 2008), we visually confirmed that the cline fit to PC1 is representative of the clines fit to 20 SNPs with the highest loadings on PC1 (supplementary fig. S10, Supplementary Material online). We identified statistically discordant clines as those SNPs whose cline center or cline width CIs do not overlap with the CIs of the neutral PC1 cline.

Degree of Candidate Gene Overlap with Other High-Elevation Studies

To determine the degree of overlap in the genomic targets of selection across other similar studies, we performed a post hoc review of 14 studies in nine different species of terrestrial vertebrates (supplementary table S9, Supplementary Material online). Prior to spring 2020 (our last search date), there were to our knowledge 56 studies of high-altitude adaptation; we subsequently eliminated studies that were not population genomic comparisons, did not publish a complete list of outlier genes, did not use a comparable allele-frequency based test of selection, did not sample a highland population at a high enough elevation to be comparable, and/or focused on an ectotherm.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  78 in total

1.  Effects of chronic hypoxia on diaphragm function in deer mice native to high altitude.

Authors:  N J Dawson; S A Lyons; D A Henry; G R Scott
Journal:  Acta Physiol (Oxf)       Date:  2018-02-01       Impact factor: 6.311

2.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

3.  Interpreting principal component analyses of spatial population genetic variation.

Authors:  John Novembre; Matthew Stephens
Journal:  Nat Genet       Date:  2008-04-20       Impact factor: 38.330

4.  Maladaptive phenotypic plasticity in cardiac muscle growth is suppressed in high-altitude deer mice.

Authors:  Jonathan P Velotta; Catherine M Ivy; Cole J Wolf; Graham R Scott; Zachary A Cheviron
Journal:  Evolution       Date:  2018-11-01       Impact factor: 3.694

5.  Geographic cline analysis as a tool for studying genome-wide variation: a case study of pollinator-mediated divergence in a monkeyflower.

Authors:  Sean Stankowski; James M Sobel; Matthew A Streisfeld
Journal:  Mol Ecol       Date:  2016-06-10       Impact factor: 6.185

6.  Gene flow and the geographic structure of natural populations.

Authors:  M Slatkin
Journal:  Science       Date:  1987-05-15       Impact factor: 47.728

7.  Physiological Genomics of Adaptation to High-Altitude Hypoxia.

Authors:  Jay F Storz; Zachary A Cheviron
Journal:  Annu Rev Anim Biosci       Date:  2020-11-23       Impact factor: 8.923

8.  Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis.

Authors:  Jeffrey Staples; Deborah A Nickerson; Jennifer E Below
Journal:  Genet Epidemiol       Date:  2012-09-19       Impact factor: 2.135

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.