Literature DB >> 35202502

Whole-genome sequencing illuminates multifaceted targets of selection to humic substances in Eurasian perch.

Mikhail Ozerov1,2,3, Kristina Noreikiene4, Siim Kahar4, Magnus Huss5, Ari Huusko6, Toomas Kõiv7, Margot Sepp7, María-Eugenia López1, Anna Gårdmark5, Riho Gross4, Anti Vasemägi1,4.   

Abstract

Extreme environments are inhospitable to the majority of species, but some organisms are able to survive in such hostile conditions due to evolutionary adaptations. For example, modern bony fishes have colonized various aquatic environments, including perpetually dark, hypoxic, hypersaline and toxic habitats. Eurasian perch (Perca fluviatilis) is among the few fish species of northern latitudes that is able to live in very acidic humic lakes. Such lakes represent almost "nocturnal" environments; they contain high levels of dissolved organic matter, which in addition to creating a challenging visual environment, also affects a large number of other habitat parameters and biotic interactions. To reveal the genomic targets of humic-associated selection, we performed whole-genome sequencing of perch originating from 16 humic and 16 clear-water lakes in northern Europe. We identified over 800,000 single nucleotide polymorphisms, of which >10,000 were identified as potential candidates under selection (associated with >3000 genes) using multiple outlier approaches. Our findings suggest that adaptation to the humic environment may involve hundreds of regions scattered across the genome. Putative signals of adaptation were detected in genes and gene families with diverse functions, including organism development and ion transportation. The observed excess of variants under selection in regulatory regions highlights the importance of adaptive evolution via regulatory elements, rather than via protein sequence modification. Our study demonstrates the power of whole-genome analysis to illuminate the multifaceted nature of humic adaptation and provides the foundation for further investigation of causal mutations underlying phenotypic traits of ecological and evolutionary importance.
© 2022 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.

Entities:  

Keywords:  Candidate gene; DOC; SNP; fish; humic adaptation; water colour

Mesh:

Substances:

Year:  2022        PMID: 35202502      PMCID: PMC9314028          DOI: 10.1111/mec.16409

Source DB:  PubMed          Journal:  Mol Ecol        ISSN: 0962-1083            Impact factor:   6.622


INTRODUCTION

Most organisms cannot live in extreme environments, but some are able to thrive in such challenging habitats due to evolutionary changes driven by natural selection. This has fuelled a long‐standing interest in how organisms cope with and adapt to extreme conditions (Riesch et al., 2015). Modern bony fishes belong to the most species‐rich clade of vertebrates, consisting of more than 30,000 described species (Nelson, 2006), many of which have colonized the most extreme types of aquatic environments, including caves and deserts, and deep‐sea, high‐altitude, hypoxic, temporary, hypersaline and toxic habitats. In recent years, fish living in extreme habitats have become emerging model species for studying adaptation and the predictability of evolution (Riesch et al., 2014; Tobler & Plath, 2011). For example, genome‐wide studies are starting to reveal the molecular mechanisms of adaptation linked to various abiotic factors, such as salinity (Dalongeville et al., 2018; Garcia‐Elfring et al., 2021), light (Marques et al., 2017), toxins (Pfenninger et al., 2014; Reid et al., 2016), and acidic (Haenel et al., 2019) and alkaline environments (Tong & Li, 2020; Xu et al., 2017). Over the last century we have also witnessed the expansion of hypoxic zones coinciding with eutrophication and extreme heat waves (Altieri & Gedan, 2015; Breitburg et al., 2018), as well as acidification (Hannan & Rummer, 2018; Tagliarolo, 2019) and increase of coloration (e.g., brownification) of lakes and rivers in the northern hemisphere (Creed et al., 2018; Evans et al., 2005; Forsberg, 1992; Kritzberg et al., 2020; Roulet & Moore, 2006; Solomon et al., 2015; Vuorenmaa et al., 2006). However, the evolutionary consequences of these and other rapid human‐induced environmental changes are currently not well understood (Grummer et al., 2019). Organic matter is a complex mixture of organic molecules originating from the decay of plant and animal debris. Most aquatic organic matter occurs in the dissolved form. Humic and fulvic acids account for the majority of dissolved organic matter in many lake ecosystems (McKnight & Aiken, 1998), and their presence in water is observable as a yellowish or brownish coloration (Bricaud et al., 1981). Due to the heterogeneity of dissolved organic matter, it is usually quantified as dissolved organic carbon (DOC; Wood et al., 2011). DOC plays a significant role in freshwater ecosystems by driving the carbon and energy cycle, and also contributes significantly to greenhouse gas emissions (Battin et al., 2009; Sobek et al., 2006; Tranvik et al., 2018). Humic (also known as dystrophic) lakes containing high levels of DOC are extreme, almost “nocturnal” visual environments—both down‐welling short‐wavelength and almost all up‐ or side‐welling light is absorbed, while only long‐wavelength red light is able to partially penetrate the water column (Eloranta, 1978; Jones, 1992). A striking example of the evolutionary adaptation to long‐wavelength visual environments has been described in three‐spined stickleback (Gasterosteus aculeatus), where nonsynonymous mutations observed in the SWS2 opsin gene have caused a red‐shift in light absorption of the visual photo pigment (Marques et al., 2017). Adaptation to extreme visual environments may also include adjustments at the gene expression level, and/or gene duplications (Carleton et al., 2020; Musilova et al., 2019). In addition to the visual environment, DOC affects a myriad of other habitat parameters and biotic interactions. Importantly, a high level of DOC contributes to acidification in low‐alkalinity and weakly buffered waters (Hope et al., 1996; Sobek et al., 2003). Therefore, humic lakes generally have low pH levels (Arvola et al., 2010; Erlandsson et al., 2010; Keskitalo & Eloranta, 1999). Freshwater fish living in acidic water must constantly import ions from their food and environment to compensate for diffusive ion losses, primarily from the gills (Dymowska et al., 2012). Thus, it is probable that low pH represents a strong selective force and physiological challenge for teleosts in humic lakes, especially during early life stages (Parra & Baldisserotto, 2007; Rask, 1984). On the other hand, DOC may also partially protect against acidic water by altering gill membrane permeability and stimulating ion uptake (Wood et al., 2011). Acidification and increased DOC can, in turn, affect ion composition in humic lakes, where the concentration of essential ions, including calcium, is very low (Weyhenmeyer et al., 2019). Calcium plays an important role in many cellular functions, and is a critical component of body structures, such as bones, carapaces, scales and shells (Greenaway, 1985). Moreover, humic lakes usually exhibit steep thermal and oxygen stratification, leading to hypoxia in deeper areas (Bastviken et al., 2004; Kankaala et al., 2006). Hence, increased DOC can also limit whole‐lake primary production and underwater higher vegetation, supporting the microbial loop and influencing resource availability and quality for consumers (Ask et al., 2009; Cole, 2009; Karlsson et al., 2009). Thus, DOC influences whole lake food‐webs from primary producers to top predators and parasites (Grether et al., 2001; Tobler & Path, 2011; Kaeuffer et al., 2012; Noreikiene et al., 2020). Total biomass production, body growthand mean body size of Eurasian perch (Perca fluviatilis) have been shown to differ between humic and nonhumic lakes (van Dorst et al., 2019). Therefore, selective agents in humic lakes are probably multifaceted and may include multiple factors, such as biotic, abiotic, and interactions within and between them. Yet, the evolutionary responses to extreme environments have been traditionally studied by focusing on a single key abiotic factor (e.g. Garcia‐Elfring et al., 2021; Haenel et al., 2019; Marques et al., 2017; Reid et al., 2016; Xu et al., 2017). Alternatively, instead of choosing a few a priori defined traits and genes, a more comprehensive view of the main mechanisms, molecular targets and selective agents could be achieved by characterizing the signatures of divergent selection based on analyses of whole genomes (Jones et al., 2012; Miller et al., 2019). Eurasian perch is among the few fish species of northern latitudes able to live in acidic and humic lakes, and is also widely distributed from the Iberian Peninsula to the River Kolyma in Siberia (Collette & Bănărescu, 1977). Perch are abundant in both humic and clear‐water lakes, and represent an important component in freshwater food‐webs; it acts as a key predator, and is also an important prey species for birds and other fishes (Diehl, 1992). Recent work has shown that perch in humic lakes play a significant role in regulating bacterial abundance via feeding on zooplankton, which affects carbon cycling via methane efflux (Devlin et al., 2015). However, the evolutionary mechanisms and molecular processes that allow perch to inhabit these extreme conditions remain unknown. Thus, dissecting the molecular mechanisms of humic adaptation will not only help to obtain an unbiased view of the molecular targets and main mechanisms of adaptation, but may also shed light on the evolutionary changes related to brownification (Creed et al., 2018), with implications for overall food web dynamics and greenhouse gas (methane) emission rates from lakes through trophic cascades. Given that humic lakes are very common around the world (Wetzel, 2001) and up to one third of known freshwater fish species are associated with tropical peat swamps containing high levels of DOC (Ng et al., 1994; Posa et al., 2011), understanding the nature of humic adaptation has a broad significance across species and geographical regions. In this study, we characterize shared genomic signatures of adaptation to extreme humic environments using whole‐genome sequencing of perch sampled from 16 humic and 16 clear‐water lakes in northern Europe. We predicted that this adaptive process may involve both well‐characterized genes with known function (e.g., visual performance, maintenance of ionic balance), as well as novel targets. By using over 800,000 high‐quality single nucleotide polymorphisms (SNPs) and applying genome‐wide divergence and environmental association approaches, we aimed to ascertain whether adaptation occurs by major shifts of allele frequencies in a few key genes (e.g., visual opsins and ion transporter genes), or if footprints of selection are scattered across the genome and include many regions and genes with diverse functions. In addition, we tested if signatures of selection are randomly distributed across the genome or if they are concentrated within or near coding and regulatory regions. Finally, we evaluated if genes putatively under divergent selection between humic and clear‐water environments are associated with specific biological processes and molecular functions using Gene Ontology (GO) analysis.

MATERIAL AND METHODS

Biological samples

In total, 32 perch individuals were sampled from 16 humic and 16 clear‐water lakes in northern Europe—a single specimen per lake, as in Jones et al. (2012) (Figure 1a; Table S1). The selection of lakes for the analysis was based on drastic differences in water colour (medianHUMIC = 212.5; range = 95.0–765.0 vs. medianCLEAR = 20.0; range = 7.5–60.0, nonparametric Mann–Whitney test p < .001, Figure S1) and geographical proximity of different type of lakes to increase the power of detecting loci under selection and to minimize the effect of population structuring (De Mita et al., 2013; Hoban et al., 2016; Lotterhos & Whitlock, 2015). Most fish were caught using a gill‐net, beach seine or rod (Table S1). Egg ribbons were collected from two Swedish lakes (Nedre Björntjärnen and Snottertjärn) and transported to laboratory facilities (Institute of Freshwater Research, Drottningholm, Sweden). After hatching, fry were killed using a benzocaine overdose and stored in 96% ethanol without measuring fork length or body mass or determining sex. The fish from the remaining lakes were killed with a sharp blow to the head. Thereafter, fork or total length (to the nearest mm) and wet body mass (to the nearest g) were measured, sex was determined by visual examination of the gonads, and a tissue sample (pelvic fin) was placed in 96% ethanol for DNA extraction.
FIGURE 1

(a) Map indicating sampling locations. (b) Dot plot showing the relationships between DOC and water colour values (ln‐transformed) in each lake. Regression is shown as a black solid line; 95% confidence interval is shaded in grey. Photographs illustrate (c) humic and (d) clear‐water habitat. Black‐ and white‐filled points represent humic and clear‐water populations, respectively. The country of origin is depicted by different shapes as shown in the key. Population ID numbers follow those given in Table S1

(a) Map indicating sampling locations. (b) Dot plot showing the relationships between DOC and water colour values (ln‐transformed) in each lake. Regression is shown as a black solid line; 95% confidence interval is shaded in grey. Photographs illustrate (c) humic and (d) clear‐water habitat. Black‐ and white‐filled points represent humic and clear‐water populations, respectively. The country of origin is depicted by different shapes as shown in the key. Population ID numbers follow those given in Table S1 The requirements outlined in Annex III (Requirements for establishments and for the care and accommodation of animals) and Annex IV (Methods of killing animals) Section B point 11 of the “Directive 2010/63/EU of the European parliament and of the council of 22 September 2010 on the protection of animals used for scientific purpose” were fully met. The authors have followed the principles of the 3Rs (replacement, reduction and refinement) and have involved the minimum number of animals to produce statistically reproducible results. The collection of samples was conducted in accordance with national legislation based on permits issued by the Estonian Ministry of Environment (54/2016) and the regional ethical review board in Uppsala, Sweden (Dnr 5.8.18‐03449/2017). Sampling in Finland and Lithuania was carried out using recreational fishing gear (rod and line), following the fishing rules set by the national legislations, and ethical issues related to recreational fishing and handling of caught fish recommended by the recreational fishing federations of the countries.

Environmental data

Lake water was collected at the time of fish sampling for chemical analysis. Two measures reflecting the amount of dissolved organic matter in the water and the visual environment in lakes–DOC (mg L–1) and water colour (mg Pt L–1)–were analysed as described in Sepp et al. (2019). The latter parameter is also influenced by iron concentrations in the water (Lei et al., 2020; Maloney et al., 2005; Weyhenmeyer et al., 2014). Clear‐water lakes have been oligotrophic, alcalitrophic or developed to mesotrophic lakes since melting of the last ice sheet. The humic lakes, on the other hand, formed after the last glaciation when organic matter (carbon) from terrestrial (allochthonous) origin entered lakes from surrounding soils and vegetation (Tranvik, 2009). The increase of DOC in lakes, however, has not been linear over time since these changes are ultimately climate‐driven, reflecting both climate and lake‐catchment coupling processes (e.g., Rantala et al., 2015). Lake surface area and lake shoreline length were manually estimated using satellite photos on Google maps (https://www.google.com/maps/; Table S1) to estimate the size of the lakes. Both DOC and water colour values were log‐normalized prior to environmental association analyses. DOC and water colour were highly correlated (Pearson's r = .96, p < .001, Figure 1b), and both showed significant differences between humic and clear‐water lakes (nonparametric Mann–Whitney test, both p < .001; Figure S1). Neither lake surface area nor shoreline distance differed between humic and clear‐water lakes (Figure S1).

DNA extraction and sequencing

Total genomic DNA (gDNA) was extracted from the tissue samples using a NucleoSpin Tissue kit (Macherey‐Nagel) according to the manufacturer's protocol. The quality of the total gDNA was assessed using a Fragment Analyzer (Advanced Analytical), and the concentration was measured using Qubit fluorometric quantification (Life Technologies). Sequencing libraries (average insert size 350 bp) were constructed from 1 µg of gDNA for each individual according to the Illumina TruSeq DNA PCR‐Free Library Preparation Guide (part #15036187). A unique Illumina TruSeq indexing adapter was ligated to each gDNA sample. The samples were normalized and pooled for automated cluster preparation using an Illumina cBot station. Sixteen libraries (each corresponding to one perch sample from Estonia) were pooled and sequenced in eight lanes on an Illumina HiSeq 3000 using paired‐end sequencing (2 × 150‐bp read length with 8‐bp index), and another 16 libraries (each corresponding to one perch sample from Finland, Lithuania and Sweden) were pooled and sequenced in four lanes on an Illumina NovaSeq 6000 using paired‐end sequencing (2 × 150‐bp read length with 8‐bp index).

SNP calling and filtering

Read quality was assessed using fastqc version 0.11.8 (Andrews, 2017). Illumina adapters, as well as short (<60 bp) and low‐quality reads (average quality score <25) were trimmed using trimmomatic version 0.36 (Bolger et al., 2014; CROP:149 HEADCROP:9 SLIDINGWINDOW:5:25 MINLEN:60) for the 16 samples sequenced on the HiSeq 3000 instrument, or fastp version 0.20 (Chen et al., 2018) for the 16 samples sequenced on the NovaSeq 6000 instrument, with the same parameters (‐g ‐w 12 ‐r ‐W 5 ‐M 25 ‐‐trim_front1 9 ‐‐trim_front2 9 ‐‐trim_tail1 2 ‐‐trim_tail2 2 ‐l 60) due to the excess of polyG tails in the latter (Arora et al., 2019; De‐Kayne et al., 2021). Filtered sequence reads of each individual were mapped to the Eurasian perch reference genome (NCBI: GCA_010015445.1) using bowtie2 version 2.3.5.1 (Langmead et al., 2009), applying default parameters except the modified score minimum threshold (‐‐score‐min L,‐0.3,‐0.3) and maximum fragment length for valid paired‐end alignments (‐X 700). Mean coverage (d) across all 32 perch individuals was 29.9, and varied from 24.0 to 40.6. Raw SNPs were called using two alternative pipelines. First, samtools version 1.10 (Li, 2011) was applied on the locally realigned and sorted BAM files (samtools mpileup ‐uIg ‐t DP,AD,INFO/AD,ADF,ADR,SP ‐q 20) with the following variant calling using bcftools version 1.8 (Li, 2011). Second, the HaplotypeCaller subroutine from gatk version 4.1.4.1 (McKenna et al., 2010) with similar parameters was applied to call variants using the same BAM files (‐ERC GVCF ‐‐minimum‐mapping‐quality 20 ‐mbq 13 ‐‐indel‐size‐to‐eliminate‐in‐ref‐model 12 ‐G AS_StandardAnnotation ‐G StandardAnnotation) with the following import of single‐sample GVCF files into GenomicsDB using GenomicsDBImport, and final calling of consensus genotypes with GenotypeGVCFs. Further quality filtering of raw variants generated by the two pipelines was performed using vcftools version 0.1.15 (Danecek et al., 2011) retaining only the variants that met the following criteria: (i) minimum and maximum mean sequencing depth (d) was set to 10 (as a minimum recommended by Illumina) and 66 (maximum depth = d max + 4√d max; Li, 2014), respectively; (ii) the consensus quality was ≥30; (iii) a variant had at least two copies of an allele; (iv) no missing data were allowed; (v) only bi‐allelic sites were included; and (vi) a variant did not occur in repetitive genomic regions (‐‐max‐meanDP 66 ‐‐min‐meanDP 10 ‐‐max‐missing 1 ‐‐mac 2 ‐‐min‐alleles 2 ‐‐max‐alleles 2 ‐‐minQ 30 ‐‐exclude‐bed Pfluv_repeats.bed). In total, 1,078,577 and 1,074,879 SNPs were retained after applying the samtools‐bcftools and gatk pipelines, respectively. To ensure the quality and reliability of the data set, only the variants consistently called by both pipelines were retained (1,025,544 SNPs). To further exclude false heterozygous calls and regions with an excess of variable sites (such as multilocus contigs, paralogues, repetitive elements and otherwise superficially similar sequences; Hohenlohe et al., 2011; Willis et al., 2017; O'Leary et al., 2018), the additional quality control filtering was applied: number of heterozygotes per SNP ≤ 16 and MAF ≥ 0.05, which resulted in 810,591 SNP loci in the final data set. The identified SNPs were annotated using snpeff version 5.0 (Cingolani et al., 2012). The SnpEff database was generated using the Eurasian perch reference genome sequence and annotation (NCBI: GCA_010015445.1).

Population diversity and structure

Observed heterozygosity per individual was calculated using vcftools version 0.1.15 (Danecek et al., 2011). The difference in observed heterozygosity between humic and clear‐water lakes was tested using a nonparametric Mann–Whitney test (wilcox.test function in the basic R‐package stats). The same package was used to examine linear relationships of observed heterozygosity and DOC, water colour, lake surface area and lake shoreline length with the cor.test function. The R package adegenet version 2.1.3 (Jombart, 2008; Jombart & Ahmed, 2011) was used to convert SNP data into a genind object. Overall population genetic structure was examined by applying principal component analysis (PCA) using the dudi.pca function of the ade4 version 1.7–16 R‐package (Dray & Dufour, 2007) on three sets of SNP loci: (i) all 810,591 SNPs; (ii) 256,880 intergenic SNPs located exclusively in the intergenic regions and assumed as putatively neutral (if a given SNP was linked to multiple annotations, it was not included to the intergenic SNP list; File S1); and (iii) 10,245 candidate SNPs potentially under natural selection (identified based on two or three outlier tests, see details below). The per cent of variation explained by each PC axis was extracted using the factorextra version 1.0.7 R package (Kassambara & Mundt, 2020). Pairwise genetic differentiation among the samples was estimated as pairwise mean absolute allele frequency differences (D; Prevosti et al., 1975). Genetic relationships among the samples were explored using Nei's genetic distances (Nei, 1972) in the R package poppr version 2.8.2 (Kamvar et al., 2014, 2015); neighbour‐joining trees (Saitou & Nei, 1987) were reconstructed with 100 bootstrap replicates among loci using the R‐package ape version 5.4–1 (Paradis & Schliep, 2019).

Signatures of selection

Candidate SNPs potentially under natural selection were identified using three approaches: one population divergence method and two environmental association methods.

Highly divergent loci

The genetic divergence of each SNP locus was estimated as a mean absolute allele frequency difference (δ) between humic and clear‐water lakes. Candidate SNPs were identified as SNPs with a δ that was higher than 2.5 SD (standard deviations) from the mean δ (δ threshold = 0.268; empirical two‐tailed p = .012; Miller, 1991).

Environmental association analysis

Redundancy analysis (RDA)

RDA is an extension of multiple linear regression (Legendre & Legendre, 2012) that compares a matrix of multiple response variables (i.e., allele frequencies) with multiple independent predictor variables (i.e., environmental variables). We analysed allele frequencies and environmental variables of each lake with linear regressions. Further, PCA was applied to constrain the fitted values of the regressions and to produce ordination axes, which are linear combinations of the original predictor variables. RDA was performed to test the effects of DOC and water colour on the estimated allele frequencies. Although both DOC and water colour estimates were highly correlated to each other (Pearson's r = .96, p < .001), this does not influence the ability of RDA to obtain a good fit nor does it influence the predictions and precision of the predictions (Kutner et al., 2005). Furthermore, we did not aim to distinguish the effects of DOC and water colour, but to predict their influence on genetic variation as a complex. The effect of genetic and spatial structure was controlled using PC1 loadings based on intergenic SNPs (Figure S3) and geographical coordinates (latitude and longitude) of the sampled lakes. For the RDA, we used the rda function of the R package vegan version 2.5–7 (Oksanen et al., 2020). Candidate SNPs were identified on each of the first two ordination axes as SNPs that had a “locus score” above ± 2.5 SD from the mean score for that axis (RDA loading threshold at axis 1 = ±0.0202 and at axis 2 = ±0.0196; empirical two‐tailed p = .012; Dalongeville et al., 2018; Forester et al., 2016; Miller, 1991; Xuereb et al., 2018).

Latent factor mixed model (LFMM)

LFMMs are factor regression models in which allele–environment correlations are estimated between each locus and each environmental variable (Frichot & François, 2015; Frichot et al., 2013). Environmental variables are considered as fixed effects, while population structure is modelled using a number of latent factors (K) included in the model as covariates (Frichot et al., 2013). The latent factors comprise the levels of population structure due to genetic variation or shared demographic history (Frichot et al., 2013). The analysis was performed using the lfmm version 1.0 package in R, implementing a least‐squares approach for latent factor estimation (Caye et al., 2019). Here, we used K = 2 to correct for population structure inferred using the PCA of intergenic SNPs (see Results; Figure S3). The significance of the association was tested and p‐values were calibrated using the genomic control method (Devlin & Roeder, 1999). Genomic control uses a robust estimate of the variance of z‐scores called "genomic inflation factor." The p‐value threshold (p ≤ .012) for LFMM was chosen (above 2.5 SD) to correspond to the two other methods (RDA and δ). However, this does not necessarily mean that all three methods have similar levels of false discovery rate in terms of detecting loci under divergent selection.

Candidate SNPs supported by multiple methods

The final set of candidate SNPs potentially under natural selection included 10,245 candidates, which were identified using at least two methods, which is expected to reduce the number of false positive loci in the final set and control for genetic and spatial structuring (Figure 2e). Based on the empirical p‐value threshold of .012 and assuming that detection of candidate loci is independent in each outlier method, we expect 445 false positives (3.4%) among the identified candidate SNPs.
FIGURE 2

PCA summarizing the genetic structure for (a) 810,591 genome‐wide SNPs and for (b) 10,245 candidate SNPs. Neighbour‐joining dendrogram based on D Nei genetic distances, demonstrating the genetic relationships among perch samples based on (c) 810,591 genome‐wide SNPs and (d) 10,245 candidate SNPs. The branches with bootstrap value support <80% are represented as dashed lines. Humic and clear‐water populations are indicated as black‐ and white‐filled symbols, respectively. The country of origin is depicted by different shapes as shown in the key. (e) Venn diagram showing overlap among candidate SNPs revealed by three methods; the final set of candidate SNPs is highlighted in bold

PCA summarizing the genetic structure for (a) 810,591 genome‐wide SNPs and for (b) 10,245 candidate SNPs. Neighbour‐joining dendrogram based on D Nei genetic distances, demonstrating the genetic relationships among perch samples based on (c) 810,591 genome‐wide SNPs and (d) 10,245 candidate SNPs. The branches with bootstrap value support <80% are represented as dashed lines. Humic and clear‐water populations are indicated as black‐ and white‐filled symbols, respectively. The country of origin is depicted by different shapes as shown in the key. (e) Venn diagram showing overlap among candidate SNPs revealed by three methods; the final set of candidate SNPs is highlighted in bold To test for excess and deficiency of identified candidate SNPs in each annotation category, a chi‐squared test was performed using the stats version 3.6.2 package in R to compare the 10,245 candidate SNPs with all identified SNPs. Genes that were located in genomic regions with at least one candidate SNP in exon, intron or regulatory sequences, including 5K up‐ and downstream regions, were considered as candidate genes. In addition, we used the circlize version 0.4.12 (Gu et al., 2014) package in R to visualize the distribution of all and candidate SNPs, as well as the genes, genomic divergence and diversity across the perch genome. The density of SNPs, genes and candidate SNPs across the genome was estimated at each chromosome based on nonoverlapped 500‐kb windows using the genomicDensity function of the circlize R package. In addition, the density of candidate SNPs was estimated in 25‐kb windows using the same package. Genomic regions were identified as regions with a high density of candidate SNPs if the density estimate was higher than 2.5 SD from the mean density. The linear plots representing candidate regions affected by divergent selection were visualized using karioploter version 1.16.0 (Gel & Serra, 2017).

Gene Ontology analysis

For GO analysis, we first identified perch genes that are orthologous to human and zebrafish genes using the rentrez (Winter, 2017) package in R. GO enrichment analysis of candidate genes against all orthologous gene symbols as a background was performed using a binomial test in panther (Thomas et al., 2003). The GO terms with a false discovery rate (FDR) ≤ 0.05 were considered as significant.

RESULTS

Population genomic variation and structure

Overall genome‐wide genetic diversity (estimated as observed heterozygosity, H O) varied from 0.064 (ELOO) to 0.309 (EUIA) with a mean H O = 0.215 (Table S1). Average genetic diversity among perch in humic lakes (mean H O = 0.195, median H O = 0.221) was slightly lower compared to that in clear‐water lakes (mean H O = 0.236, median H O = 0.240), although the difference was not significant (Figure S2a). There was a weak, nonsignificant tendency for genetic diversity to decrease with increasing DOC and water colour, and to increase with lake size (Figure S2b–e). Overall genome‐wide genetic divergence (measured as D; Prevosti et al., 1975) among the samples was D = 0.309, and ranged from D = 0.204 (FIMU vs. FHYV) to D = 0.356 (FITA vs. ELOO; Table S2). The first two PCA axes based on 810,591 SNPs explained nearly 79% of the genomic variation, and depicted a genetic structure corresponding to geographical origin of perch (Figure 2a). Similarly, the topology of the neighbour‐joining tree based on D Nei genetic distances reflected the main clusters identified by the PCA (Figure 2c). In general, perch samples clustered into the two main groups: (i) Finnish, including the samples from Finland, and (ii) South‐Western Baltic, including perch from Sweden, Estonia and Lithuania.

Candidate SNPs under divergent selection

The combination of genetic divergence and two environmental association analyses revealed 10,245 candidate SNPs (Figure 2e). Based on the PCA using these 10,245 candidate SNPs, 6% of the variation on the second axis was explained by humic vs. clear‐water separation, whereas 65% was attributed to the geographical origin (Figure 2b). Similarly, the topology of the neighbour‐joining tree based on D Nei distances using 10,245 candidate SNPs showed the separation of humic and clear‐water perch within the Finnish and South‐Western Baltic groups (Figure 2d). The majority of candidate SNPs (8,271 SNPs; 80.7%) were polymorphic in both Finnish and South‐Western Baltic groups, whereas a lower proportion of SNPs were monomorphic within Finnish (1920 SNPs; 18.7%) and South‐Western Baltic fish (54 SNPs; 0.5%). Significant enrichment (chi‐squared test: χ 2 = 11.55–85.50, p = 2.3 × 10−20–4.8 × 10−4) and depletion (chi‐squared test: χ 2 = 3.98–37.55, p = 4.6 × 10−2–8.9 × 10−10) for candidate SNPs was observed in seven and 10 chromosomes, respectively (Figure 3; Table S3). Furthermore, the distribution of candidate SNPs across the chromosomes was not even; several peaks of high density were identified (Figure 3; File S2). The number of genomic regions with a high density of candidate SNPs varied from one to six (mean = 2.4) per chromosome with 500‐kb windows or from 17 to 49 (mean = 31.4) per chromosome when 25‐kb windows were used for density estimation. The total number of regions with a high density of candidate SNPs was 50 or 658 when using 500 or 25‐kb windows, respectively (Table S4). Most of the candidate SNPs were located in intergenic regions (32.3%) and introns (33.8%), whereas ~4% were found in exons (Table 1). However, the enrichment analysis indicated that the frequency of candidate SNPs in intergenic regions was lower compared to the rest of the genome (chi‐squared test: χ 2 = 9.76, p = 1.9 × 10−3; Table 1). On the other hand, the 5′UTR (untranslated region), 3′UTR and 5K downstream gene regions were enriched for candidate SNPs (chi‐squared test: χ 2 = 8.88–5.42, p = 2.0 × 10−2–2.8 × 10−3; Table 1).
FIGURE 3

Circos and insert plots showing the distribution of SNPs, genes, genomic divergence and diversity in humic and clear‐water perch (from 32 lakes) across 24 chromosomes. Chromosomes with a significant enrichment or depletion of candidate SNPs are highlighted in red and light blue, respectively. The circles from outer to inner and graphs on the insert plot show (a) all SNP density (green), (b) gene density (orange), (c) candidate SNP density (magenta), (d) genetic divergence (δ) and (e) difference of genetic diversity (H O) between clear‐water and humic perch. Genomic densities were calculated using window sizes of 0.5 Mb. Candidate SNPs, SNPs related to calmodulin binding genes and other SNPs on the δ plots are shown as red, blue and grey dots, respectively. The dashed line indicates δ threshold = 0.268, which corresponds to 2.5 SD of mean δ. An excess or deficiency of H O in humic perch is shown by brown and cyan lines, respectively. The observed heterozygosity difference between humic and clear‐water perch was estimated as the difference between moving averages of H O across 500 SNPs

TABLE 1

Enrichment and depletion of candidate SNPs in each annotation category

VariantAll SNPs a Candidate SNPs a χ 2 p All SNPs (%)Candidate SNPs (%)
5K upstream132,08716750.056.81312.2212.14
5'UTR80401275.415.020 0.740.92
5'UTR premature start codon gain1386130.979.3220.130.09
Synonymous25,0463050.607.4362.322.21
Splice region & synonymous58080.001.9740.050.06
Nonsynonymous18,8012370.024.8761.741.72
Nonsynonymous & splice region44530.826.3630.040.02
Intron358,88546681.207.27233.2033.83
Splice region & intron3564420.192.6610.330.30
Splice region15420.0001.0000.010.01
3'UTR33,5524928.883.003 3.103.57
5K downstream130,02917756.937.008 12.0312.86
Intergenic368,10544509.759.002 34.0532.25
Stop gained14510.064.8010.010.01

The sum of SNPs per annotation category does not correspond to the total number of SNPs due to multiple annotations of some SNPs located in nearby (<5 K) or overlapped genes (see details in File S1).

Circos and insert plots showing the distribution of SNPs, genes, genomic divergence and diversity in humic and clear‐water perch (from 32 lakes) across 24 chromosomes. Chromosomes with a significant enrichment or depletion of candidate SNPs are highlighted in red and light blue, respectively. The circles from outer to inner and graphs on the insert plot show (a) all SNP density (green), (b) gene density (orange), (c) candidate SNP density (magenta), (d) genetic divergence (δ) and (e) difference of genetic diversity (H O) between clear‐water and humic perch. Genomic densities were calculated using window sizes of 0.5 Mb. Candidate SNPs, SNPs related to calmodulin binding genes and other SNPs on the δ plots are shown as red, blue and grey dots, respectively. The dashed line indicates δ threshold = 0.268, which corresponds to 2.5 SD of mean δ. An excess or deficiency of H O in humic perch is shown by brown and cyan lines, respectively. The observed heterozygosity difference between humic and clear‐water perch was estimated as the difference between moving averages of H O across 500 SNPs Enrichment and depletion of candidate SNPs in each annotation category The sum of SNPs per annotation category does not correspond to the total number of SNPs due to multiple annotations of some SNPs located in nearby (<5 K) or overlapped genes (see details in File S1).

Candidate genes under divergent selection

In total, 3,245 candidate genes were identified (Table S5). Since the strongest selective sweeps are expected to affect multiple adjacent SNPs, further ranking of the candidate gene list (average δ > 0.4, >5 SNPs per gene) revealed 31 of the strongest candidates containing six to 38 outlier SNPs (Table S5). A large proportion of genes among these are involved in anatomical structure development (ASXL1, CDON, ECE1, FGF11, FLII, HOXB9, LRRN1, MYLIP, PLAGL2, RTN1, TDRD9, TTLL3, TTN) and regulation of nervous system development (MYLIP, BHLHE40, CDON, FGF11, LRRN1, RTN1). The highest number of SNPs was observed in the MYLIP gene located on chromosome 13 (Figure 4c). This gene is involved in various biological processes, including multicellular organism development, regulation of plasma membrane‐bound cell projection organization and nervous system development (Bornhauser et al., 2003; Calkin et al., 2011; Olsson et al., 1999). The functions of other strong candidate genes included regulation of voltage‐gated sodium channel activity (FGF11, SLMAP), inorganic ion transmembrane transport (KCNMA1, TMEM163; Li et al., 2018) and regulation of circadian rhythm (BHLHE40; Honma et al., 2002 ).
FIGURE 4

Examples of (a–d) genomic regions of high genetic divergence (δ) between humic and clear‐water perch, and (e, f) opsins with nonsynonymous mutations. Candidate SNPs, nonsynonymous candidate SNPs and other SNPs are shown by red, black and grey dots, respectively. Moving averages of δ, H O of humic perch and H O of clear‐water perch across 50 SNPs are shown with black, brown and cyan solid lines, respectively. Gene symbols are presented as human (and zebrafish for opsins) orthologues and perch GenBank gene IDs (PFLUV_G) for genes with unidentified functions. Candidate genes in the regions of elevated genetic divergence and opsin genes with nonsynonymous substitutions are highlighted in black, while other opsin genes at chromosome 4 are highlighted in italic. The dashed line indicates δ threshold = 0.268, which corresponds to 2.5 SD of mean δ

Examples of (a–d) genomic regions of high genetic divergence (δ) between humic and clear‐water perch, and (e, f) opsins with nonsynonymous mutations. Candidate SNPs, nonsynonymous candidate SNPs and other SNPs are shown by red, black and grey dots, respectively. Moving averages of δ, H O of humic perch and H O of clear‐water perch across 50 SNPs are shown with black, brown and cyan solid lines, respectively. Gene symbols are presented as human (and zebrafish for opsins) orthologues and perch GenBank gene IDs (PFLUV_G) for genes with unidentified functions. Candidate genes in the regions of elevated genetic divergence and opsin genes with nonsynonymous substitutions are highlighted in black, while other opsin genes at chromosome 4 are highlighted in italic. The dashed line indicates δ threshold = 0.268, which corresponds to 2.5 SD of mean δ We observed several 50–100‐kb regions of elevated δ between humic and clear‐water perch in multiple chromosomes, involving genes adjacent to the strongest candidates (Figure 4a,b,d). For example, the ~100‐kb region on chromosome 5 included genes regulating anatomical structure development (ASXL1, MYT1L, PLAGL2), oxidative stress responding regulation (PLAGL2), response to oxygen‐containing compound (ASXL1) and mitochondrial fusion (MFN2); the ~‐kb region on chromosome 7 contained genes regulating response to stress (RPA2, THEMIS2), lipid (CYP4B1) and glycogen (PPP1R8) metabolism; and the ~100‐kb region on chromosome 15 involved genes regulating ion transport (LLGL1, SHISA9, DESI1) and cell differentiation (FLII, LLGL1). As expected, most of the regions showing elevated δ also showed a reduction in H O among humic perch, suggesting that directional selection associated with high DOC content has reduced the genetic diversity of adjacent genomic regions. In contrast to our previous expectations, SNPs found in visual opsin genes (orthologous to human OPN1LW, OPN1MW, OPN1SW and RHO) did not show high allele frequency differences between humic and clear‐water populations (Figure S4), suggesting that the strongest selective sweeps associated with humic adaptation in perch do not involve visual opsins. However, we found a nonsynonymous candidate SNP (CM020912_7210778; T669G; Ile223Met) in one of the four red‐sensitive opsin‐like gene orthologues (OPN1LW) showing moderate allele frequency difference between humic and clear‐water perch (AF HUMIC =0.53 vs. AF CLEAR =0.25, δ = 0.28; Figure 4e). Another nonsynonymous mutation (CM020919_22496453; A1120G; Asn374Asp) exhibiting high allele frequency difference (AF HUMIC =0.19 vs. AF CLEAR =0.59) was observed in a nonvisual opsin 7: group member a gene (OPN5/opn7a, δ = 0.41; Figure 4f). The other nonsynonymous mutations in visual or nonvisual opsin genes did not show high allele frequency differences or significant associations with environmental parameters (File S1).

Gene ontology analysis

Overall assessment of the gene functions using GO analyses showed that candidate genes were enriched for 134 GO biological process terms, 98 GO biological component terms and 10 GO molecular function terms (FDR ≤ 0.05; Figure 5; Table S6). The top 10 most significant GO terms included regulation of organism development and nervous system development (biological process); actin and calmodulin binding (molecular function); and cell junction, including nervous tissues (cellular component). A group of genes among those involved in calmodulin binding regulate calcium (calcium‐transporting ATP‐ases) and potassium (KCN genes) transport, as well as sodium–calcium exchange (SLC8 gene family, Table S6), and play an important role in osmoregulation (Hwang et al., 2011; Pallone et al., 2012; Pizzagalli et al., 2021).
FIGURE 5

Top 10 significantly enriched gene ontology (GO) terms (FDR ≤ 0.05) among the 3245 candidate genes for humic adaptation in perch. Bar length and numbers on the right represent the fold enrichment and number of enriched genes for each GO term, respectively

Top 10 significantly enriched gene ontology (GO) terms (FDR ≤ 0.05) among the 3245 candidate genes for humic adaptation in perch. Bar length and numbers on the right represent the fold enrichment and number of enriched genes for each GO term, respectively

DISCUSSION

Based on analysis of 32 whole genomes, we discovered that footprints of selection associated with humic environments comprise hundreds of regions scattered across the Eurasian perch genome. Putative signals of adaptation were detected in genes and gene families with diverse functions. Most frequently, the candidate genes were involved in the regulation of organism development, nervous system development and calcium/potassium/sodium exchange, suggesting their possible role during early development and in maintenance of ion balance. In contrast, we did not observe strong evidence of selection on visual opsins, despite the extreme differences in visual environment between the studied lakes. The observed overrepresentation of candidate SNPs in regulatory regions (5′UTR, 3′UTR and 5K downstream gene regions) suggests that regulatory changes may play a more important role in humic adaptation compared to changes in amino acids.

Role of regulatory regions in humic adaptation

Most candidate SNPs were detected in intergenic and intronic regions, which comprise a large part of the perch genome (Ozerov et al., 2018). At the same time, we observed a significant excess of candidate SNPs in regulatory regions (5′UTR, 3′UTR and 5K downstream gene regions), indicating their important role in humic adaptation of perch. On the other hand, the proportion of nonsynonymous mutations among candidates did not show an increase compared to the whole data set. Thus, our findings are in line with a growing body of evidence suggesting that natural selection is predominantly acting at regulatory regions (e.g., Fagny & Austerlitz, 2021; Fraser, 2013; Glaser‐Schmitt & Parsch, 2018; Verta & Jones, 2019). For example, Fraser (2013) showed that local adaptations found in a subset of human populations are over 10‐fold more likely to affect gene expression than to alter protein sequences. The excess of candidate SNPs in 5′ and 3′ UTRs observed in perch is in agreement with this, as UTRs are among the main elements involved in the regulation of gene expression (Barrett et al., 2012). 5′UTRs, located upstream of the protein coding sequence, play an important role in translation initiation, and expression of alternative 5′UTRs allows variation in expression from a single gene and tissue‐specific expression patterns (Barrett et al., 2012). 3′UTRs, located downstream of the protein coding region, impact post‐transcriptional and translational processes, including mRNA localization (Andreassi & Riccio, 2009), stability (Goldstrohm & Wickens, 2008) and expression levels (Matoulkova et al., 2012). In general, 3′UTRs are more polymorphic and variable in length than 5′UTRs, resulting in a greater evolutionary potential of these regulatory elements (Barrett et al., 2012; Steri et al., 2018). Indeed, the observed number of SNPs in 3′UTRs in the perch genome was nearly four times higher than that in 5′UTRs (Table 1). The important role of 3′UTRs in teleost evolution has been recently highlighted in cichlid fishes, suggesting that these regions might act as metaregulators (i.e., regulators of other mechanisms governing post‐transcriptional regulation), particularly in species undergoing rapid adaptation and speciation (Xiong et al., 2018). Therefore, our results suggest that the changes in regulatory regions may play a more important role in shaping humic adaptation in perch, compared to mutations in protein sequences.

Candidate genes and gene families involved in humic adaptation

We detected multiple signals of selection consisting of over 3,000 genes scattered across the perch genome. These candidate genes are involved in multiple biological processes and cellular functions, most significant of which are multicellular organism processes and system development, actin and calmodulin binding, and ion transfer. Several genes that showed elevated genetic divergence between humic and clear‐water perch were also involved in development processes. For example, the signal of divergent selection involving more than 30 SNPs was centred around the MYLIP gene, essential for embryonic development (Knowlton et al., 2003). Moreover, the role of MYLIP in early embryonic development of zebrafish involves calcium‐dependent mechanisms during gastrulation (Knowlton et al., 2003). Therefore, it is possible that the observed adaptive variation in perch may be linked to Ca2+ deficiency compensation during embryonic development in humic lakes. However, drawing firm conclusions about the role of MYLIP in perch embryogenesis and humic adaptation requires further investigation (e.g., Kurko et al., 2020). Similar to MYLIP, many other genes involved in embryonic development were found in the regions of elevated genetic divergence. For example, a region on chromosome 5 includes genes shown to regulate brain and eyes (PLAGL2; Pendeville et al., 2006), hypothalamus (MYT1L; Blanchet et al., 2017), axonal and neuromuscular (MFN2; Vettori et al., 2011), and neutrophil development (ASXL1; Fang et al., 2021) in zebrafish. The LLGL1 gene located on chromosome 15 was found to regulate zebrafish cardiac development (Flinn et al., 2020). Other regions located on chromosomes 7 and 15 included genes involved in the immune response, such as the T‐cell receptor signalling pathway (THEMIS2; Cheng et al., 2017; Peirce et al., 2010), inflammation and wound repair (FLII; Strudwick & Cowin, 2020), as well as genes involved in DNA replication and reparation (RPA2, DESI1; Mer et al., 2000; Shin et al., 2012). However, more elaborate molecular and ecological studies are needed to shed light on specific phenotypic consequences and fitness effects of these and other candidate genes. Among the candidate genes, we identified a large group of genes encoding transmembrane transportation of ions, such as solute carriers (SLCs; Hediger et al., 2004), calcium channel subunits (CACNs; Catterall et al., 2005), potassium channel subunits (KCNs; Shieh et al., 2000) and sodium channel subunits (SCNs; Yu & Catterall, 2003). There were also genes involved in energy‐dependent transportation, such as ATP‐binding cassette transporters (ABCs; Jones & George, 2004). Given the low pH and low level of calcium ions in humic lakes, we expected a major role of Na+/H+, K+ and Ca2+ exchangers during osmoregulation and maintenance of internal homeostasis in humic perch (Hwang et al., 2011). Accordingly, among the candidate genes were several SLC family 9 genes (SLC9A1, SLC9A3R1, SLC9A5, SLC9A6 and SLC9B2) and one SLC family 4 gene (SLC4A5), which play an important role in the regulation of cellular pH via the transport of bicarbonate ions and protons (Pizzagalli et al., 2021). Moreover, selection signals were also observed in SLC family 8 (SLC8A1 and SLC8A2) and family 24 (SLC24A2), and in calcium channel subunits alpha (CACNA1A, CACNA1B, CACNA1C, CACNA1D, CACNA1E, CACNA1G, CACNA1I, CACNA2D1 and CACNA2D4), which play a significant role in the regulation of intracellular Ca2+ concentrations and Ca2+ influx (Pallone et al., 2012; Pizzagalli et al., 2021). Similarly, selection signals were found in several genes linked to pH maintenance via ion and acid–base equivalent exchanges, such as SCNs (SCN1B, SCN3B) and KCNs (KCNA7, KCNAB1, KCNC3, KCNC4, KCNE1, KCNF1, KCNH2, KCNIP4, KCNJ12, KCNK13, KCNMA1, KCNN2, KCNQ1, KCNQ2, KCNQ3, KCNQ5). Therefore, our results suggest that adaptation to high DOC concentrations probably includes a large group of genes involved in ion transport and balance. The above‐mentioned gene families have also been associated with adaptation to acidic (Haenel et al., 2019) and alkaline (Xu et al., 2017) environments in three‐spined stickleback (G. aculeatus) and Amur ide (Leuciscus waleckii), respectively. Together, our results corroborate findings that variation in water chemistry, including DOC content, ion concentration and pH (Parra & Baldisserotto, 2007; Rask, 1984; Wood et al., 2011), is a strong selective force shaping molecular mechanisms of adaptation in teleosts.

Opsins

Although we found several nonsynonymous substitutions in green‐ and red‐sensitive opsins and in rhodopsin, the allele frequency differences between humic and clear‐water habitats were small and could be explained by random drift alone. One potential exception was the single nonsynonymous SNP in one of the four red‐sensitive opsin‐like gene orthologues (OPN1LW) that showed a moderate allele frequency difference (δ = 0.28) between humic and clear‐water perch. However, as this putative selective sweep region consisted of several genes (e.g., TFE3, CXXC1) and multiple SNPs with a high allele frequency difference, it is not trivial to pinpoint the specific variant under divergent selection. Furthermore, a high allele frequency difference (δ = 0.45) at OPN1LW between humic and clear‐water perch was observed only among South‐Western Baltic samples, whereas all 10 studied Finnish lakes were homozygous for one allele irrespective of visual environment. Further comparison of P. fluviatilis OPN1LW protein sequence with other teleosts revealed that Met and Ile at position 223 probably correspond to the ancestral and the derived allele, respectively, and the frequency of the derived allele was higher in clear‐water lakes. Therefore, we cannot exclude the possibility that the allele frequency difference at nonsynonymous SNPs in OPN1LW is shaped by divergent selection occurring in clear‐water, rather than in the humic environment. We also detected a nonsynonymous substitution in the nonvisual opsin 7a (OPN5/opn7a) that had a large difference in allele frequency between humic and clear‐water perch (δ = 0.41; 99.94 percentile of all nonsynonymous SNPs). The opsin 7 family shows responsiveness at wavelengths <380 nm, which might be expected for a tissue that is exposed to direct sunlight. These genes are expressed in various tissues including the brain, digestive system, eye, heart and testis (Davies et al., 2015; Liu et al., 2020). However, as the SNP diversity adjacent to OPN5/opn7a was rather similar in both habitats, the role of this nonsynonymous variation in the context of adaptation to contrasting visual environments needs further investigation. To conclude, our results suggest that amino acid changes in perch visual opsins most likely do not play a main role in adaptation to extreme visual environments. As such, these findings contrast recent reports in three‐spine stickleback (Marques et al., 2017) and Baltic herring (Hill et al., 2019), where a single amino acid change was shown to have a major effect on ecological adaptation. Nevertheless, recent work based on RNA‐sequencing analysis of the whole eye in Eurasian perch has shown differential expression of visual red‐, green‐ and short‐wavelength‐sensitive opsins (OPN1LW/opn1lw1, OPN1MW/opn1mw1 and OPN1SW/opn1sw2, respectively) between humic and clear‐water environments (Noreikiene et al., 2020), indicating that regulation of visual opsin expression still occurs in humic lakes and may represent an important mechanism of adaptation.

Potential implications of genetic and spatial structuring on outlier detection

Genetic and spatial structuring has important consequences on the performance of outlier tests, since the true signals of selection may be harder to detect among highly differentiated populations (Hoban et al., 2016) and the effects of drift may overcome the effects of selection if selection is weak and/or the population is small (Hedrick, 2005). We aimed to mitigate the influence of random genetic drift and population stratification in our work by the following measures. First, the paired sampling strategy applied in our study (i.e., focusing on geographically proximate lakes with drastic environmental differences in humic content) allowed us to minimize the potential effects of spatial genetic structuring and maximize the number of independent population replicates. Second, the applied strategy potentially increases the power to detect common loci under weak to moderate selection compared to random sampling designs (Hoban et al., 2016; Lotterhos & Whitlock, 2015). On the other hand, analysis of a limited number of populations combined with an unequal number of samples per geographical region precluded us from carrying out detailed investigations of the population‐ or region‐specific signatures of adaptation. For example, splitting the samples into two genetically distinct groups corresponding to the South‐Western Baltic and the Finnish lakes likely reduces power and increases the false discovery rate due to a decrease of sample size, which, in turn, makes the analyses more sensitive to high between‐population variance (De Mita et al., 2013). Instead, we concentrated on identification of common signatures of selection from standing genetic variation after perch colonized the studied area ~12,000–11,000 years ago (Nesbø et al., 1999). Furthermore, candidate SNPs potentially influenced by humic adaptation were called only when multiple methods supported the outlier identification, and two approaches applied in the current study accounted for global genetic (LFMM and RDA) and spatial structuring (RDA). Integration of population genetic structure, such as random factors, latent factors or covariates in the models, can improve the accuracy of both differentiation and genetic–environment association outlier tests (Frichot et al., 2013; Lotterhos & Whitlock, 2015). The lack of substantial influence of population stratification on our set of candiate SNPs was further supported by the high number of common outlier SNPs detected using both RDA (which takes genetic and geographical structuring into account) and simple allele frequency difference approaches, showing 7111 overlapping SNPs (Figure 2a). Therefore, it is unlikely that the set of candidate SNPs is driven strongly by large‐scale genetic or spatial structuring. However, because most of the studied lakes originated from a geographically small region, the relative weight of these samples (i.e., patterns observed among Estonian samples) were probably also detected as global signals, while population‐specific or less common signals of selection remained undetected using our sampling design.

Conclusions

This study revealed that humic adaptation in perch comprises a large number of regions and genes scattered across the genome. The excess of putatively adaptive variants that are found in 5′UTR, 3′UTR and 5K downstream gene regions highlights the importance of adaptive evolution via regulatory elements, rather than via amino acid changes in proteins. Putative adaptation signals were detected in genes and gene families with diverse functions, including genes involved in organism development, plasma membrane and ion transportation, underlying the multifaceted nature of humic‐driven selection. Our work illustrates the power of whole genome analysis to identify the most promising candidates involved in adaptation to complex environmental conditions and provides the foundation to further investigate functions of candidate genes underlying phenotypic traits of ecological and evolutionary importance.

AUTHOR CONTRIBUTIONS

M.O. analysed the data and wrote the first draft of the manuscript together with A.V. M.L. analysed the data. T.K. and M.S. performed laboratory analyses of water samples. S.K., A.H., M.H., A.G., K.N. and A.V. were involved in fieldwork. A.V. and R.G. conceived the study. All the authors contributed to revisions of the draft manuscript. All the authors read and approved the final manuscript.

CONFLICT OF INTEREST

The authors declare no competing interests. Fig S1‐S4 Click here for additional data file. Table S1‐S6 Click here for additional data file. File S1 Click here for additional data file. File S2 Click here for additional data file.
  107 in total

1.  Colonisation of toxic environments drives predictable life-history evolution in livebearing fishes (Poeciliidae).

Authors:  Rüdiger Riesch; Martin Plath; Ingo Schlupp; Michael Tobler; R Brian Langerhans
Journal:  Ecol Lett       Date:  2013-11-04       Impact factor: 9.492

2.  Haplotyping RAD loci: an efficient method to filter paralogs and account for physical linkage.

Authors:  Stuart C Willis; Christopher M Hollenbeck; Jonathan B Puritz; John R Gold; David S Portnoy
Journal:  Mol Ecol Resour       Date:  2017-02-09       Impact factor: 7.090

3.  Assessment of dissolved organic carbon and iron effects on water color between a forest and pasture-dominated fine-scale catchment in a Central Appalachian region, West Virginia.

Authors:  Lili Lei; James A Thompson; Louis M McDonald
Journal:  Environ Sci Pollut Res Int       Date:  2020-05-22       Impact factor: 4.223

4.  Putatively adaptive genetic variation in the giant California sea cucumber (Parastichopus californicus) as revealed by environmental association analysis of restriction-site associated DNA sequencing data.

Authors:  Amanda Xuereb; Christopher M Kimber; Janelle M R Curtis; Louis Bernatchez; Marie-Josée Fortin
Journal:  Mol Ecol       Date:  2018-12-10       Impact factor: 6.185

5.  FERM-dependent E3 ligase recognition is a conserved mechanism for targeted degradation of lipoprotein receptors.

Authors:  Anna C Calkin; Benjamin T Goult; Li Zhang; Louise Fairall; Cynthia Hong; John W R Schwabe; Peter Tontonoz
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-22       Impact factor: 11.205

6.  PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

Authors:  Paul D Thomas; Anish Kejariwal; Michael J Campbell; Huaiyu Mi; Karen Diemer; Nan Guo; Istvan Ladunga; Betty Ulitsky-Lazareva; Anushya Muruganujan; Steven Rabkin; Jody A Vandergriff; Olivier Doremieux
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

Review 7.  The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteinsIntroduction.

Authors:  Matthias A Hediger; Michael F Romero; Ji-Bin Peng; Andreas Rolfs; Hitomi Takanaga; Elspeth A Bruford
Journal:  Pflugers Arch       Date:  2003-11-18       Impact factor: 3.657

Review 8.  Overview of the voltage-gated sodium channel family.

Authors:  Frank H Yu; William A Catterall
Journal:  Genome Biol       Date:  2003-02-24       Impact factor: 13.583

9.  Functional characterization of adaptive variation within a cis-regulatory element influencing Drosophila melanogaster growth.

Authors:  Amanda Glaser-Schmitt; John Parsch
Journal:  PLoS Biol       Date:  2018-01-11       Impact factor: 8.029

10.  Combining six genome scan methods to detect candidate genes to salinity in the Mediterranean striped red mullet (Mullus surmuletus).

Authors:  Alicia Dalongeville; Laura Benestan; David Mouillot; Stephane Lobreaux; Stéphanie Manel
Journal:  BMC Genomics       Date:  2018-03-27       Impact factor: 3.969

View more
  1 in total

1.  Whole-genome sequencing illuminates multifaceted targets of selection to humic substances in Eurasian perch.

Authors:  Mikhail Ozerov; Kristina Noreikiene; Siim Kahar; Magnus Huss; Ari Huusko; Toomas Kõiv; Margot Sepp; María-Eugenia López; Anna Gårdmark; Riho Gross; Anti Vasemägi
Journal:  Mol Ecol       Date:  2022-03-06       Impact factor: 6.622

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.