Literature DB >> 25383542

Continental-scale footprint of balancing and positive selection in a small rodent (Microtus arvalis).

Martin C Fischer1, Matthieu Foll2, Gerald Heckel3, Laurent Excoffier3.   

Abstract

Genetic adaptation to different environmental conditions is expected to lead to large differences between populations at selected loci, thus providing a signature of positive selection. Whereas balancing selection can maintain polymorphisms over long evolutionary periods and even geographic scale, thus leads to low levels of divergence between populations at selected loci. However, little is known about the relative importance of these two selective forces in shaping genomic diversity, partly due to difficulties in recognizing balancing selection in species showing low levels of differentiation. Here we address this problem by studying genomic diversity in the European common vole (Microtus arvalis) presenting high levels of differentiation between populations (average F ST = 0.31). We studied 3,839 Amplified Fragment Length Polymorphism (AFLP) markers genotyped in 444 individuals from 21 populations distributed across the European continent and hence over different environmental conditions. Our statistical approach to detect markers under selection is based on a Bayesian method specifically developed for AFLP markers, which treats AFLPs as a nearly codominant marker system, and therefore has increased power to detect selection. The high number of screened populations allowed us to detect the signature of balancing selection across a large geographic area. We detected 33 markers potentially under balancing selection, hence strong evidence of stabilizing selection in 21 populations across Europe. However, our analyses identified four-times more markers (138) being under positive selection, and geographical patterns suggest that some of these markers are probably associated with alpine regions, which seem to have environmental conditions that favour adaptation. We conclude that despite favourable conditions in this study for the detection of balancing selection, this evolutionary force seems to play a relatively minor role in shaping the genomic diversity of the common vole, which is more influenced by positive selection and neutral processes like drift and demographic history.

Entities:  

Mesh:

Year:  2014        PMID: 25383542      PMCID: PMC4226552          DOI: 10.1371/journal.pone.0112332

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Despite nearly six decades of genetic investigations, it remains unclear for most organisms to which extent the demographic history of populations, genetic drift or selection influences the pattern of genetic diversity of a species. Historically, the observation that many genes are genetically polymorphic within population was first explained by a selective advantage of heterozygotes [1]. This explanation was challenged by Kimura's neutral [2], [3] and nearly neutral [4] theory of molecular evolution, which provided a competing explanation for the high frequency of genetic polymorphism. Nowadays it is generally accepted that a majority of the genetic variations evolved nearly neutrally, but that natural selection plays a decisive role in evolution and leaves footprints in the genome. Natural selection acts in at least three forms, which are positive, purifying and balancing selection. Positive selection can lower genetic diversity locally but increase it globally, to a level depending on the spatial and environmental heterogeneity [5]–[7]. Balancing selection maintains genetic variation within populations [8] and leads to generally low levels of differentiation between populations, even though it can contribute to increase population differentiation if selective pressures are spatially heterogeneous [5]. Finally, purifying selection generally decrease levels of genetic diversity, even though strong background selection can promote increased difference between populations by lowering their effective size [9]. In the past, balancing selection played an important role in evolutionary genetics in explaining the high level of genomic polymorphism observed among species or populations [8], [10]. However, the effect of selection can be multifarious and the impact of each is still under debate [11], especially for balancing selection. At least in humans a number of common genetic diseases have been proposed to be maintained in populations as a result of balancing selection, e.g. sickle-cell anaemia [12], [13], glucose-6-phosphate dehydrogenase deficiency [14], thalassemia [15] and cystic fibrosis [16]. Other examples are the ABO blood group [17], polymorphisms of beta-globin [18], the major histocompatibility complex (MHC; [19]) including the human HLA-G promoter [20], CCR5 in humans [21], the complementary sex determination locus in bees [22], response to pathogens [23], high diversity genes in Arabidopsis [24] or self-incompatibility and nuclear-cytoplasmic gynodioecy in plants (see e.g. [25]). However, all these examples were identified by a candidate gene approach and not genome-wide scans. Hence they do not give any information about the importance of balancing selection in shaping genomic diversity. In this context, there are only few genome-wide studies of balancing selection in humans [26], [27] or sticklebacks [28] and these studies remain inconclusive about the importance of balancing selection in shaping and maintaining genetic diversity, potentially due to methodological limitations (see below). Compared to balancing selection, the occurrence and influence of positive selection on an organism's genetic variation is much less questioned, as positive selection should allow the spread of advantageous traits and play a central role in the evolution of species (see e.g. [29], [30]). The prevalence of balancing selection is still highly debated, mainly due to missing evidence in organisms other than humans, but also because methods developed specifically to detect balancing selection are still few (see e.g. [27], [31]–[34]). Moreover, the classical detection of balancing selection based on levels of differentiation between populations is difficult in organisms with low levels of differentiation (see [35]) like humans or Drosophila [36], [37] and a decent number of populations need to be investigated to have the statistical power to detect balancing selection [35]. In order to better detect signals of balancing selection, we focused in this study on an organism showing particularly high levels of differentiation, which is the common vole (Microtus arvalis). This species has a very wide distribution in Europe, and it is found in most open grassland and farmland habitats up to 2,000 m altitude [38]–[40]. It ranges from the Atlantic coast of France to Central Russia, as well as from the Orkney archipelago in the North to the Mediterranean coast in Spain (Figure 1). In previous studies it has been shown that the vole populations have an overall high levels of differentiation for both mtDNA (F ST = 0.7) and nuclear markers (STR, F ST = 0.17) [41]–[43]. The widespread distribution of this species in different habitats and environments, and its peculiar pattern of genetic diversity makes it particularly suitable for the detection of markers with high or low levels of differentiation, and by extension for the determination of the respective roles of positive or balancing selection over a large geographic scale.
Figure 1

Geographic location of the 21 Microtus arvalis populations analyzed.

The grey area corresponds to the European distribution of the species (after [98]). See Table 1 for sample abbreviations.

Geographic location of the 21 Microtus arvalis populations analyzed.

The grey area corresponds to the European distribution of the species (after [98]). See Table 1 for sample abbreviations.
Table 1

Location and properties of 444 M. arvalis samples from 21 populations across Europe.

IDCountryLocationNLatitudeLongitude F ST (CI) FIS (CI)Variable AFLPs %AFLP diversity
BStBelgiumStalhille2151°13′N3°05′E0.304 (0.292–0.318)0.0010 (0.0000–0.0029)30.20.091
BVeBelgiumVeurne2151°03′N2°40′E0.345 (0.331–0.359)0.0008 (0.0000–0.0024)25.30.078
CZDCzech RepublicDrnholec2148°53′N16°27′E0.162 (0.151–0.172)0.0431 (0.0288–0.0576)46.30.132
FAvFranceAvallon2047°29′N3°54′E0.248 (0.236–0.261)0.0008 (0.0000–0.0023)37.30.115
FCmFranceClérmont-Ferrand2145°47′N3°05′E0.262 (0.250–0.275)0.0006 (0.0000–0.0018)38.10.116
FFrFranceFressenneville2250°05′N1°34′E0.302 (0.289–0.315)0.0020 (0.0001–0.0058)30.80.090
FThFranceThaon2149°16′N0°26′W0.294 (0.281–0.306)0.0058 (0.0004–0.0137)35.40.108
DGoGermanyGotha2050°57′N10°42′E0.224 (0.212–0.235)0.0238 (0.0118–0.0382)37.60.113
DScGermanySchiltach2048°17′N8°21′E0.237 (0.225–0.250)0.0007 (0.0000–0.0022)33.80.107
INaItalyNaturno2146°39′N11°01′E0.349 (0.335–0.363)0.0030 (0.0002–0.0083)31.60.096
PSrPolandSrodkowy2153°35′N22°47′E0.297 (0.283–0.309)0.0055 (0.0000–0.0136)34.20.108
EAvSpainAvila2140°42′N4°48′W0.426 (0.411–0.440)0.0009 (0.0000–0.0026)27.80.084
ESeSpainSegovia2140°54′N4°06′W0.442 (0.429–0.457)0.0006 (0.0000–0.0019)26.10.078
CHAP1 SwitzerlandAlp di Plaun2046°47′N9°29′E0.224 (0.212–0.236)0.0129 (0.0001–0.0240)39.30.123
CHBo1 SwitzerlandBonaduz2246°48′N9°24′E0.297 (0.285–0.310)0.0050 (0.0000–0.0125)32.30.098
CHBw2 SwitzerlandBrienzwiler2146°45′N8°07′E0.328 (0.315–0.343)0.0031 (0.0000–0.0085)27.70.082
CHCa1 SwitzerlandCalandahütte2246°53′N9°29′E0.257 (0.245–0.269)0.0124 (0.0003–0.0232)35.50.105
CHDP1 SwitzerlandDomat/Ems2246°50′N9°26′E0.278 (0.265–0.290)0.0061 (0.0000–0.0145)32.60.097
CHGS2 SwitzerlandGrosse Scheidegg2246°40′N8°06′E0.386 (0.372–0.400)0.0011 (0.0000–0.0034)22.60.068
CHMe2 SwitzerlandMeiringen2246°43′′N8°11′E0.411 (0.396–0.424)0.0007 (0.0000–0.0021)19.90.061
CHSF2 SwitzerlandSchreck-Feld2246°40′N8°04′E0.354 (0.341–0.368)0.0019 (0.0001–0.0055)25.10.076

We report the population ID, sampling site (country and location), number of individuals per population (N), coordinates, estimates of population specific F ST and F IS with 95% credibility intervals (CI), proportion of variable AFLP markers and AFLP marker diversity for each population.

AFLPs of six primer combinations have been published in Foll et al. [56].

AFLPs of 21 primer combinations have been published in Fischer et al. [40].

The aim of this study is to detect selective patterns in populations across the European mainland to disentangle the importance of balancing and positive selection in shaping the genetic diversity observed in the distribution range of the common vole. However, a major challenge in identifying genomic regions under selection is to separate the footprint of selection from that of population history and demography (e.g. [10], [44], [45]). Hence examining a large number of loci scattered throughout the genome is an effective way to tell apart the effect of selection from the confounding effects of population history and demography [10], [46], [47]. Cavalli-Sforza [48] and Lewontin and Krakauer [49] proposed that genetic drift and gene flow should affect all loci similarly, leading to some overall degree of differentiation between populations, but that selected loci would deviate significantly from this distribution. Indeed, positive selection acting on a given locus should increase population differentiation (and lead to high F) whereas balancing selection should reduce it and lead to low F (see e.g. [40], [47], [50], [51]). For non-model organisms Amplified Fragment Length Polymorphisms (AFLPs) allow the screening of thousands of randomly distributed loci in a genome [52], [53]. To detect AFLP outliers, we used a recently developed extension of the Bayesian F ST-based approach [35], [54] based on the F-model [55]. BayeScan 2.1 [40] provides estimates of allele frequencies and F-statistics from AFLPs by incorporating for each individual the band intensity of a marker instead of simply using presence-absence patterns [56], [57]. This procedure implicitly allows one to distinguish between homo- and heterozygotes, and significantly improves the detection of selection with AFLP markers, which nearly reach the power obtained with single nucleotide polymorphism (SNP) data for which individual genotypes are known [40].

Materials and Methods

Sample and DNA extraction

We analysed 21 vole populations across most of the distribution range of M. arvalis in Europe, with a total of 444 individuals (see Figure 1 and Table 1). The populations were spread over 2,500 km from Spain (EAv) to Poland (PSr), and over a 750 km latitudinal gradient from Belgium (BSt) to Italy (INa). We report the population ID, sampling site (country and location), number of individuals per population (N), coordinates, estimates of population specific F ST and F IS with 95% credibility intervals (CI), proportion of variable AFLP markers and AFLP marker diversity for each population. AFLPs of six primer combinations have been published in Foll et al. [56]. AFLPs of 21 primer combinations have been published in Fischer et al. [40]. The samples for this study were obtained by strictly following the legislation on animal protection and experimentation of Switzerland and the other European countries involved. Microtus arvalis is not specifically protected by Swiss laws on animal protection (Tierschutzgesetz from December 16 2005) and hunting legislation (Verordnung zum Jagdschutzgesetz, February 29 1988) because of its role as an agricultural pest and general abundance. The use of snap traps for sampling M. arvalis is not a stress-inducing animal experiment (Schweregrad 0; Art. 137ff Swiss federal regulations on animal experimentation). However, Swiss samples analyzed in this study (some of them also covered in earlier studies; [40]–[43], [56], [58]–[62]) were obtained also under animal experimentation permits No. 55/02; 107/05; BE08/10; BE90/10 issued by the cantonal veterinary office of Bern according to federal law after ethical approval by the Bernese cantonal commission on animal experimentation. Additional samples were obtained from the researcher network on rodent-borne pathogens based at the German Federal Research Institute for Animal Health (FLI; http://www.fli.bund.de; GH is one of the coordinators) [63]–[65] and its international partners in the European projects EDEN and EDENext on biology and control of vector-borne infections in Europe (http://www.edenext.eu). Sample acquisition followed strictly the legislation of the relevant countries after approval by the according animal protection and ethics committees as required by the European Commission Seventh Framework Programme (FP7; http://cordis.europa.eu/fp7/home_en.html) [66], [67]. Total genomic DNA was extracted from foot, tail or liver tissue stored in absolute ethanol and later deep-frozen using a standard phenol-chloroform protocol [68]. The quality and quantity of the DNA was determined on 0.8% agarose gels and with a spectrophotometer (NanoDrop ND-1000 Spectrophotometer, NanoDrop Technologies, Inc., Wilmington, USA). The DNA concentration was standardized to 100 ng DNA/µL for all individuals to ensure similar PCR yield across samples [40].

AFLP analyses

AFLP analyses were performed according to standard protocols as established by Vos et al. [52] and modified by Fink et al. [69]. Selective amplifications were performed using 21 primer combinations (Table S1). These primer combinations were then named according to the last two selective bases of each primer, e.g. the combination E01-AAC/M02-cag is referred to as ACag. Special care was taken to guarantee the reproducibility of AFLP marker analyses: a liquid-handling robot (Microlab STAR, Hamilton Bonaduz AG, Bonaduz, Switzerland) was used for selective amplification, multiplexing of PCR products and loading of the 96-well sequencer plate, and 38 individuals (9%) were independently replicated for all 21 primer combinations (see [40] for more details).

AFLP fragment scoring and diversity

AFLP fragment scoring was performed with GeneMapper software version 3.7 (Applied Biosystems). Bin sets were created automatically and manually revised [40]. Two AFLP data matrices were produced, one with band intensity information and one with a standard binary presence-absence matrix. The AFLPs binary data matrix was used to estimate reproducibility, AFLP diversity estimates, and to run the first PCA analyses. A particular AFLP band intensity was scored as ‘present’ (1), if its value was larger than 10% of the 95% band intensity distribution quantile, or ‘absent’ (0), if its intensity was smaller than 10% of the 95% quantile value. AFLP marker frequencies, the number of variable markers per population and AFLP diversity were then calculated with the program AFLPDAT [70]. AFLP diversity was calculated as the average proportion of pairwise differences between individuals for each population, which is an index similar to Nei's gene diversity calculated from marker frequencies [71], [72].

Outlier detection

A Bayesian genome scan approach (BayeScan) was used to detect markers under selection. This procedure is more efficient than classical outlier detection methods (like DETSELD, modified version of [73] or DFDIST, modified version of [74]) in the discovery of true selected loci, as it results in a lower number of false positives [75]. BayeScan 2.1 was specifically developed for AFLP markers. The inclusion of band intensity information makes the BayeScan analysis of dominant AFLPs almost as powerful as an analysis of the same number of codominant markers (e.g. reaching 92% of the power of a SNP data set) to detect selection (for more details see [40]). Moreover, this additional information makes it possible to infer population-specific inbreeding coefficient (F IS) from AFLP data [56]. Band intensity information required by BayeScan 2.1 was obtained from the AFLP data matrix of marker band intensity provided by GeneMapper. Since markers with a low minor allele frequency systematically bias the F ST estimates downwards [76], only markers with band frequencies between 5% and 95% were used for subsequent analyses. This procedure prevents an artificial increase in the number of inferred outlier markers under positive selection [76]. Note that markers having band frequencies higher than 95% were still considered as polymorphic if the distribution of band intensity across all individuals was bimodal [40] and if they did not exceed three-times the 95% quantile of the band intensity distribution for that marker to avoid artefacts of the sequencing machine. These markers are probably informative to infer F IS, as they contain a high proportion of fixed and/or heterozygous individuals. BayeScan assumes that allele frequencies within populations follow a multinomial-Dirichlet distribution [55], [77], [78] with F ST parameters being a function of population-specific components shared among all loci and of locus-specific components shared among all populations. For a given locus, departure from neutrality is assumed when the locus-specific component is required to explain the observed pattern of diversity. BayeScan directly infers the posterior probability of each locus to be under the effect of selection by defining and comparing two alternative models: one model includes the locus-specific component, while the other excludes it [35]. The ratio of the model posterior probabilities is used to calculate then the posterior odds (PO), which measures how much more likely the model with selection is compared to the model without selection (see [40]). We used a threshold of PO>10 for a marker to be considered under selection, which refers to “strong evidence” for the alternative model (in this case the model with selection) as defined by Jeffreys [79]. For the Markov chain Monte Carlo (MCMC) algorithm we used 20 pilot runs of 5,000 iterations to adjust the proposal distribution to acceptance rates between 0.25 and 0.45 for the runs. A burn-in of 50,000 iterations was used and visually checked for convergence of the MCMC chains, followed by 50,000 iterations for estimation using a thinning interval of 10. False Discovery Rate (FDR) was used to control for multiple testing [40], [80].

Inference of neutral genetic structure across Europe

We performed two principal component analyses (PCA) in R [81] to infer the patterns of neutral genetic structure in common voles across Europe. PCA analyses were performed on the neutral (excluding outlier loci) and evolutionary informative AFLP markers. Evolutionary informative AFLPs have band frequencies ranging between 5% and 95%, which excludes uninformative and rare markers [76]. One PCA analysis was done at the individual level using AFLP marker presence/absence data for all 444 individuals and the second analysis was done at the population level, on the basis of marker allele frequencies estimated by BayeScan [40], [56] using band intensity information.

Inference of balancing selection

Markers detected under balancing selection were investigated in more detail, as heterozygosity information can be gained from the population-specific band intensity distribution for a specific AFLP marker. A marker under balancing selection should indeed have evenly distributed allele frequencies across most populations and heterozygous individuals should be observed within populations, leading to a bimodal band intensity distribution for this AFLP marker [56]. The markers inferred as under balancing selection were thus carefully examined for bimodality of band intensities. However, sex-chromosome linked markers may also show bimodal distributions and low differentiation between populations in samples with equal sex ratios, as males only have one X-chromosome. A t-test implemented in R was thus used to check for association between band intensity and gender, using a threshold of p>0.05 without correction for multiple testing, to be conservative in the identification of marker under balancing selection. We have used the same approach to test for any amplification difference among different 96-well PCR plates of the same primer pair (batch effect).

Inference of positive selection patterns across Europe

To infer the patterns of positive selection in common voles across Europe we performed scaled PCA in R of the population allele frequencies of loci inferred under positive selection by BayeScan. To identify the strongest geographic patterns of selection across Europe, we used a locus-by-locus SAMOVA approach [82] to separate for each marker populations into groups (k = 2) leading to the highest level of genetic differentiation (F CT). The three outlier loci showing the highest F CT were identified and plotted onto the European map using the R package plotrix to visualize the population-specific allele frequencies of these patterns of selection. To find loci showing similar geographic patterns of selection across Europe, which could be the cause of multi-genic adaptation due to similar selective pressures on different loci or genetic linkage of markers, we computed a pairwise Pearson's correlation between the population-specific allele frequencies of the outlier loci using the R package psych and Holm's correction for multiple testing [83].

Results

AFLP variation and neutral genetic structure across Europe

The AFLP analyses of the 21 European vole populations provided 3,839 markers. The majority of these AFLP markers were polymorphic (3,318; 86%) and 2,054 (54%) showed informative band frequencies between 5% and 95% overall. For each individual, we obtained on average 2,342 AFLPs (range: 2,169–2,418) across all primer combinations, and the mean length of the fragments was 239 bp. An average of 183 AFLP markers was scored per primer combination across all individuals (range: 86–256; Table 2). The average proportion of variable AFLP bands per population was 31%, with an average AFLP diversity of 9.6%. F IS estimates were low for all populations, ranging between 0.001 and 0.043 (Table 1). Average genetic differentiation among populations was globally high with an average population-specific F ST of 0.31. The population from the Czech Republic (CZD) had the highest number of variable AFLP bands per population (46%), and consequently the lowest population-specific F ST (0.16), whereas the lowest diversity was observed in a population of the Swiss Alps (CHMe), with only 20% of variable markers and hence a high population-specific F ST (0.41).
Table 2

AFLP markers detected under positive selection with BayeScan.

PrimersMarkers# BayeScanMarker ID of specific primer combination
ACag 2019 33; 77; 84; 85; 119; 136; 186; 187; 200
ACtc 1752 21; 94
ACtt 1575 40; 76; 79; 115; 142
AGaa 1474 5; 56; 60; 101
AGac 2007 3; 13; 84; 144; 161; 188; 190
AGtg 1414 97; 103; 106; 126
CAat 2105 102; 103; 145; 166; 181
CAta 1858 73; 84; 116; 155; 157; 158; 165; 169
CCac 1954 17; 129; 153; 167
CCta 2193 88; 153; 213;
CCtt 25612 2; 34; 41; 42; 50; 67; 97; 158; 168; 169; 193; 208
CGag 1014 67; 69; 70; 97
CGtt 866 24; 40; 50; 51; 79; 87
CTaa 23911 3; 7; 78; 86; 166; 173; 184; 185; 187; 208; 232
CTag 2089 64; 90; 102; 125; 132; 133; 147; 154; 176
CTtg 21110 46; 93; 97; 104; 117; 133; 170; 174; 175; 187
GCat 22412 15; 31; 60; 106; 169; 175; 180; 181; 192; 208; 212; 213
GCta 1996 7; 20; 33; 82; 98; 187
GCtc 1817 16; 22; 40; 43; 160; 166; 172
GGac 1684 30; 31; 36; 164
GGtc 1366 15; 18; 64; 85; 125; 129
Total3839138

Given are the 21 primer combinations investigated, the total number of scored AFLP markers per primer combination across all individuals, the number of markers with a posterior odds (PO)>10 and the primer combination specific IDs of the markers under selection.

Given are the 21 primer combinations investigated, the total number of scored AFLP markers per primer combination across all individuals, the number of markers with a posterior odds (PO)>10 and the primer combination specific IDs of the markers under selection. The two “evolutionary neutral” PCAs were based on 1,843 neutral AFLP markers - these were the 2,054 evolutionary informative AFLP markers minus the 211 inferred outlier loci (see more details below). These neutral markers led to a clustering of individuals that approximately matches the geographic origin of the samples (Figure 1) except that the Swiss vole populations were somewhat farther apart than geography would suggest. The entire individual-based AFLP data set (Figure 2A) as well as the PCA from estimated population-specific allele frequencies (Figure 2B) show very similar patterns and allow a clear separation of the populations, which indicates the high information content of these AFLP markers.
Figure 2

Principal component analysis (PCA) of (A) the neutral binary AFLP data matrix of 444 individuals from 21 populations across Europe using 1,843 neutral and evolutionary informative AFLP markers (see Material and Methods).

(B) PCA of the 21 population-specific allele frequency estimates of neutral AFLP markers by BayeScan. The distribution of the populations on the plot roughly follows the geographic origin of the samples. (C) PCA of the estimated population-based allele frequencies of the 138 outliers probably under positive selection. For population IDs see Table 1. Colours correspond to country affiliation (see Figure 1).

Principal component analysis (PCA) of (A) the neutral binary AFLP data matrix of 444 individuals from 21 populations across Europe using 1,843 neutral and evolutionary informative AFLP markers (see Material and Methods).

(B) PCA of the 21 population-specific allele frequency estimates of neutral AFLP markers by BayeScan. The distribution of the populations on the plot roughly follows the geographic origin of the samples. (C) PCA of the estimated population-based allele frequencies of the 138 outliers probably under positive selection. For population IDs see Table 1. Colours correspond to country affiliation (see Figure 1).

Genome scan

The BayeScan analysis of the 2,054 informative AFLP markers in 21 populations across Europe revealed 211 markers with a PO for selection larger than 10 with an associated FDR of less than 1.4%. Among these markers under selection, 138 (6.7%) had high F ST (mean F ST: 0.52) indicative of positive selection, and 73 were associated with very low F ST (mean F ST: 0.08) indicative of balancing selection (Figure 3; Table 2).
Figure 3

Results of BayeScan analysis for 2,054 informative AFLPs genotyped in 444 M. arvalis voles sampled in 21 populations.

The marker-specific F ST is plotted against the posterior odds (PO) of being under selection. The vertical line shows the critical PO of 10 used to identify outlier markers. Markers on the right side of the vertical line are outliers: 138 markers with high F ST indicative of positive selection and 73 markers with low F ST indicative of balancing selection were identified. Markers having a log10(PO)>4 were summarized in the category 4.

Results of BayeScan analysis for 2,054 informative AFLPs genotyped in 444 M. arvalis voles sampled in 21 populations.

The marker-specific F ST is plotted against the posterior odds (PO) of being under selection. The vertical line shows the critical PO of 10 used to identify outlier markers. Markers on the right side of the vertical line are outliers: 138 markers with high F ST indicative of positive selection and 73 markers with low F ST indicative of balancing selection were identified. Markers having a log10(PO)>4 were summarized in the category 4. Bimodal band intensity information of AFLPs (for more details see Figure 4A and B, or [40], [56], [57]) was used to identify prime candidates for balancing selection and to exclude false positives among the 73 low F ST outliers. Among these, 40 markers were considered as unlikely to be under balancing selection, either because outliers showed significant band intensity differences between males and females (t-test, p<0.05) and were thus likely sex-chromosome linked (33 markers, Table 3, Figure 4C and D) or because of PCR amplification strength differences between 96-well plates (7 markers of the primer combination GGtc).
Figure 4

Bimodal band intensity distribution of two low F ST outlier markers.

(A) Bimodal distribution of a marker likely to be under balancing selection (CTaa125). The zero class of the distribution represents individuals not showing any band, the following first peak corresponds to heterozygous individuals and the second peak represents homozygous individuals. The black line represents a fitted density curve. (B) Comparison of band intensities in males and females for marker CTaa125 shown in (A) revealing no statistical difference and suggesting it is an autosomal marker. (C) Bimodal distribution for marker ACtt16. (D) Corresponding box plot of sex-specific band intensities for marker ACtt16 where females have band intensities about twice larger than males, hence suggesting it is an X-linked marker.

Table 3

List of the 73 markers identified by BayeScan as being potentially under balancing selection.

Frequency
IDPO F ST MarkerAlleleMarker categorization
CCta185 226.30.1050.9910.843bimodal (balancing selection)
CGag21 0.0570.9180.764bimodal (balancing selection)
CTaa125 63.10.1360.9270.699bimodal (balancing selection)
CTaa235 14.00.1570.8130.618bimodal (balancing selection)
GCat153 0.0940.7230.476bimodal (balancing selection)
GCtc42 118.00.1200.9040.770bimodal (balancing selection)
CAta44 0.03210.738bimodal a
GCtc76 0.0290.9840.847bimodal a
ACtc64 999.00.0630.9950.972multimodal b
AGaa48 554.60.08410.844multimodal b
AGaa96 22.50.1530.7850.546unimodal b
AGtg79 130.60.1170.9410.728unimodal b
CAat185 249.00.0780.9690.812unimodal b
CAta9 0.03510.940unimodal b
CAta91 32.10.1070.9320.783unimodal b
CCta61 28.60.1260.9340.761unimodal b
GCat12 0.03310.982unimodal b
GCat95 0.0540.9130.739unimodal b
GCat152 12490.0750.930.740unimodal b
GCta26 91.60.1240.8880.696unimodal b
GCtc48 12.10.1480.8570.596unimodal b
ACag100 134.10.1130.2260.131low frequency c
CCac84 11.20.1200.0520.049low frequency c
CCta81 27.90.1050.0740.044low frequency c
CCta90 59.20.1000.0640.052low frequency c
CCta180 31.50.1020.060.037low frequency c
CCta184 10.70.1270.0650.043low frequency c
CCtt32 10.10.1320.070.054low frequency c
CCtt65 0.0780.330.206low frequency c
CCtt131 39.30.1090.1190.082low frequency c
CCtt145 383.60.0680.0750.074low frequency c
CGtt4 16.50.1390.2040.123low frequency c
CGtt6 40.00.1090.0980.069low frequency c
ACtc47 0.0660.9910.812sex-linked d
ACtt16 0.02610.787sex-linked d
ACtt18 105.40.1200.8810.734sex-linked d
ACtt31 0.0240.9980.762sex-linked d
ACtt46 276.80.1210.9380.713sex-linked d
AGac127 0.0790.9980.829sex-linked d
AGtg131 160.30.1350.780.596sex-linked d
CAat112 33.20.1440.8470.618sex-linked d
CAta114 0.0240.9860.744sex-linked d
CAta142 0.0680.9610.772sex-linked d
CCac46 0.0380.9950.747sex-linked d
CCac57 0.02910.763sex-linked d
CCta11 0.02410.752sex-linked d
CCta29 34.70.1480.660.477sex-linked d
CCta73 12.40.1010.9980.983sex-linked d
CCtt100 0.0310.9910.710sex-linked d
CCtt150 0.0420.9980.757sex-linked d
CCtt191 0.0290.9950.712sex-linked d
CTaa27 0.0400.9840.775sex-linked d
CTaa102 0.0400.9340.662sex-linked d
CTaa113 0.04910.780sex-linked d
CTag19 0.04510.772sex-linked d
CTag94 0.0270.9980.797sex-linked d
CTtg10 0.0760.8960.738sex-linked d
CTtg105 0.0380.9710.730sex-linked d
CTtg155 0.0590.9430.721sex-linked d
GCat44 0.0680.9460.778sex-linked d
GCat69 0.0230.9950.742sex-linked d
GCat98 0.0440.9950.792sex-linked d
GCta14 0.06010.817sex-linked d
GCta96 49990.0900.8950.632sex-linked d
ACag54 262.20.1250.5760.831sex-linked d
AGtg13 13.40.1530.4440.301sex-linked d
GGtc1 0.0330.9980.866lab amplification difference e
GGtc9 45.30.1220.9880.748lab amplification difference e
GGtc10 0.0470.9950.981lab amplification difference e
GGtc21 49990.0970.8460.714lab amplification difference e
GGtc34 16.20.0890.9950.968lab amplification difference e
GGtc37 0.0230.9980.863lab amplification difference e
GGtc55 0.0570.9980.817lab amplification difference e

The top six markers are prime candidates for being under balancing selection (underlined markers) due to their clear bimodal intensity distribution in all populations. Thirty-three additional markers are sex-lined. Given are marker ID, posterior odds (PO) for the marker, marker frequency, estimated allele frequency and marker categorization.

clear bimodality was not found in single population.

unimodal or multimodal band intensity distribution.

low allele frequency.

sex-chromosome linkage.

bimodal distribution probably due to PCR amplification strength difference between 96-well plates of one specific primer pair (GGtc).

Bimodal band intensity distribution of two low F ST outlier markers.

(A) Bimodal distribution of a marker likely to be under balancing selection (CTaa125). The zero class of the distribution represents individuals not showing any band, the following first peak corresponds to heterozygous individuals and the second peak represents homozygous individuals. The black line represents a fitted density curve. (B) Comparison of band intensities in males and females for marker CTaa125 shown in (A) revealing no statistical difference and suggesting it is an autosomal marker. (C) Bimodal distribution for marker ACtt16. (D) Corresponding box plot of sex-specific band intensities for marker ACtt16 where females have band intensities about twice larger than males, hence suggesting it is an X-linked marker. The top six markers are prime candidates for being under balancing selection (underlined markers) due to their clear bimodal intensity distribution in all populations. Thirty-three additional markers are sex-lined. Given are marker ID, posterior odds (PO) for the marker, marker frequency, estimated allele frequency and marker categorization. clear bimodality was not found in single population. unimodal or multimodal band intensity distribution. low allele frequency. sex-chromosome linkage. bimodal distribution probably due to PCR amplification strength difference between 96-well plates of one specific primer pair (GGtc). Among the remaining 33 markers with low F ST values, 27 showed distributions that could be compatible with other factors than just balancing selection. Two markers (CAta44 and GCtc76) had an overall bimodal distribution, but a clear bimodality was missing in individual populations. Thirteen markers had either a unimodal or multimodal band intensity distribution. Twelve markers had low allele frequencies (0.04–0.21) that could be a consequence of negative selection or frequency dependent selection, which is also form of balancing selection. Finally, six markers were identified as prime candidates for balancing selection (Table 3), as homozygous individuals had approximately twice the band intensity of heterozygous individuals (Figure 4A) and all populations showed intermediate allele frequencies across the European continent (see e.g. Figure 5A).
Figure 5

Allele frequency distributions of a locus potentially under balancing and three loci potentially under positive selection across Europe.

Pie charts indicate the minor allele frequency. (A) Potential patterns of continental balancing selection of the locus CTaa125, which shows very homogenous allele frequencies across Europe. (B–D) Three loci under potential positive selection, which produced the strongest splits (F CT) between two groups of populations across Europe identified by the locus-by-locus SAMOVA approach [82]. Shown are the loci (B) ACag119 with a F CT of 0.93, (C) CTaa3 with a F CT of 0.89, and (D) GGac31 with a F CT of 0.87.

Allele frequency distributions of a locus potentially under balancing and three loci potentially under positive selection across Europe.

Pie charts indicate the minor allele frequency. (A) Potential patterns of continental balancing selection of the locus CTaa125, which shows very homogenous allele frequencies across Europe. (B–D) Three loci under potential positive selection, which produced the strongest splits (F CT) between two groups of populations across Europe identified by the locus-by-locus SAMOVA approach [82]. Shown are the loci (B) ACag119 with a F CT of 0.93, (C) CTaa3 with a F CT of 0.89, and (D) GGac31 with a F CT of 0.87.

Inference of positive selection across Europe

We detected a total of 138 markers potentially under positive selection across Europe, with an average of 6.6 outliers per tested primer combination (range: 2–12; Table 2). For these outliers, strong allele frequency differences were always identified in three or more populations compared to the rest, showing that selection was inferred independently in multiple populations (see e.g. Figure 5 B–D). The PCA based on allele frequencies estimated for the 138 loci potentially under positive selection revealed a different pattern than the neutral markers (Figure 2C). Especially the populations within the Swiss Alps (CHAP, CHBo, CHCa, CHDP, CHBw, CHMe, CHGS and CHSF) and Italian Alps (INa) are much more separated from the other populations and show larger extent of differentiation among themselves compared to the PCA on neutral loci (Figure 2A and B), which is potentially indicative of strong selection pressures in the alpine area. SAMOVA allowed us to identify the outlier loci that produced the strongest splits between two groups of populations across Europe leading to the highest level of genetic differentiation (F CT), which might be an indication of the strength of selection. The three loci that showed the strongest splits are ACag119 with a F CT of 0.93 (Figure 5B), CTaa3 with a F CT of 0.89 (Figure 5C) and GGac31 with a F CT of 0.87 (Figure 5D). The pairwise comparison of allele frequencies of outlier loci identified that ACag119 showed a significant correlation with only three other loci, CTaa3 with six and GGac31 with 16 loci. Among the 138 loci under positive selection the average number of associations was 6.1 with a range of 0 to 24 associations. Additional information for all 138 outlier loci can be found in Table S2.

Discussion

The current study illustrates the capacity of Bayesian F ST outlier approaches to identify the signature of positive and balancing selection in non-model organisms. The nearly 4,000 AFLP markers, of which 2,054 were evolutionarily informative, clearly allowed us to screen a representative part of the common vole genome for loci linked to recent adaptation on a continental scale in Europe.

Genetic structure across Europe inferred by AFLPs

The neutral AFLP markers allowed us to accurately resolve population genetic structure of the 21 vole populations across the European continent and the PCA led to a clustering of individuals and populations that corresponds approximately to the geographic origin of the samples (Figure 1 vs. 4A and B). Similar patterns were found in humans were genetic data also mirror geography in Europe [84]. This high resolution indicates the large information content present in this AFLP data set and is further supported by a very similar PCA-based clustering of populations inferred by 6,807 polymorphic SNPs (see Figure S2 in [60]), which were used to resolved the four evolutionary lineages present in Europe [43].

Pattern of selection across European continent

We scanned 21 vole populations across the European continent for evidence of selection. Overall slightly more than 8% of all markers were under positive or balancing selection. Despite the detection of some candidate loci for balancing selection (1.6%), more loci for local positive selection (6.7%) were identified. These results suggest that drift and the demographic history of vole populations have strongly influenced the observed genetic diversity, but that also positive selection plays an important role in shaping the genetic diversity of vole populations, while balancing selection is less common. Nevertheless, the detection of several markers with multiple evidence of balancing selection is remarkable, especially the signature of a stabilizing evolutionary process on such a large geographic scale. Contrasting to our results, balancing selection played in the past an important role in evolutionary genetics in explaining the high level of genomic polymorphism observed among species or populations [8], [10]. Six decades ago Dobzhansky [1] suggested that genetic polymorphisms were maintained in populations by selection favouring heterozygotes, thus by balancing selection. Later Kimura [2], [85] has shown that most polymorphisms in the genome should be selectively neutral after the action of purifying selection. It follows that clear examples of balancing selection in any organism should be quite limited and mainly inferred by a candidate gene approach (see Introduction and e.g. [12]–[25]), but little is known about the prevalence of balancing selection on a genome wide scale [26]–[28]. In humans balancing selection is thought to have a limited role in preserving genome-wide polymorphisms [26], [86], as a specific survey of balancing selection in humans identified only 60 out of 13,400 genes [27]. In this study we identified 33 loci with significantly low levels of differentiation among populations, which represent slightly more than 1.5% of all informative markers and hence slightly more that the 0.5% inferred in humans [27]. Our findings, together with the human studies [27], indicate that large geographic scale balancing selection is probably not as frequent as previously suspected, and hence only plays a minor role in maintaining polymorphism in a population or in shaping the genetic diversity of a species. The observation of evenly distributed allele frequencies across the whole European continent (e.g. see Figure 5A) despite extremely strong levels of differentiation among populations (average F ST = 0.31) is quite remarkable, especially for a species with limited dispersal ability [43], [87]. Such even allele frequencies across a large geographic range are difficult to explain in absence of strong stabilizing selection and hence good support for the presence of balancing selection. This study used a conservative post-hoc evaluation of AFLP marker band intensity distributions to provide further support for the authenticity of the signature of balancing selection, which allowed us to prioritize prime candidate loci for balancing selection. Six markers were characterized by low F ST values, evenly distributed allele frequencies among populations (Figure 5A) and especially by the bimodal band intensity distribution, which clearly indicates the presence of heterozygous individuals in several populations (Figure 4A). Apart from these six loci, 27 markers showed peculiarities also compatible with other factors than only balancing selection. Twelve markers had low allele frequencies across Europe, maybe as a result of frequency-dependent selection, a selective mechanism that favours alleles when they are rare and might result in balanced genetic polymorphisms in populations [11]. But the observed low allele frequencies might also be explained by slightly negative selection [27]. For 15 markers no obvious bimodal band intensity distribution was observed, hence no clear signal of heterozygous individuals within populations could be identified, which might be explained by the stochasticity of slight technical variation in the sequencing machine that might have eroded the signal. However, especially the detection of 33 sex-chromosome linked markers (Figure 4B) clearly supports the use of AFLPs as a partially codominant marker system and indicates that heterozygous individuals or individuals carrying only one gene copy can reliably be estimated from the band intensity distribution in AFLP markers [56], [57]. Compared to balancing selection the inference of directional selection is less questioned, even though some confounding demographic factors (e.g. surfing during range expansions; [88], [89]) might produce some false positives. However, as we have used a quite stringent threshold for accepting a locus to be under selection (PO>10), our results suggest that we have here a very low false discovery rate of less than 1.4%. We detected that 6.7% of the informative markers probably evolved as a consequence of directional selection, which might be linked to adaptation to spatial heterogeneity of the environments of European vole populations. Given the wide distribution range and highly heterogeneous environments where these voles are found, it is indeed expected that different polymorphisms might be selected in different populations and habitats [5], [26]. The markers detected under positive selection in this study display a wide variety of allele frequency patterns across Europe. The PCA based on 138 markers under positive selection revealed a quite different structure (Figure 2C) than the PCA computed on 1,843 informative and neutral AFLPs (Figure 2A and B), indicating that selection acts differently on these loci than the interplay of drift and geographic separation. It is difficult to draw conclusions on the selection pressure from the allele frequency distribution of these markers; nevertheless there are some interesting patterns, which might be explained by environmental differences among populations. The two outlier loci that showed the strongest splits between two groups of populations across Europe (Figure 5B and C), were driven by populations from Alpine areas (some of the vole populations lived above 2,000 m asl). Hence they might be related to an adaptation to high elevation [40], [90] or just to the highly heterogeneous environment observed at a small geographic scale, which is specific to Alpine regions [91]. These Swiss and Italian Alpine populations are much more separated in the PCA on loci under selection (Figure 2A) than in the neutral PCAs, indicating that probably many loci are under selection in this region. However, there are also patterns that are more difficult to interpret in environmental or geographic context, e.g. Figure 5D, but biotic interactions can be also very important for local adaption and are much more difficult to infer.

Outlook

AFLP genome scans enable us to detect markers under recent selection in the common vole genome, but it is unfortunately impossible to determine their function and location in the absence of a sequenced genome for this species. New high-throughput technologies make full genomes more accessible than before (for review see [92]–[94]), but target-capture sequencing of hundreds of individuals is still prohibitive for most non-model organisms [57] and full genome re-sequencing studies of pooled population data (Pool-Seq) is only possible for rather small genomes (see e.g. [91], [95], [96]). An alternative strategy would be the investigation of candidate loci for selection by direct high-throughput sequencing of AFLP fragments [60], [97], which could be useful to further characterize candidate regions and genes linked with AFLP markers in this non-model organism. The 21 selective primer combinations and their fluorescent labels used in the AFLP assay. (DOCX) Click here for additional data file. SAMOVA results of the 138 loci probably under positive selection ranked by F CT and the number of significant association with other loci having similar allele frequencies across Europe. (DOCX) Click here for additional data file.
  83 in total

1.  Protective effects of the sickle cell gene against malaria morbidity and mortality.

Authors:  Michael Aidoo; Dianne J Terlouw; Margarette S Kolczak; Peter D McElroy; Feiko O ter Kuile; Simon Kariuki; Bernard L Nahlen; Altaf A Lal; Venkatachalam Udhayakumar
Journal:  Lancet       Date:  2002-04-13       Impact factor: 79.321

2.  A review of some fundamental concepts and problems of population genetics.

Authors:  T DOBZHANSKY
Journal:  Cold Spring Harb Symp Quant Biol       Date:  1955

3.  Estimating population structure from AFLP amplification intensity.

Authors:  Matthieu Foll; Martin C Fischer; Gerald Heckel; Laurent Excoffier
Journal:  Mol Ecol       Date:  2010-09-27       Impact factor: 6.185

4.  Genetic variability maintained in a finite population due to mutational production of neutral and nearly neutral isoalleles.

Authors:  M Kimura
Journal:  Genet Res       Date:  1968-06       Impact factor: 1.588

5.  The fate of mutations surfing on the wave of a range expansion.

Authors:  Seraina Klopfstein; Mathias Currat; Laurent Excoffier
Journal:  Mol Biol Evol       Date:  2005-11-09       Impact factor: 16.240

6.  Contrasting effects of natural selection on human and chimpanzee CC chemokine receptor 5.

Authors:  Stephen Wooding; Anne C Stone; Diane M Dunn; Srinivas Mummidi; Lynn B Jorde; Robert K Weiss; Sunil Ahuja; Michael J Bamshad
Journal:  Am J Hum Genet       Date:  2004-12-29       Impact factor: 11.025

7.  Transalpine colonisation and partial phylogeographic erosion by dispersal in the common vole (Microtus arvalis).

Authors:  Sonja Braaker; Gerald Heckel
Journal:  Mol Ecol       Date:  2009-04-23       Impact factor: 6.185

8.  Comparing three different methods to detect selective loci using dominant markers.

Authors:  A Pérez-Figueroa; M J García-Pereira; M Saura; E Rolán-Alvarez; A Caballero
Journal:  J Evol Biol       Date:  2010-10       Impact factor: 2.411

9.  Uninformative polymorphisms bias genome scans for signatures of selection.

Authors:  Marius Roesti; Walter Salzburger; Daniel Berner
Journal:  BMC Evol Biol       Date:  2012-06-22       Impact factor: 3.260

10.  Validation of SNP allele frequencies determined by pooled next-generation sequencing in natural populations of a non-model plant species.

Authors:  Christian Rellstab; Stefan Zoller; Andrew Tedder; Felix Gugerli; Martin C Fischer
Journal:  PLoS One       Date:  2013-11-07       Impact factor: 3.240

View more
  4 in total

1.  Revised time scales of RNA virus evolution based on spatial information.

Authors:  Moritz Saxenhofer; Vanessa Weber de Melo; Rainer G Ulrich; Gerald Heckel
Journal:  Proc Biol Sci       Date:  2017-08-16       Impact factor: 5.349

Review 2.  Host-parasite co-evolution and its genomic signature.

Authors:  Dieter Ebert; Peter D Fields
Journal:  Nat Rev Genet       Date:  2020-08-28       Impact factor: 53.242

3.  Contrasted patterns of variation and evolutionary convergence at the antiviral OAS1 gene in old world primates.

Authors:  Ian Fish; Stéphane Boissinot
Journal:  Immunogenetics       Date:  2015-07-10       Impact factor: 2.846

4.  Tracing reinforcement through asymmetrical partner preference in the European common vole Microtus arvalis.

Authors:  Mathias Beysard; Rebecca Krebs-Wheaton; Gerald Heckel
Journal:  BMC Evol Biol       Date:  2015-08-25       Impact factor: 3.260

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.