Daniela Polic1, Yeşerin Yıldırım1,2, Kyung Min Lee3,4, Markus Franzén3, Marko Mutanen3, Roger Vila5, Anders Forsman1. 1. Department of Biology and Environmental Science, Linnaeus University, Kalmar, Sweden. 2. Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden. 3. Ecology and Genetics Research Unit, University of Oulu, Oulu, Finland. 4. Zoology Unit, Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland. 5. Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain.
Abstract
Understanding which factors and processes are associated with genetic differentiation within and among species remains a major goal in evolutionary biology. To explore differences and similarities in genetic structure and its association with geographical and climatic factors in sympatric sister species, we conducted a large-scale (>32° latitude and >36° longitude) comparative phylogeographical study on three Argynnini butterfly species (Speyeria aglaja, Fabriciana adippe and F. niobe) that have similar life histories, but differ in ecological generalism and dispersal abilities. Analyses of nuclear (ddRAD-sequencing derived SNP markers) and mitochondrial (COI sequences) data revealed differences between species in genetic structure and how genetic differentiation was associated with climatic factors (temperature, solar radiation, precipitation, wind speed). Geographical proximity accounted for much of the variation in nuclear and mitochondrial structure and evolutionary relationships in F. adippe and F. niobe, but only explained the pattern observed in the nuclear data in S. aglaja, for which mitonuclear discordance was documented. In all species, Iberian and Balkan individuals formed genetic clusters, suggesting isolation in glacial refugia and limited postglacial expansion. Solar radiation and precipitation were associated with the genetic structure on a regional scale in all species, but the specific combinations of environmental and geographical factors linked to variation within species were unique, pointing to species-specific responses to common environments. Our findings show that the species share similar colonization histories, and that the same ecological factors, such as niche breadth and dispersal capacity, covary with genetic differentiation within these species to some extent, thereby highlighting the importance of comparative phylogeographical studies in sympatric sister species.
Understanding which factors and processes are associated with genetic differentiation within and among species remains a major goal in evolutionary biology. To explore differences and similarities in genetic structure and its association with geographical and climatic factors in sympatric sister species, we conducted a large-scale (>32° latitude and >36° longitude) comparative phylogeographical study on three Argynnini butterfly species (Speyeria aglaja, Fabriciana adippe and F. niobe) that have similar life histories, but differ in ecological generalism and dispersal abilities. Analyses of nuclear (ddRAD-sequencing derived SNP markers) and mitochondrial (COI sequences) data revealed differences between species in genetic structure and how genetic differentiation was associated with climatic factors (temperature, solar radiation, precipitation, wind speed). Geographical proximity accounted for much of the variation in nuclear and mitochondrial structure and evolutionary relationships in F. adippe and F. niobe, but only explained the pattern observed in the nuclear data in S. aglaja, for which mitonuclear discordance was documented. In all species, Iberian and Balkan individuals formed genetic clusters, suggesting isolation in glacial refugia and limited postglacial expansion. Solar radiation and precipitation were associated with the genetic structure on a regional scale in all species, but the specific combinations of environmental and geographical factors linked to variation within species were unique, pointing to species-specific responses to common environments. Our findings show that the species share similar colonization histories, and that the same ecological factors, such as niche breadth and dispersal capacity, covary with genetic differentiation within these species to some extent, thereby highlighting the importance of comparative phylogeographical studies in sympatric sister species.
Evolutionary biologists and ecologists have long been studying the biogeographical history (phylogeography) of species to understand how different factors act together in forming genetic patterns of populations (Avise, 2000; Avise et al., 2016; Bermingham & Moritz, 1998; Shikano et al., 2010). In particular, large‐scale comparative phylogeography among closely related species with similar yet diverging habitat requirements and overlapping distribution areas can help understand if and how common historical factors, and biogeographical and environmental parameters shape biological diversity (Michaux et al., 2005; Papadopoulou & Knowles, 2016; Victoriano et al., 2008). Similar responses to the same factors would be expected to result in convergent phylogeographical patterns across species, whereas discrepant patterns might reflect species‐specific responses (Avise, 2000, Avise et al., 2016, Bermingham & Moritz, 1998, Hickerson et al., 2010, Zink et al., 2001). Incongruence among species could stem from different historical and contemporary reactions to drivers of spatial genetic divergence: geographical boundaries such as landscape barriers (vicariance), spatial isolation (isolation by distance), selective regimes and differences in effective population size (Avise et al., 2016, Crispo et al., 2006, Riddle, 2016, Sobel et al., 2010, Wright, 1943). Ultimately, differential responses to spatial and temporal variation in environmental conditions may be determined by species traits such as dispersal capacity, thermal regulation, body size, physiological tolerance, life history traits, adaptive capacity and plasticity (Bernardo & Spotila, 2006; Kawecki & Ebert, 2004; Paz et al., 2015; Rundle & Nosil, 2005).The Quaternary climatic oscillations are considered crucial for today's species' distribution and genetic structure (Davis & Shaw, 2001; Hewitt, 2004). During the Pleistocene (2.58 million to 12,000 years ago), many populations have been isolated in different glacial refugia which presumably resulted in allopatric differentiation, intraspecific variation and local adaptation (Ren et al., 2017; Stewart et al., 2010). Depending on how long populations and species have been confined to such refugia, and how fast populations have expanded their ranges after the last glacial maximum (LGM, 26,500–20,000 years ago), gene flow between previously separated populations would have occurred at different rates, thereby differently influencing the contemporary genetic structure of species (Roed et al., 2016; Stewart et al., 2010; Xun et al., 2016).For ectothermic organisms like butterflies, abiotic factors such as temperature, solar radiation, precipitation and wind speed influence the need for thermoregulation, define selective regimes, and set boundaries on activity and dispersal, thereby potentially affecting the genetic and phenotypic variability of natural populations (Kingsolver & Watt, 1983; Kingsolver & Wiernasz, 1991; Polgar et al., 2013; Zaman et al., 2019). Further, geographical factors such as altitude may differently affect distinct butterfly species. In high‐altitude habitats, the combination of relatively short vegetation periods (which translates to short foraging times during the larval stage), low temperatures, high solar radiation, and frequent and sudden changes in weather might lead to adaptive divergence in butterfly populations at different altitudinal clines (Karl et al., 2008). Depending on the species' mobility, high mountain ranges may hinder dispersal and restrict gene flow. In addition to the direct effects that climatic and geographical factors may have on the evolution of species, there is potential for differential and divergent selection owing to spatial variation in community species composition and biotic interactions with host plants, competitors, predators and enemies to influence local adaptations and genetic structure (Magalhaes et al., 2011; Mopper & Strauss, 2013; Pilot et al., 2006; Runquist et al., 2020; Xie et al., 2019). Again, large‐scale comparative biogeographical approaches can inform about the relative roles of different abiotic and biotic factors.Argynnini butterfly species are well suited for analysing and comparing the genetic structure in closely related species with partially overlapping ecological niches and distribution ranges, as they can occur sympatrically across large parts of their shared distribution areas. Here, we use three Argynnini species, Speyeria aglaja, Fabriciana adippe and Fabriciana niobe, as model species. These species belong to the same tribe within the fritillaries (i.e., Nymphalidae butterflies with mostly checkered wing patterns) and share overall morphological similarities. Despite having similar distribution ranges, they differ in their northern range limits, presumably owing to differences in their temperature tolerance (Ellis et al., 2010). Speyeria aglaja is distributed from Northern Africa over Europe and Russia to Southeast Asia, reaching as far north as Southern Lapland, whereas F. adippe's northern distribution only extends to central Sweden (Fox et al., 2011; Franzén & Johannesson, 2007; Tolman & Lewington, 2012; Warren, 1995), and F. niobe only reaches southern Sweden (Eliasson et al., 2005; Reinhardt & Bolz, 2011; van Swaay & Warren, 1999). Further, these species differ in their degree of ecological generalism, dispersal abilities, habitat preferences and host plant use (Eliasson et al., 2005; Forster & Wohlfahrt, 1955; Fric et al., 2005; Higgins & Riley, 1970; Komonen et al., 2004; Öckinger et al., 2006; Polic et al., 2020; Pollard & Yates, 1994; Zima et al., 2013). Speyeria aglaja is a widespread and rather common species that occupies a wide range of habitats and can breed at lower temperatures than the other two fritillaries (Ellis et al., 2010; Zimmermann et al., 2009). By contrast, both F. adippe and F. niobe are habitat specialists confined to warm microhabitats; F. adippe is restricted to woodland‐dominated areas, while F. niobe is associated with dry habitats and has strongly declined in Europe in recent decades (Ellis et al., 2019; Robertson et al., 1995; Salz & Fartmann, 2009; Spitzer et al., 2009). All three species are univoltine and overwinter as first‐instar larvae or as formed larvae within the eggs, which are usually placed directly on or nearby their larval food plants (Eliasson et al., 2005). While the larvae of F. adippe and F. niobe only feed on Viola species, S. aglaja larvae have also been found foraging on Bistorta major and Persicaria spp., depending on their geographical location (Forster & Wohlfahrt, 1955; Fric et al., 2005; Higgins & Riley, 1970; Tolman & Lewington, 2012). According to their dispersal ability and to rankings based on “mobility indices” as assessed by butterfly experts, all three species are considered as intermediately mobile (Komonen et al., 2004; Öckinger et al., 2006; Pollard & Yates, 1994; Zima et al., 2013), although F. adippe has been reported to be more likely to engage in long‐distance dispersal of several kilometres than S. aglaja (Polic et al., 2020). These interspecific divergences could lead to different large‐scale genetic patterns among the species, despite them being phylogenetically close and exposed to similar environmental conditions and selection regimes.To quantify and compare population genetic structures among closely related species and gain insights into the underlying evolutionary drivers and historical changes, different molecular markers can be used. Traditionally, population genetic studies of non‐model organisms relied on relatively limited numbers of neutral nuclear DNA (nuDNA) markers, mostly microsatellites, or on mitochondrial DNA (mtDNA) sequences, such as the mitochondrial cytochrome c oxidase I (COI) gene (Guichoux et al., 2011; Hajibabaei et al., 2007; Harrison, 1989). Thanks to the rapid development of high‐throughput technologies, approaches such as double‐digest restriction‐site associated DNA sequencing (ddRAD‐sequencing) generate thousands of mainly nuclear SNP (single nucleotide polymorphism) loci randomly distributed across the genome, allowing for genotyping of even non‐model organisms (Andrews et al., 2016; Peterson et al., 2012). While nuclear loci are inherited biparentally, mtDNA is haploid, nonrecombinant and only inherited maternally (Lansman et al., 1983), resulting in a four‐fold (assuming a balanced sex ratio) smaller effective population size of the mitochondrial genome. Further, mutation rates are higher for mtDNA than nuDNA in most organisms (Allio et al., 2017; Ballard & Whitlock, 2004). These intrinsic characteristics could lead to mitonuclear discordance following incomplete lineage sorting, sex‐biased dispersal, introgression, horizontal gene transfer, selective sweeps and demographic expansion (Despres, 2019; Gompert et al., 2008; Hinojosa et al., 2019; Soucy et al., 2015; Yıldırım et al., 2018). Some of these effects can be mediated by Wolbachia bacteria as infection by them has been documented to be one of the most frequent causes of mitonuclear discordance in insects (Toews & Brelsford, 2012). These maternally inherited bacteria may induce cytoplasmic incompatibility and male‐killing, potentially resulting in selective sweeps favouring the mitochondrial genome of the Wolbachia‐infected lineage (Hurst & Jiggins, 2005; Ritter et al., 2013; Werren et al., 2008). As different markers might tell different stories about the genetic structuring of species, incorporating more than one marker type into phylogeographical studies can provide a more comprehensive picture of the history and variation of contemporary genetic patterns.Here, we studied 148 individuals belonging to three Argynnini species (i.e., S. aglaja, F. adippe and F. niobe) from various geographical locations covering large parts of Europe (and Morocco for S. aglaja, Figure 1) to analyse their genetic structure using nuclear SNP markers derived from ddRAD‐sequencing and mtDNA COI sequences. To our knowledge, this is the first genome‐wide study that addresses large‐scale comparative phylogeographical patterns in butterflies using a genome‐wide nuDNA approach in addition to mtDNA. Specifically, we aim to answer the following questions: (i) Are the large‐scale population genetic structures of the studied species similar? (ii) Are the genetic structures of the studied species associated with the same or different environmental and geographical variables, and can any differences be accounted for by species characteristics? (iii) Do the respective marker types (ddRADseq, COI) generate different large‐scale genetic patterns? In this study, we highlight the importance of studying spatial genetic variation in multiple species simultaneously to understand how even sister species may respond differently to common biotic and abiotic factors, possibly due to their unique history and ecology.
FIGURE 1
Map of sampling locations of Speyeria aglaja (turquoise), Fabriciana adippe (yellow) and Fabriciana niobe (red). The right side of the butterflies shows the dorsal wings and the left side the ventral wings
Map of sampling locations of Speyeria aglaja (turquoise), Fabriciana adippe (yellow) and Fabriciana niobe (red). The right side of the butterflies shows the dorsal wings and the left side the ventral wings
MATERIAL AND METHODS
Study area and data collection
We studied 148 individuals belonging to three Argynnini species, 43 Speyeria aglaja, 74 Fabriciana adippe and 31 F. niobe individuals, selected from butterfly collections at the Institut de Biologia Evolutiva, Universitat Pompeu Fabra in Barcelona and the Department of Biology and Environmental Science, Linnaeus University in Kalmar. Samples were chosen from various locations (one to three individuals per sampling location for 22 sampling locations in S. aglaja, one to nine individuals per sampling location for 26 sampling locations in F. adippe, and one to seven individuals per sampling location for 13 sampling locations in F. niobe), so that large parts of their European (and Moroccan for S. aglaja) distribution areas were covered (33.542–65.856 degrees latitude, and −8.227 to 28.689 degrees longitude) and that individuals in all species stemmed from a wide range of locations (Figure 1; Table S1). All Moroccan samples belonged to S. aglaja, as the other two species do not occur in the Maghreb. The butterflies were collected between 2008 and 2017. After capture, the specimens were either directly transferred to 95–99% ethanol or dried, and stored at −20°C upon arrival to the laboratory until DNA extraction. The coordinates for each sampling location were recorded directly in the field.
Environmental data
Climate data for each sample location, namely information on temperature, solar radiation, precipitation and wind speed, were attained from WorldClim – Global Climate Data (Fick & Hijmans, 2017). We used the R package raster 3.5 (Hijmans et al., 2021) to import the WorldClim data as raster layers, and the function extract to add the environmental values to each butterfly sampling point. We extracted information on altitude for each sampling location from raster data obtained from the U.S. Geological Survey Earth explorer using qgis 2.18.25 (QGIS.org, 2022). Information about the environmental and geographical data for each sample location can be found in Table S1.
Library preparation and ddRAD‐sequencing
For DNA extraction, thorax tissue from each sample was ground with stainless steel beads (Next Advance) in a bullet blender, and an E.Z.N.A. Genomic DNA Isolation Kit (OMEGA Bio‐Tek) was used for further processing. The quality of genomic DNA (gDNA) was estimated by performing agarose gel electrophoresis to identify signs of DNA degradation. DNA samples of relatively low quality were subjected to whole genome amplification with a REPLI‐g Mini Kit (Qiagen) prior to the ddRADseq library preparation to obtain sufficient gDNA quantity and quality. We prepared ddRADseq libraries at the molecular laboratory at the Ecology and Genetics Research Unit, University of Oulu, Finland, following the protocol in Lee et al. (2018) and Peterson et al. (2012) with a few exceptions. A total of 500 ng of DNA for each sample and PstI and MseI restriction enzymes (New England Biolabs) were used, a purification step with AMPure XP magnetic beads (Agencourt) was carried out in a Biomek automated liquid handler (Beckman Coulter), and the size distribution and concentration of the pools were measured with a Bioanalyzer (Agilent Technologies). The individuals used in this study were a subset of 285 Argynnini samples, which were pooled into three libraries (95 samples per pool), and sequenced on an Illumina HiSeq 2500 PE 101 v4 machine at the Institute for Molecular Medicine Finland (FIMM, Helsinki). The demultiplexed fastq data are archived in the NCBI SRA under the BioProject ID PRJNA70744.
Quality control and data trimming
We demultiplexed the raw data and performed quality control using the process_radtags unit in stacks version 2.2 (Catchen et al., 2011; Catchen et al., 2013) with default settings and the following additional specifications: ‐c (reads with an uncalled base were discarded), ‐q (reads were discarded when the raw Phred quality score dropped below 10 within a sliding window spanning 15% of the read length), and ‐r (reads were rescued in case of one nucleotide mismatch at the enzyme cut‐site or individual barcodes). The filtering process retained 90% of the raw data. We used trimmomatic 0.36 (Bolger et al., 2014) to remove adapters and to trim all the reads to 92 bp.
Building the catalogue of loci
stacks' de novo pipeline (Catchen et al., 2011; Catchen et al., 2013) was used for assembly of ddRADtag loci and SNP calling by running each step of the pipeline (ustacks, cstacks, sstacks, tsv2bam and gstacks) separately. The parameters used for further processing in the stacks pipeline were determined after parameter optimization on a randomly selected subset of individuals covering a wide geographical range (16 S. aglaja, 16 F. adippe, 12 F. niobe and 16 individuals across all three species), as described in Paris et al. (2017), and Rochette and Catchen (2017). Parameter m is the minimum number of raw reads to form a stack or putative allele within an individual, M is the number of mismatches tolerated between stacks to form a putative locus and n is the number of mismatches allowed between individual loci across samples to build a catalogue of loci across individuals. We decided to use the same parameter setup for all four data sets, that is m = 4, M = 5 and n = 6 (Figures S1 and S2). The coverage at m4 was around 80× for all data sets (Figure S1). We ran the de novo pipeline on 47 S. aglaja, 84 F. adippe, 36 F. niobe and 167 samples across all three species using these parameters and otherwise default settings, which resulted in catalogues containing 395,548, 221,181, 420,316 and 931,511 loci, respectively. The catalogues were further processed with the populations unit in stacks. For all data sets, we chose the r80 setup, where a locus had to be present in at least 80% of individuals in order to be retained. This rather conservative approach was used, as the number of retained loci was still relatively high, while we can be confident to avoid spurious associations between the presence of a locus and genetic distance. For all data sets, we chose one random SNP per locus, a minimum minor allele frequency of 0.05 and a maximum observed heterozygosity of 0.7. After populations, samples with more than 25% of missing data were excluded, which resulted in 43 S. aglaja, 74 F. adippe, 31 F. niobe and 148 individuals across all three species. Then, populations was repeated with the reduced data sets. The final data sets consisted of 8627, 7503, 9492 and 2022 SNPs for S. aglaja, F. adippe, F. niobe and all three species pooled, respectively.Since our analyses showed a strong genetic differentiation of Iberian and/or Moroccan populations in S. aglaja and F. adippe (the latter is only valid for S. aglaja, as the other species do not occur in the Maghreb, see details in the Results), potential finer structuring between populations in the rest of Europe might have been concealed. We therefore repeated populations in stacks with r80 excluding individuals from Iberia (and Morocco), and ran the downstream analyses for the reduced data sets, which consisted of 30 (S. aglaja) and 59 (F. adippe) individuals.To detect and eliminate potential mitochondrial sequences from ddRADseq data sets, the sequences obtained after populations in stacks were blasted against three mitochondrial Lepidoptera genomes: Melitaea cinxia (GenBank ID: CM002851, Ahola et al., 2014), Zerene cesonia (GenBank: CM022631, Rodriguez‐Caro et al., 2020) and Heliconius melpomene rosina (GenBank: KP100653, Meng et al., 2016). Three reads from F. adippe and one read from F. niobe sequences mapped to the mitochondrial genomes, and all these loci were removed from the data sets.
Assessing potential Wolbachia infections
The resulting sequences obtained after populations in stacks were mapped to the Wolbachia pipientis genome (GenBank: NZ_JQAM01000001) using Geneious mapper in geneious 11.1.05 (Biomatters) to evaluate the risk of contamination by the bacterial parasite Wolbachia, which could influence the genetic structure of species and thus the outcome of the population genetic analyses (Gompert et al., 2008; Werren et al., 2008). Only one sequence for F. niobe mapped to the Wolbachia genome, and it was excluded from the data set for the analyses. This sequence was also blasted against Wolbachia genomes in NCBI's nucleotide database, and mapped to the genome.
Sequencing of the mitochondrial
gene
The barcode region of mitochondrial cytochrome c oxidase 1 (COI) was amplified using the primers LepF1 and LepR1 (Cooper et al., 2007), and the PCR (polymerase chain reaction) amplification conditions as stated in Dincă et al. (2019). Sanger sequencing of the amplicons was carried out by Eurofins Genomics (Eurofins Genomics Europe Shared Services). Six samples with sequencing quality issues were not included in the COI analyses (Table 1). After aligning and trimming the COI reads in geneious, sequences with a length of 601 bp of the barcode region were used for further analyses. The sequences were submitted to the NCBI nucleotide database (Accession codes: MZ026157–MZ026168, MZ026170, MZ026171, MZ026174–MZ026200, MZ026202, MZ026242, MZ026245–MZ026261, MZ026263–MZ026290). Sequencing data for 20 of the samples were retrieved from the NCBI nucleotide database (see Table S1 for accession codes).
TABLE 1
Diversity indices for each species according to nuclear data obtained through ddRADseq and mitochondrial data obtained through COI sequencing
Number of samples (ddRADseq/mtDNA)
ddRADseq
COI
Number of SNP sites
Size (bp)
Number of haplotypes
Nucleotide diversity
Haplotype diversity
k
Segregating sites
S. aglaja
43/40
8627
601
10
0.005
0.835
3.29
16
F. adippe
74/73
7503
601
17
0.009
0.881
5.38
23
F. niobe
31/30
9492
601
7
0.005
0.823
3.03
9
Note: k: average number of nucleotide differences.
Diversity indices for each species according to nuclear data obtained through ddRADseq and mitochondrial data obtained through COI sequencingNote: k: average number of nucleotide differences.
Data analysis
Genetic structure
We examined patterns of genetic differentiation in the study organisms by using multiple approaches with different algorithms for the purpose of comparing results from differing analyses and thereby obtaining a more comprehensive picture of the genetic structure. The program faststructure 1.0 (Raj et al., 2014) was used as a Bayesian approach to determine the most likely number of genetic clusters (K) by running the analysis for each number of K (1–22, 1–26 and 1–13 for S. aglaja, F. adippe and F. niobe, respectively; based on the number of sampling locations) with 10 iterations using a simple prior and five‐fold cross‐validation. The most likely number of K was determined using the faststructure script chooseK.py based on two criteria: “model complexity that maximizes marginal likelihood” (henceforth marginal likelihood), and “model components used to explain structure in data” (henceforth model complexity). The plots were created using the faststructure script distruct.py for the chosen values of K (Raj et al., 2014). Next, fineradstructure (Malinsky et al., 2018) was used as a formal clustering analysis to evaluate shared ancestry among individuals in each species. This program uses Bayesian haplotype‐based statistics to group individuals with high levels of shared co‐ancestry. Using the R scripts fineRADstructurePlot.R and FinestructureLibrary.R that come with the software, we visualized the shared co‐ancestry between individuals in a heatmap. We also assessed the genetic structure in each species visually via principal component analysis (PCA) of the ddRADseq data, using a distance matrix. Pairwise distances between the individuals were based on the proportion of shared alleles. PCA was performed using the prcomp function in rstudio 3.6.1 (R Core Team, 2020). To statistically test for an association of genetic distance with geographical distance, we performed linear regressions for each species separately in rstudio. First, we calculated the geographical distance between all points using the R package Imap 1.32 (Wallace, 2012), converted the distance matrix as used for PCA into a Manhattan dissimilarity matrix, and then regressed the individual geographical distances against the genetic distances using the lm function.A colonization pattern towards northern regions following the LGM is expected to lead to a lower genetic diversity in northern regions compared to southern regions (Adams & Hadly, 2013; Hughes & Hughes, 2007; Pereira, 2016). We performed linear regressions for each species separately in rstudio to test for associations between individual genetic diversity (expected heterozygosity [H
E]) and latitude using the SNP data derived from ddRADseq. Expected heterozygosity H
E was calculated in arlequin 3.5 (Excoffier et al., 2005).
Phylogenetic analyses
To assess phylogenetic relationships between individuals from different sampling locations for each species separately and all three species pooled based on concatenated ddRADseq data, we used the maximum‐likelihood (ML) tree inference in iq‐tree 2.1.2 (Nguyen et al., 2015) and branch support bootstrap values were calculated with ultrafast bootstrap (UFBoot, 1000 bootstraps), as implemented in iq‐tree (Hoang et al., 2017). Prior to analyses, the best fitting mutational model was obtained through iq‐tree's modelfinder (Kalyaanamoorthy et al., 2017) based on Bayesian information criterion (BIC; see Table S2 for all models and corresponding BIC values). The chosen best fitting model was TVMe+G4 (Transversion model, AG = CT, equal base frequency, discrete Gamma model with default 4 rate categories) for S. aglaja (BIC = 223,146.3), PMB + F + G4 (Probability Matrix from Blocks, empirical base frequencies, discrete Gamma model with default 4 rate categories) for F. adippe (BIC = 1,506,379.7) and for F. niobe (BIC = 197,869.5), and for all three species together (BIC = 78,144.5), the best fitting model was TVM + F + G4 (Transversion model, AG = CT, unequal base frequency, empirical base frequencies, discrete Gamma model with default 4 rate categories). We visualized the phylogenetic trees with Interactive Tree Of Life (itol) 6 (Letunic & Bork, 2019).Genetic diversity indices for COI sequences were calculated in dnasp 6 (Rozas et al., 2017). The evolutionary relationships between the haplotypes were assessed by haplotype networks using the Median Joining Network method (Bandelt et al., 1999), as implemented in popart 1.7 (Leigh & Bryant, 2015). The phylogenetic analyses for the three fritillary species based on COI sequences were also performed in iq‐tree. We only included the unique haplotypes in the analysis, and used Boloria pales (GenBank ID: HM901782.1, Dapporto et al., 2019) as an outgroup for the COI tree. We included COI sequences from Speyeria zerene (GenBank ID: HQ561199.1) and Argynnis paphia (GenBank ID: MN145295.1) to show the relationship between the study species at a higher resolution. modelfinder, which was used to estimate the best‐fitting model prior to construction of the tree, suggested TIM2 + F + R2 (AC = AT, CG = GT, unequal base frequency, empirical base frequencies, FreeRate model with 2 categories; Soubrier et al., 2012; Yang, 1995) as the best‐fitting model based on BIC (=4040.6); see Table S2 for all models and corresponding BIC values. The consensus phylogenetic trees were visualized with itol. We calculated divergence percentages between COI haplotypes in geneious based on nucleotide substitutions.
Associations of the genetic structure with geography and environmental factors
Potential associations of the genetic structure and environmental and spatial factors—mean annual temperature, mean annual solar radiation, mean summer wind speed, mean annual precipitation, latitude, longitude and altitude—were investigated using distance‐based redundancy analysis (db‐RDA), a constrained version of a PCA (Legendre & Anderson, 1999). We chose mean annual temperature, mean annual solar radiation and mean annual precipitation, as these variables influence the vegetation periods in temperate climates (Piao et al., 2006), thereby potentially affecting the studied butterfly species' host plants growth and distribution, as well as the availability of nectar plants. This in turn may affect body size in butterflies, as longer vegetation periods might prolong the larval food stage, thereby increasing the body size. Body size itself is often used as a proxy for dispersal capacity in insects, which can have implications for gene flow (Kuussaari et al., 2014; Lenormand, 2002). Similarly, wind speed during the butterflies' flight period (i.e., summer) may influence wing architecture and size, thus impacting mobility. We included altitude because climatic variables change with variations in this factor (e.g., higher solar radiation at higher altitudes, but same latitudes). The distance matrix used for PCA (see above) was converted into a Manhattan dissimilarity matrix and used as input to a principal coordinates analysis (PCoA). We used the capscale function in the R package vegan 2.5 (Oksanen et al., 2019) to build db‐RDA models in order to assess which variables contribute the most to the PCoA ordination. When building the models, we first checked for multicollinearity between variables by calculating the variance inflation factors (VIFs), and excluded variables with a VIF > 10 from the model, as such high values are indicative of a problematic level of collinearity (James et al., 2013). The overall statistical significance of the model and the predictor variables was determined with the function anova.cca using 999 permutations. Nonsignificant variables were removed stepwise, and adjusted R
2 was calculated for each model, which was then used as the basis for model selection. The p‐values were adjusted for multiple testing by using Holm's correction (Holm, 1979). To account for spatial autocorrelation between samples, we used longitude and latitude as a condition in the model, thereby partialling out these terms from the test before analysing the other variables. As the environmental and geographical variables used in this study were heterogeneous in dimension, we standardized them via z‐scoring, that is, for each variable, the mean value was subtracted from the observed value and then divided by the standard deviation of all observed values.
RESULTS
Genetic structure
For Speyeria aglaja, the most likely number of genetic clusters determined with faststructure was either three (marginal likelihood) or four (model complexity); that is, Morocco and Southern Spain constituted clear distinct clusters, whereas the rest of the individuals seemed to present one cluster (K = 3, Figure 2), or a vague north–south structure (K = 4, Figure S3). For Fabriciana adippe, faststructure identified two distinct genetic clusters (K = 2 with both marginal likelihood and model complexity) with individuals from Iberia grouping together and the rest of the European samples forming a second cluster. Individuals from Catalonia were intermediate and about half of their analysed sequences were attributed to each cluster, constituting a transition from the Iberian Peninsula to the rest of Europe (Figure 2). For Fabriciana niobe, faststructure identified either one (model complexity) or three (marginal likelihood) clusters; that is, samples from Eastern Europe and the Southern Balkans formed one cluster, and the rest of the samples composed two clusters with a somewhat east–west structuring (K = 3, Figure 2).
FIGURE 2
(a–c) Genetic clusters according to faststructure analyses of nuclear data obtained through ddRADseq with the chosen number of K for each species (Speyeria aglaja; K = 3, Fabriciana adippe: K = 2, Fabriciana niobe: K = 3), where each bar represents one individual and each colour a genetic cluster. In (a), the green bars represent individuals from Southern Iberia. (d–f) Co‐ancestry matrices for each species according to fineradstructure analyses based on genetic similarity. Higher levels of co‐ancestry (darker colours) imply less divergence between two individuals and frame colours of individuals show the sampling region
(a–c) Genetic clusters according to faststructure analyses of nuclear data obtained through ddRADseq with the chosen number of K for each species (Speyeria aglaja; K = 3, Fabriciana adippe: K = 2, Fabriciana niobe: K = 3), where each bar represents one individual and each colour a genetic cluster. In (a), the green bars represent individuals from Southern Iberia. (d–f) Co‐ancestry matrices for each species according to fineradstructure analyses based on genetic similarity. Higher levels of co‐ancestry (darker colours) imply less divergence between two individuals and frame colours of individuals show the sampling regionThe fineradstructure results were largely in line with faststructure (Figure 2): in S. aglaja, the highest levels of shared co‐ancestry were revealed between individuals from Morocco, followed by a group of Iberian individuals and some of the Southern Balkan individuals, while in F. adippe, individuals from Iberia constituted the most distinct group. In F. niobe, most of the Iberian samples formed the furthest diverged cluster, followed by the Southern Balkan individuals. Lastly, central European F. niobe individuals showed comparably high levels of shared co‐ancestry.Prinicipal component analyses visually supported the faststructure and fineradstructure results. However, PCA suggested a finer structure than faststructure (Figure 3). In S. aglaja, Southern Iberian and Moroccan samples formed two clusters, which were well separated from each other and from the rest of the samples. When excluding these individuals from the analyses, a finer structure in the remaining samples was revealed, with Scandinavian, Eastern European and central European individuals, respectively, loosely grouping together (Figure 3). In F. adippe, PCA illustrated the main axis of variation between samples from Iberia and the rest of the locations. Individuals from Catalonia were recovered at an intermediate position from the other two. Non‐Iberian individuals visually clustered into three vague groups, namely Scandinavia, Central Europe, and Southern Italy together with Southern Balkans; some Eastern European samples showed overlaps between the clusters. These loose clusters became more obvious when individuals from Iberia were excluded from the analysis (Figure 3). The strongest differentiation in F. niobe was visualized between individuals from the Southern Balkans and the rest. Central Europe and Iberian samples also formed two separate clusters, but the divergence of the latter from the remaining individuals was not as strong as in F. adippe (Figure 3).
FIGURE 3
(a–f) Biplots showing the genetic clustering according to PCA of nuclear data obtained by ddRADseq for the three Argynnini species. The colours represent geographical regions, or species identity, respectively (f). For Speyeria aglaja (a, b) and Fabriciana adippe (c, d), PCAs were performed for the data sets including all individuals (a, c), and the data sets excluding individuals from Iberia (and Morocco for S. aglaja) (b, d). (g) Scatterplot showing the significant relationship of pairwise geographical distance and genetic distance based on the proportion of distinct alleles. (h) Scatterplot showing a decrease in heterozygosity H
E from southern towards northern sampling regions (only significant in S. aglaja)
(a–f) Biplots showing the genetic clustering according to PCA of nuclear data obtained by ddRADseq for the three Argynnini species. The colours represent geographical regions, or species identity, respectively (f). For Speyeria aglaja (a, b) and Fabriciana adippe (c, d), PCAs were performed for the data sets including all individuals (a, c), and the data sets excluding individuals from Iberia (and Morocco for S. aglaja) (b, d). (g) Scatterplot showing the significant relationship of pairwise geographical distance and genetic distance based on the proportion of distinct alleles. (h) Scatterplot showing a decrease in heterozygosity H
E from southern towards northern sampling regions (only significant in S. aglaja)There was a significant and positive association of geographical and genetic distance in S. aglaja (regression, R
2 = 0.13, F
1,901 = 131.9; effect of geographical distance, b = 1.008e−08, p < .001), F. adippe (regression, R
2 = 0.44, F
1,2699 = 2119; effect of geographical distance, b = 3.591e−08, p < .001) and F. niobe (regression, R
2 = 0.48, F
1,463 = 420.3; effect of geographical distance, b = 2.129e−08, p < .001, Figure 3). Heterozygosity decreased significantly towards the North in S. aglaja (regression, R
2 = 0.09, F
1,41 = 4.16; effect of latitude, b = −5.723e‐05, p = .05, Figure 3). The trend of a decrease in diversity towards higher latitudes was also observed in F. adippe (regression, R
2 = 0.02, F
1,72 = 1.64; effect of latitude, b = −3.542e‐05, p = .21, Figure 3) and F. niobe (regression, R
2 = 0.04, F
1,29 = 1.26; effect of latitude, b = −7.395e‐05, p = .3, Figure 3), although it was not significant.
Phylogenetic analyses
In all data sets obtained by ddRADseq, the phylogenetic (ML) trees revealed similarities in the genetic patterns with faststructure, fineradstructure and PCA. Phylogenetic analyses of nuclear data retrieved through ddRADseq could separate the three species from each other with high bootstrap support (100%), with S. aglaja forming a sister lineage to F. adippe and F. niobe. The trees of each individual species were in line with the PCA plots, forming clades associated with geographical proximity (Figure 4).
FIGURE 4
(a–d) Maximum likelihood trees according to nuclear data obtained by ddRADseq for (a) all three Argynnini species together (rooted at the base of Speyeria aglaja), (b) S. aglaja, (c) Fabriciana adippe and (d) Fabriciana niobe. Trees (b–d) were rooted at the furthest diverged geographical group in each respective species according to PCA. (e) Median Joining Haplotype network for three Argynnini species according to mtDNA COI data. The coloration is based on the sampling regions. Black nodes represent the unsampled haplotypes connecting the present haplotypes, and the hashes indicate the mutational steps between the haplotypes. Each distinct haplotype was given a unique ID
(a–d) Maximum likelihood trees according to nuclear data obtained by ddRADseq for (a) all three Argynnini species together (rooted at the base of Speyeria aglaja), (b) S. aglaja, (c) Fabriciana adippe and (d) Fabriciana niobe. Trees (b–d) were rooted at the furthest diverged geographical group in each respective species according to PCA. (e) Median Joining Haplotype network for three Argynnini species according to mtDNA COI data. The coloration is based on the sampling regions. Black nodes represent the unsampled haplotypes connecting the present haplotypes, and the hashes indicate the mutational steps between the haplotypes. Each distinct haplotype was given a unique IDCorroborating the results from the phylogenetic analyses of nuclear data obtained by ddRADseq, both the haplotype network (Figure 4) and the phylogenetic tree (Figure S4) constructed from the COI sequences showed that S. aglaja is a sister to the two Fabriciana species. The divergence between the Fabriciana species was 2.39–3.49%, whereas it was 5.16–6.82% between S. aglaja and the two Fabriciana species. Many individuals shared common haplotypes for the COI gene in all three species. The number of unique haplotypes was 10 for S. aglaja, 18 for F. adippe and seven for F. niobe (Figure 4). Structuring based on geographical regions could be observed, but these structures were not parallel among the three species. Haplotype diversity was richer in F. adippe than in the other two species (Table 1). Further, there was some incongruence between data obtained via mtDNA COI sequencing and nuclear data obtained via ddRADseq, especially in S. aglaja.According to the network analysis in S. aglaja, the COI data lacked a clear geographical structure; almost every geographical region had more than one haplotype with a divergence of 1–5 bp between them, and the distribution of the haplotypes among the regions was patchy (Figure 4). The most common haplotype (Ag5) was shared among several regions, namely Southern Balkans, Central Europe and Eastern Europe. According to both the network and the ML tree (Figure S4), one common haplotype shared by several regions (Ag8; Iberia, Eastern Europe, Central Europe and Scandinavia) gave rise to the other haplotypes. The majority of the haplotypes were closely related to each other (1–4 bp), and similar to the F. adippe network, the relationship between the haplotypes was complex with ambiguities in the evolutionary relationships among them. We found high incongruence with the nuclear data retrieved from ddRADseq insofar as the Moroccan (Ag3) and some Iberian samples (Ag2) were not distant from the rest. Instead, one Eastern European haplotype shared between some Romanian and Slovakian samples (Ag1) formed the most distant lineage (6 bp from the closest haplotypes).In F. adippe, the outcome of COI analyses was similar to that of the nuclear data from ddRADseq analyses. Iberian individuals (except one Catalan individual; haplotypes Ad1 and Ad2, Figure 4) were differentiated from the rest of the regions by 8 bp according to the COI haplotype network. The divergence between these haplotypes and the rest varied between 1.64% and 2.49%. Apart from the Iberian cluster, two main groups were observed: Scandinavian, Eastern European and some Central European samples formed one clade (Ad3–Ad10), while Southern Balkan, Southern Italian and the rest of the Central European samples formed another cluster (Ad14–Ad17). The haplotypes connecting the clusters were almost fully connected to each other with high ambiguity in the order of mutational changes. Overall, the geographical proximity determined the population structure. The ML tree supported the network by separating the Iberian samples with high bootstrap values (95%) (Figure S4). However, the resolution of the evolutionary relationship among the other haplotypes was mostly backed by only ~50% bootstrap support. This was in parallel with the haplotype network, pointing towards ambiguity in the evolutionary relationship among the present haplotypes. The relationship between the geographical regions was mostly congruent with the nuclear data (Figures 3 and 4).Despite some similarities, a clear geographical separation as nuclear data obtained from ddRADseq suggested was not observed for F. niobe haplotypes, yet haplotypes clustered based on geographical proximity (Figure 4). The haplotypes were closely related and showed a vague latitudinal gradient. Some Southern Balkan and Central European haplotypes (Peloponnesian, and the majority of Italian samples, haplotypes N1 and N2) formed a distinct cluster, which was 4–5 bp distant from the rest of the haplotypes. The second distinct cluster included the haplotypes from Eastern European, Central European and Iberian samples (N3–N7). Strong ambiguity was not observed between the F. niobe haplotypes data. For F. niobe, the ML tree supported the network, and the separation of the two clusters had relatively high bootstrap support (84%). Further separation between the haplotypes had intermediate to high support (62–99%).
Associations of the genetic structure with geographical and environmental factors
A different set of environmental variables was associated with the genetic structure in the three studied species, following db‐RDA of nuclear data obtained by ddRADseq (Table 2). In S. aglaja (whole data set), we found a significant association of latitude, mean annual temperature and mean annual precipitation with the genetic structure. The genetic structure in F. adippe (whole data set) was associated with altitude, mean annual temperature, mean annual precipitation and mean summer wind speed. F. niobe individuals clustered together according to latitude, mean annual solar radiation and mean annual precipitation.
TABLE 2
Geographical and environmental variables associated with the genetic structure in the three studied fritillary species according to db‐RDA of nuclear data obtained through ddRADseq
S. aglaja
F. adippe
F. niobe
Whole data set
Excl. Iberia and Morocco
Whole data set
Excl. Iberia
Longitude
—
—
—
—
—
Latitude
***
—
—
—
***
Altitude
—
—
***
—
—
Solar radiation
—
***
—
***
*
Temperature
*
—
*
—
—
Precipitation
**
—
*
—
*
Wind speed
—
—
**
**
—
***p < .001, **p < .01, *p < .05.
Geographical and environmental variables associated with the genetic structure in the three studied fritillary species according to db‐RDA of nuclear data obtained through ddRADseq***p < .001, **p < .01, *p < .05.In S. aglaja, the final selected model for the whole data set (all S. aglaja individuals) included mean annual temperature, mean annual precipitation, mean summer wind speed, latitude and longitude as explanatory variables (adjusted R
2 = 0.04). Latitude (F
1,37 = 1.86, p < .001), mean annual temperature (F
1,37 = 1.31, p = .02) and mean annual precipitation (F
1,37 = 1.38, p < .01) were significantly associated with the genetic structure in S. aglaja, when analysing all individuals together. Examination of the biplot reveals that the separation of the Moroccan samples from the rest of the individuals is mainly linked to temperature and precipitation (Figure 5). When we excluded individuals from Iberia and Morocco, the selected model included the variables mean annual solar radiation, mean annual temperature, mean annual precipitation, mean summer wind speed and longitude (adjusted R
2 = 0.03). There was a significant association of mean annual solar radiation (F
1,24 = 1.55, p < .001) with the genetic structure. The Scandinavian samples seem to be separated from most of the other samples following the direction associated with solar radiation and temperature (Figure 5, Table 2).
FIGURE 5
Biplots according to db‐RDA of nuclear data obtained through ddRADseq showing the associations of environmental and spatial factors with the genetic structuring in each studied fritillary species. In Speyeria aglaja and Fabriciana adippe, we performed these analyses once on the data sets including all individuals (a, c), and once on the data sets excluding individuals from Iberia (and Morocco for S. aglaja) (b, d). Colours are according to geographical region and arrows show the direction of association of environmental and geographical variables with the genetic structure
Biplots according to db‐RDA of nuclear data obtained through ddRADseq showing the associations of environmental and spatial factors with the genetic structuring in each studied fritillary species. In Speyeria aglaja and Fabriciana adippe, we performed these analyses once on the data sets including all individuals (a, c), and once on the data sets excluding individuals from Iberia (and Morocco for S. aglaja) (b, d). Colours are according to geographical region and arrows show the direction of association of environmental and geographical variables with the genetic structureThe final selected model for F. adippe when using the whole data set (all F. adippe individuals) included mean annual temperature, mean annual precipitation, mean summer wind speed, longitude and altitude as explanatory variables (adjusted R
2 = 0.02). Altitude (F
1,68 = 1.65, p < .001), mean summer wind speed (F
1,68 = 1.4, p < .01), mean annual precipitation (F
1,68 = 1.29, p = .01) and mean annual temperature (F
1,68 = 1.19, p = .03) were significantly associated with the genetic structure in F. adippe. The biplot showed that the direction of separation associated with altitude and temperature was linked to the differentiation between Iberian and Southern Balkan samples from Scandinavian samples in F. adippe. Precipitation and wind speed were associated with the separation of butterflies from central Europe from Iberia and some Scandinavian samples (Figure 5, Table 2). When we excluded the individuals from Iberia, the final selected model included mean annual solar radiation, mean annual temperature, mean summer wind speed, mean annual precipitation and longitude (adjusted R
2 = 0.01). The genetic structure in F. adippe when excluding Iberian individuals was significantly associated with mean annual solar radiation (F
1,53 = 1.49, p < .001), and mean summer wind speed (F
1,53 = 1.23, p < .01). The separation of genetic clusters was associated with similar environmental and geographical factors as in the whole data set (Figure 5, Table 2).In F. niobe, the final selected model included mean annual solar radiation, mean annual precipitation, mean summer wind speed, latitude, longitude and altitude as predictor variables (adjusted R
2 = 0.08). The genetic structure in F. niobe was significantly associated with latitude (F
1,24 = 2.84, p < .001), mean annual solar radiation (F
1,24 = 1.27, p = .03) and mean annual precipitation (F
1,24 = 1.34, p = .02). According to the biplot, the separation of the Iberian, Southern Balkan and Southern Italian samples from the rest was associated with mean annual solar radiation and latitude (Figure 5, Table 2).
DISCUSSION
In this study, we compared phylogeographical patterns of three Argynnini species throughout Europe (and Morocco for Speyeria aglaja) to shed light on their postglacial colonization histories, and explored similarities and differences in genetic variation and structure among closely related sympatric species. Our data suggest that the contemporary structure is mainly linked to geographical proximity, but also reflects the species' unique refugial and postglacial histories. Apparent differences in the species' present genetic structure might be at least partly explained by species‐specific ecological requirements and differences in their dispersal abilities, showing that even sister species with partially overlapping distribution ranges may respond differently to common ecological factors. Note that the sampling localities only partially overlap between the three studied species, which is why comparisons have to be made with care. However, since the results generated by the different analytical approaches used were largely in line with each other, we are confident that our conclusions are valid.
Distinct genetic clusters suggest differences in the biogeographical history of the study species
Studying and comparing the phylogeography of three fritillary species using nuclear data obtained through ddRADseq revealed three main genetic clusters in all studied species in Europe, namely Iberia, Southern Balkans (and Italy) and the rest of Europe. In S. aglaja, samples from Morocco formed a separate cluster (the other two species do not occur in the Maghreb).According to nuclear data attained via ddRADseq, Iberian samples were clearly separated from the rest, forming the most distinct cluster for both S. aglaja and Fabriciana adippe. An Iberian cluster was also observed for Fabriciana niobe, although it did not constitute the furthest diverged group. We propose that this points towards an important glacial refugium in Iberia for these species, as Iberian populations were probably separated from the rest of Europe at a relative early stage as a result of population retraction to warmer areas during the ice ages. This peninsula has previously been identified as an important refugium during the LGM for many plant and animal species (e.g., Cheddadi et al., 2006; Ferrero et al., 2011).Unlike in F. adippe, the northern Iberian samples were distinct from the southern Iberian ones for S. aglaja, possibly because the Baetic and Sierra Morena Mountain Ranges might limit dispersal for butterflies across these landscape elements. Such topographical features can shape major axes of genetic differentiation, which is also evident in multiple other terrestrial taxonomic groups, such as grasshoppers (Gonzalez‐Serna et al., 2019), hummingbirds (Benham & Witt, 2016) and mammals (Castillo et al., 2014). In F. adippe, the whole of Iberia formed the most distinct cluster separated from the rest of Europe, with the Pyrenees presumably acting as a geographical barrier for eastward movement that might have reduced gene flow after the LGM. Catalan F. adippe individuals seemed to constitute a mixture of the two main clusters, which may represent a very balanced outcome of a recent contact between the two lineages. The very high divergence of the Iberian lineage coincides with the distribution area of the subspecies F. adippe subsp. chlorodippe, whose status needs to be revised according to some authors (Tolman & Lewington, 2012; Tshikolovets, 2011). The highest divergence recovered between the Iberian haplotypes (obtained via mitochondrial COI sequencing) and the rest was within the range of the differentiation between F. adippe and F. niobe haplotypes, which may suggest the Iberian clade of F. adippe as a sister species. However, since all Catalan individuals in this study were genetically exactly intermediate according to nuclear data obtained via ddRADseq, occasional F1 hybridization can be dismissed, and thus Catalonia could represent a recent contact zone between the two main European lineages. Further testing of this scenario with a more balanced and larger samples size would be worthwhile. In light of our present results, F. adippe can be considered as a single species with high probability.The Southern Balkans emerged as another important refugium for all studied species, with Southern Balkan samples forming the most distinct cluster in F. niobe. During the LGM, the Balkans and Asia Minor were probably connected by a land bridge, rendering the Balkans an important European refugial region for species confined to Asia Minor during the last glaciation (Hewitt, 1999; Perissoratis & Conispoliatis, 2003). After removing Iberian and Moroccan samples in S. aglaja, a clear separation of Southern Balkan samples was also evident in this species, suggesting this area as another glacial refugium. However, the lack of geographical elements that hinder gene flow between the Balkans and central Europe has presumably allowed some admixture between these regions during interglacial periods (Hewitt, 2000). The stronger divergence of Balkan samples relative to the Iberian ones in F. niobe suggests a longer isolation of the Balkan populations in this species.According to nuclear data obtained via ddRADseq, there was a gradual differentiation among the samples in S. aglaja and F. adippe that could be explained by geographical distance. In both species, samples from Scandinavia formed one clade along with northern European samples, following a latitudinal gradient. The evolutionary relationships between the mitochondrial COI haplotypes along with the analyses of nuclear ddRADseq data for F. adippe suggest a northwards population expansion from the glacial refugia in Iberia and Southern Balkans following the LGM, which was supported by a decreased genetic diversity in the northern regions.Overall, the detected large‐scale genetic structuring in the studied species is in line with the main European biogeographical paradigm, proposing that most temperate species were restricted to isolated refugia in Southern Europe (Iberia, Italy, Balkans and Mediterranean islands) during the Pleistocene, leading to genetically diverged lineages along these regions in many European species (Dapporto et al., 2019; Hewitt, 2000). For example, a broad‐scaled genetic structuring with three major groups for Iberia, Italy and the Balkans was also found in the butterfly species Polyommatus coridon and Melanargia galathea (Habel et al., 2005; Schmitt & Seitz, 2001). Further, a European‐wide study on Maculinea alcon butterflies revealed a genetic pattern mainly explained by geographical proximity, which corresponded with the above‐mentioned three refugial regions (Koubínová et al., 2017).
The unique ecology of species might explain differences in the genetic structure
Besides geographical space and geological time, one factor that could contribute to the structural differences between the species is variation in dispersal capacity. In S. aglaja, northern and southern Iberian samples differed substantially from each other, possibly enforced by the Baetic and Sierra Morena Mountain Ranges, while such a structuring was not observed in F. adippe. As F. adippe seems to more likely perform long‐distance dispersal than S. aglaja (Polic et al., 2020), there might have been more exchange between northern and southern Iberian populations in F. adippe than in S. aglaja, resulting in the slight differences in the contemporary genetic structure in these species. Differences in dispersal capacity have been shown to affect spatial genetic patterns in other taxa as well, with an increased genetic structure in organisms of limited mobility as opposed to highly mobile species, such as in grasshoppers (Ortego et al., 2021) and songbirds (Cros et al., 2020).Another factor potentially contributing to differences in genetic architecture between sister species is ecological niche breadth. In Iberia, S. aglaja is found at higher altitudes, whereas F. adippe also occupies habitats at lower elevations (Vila et al., 2018). This relatively higher degree of generalism in Iberian F. adippe populations might have led to a less structured genetic pattern on the peninsula.On a large scale, the stronger geographical structuring in the mitochondrial COI data as observed in the two specialist species F. adippe and F. niobe might be due to the less widespread warm microhabitats to which these species are confined. By contrast, the detected pattern of the relative generalist S. aglaja hints at higher levels of gene flow between local populations, probably because of a higher temperature tolerance, a less patchy distribution of suitable habitats and a slightly wider larval food plant range. It should be mentioned that this generalist–specialist differentiation is based on large‐scale distribution patterns and habitat preferences of these species. On a local scale, this distinction may vary; for example, on the Iberian Peninsula, S. aglaja is considered a specialist species relative to the other two, as mentioned above (Vila et al., 2018). Similar broad‐scale phylogeographical patterns to our findings have been observed in a comparative genetic study of two closely related, but cold‐adapted fritillary species, namely the specialist Boloria eunomia, which showed a more distinct geographical structuring than the generalist Boloria selene (Marešová et al., 2019). These findings show that ecological specialization may affect population expansion, thereby potentially leading to more pronounced regional population differentiation in specialist than in generalist species, which has been shown for many organisms (Cros et al., 2020; Engelhardt et al., 2011; Matern et al., 2009). Our data also suggest an association of ecological niche breadth and range expansion in F. niobe. Although the Balkans are not separated from central Europe by mountain ranges or other landscape elements (as is Iberia), the furthest diverged genetic cluster in F. niobe consisted of individuals from this region. One explanation could be that the ecological restrictions of the highly specialized F. niobe, such as the need for warm microhabitats (Salz & Fartmann, 2009; Spitzer et al., 2009), might have constrained its northward movement at a similar speed as in F. adippe and S. aglaja. Our findings thereby suggest that species‐specific traits, such as dispersal ability and the degree of ecological generalism, might influence genetic population divergences over time.
Mitonuclear discordance
While nuclear data obtained via ddRADseq and mitochondrial COI data were mostly congruent in F. adippe and F. niobe, there was discrepancy between the two data sets in S. aglaja: The Moroccan and the Southern Iberian samples from the Sierra Nevada formed distinct and well‐separated lineages based on ddRADseq data, but they were poorly differentiated from the rest based on COI. Common haplotypes were shared among several regions without forming clear geographical lineages. Although our sampling does not allow for elucidating the species' biogeography in complete detail, we suggest that gene flow between the isolated lineages during interglacial cycles or selective sweeps affecting the mtDNA could have resulted in decreased haplotype diversity and thus might have erased the signs of early differentiation among the populations in the COI gene. Because mtDNA is more vulnerable to selective sweeps, such phenomena can be observed more strongly when sequencing the mtDNA. In contrast to nuclear data attained via ddRADseq, the most distinct lineage formed by Eastern European samples (Ag1, 6 bp from the rest) suggests that some areas in the Carpathians might have constituted an extra‐Mediterranean microrefugium (Schmitt & Varga, 2012) for S. aglaja, as observed for some other Lepidoptera species, such as Polyommatus coridon (Kühne et al., 2017), Erebia euryale (Paučulová et al., 2017) and Coenonympha arcania (Cassel‐Lundhagen et al., 2020). An early split might have happened between this region and the rest of Europe (Schmitt & Varga, 2012), with the Carpathians staying separated for a relatively long time and thus being less affected by gene flow or selective sweeps.Incongruence between nuclear data as obtained through ddRADseq and mtDNA data as obtained through COI sequencing (mitonuclear discordance) is not uncommon (e.g., Campbell et al., 2020; Hinojosa et al., 2019; Ivanov et al., 2018), and can have several reasons, one of which is sex‐biased dispersal (e.g., Nietlisbach et al., 2012). In a study on demographic patterns in S. aglaja in the Czech Republic, Zimmermann et al. (2009) report on higher dispersal distances in females and a lifespan twice as long as in males, which would support the hypothesis of sex‐biased dispersal as the reason for the observed mitonuclear discordance. Most other explanations for mitonuclear discordance such as a faster mutation rate in mtDNA, the nonrecombinant nature of mtDNA and introgression events often lead to a higher structure in mtDNA as opposed to nuDNA (Toews & Brelsford, 2012), which is not the case in our study. In fact, the pattern of nuDNA being far more structured than mtDNA across a large range is rather unusual, and frequently the opposite applies: potential cryptic species highlighted by COI that are not when studied with nuDNA (e.g., Hinojosa et al., 2019; Thielsch et al., 2017). Besides sex‐biased dispersal, Wolbachia‐mediated genetic sweeps could explain homogeneity of mtDNA, but Wolbachia infection was not detected in the S. aglaja data set. Although we cannot rule out the other explanations for mitonuclear discordance with certainty, we argue that sex‐biased dispersal, with females dispersing further than males, is one of the likely reasons for the detected pattern. Note that the furthest diverged groups according to genome‐wide SNP data obtained through ddRADseq (in discordance with the COI data) are two isolated sampling locations at the extreme of the distribution range: Sierra Nevada in Southern Iberia and Morocco. Thus, low population numbers and stochasticity have to be considered as important factors. The detected mitonuclear discrepancy suggests that some females dispersed from the European lineage to these two isolated enclaves. However, the impact of these immigrants on the nuclear genome seems to have been minimal, suggesting that the dispersal was done by extremely few individuals. This might have happened during the LGM, considering that S. aglaja inhabited lower altitudes then (as higher mountains were covered by ice) in these regions, and that COI haplotypes between Southern Iberia and Morocco are slightly differentiated (Figure 4). It is worth mentioning that the probably higher gene flow among the S. aglaja populations due to its relative generalist ecology and tolerance of lower temperatures might also have led to the mitonuclear discordance as shown in another Lepidoptera species (Hinojosa et al., 2019). More rigorous sampling would allow to formally test various hypotheses regarding mitonuclear discordance in S. aglaja.Overall, it seems that geographical distance is the main driver of genetic divergence between populations in the studied species and that differences in the prevailing environmental factors may further refine the differentiation. Our analyses showed that the genetic structure is associated with a unique combination of environmental variables in each of the studied Argynnini species, although some factors were seemingly important for more than one species. Solar radiation and precipitation were associated with the genetic structure in all three species, and temperature was linked with the genetic structure in S. aglaja and F. adippe, while wind speed and altitude were only linked with the genetic patterns in F. adippe. For ectothermic organisms such as butterflies, solar radiation, temperature, precipitation and wind speed are important factors connected to thermoregulation, and these parameters might therefore contribute to the current genetic structure in butterfly species. However, this interpretation is potentially complicated by the fact that the extent and location of the sampling regions of the three studied species only partially overlap. Further, while these environmental variables could contribute to the contemporary structure by exerting selective pressures or creating heterogeneous habitats which results in resistance to migration (Manel et al., 2003), the interaction of processes shaping the genetic structure of species is complex. Besides geographical features, species' intrinsic characteristics, such as ecological and life history traits, play an important role in determining phylogeographical patterns (Paz et al., 2015). The genetic structure we see today probably results from a combination of several interacting phenomena: events in the phylogeographical history of the species, stochastic processes, adaptation to prevailing environmental factors and gene flow (Cassel‐Lundhagen et al., 2020; Joshi, 1997). Because environmental conditions also change over time, a joint consideration of landscape genetics and phylogeography could provide a more comprehensive explanation of the factors shaping the current distribution of genetic variation.
CONCLUSIONS AND FUTURE DIRECTIONS
We utilized both genome‐wide and mitochondrial sequence data in a comparative phylogeographical analysis framework to investigate how the large‐scale genetic structure of three closely related fritillary butterfly species is associated with environmental and geographical factors. Taken together, the results suggest that, following isolation in several refugia during the LGM and expansion after the Pleistocene, the diverging ecology of the species together with geographical distances and barriers to dispersal has resulted in differentiation among regions, with responses to environmental factors being partly unique to each species. The mitonuclear discordance that was observed in Speyeria aglaja may be the result of sex‐biased dispersal, selective sweeps or admixture events between lineages after geographical isolation during the Pleistocene. However, because of low sample sizes at each location it remains uncertain what caused the discrepancy, something that may be investigated in the future using a strategy with larger sample sizes.Our phylogeographical comparisons suggest that ecological specialization and dispersal capacity might have modulated the responses to geographical and environmental factors, and thereby contributed to the variation in genetic structure between species. Future studies of small‐scale structure and biogeographical histories in distinct regions across the species' distribution ranges are required to further tease apart the contributions of different species characteristics and identify the underlying mechanisms responsible for the species‐specific evolutionary responses to key drivers of population genetic structure indicated by the present results. As more studies become available, systematic evaluations and meta‐analysis approaches may be used to assess the generality of findings and determine whether and how genetic structure is modified by species characteristics, geography and environmental conditions. This would deepen our understanding of how contemporary distribution ranges are shaped, improve the ability to predict a species' future development in response to environmental change, and hence inform conservation management in the respective countries of occurrence about appropriate measures.
AUTHOR CONTRIBUTIONS
D.P., Y.Y., A.F., M.F. and R.V. designed the study. Funding was secured by A.F., M.F., D.P. and Y.Y. D.P., Y.Y. and K.M.L. performed the laboratory work. D.P., M.F. and R.V. contributed to field sampling. M.M. provided laboratory infrastructure to prepare the ddRADseq libraries. D.P. and Y.Y. analysed the data, and wrote the first draft of the manuscript. All authors contributed to the final version and approved the submitted manuscript.
CONFLICT OF INTEREST
The authors have no conflicts of interest to disclose.Appendix S1Click here for additional data file.
Authors: Joan C Hinojosa; Darina Koubínová; Mark A Szenteczki; Camille Pitteloud; Vlad Dincă; Nadir Alvarez; Roger Vila Journal: Mol Ecol Date: 2019-07-18 Impact factor: 6.185
Authors: Jason K Cooper; Greg Sykes; Steve King; Karin Cottrill; Natalia V Ivanova; Robert Hanner; Pranvera Ikonomi Journal: In Vitro Cell Dev Biol Anim Date: 2007-10-13 Impact factor: 2.416
Authors: Julio Rozas; Albert Ferrer-Mata; Juan Carlos Sánchez-DelBarrio; Sara Guirao-Rico; Pablo Librado; Sebastián E Ramos-Onsins; Alejandro Sánchez-Gracia Journal: Mol Biol Evol Date: 2017-12-01 Impact factor: 16.240
Authors: Emilie Cros; Balaji Chattopadhyay; Kritika M Garg; Nathaniel S R Ng; Suzanne Tomassi; Suzan Benedick; David P Edwards; Frank E Rheindt Journal: Mol Ecol Date: 2020-06-24 Impact factor: 6.185
Authors: Daniela Polic; Yeşerin Yıldırım; Kyung Min Lee; Markus Franzén; Marko Mutanen; Roger Vila; Anders Forsman Journal: Mol Ecol Date: 2022-07-15 Impact factor: 6.622
Authors: Daniela Polic; Yeşerin Yıldırım; Kyung Min Lee; Markus Franzén; Marko Mutanen; Roger Vila; Anders Forsman Journal: Mol Ecol Date: 2022-07-15 Impact factor: 6.622