Jérémy Gauthier1, Mila Pajkovic2, Samuel Neuenschwander3, Lauri Kaila4, Sarah Schmid2,5, Ludovic Orlando6,7, Nadir Alvarez1,2. 1. Geneva Natural History Museum, Geneva, Switzerland. 2. Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland. 3. Vital-IT, Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland. 4. Zoology Unit, Finnish Museum of Natural History, University of Helsinki, Finland. 5. Department of Computational Biology, University of Lausanne, Lausanne, Switzerland. 6. Laboratoire AMIS CNRS UMR 5288, Faculté de Médecine de Purpan, Toulouse, France. 7. Globe Institut, Lundbeck Foundation GeoGenetics Centre, University of Copenhagen, Copenhagen, Denmark.
Abstract
Erosion of biodiversity generated by anthropogenic activities has been studied for decades and in many areas at the species level, using taxa monitoring. In contrast, genetic erosion within species has rarely been tracked, and is often studied by inferring past population dynamics from contemporaneous estimators. An alternative to such inferences is the direct examination of past genes, by analysing museum collection specimens. While providing direct access to genetic variation over time, historical DNA is usually not optimally preserved, and it is necessary to apply genotyping methods based on hybridization-capture to unravel past genetic variation. In this study, we apply such a method (i.e., HyRAD), to large time series of two butterfly species in Finland, and present a new bioinformatic pipeline, namely PopHyRAD, that standardizes and optimizes the analysis of HyRAD data at the within-species level. In the localities for which the data retrieved have sufficient power to accurately examine genetic dynamics through time, we show that genetic erosion has increased across the last 100 years, as revealed by signatures of allele extinctions and heterozygosity decreases, despite local variations. In one of the two butterflies (Erebia embla), isolation by distance also increased through time, revealing the effect of greater habitat fragmentation over time.
Erosion of biodiversity generated by anthropogenic activities has been studied for decades and in many areas at the species level, using taxa monitoring. In contrast, genetic erosion within species has rarely been tracked, and is often studied by inferring past population dynamics from contemporaneous estimators. An alternative to such inferences is the direct examination of past genes, by analysing museum collection specimens. While providing direct access to genetic variation over time, historical DNA is usually not optimally preserved, and it is necessary to apply genotyping methods based on hybridization-capture to unravel past genetic variation. In this study, we apply such a method (i.e., HyRAD), to large time series of two butterfly species in Finland, and present a new bioinformatic pipeline, namely PopHyRAD, that standardizes and optimizes the analysis of HyRAD data at the within-species level. In the localities for which the data retrieved have sufficient power to accurately examine genetic dynamics through time, we show that genetic erosion has increased across the last 100 years, as revealed by signatures of allele extinctions and heterozygosity decreases, despite local variations. In one of the two butterflies (Erebia embla), isolation by distance also increased through time, revealing the effect of greater habitat fragmentation over time.
An increasing number of studies have revealed that a dramatic collapse of biodiversity has been ongoing since the early 20th century, and that decay rates have accelerated over the past 50 years (de Oliveira Roque et al., 2018; Johnson et al., 2017). There is no doubt that this global ongoing crisis of biodiversity loss is due to anthropogenic factors that have transformed species' habitats, and caused a decrease in population sizes, with potential long‐term consequences for local or global extinction. Many studies have highlighted the consequences of this extinction in terms of erosion of species diversity for decades (Ceballos, Ehrlich, & Dirzo, 2017; Ehrlich, 1995), providing the basis for international reports assessing the state of biodiversity and the ecosystem services it provides to society (e.g., the Intergovernmental Platform on Biodiversity and Ecosystem Services, IPBES; https://bit.ly/IPBESReport). However, it is only recently that observations and empirical approaches have shown that biomass and, by extension, population sizes have also declined significantly over the last 50 years, especially for insects, resulting in multitrophic cascades affecting the biomass of many insectivorous species (Karp et al., 2013). In particular, Hallmann et al. (2017) have measured insect biomass during 27 years at 63 German localities, and found a greater than three‐quarters decline in this short time span, which represents an average 3% drop per year. Similarly, Lister and Garcia (2018) reported a decrease of more than 90% of the terrestrial arthropod biomass and 80% of that of the canopy over the last 36 years in Puerto Rico, which has in turn strongly reduced the abundance of predator populations of lizards, frogs and birds. Anthropogenic factors are strong candidates for this recognized “insectageddon” including: pesticide and fertilizer use in agriculture, land‐use change (habitat destruction), climate change, chemical contamination (Whiteside & Marvin Herndon, 2018), light pollution, invasion by exogenous species (Sánchez‐Bayo & Wyckhuys, 2019; van Strien, van Swaay, van Strien‐van Liempt, Poot, & WallisDeVries, 2019) and wireless communication systems (Thielens et al., 2018).These anthropic factors will impact the populations at different intensities, ranging from reduction of their size to local extinction. The reduction in population sizes will have several consequences. A reduction in effective population size (N
e) will cause a loss of genetic diversity (i.e., the number of alleles in the populations), thus decreasing both neutral variation and adaptive potential. In addition, shrinking populations will experience increased allele extinction due to inbreeding that causes allele fixation by genetic drift. Also, inbred populations will tend to accumulate deleterious alleles and such a mechanism, referred to as inbreeding depression, will further decrease the average fitness of a population (Keller & Waller, 2002; Kristensen, Pedersen, Vermeulen, & Loeschcke, 2010). Furthermore, the extinction dynamics will weaken the connectivity network between remaining populations and thus reduce overall gene flow. This will increase their divergence and prevent their capacity to exchange potential adaptive alleles. The combination of these mechanisms will further decrease the population persistence likelihood over time (Bouzat, 2010).However, empirical data providing time series of abundance data and genetic diversity remain scarce, which limits our ability to precisely infer the recent demographic trajectory of most populations. A variety of indirect proxies can help to overcome the general lack of long‐term abundance data across taxa to estimate demographics. For example, the genetic information present in museum specimens collected across the last few decades or even centuries provides a unique opportunity to obtain temporal snapshots of past population genetic diversity and quantify the extent and dynamics of the current biodiversity crisis (Jensen et al., 2018; Meineke, Davies, Daru, & Davis, 2018; Ryan et al., 2018). It is estimated that the number of museum specimens collected around the world exceeds 1 billion individuals, and covers ~1.2 million species (Pyke & Ehrlich, 2010). The molecular diversity present in these specimens can help assess population sizes through time but signatures can also help track adaptation to changing environments (Hoffmann, Sgrò, & Kristensen, 2017). These approaches can thus both help define conservation priorities and estimate the future resilience to ongoing environmental change.However, analysing DNA contained in museum collections (hereafter referred to as historical DNA) is not devoid of difficulties owing to post‐mortem decay reactions fragmenting the DNA backbone and modifying the chemical nature of individual nucleotidic bases (Dabney, Meyer, & Pääbo, 2013). The extensive fragmentation of historical DNA molecules precludes widely used conventional genetic analyses, such as high‐density array genotyping (Decker et al., 2009) and Restriction Site‐Associated DNA Sequencing (RADseq) (Linck, Hanna, Sellas, & Dumbacher, 2017). Over the past decade, a number of alternative molecular methods have been developed to gather historical DNA data at population and genome‐wide scales (reviewed in Burrell, Disotell, & Bergey, 2015; Horn, 2012; Orlando, Gilbert, & Willerslev, 2015). These most often rely on the construction of next‐generation sequencing DNA libraries, and the capture of those DNA library templates annealing to short synthetic nucleic acid baits spread across predefined loci of interest. This rationale was applied in 2016 to target Genotyping‐By‐Sequencing (GBS) or RADseq loci, either through bench‐top synthesized probes (i.e., HyRAD; Suchan et al., 2016), or commercially produced synthetic GBS or RADseq oligonucleotides (Ali et al., 2016; Boucher, Casazza, Szövényi, & Conti, 2016; Hoffberg et al., 2016; Sánchez Barreiro et al., 2017). These methods generally improve the quality of the genotypic information retrieved by increasing the overall average depth‐of‐coverage and reducing the fraction of loci for which no data could be obtained (Suchan et al., 2016). The HyRAD method, which utilizes the standard double digest (dd) RAD protocol (Mastretta‐Yanes et al., 2015) to prepare a set of DNA probes from fresh samples, has proven especially versatile in its applications to ancient (Schmid et al., 2017) and historical DNA (Schmid et al., 2018; Suchan et al., 2016), and for obtaining genome‐wide data at the population scale in a cost‐effective manner.The museum collections of Finland, in particular the Finnish Museum of Natural History in Helsinki (Luonnontieteellinen keskusmuseo, Luomus), contains ~22 million specimens collected over recent centuries (Tegelberg, Haapala, Mononen, Pajari, & Saarenmaa, 2012). This provides a fantastic opportunity to obtain molecular data across wide time series, in a range of species, including butterfly taxa that have declined over the past century. This is notably the case of Erebia embla—a northern European and eastern Palearctic butterfly species inhabiting bogs—which has experienced a strong decrease in southern Finland, following the extensive drainage of its habitat (Rassi, Hyvärinen, Juslén, & Mannerkoski, 2010). The same holds true for Lycaena helle, an arctic‐alpine Palearctic butterfly species inhabiting fresh, damp meadows, which shows a patchy distribution throughout Europe and has declined throughout most of its range, especially in Finland, where it remains present at only two sites, Kuivaniemi Simo and Kuusamo (Heino, Poykko, & Itames, 1998).In this study, we aim to examine how the genetic structures and diversities of E. embla and L. helle have been impacted by recent environmental changes and human activities, relying on HyRAD data generated from 118 and 165 museum specimens from Finland, respectively, collected across the last century as well as from extant populations. To achieve this, we developed a pipeline, namely PopHyRAD, that (a) aligns each sequence read against the probe catalogue (including reference genomes when available, or probe sequences provided by users), (b) identifies and controls for deamination patterns (typical of historical DNA), (c) eliminates putative paralogues, PCR duplicates, low‐quality genotypes and indels, and (d) keeps only bi‐allelic loci for downstream analyses. We then investigated the spatial and temporal dynamics of genetic diversity indices as well as isolation by distance in the two above‐mentioned butterflies, both locally and regionally.
MATERIAL AND METHODS
Sampling
Historical samples of L. helle and E. embla were sampled among the Lepidoptera collection of the Finnish Museum of Natural History in 2014 and 2015. Fresh samples were collected following fieldwork in summer 2015, by capturing flying adults with a net, sampling and storing a single leg in 95% EtOH, before releasing individuals alive. The sampling details are given in Table 1, and illustrated in Figure 1.
TABLE 1
Sample information comprising locality name, approximate GPS positions, year of collection and number of samples. The three time slices used for the analyses have been indicated
Erebia embla
Lycaena helle
Locality
Latitude
Longitude
Year
N
Locality
Latitude
Longitude
Year
N
Before 1950
Haminalahti
62.8534641
27.5326783
1909
12
Kuusamo Liikasenvaara
65.9645637
29.1883283
1928
10
Pirkkala
61.4654497
23.6456252
1909
1
Paanajarvi
66.4555006
28.9798017
1934
1
Pirkkala
61.4654497
23.6456252
1925
2
Paanajarvi
66.4555006
28.9798017
1935
6
Pirkkala
61.4654497
23.6456252
1930
5
Ivalo
68.6588185
27.5348114
1937
13
Pernio
60.2050782
23.1235771
1932
1
Mikkeli
61.6877956
27.2726569
1938
3
Portom
62.7100207
21.6163442
1937
13
Harmoinen
61.4852477
25.1409736
1940
8
Muonio
67.9593397
23.6774037
1938
13
Kannus
63.9007773
23.9170363
1940
5
Nurmes
63.5422079
29.1410100
1941
6
Nurmes
63.5422079
29.1410100
1941
13
Pernio
60.2050782
23.1235771
1944
2
Ruovesi
61.9856303
24.0703481
1941
13
Pirkkala
61.4654497
23.6456252
1945
2
Mikkeli
61.6877956
27.2726569
1942
9
Jakobstad Pietarsaari
63.6666709
22.7000229
1947
1
Haapavesi
64.1378737
25.3658176
1946
5
Pelkosenniemi
67.1095969
27.5118116
1947
9
Pelkosenniemi
67.1095969
27.5118116
1947
12
Jakobstad Pietarsaari
63.6666709
22.7000229
1949
4
Paltamo
64.4068668
27.8335512
1949
10
Between 1950 and 2000
Jakobstad Pietarsaari
63.6666709
22.7000229
1951
2
Kuusamo Liikasenvaara
65.9645637
29.1883283
1955
1
Jakobstad Pietarsaari
63.6666709
22.7000229
1953
6
Tohmajarvi
62.2259448
30.3335512
1957
1
Kuusamo
65.9645637
29.1883283
1955
8
Tohmajarvi
62.2259448
30.3335512
1958
5
Tyrvanto
61.1546112
24.3283168
1959
1
Kuopio
62.8241424
27.5945615
1959
4
Karttula
62.8952013
26.9723784
1963
4
Kuusamo Liikasenvaara
65.9645637
29.1883283
1959
1
Ikaalinen
61.7701493
23.0633777
1965
5
Kuopio
62.8241424
27.5945615
1960
4
Ikaalinen
61.7701493
23.0633777
1969
5
Tohmajarvi
62.2259448
30.3335512
1960
4
Tuulos
61.1181656
24.8337064
1970
2
Kuusamo Liikasenvaara
65.9645637
29.1883283
1962
2
Tuulos
61.1181656
24.8337064
1973
2
Kuopio
62.8241424
27.5945615
1964
1
Tuulos
61.1181656
24.8337064
1975
1
Kuusamo Liikasenvaara
65.9645637
29.1883283
1975
3
Kuusamo
65.9645637
29.1883283
1977
4
Kuivaniemi Simo
65.6040217
25.2038392
1980
6
Mikkeli
61.6877956
27.2726569
1979
2
Kuusamo Liikasenvaara
65.9645637
29.1883283
1985
12
Kuusamo
65.9645637
29.1883283
1980
1
Kuusamo Liikasenvaara
65.9645637
29.1883283
1991
13
Kuusamo
65.9645637
29.1883283
1981
3
Mikkeli
61.6877956
27.2726569
1983
1
2015
Ivalo
68.6588185
27.5348114
2015
5
Kuivaniemi Simo
65.6040217
25.2038392
2015
9
Kuusamo
65.9645637
29.1883283
2015
9
Kuusamo Liikasenvaara
65.9645637
29.1883283
2015
23
Muonio
67.9593397
23.6774037
2015
9
Oulu
65.0118734
25.4716809
2015
12
Pelkosenniemi
67.1095969
27.5118116
2015
8
Rovaniemi
66.4976214
25.7192101
2015
11
FIGURE 1
(a, c) Pictures of the two focal species, Lycaena helle and Erebia embla. (b, d) Maps of Finland with the sampling localities. The different colours indicate whether the samples considered were historical, with a distinction before and after 1950, or modern (year 2015)
Sample information comprising locality name, approximate GPS positions, year of collection and number of samples. The three time slices used for the analyses have been indicated(a, c) Pictures of the two focal species, Lycaena helle and Erebia embla. (b, d) Maps of Finland with the sampling localities. The different colours indicate whether the samples considered were historical, with a distinction before and after 1950, or modern (year 2015)
DNA extraction and HyRAD protocol
The HyRAD protocol was carried out according to Suchan et al. (2016). Briefly, for historical and fresh samples, DNA was extracted from a leg using the QIAamp DNA Micro kit (Qiagen). The probe precursors were prepared using a ddRAD protocol applied to six fresh samples from each focal species (Mastretta‐Yanes et al., 2015; Peterson, Weber, Kay, Fisher, & Hoekstra, 2012). Total genomic DNA was digested with the restriction enzymes SbfI and MseI, DNA adapters were ligated and the resulting library was purified, size‐selected with a range of 200–250 bp and amplified by PCR for 30 cycles. An aliquot of the final library was sequenced on one lane of an Illumina MiSeq 150‐bp paired‐end at the Lausanne Genomic Technology Facility (LGTF) in order to obtain a sequence catalogue of the loci represented in the ddRAD probes, and the rest of the library was converted into probes by removing adapter sequences.Individual Illumina DNA libraries were prepared from the fresh and museum specimens based on a published protocol for degraded DNA samples (Tin, Economo, & Mikheyev, 2014). Hybridization‐capture and enrichment was performed as described in Schmid et al. (2017), using a dual‐indexing tagging (i.e., with different combinations of barcodes and indexes), allowing extensive multiplexing of samples on a single sequencing lane: in L. helle, nine and 25 indexes and barcodes were used, respectively; while 10 and 25 were used in E. embla, respectively. For each butterfly species, the final (capture‐hybridized) enriched libraries were sequenced on two lanes of a HiSeq 2500 instrument using a 100‐bp paired‐end protocol at the LGTF.
PopHyRAD bioinformatic pipeline
The sequence pairs generated from the ddRAD probe libraries were first cleaned and overlapping reads were collapsed using adapterremoval version 2 (Schubert, Lindgreen, & Orlando, 2016). Then a first data set reduction was performed following the first step of the ddocent pipeline (Puritz, Hollenbeck, & Gold, 2014), keeping only alleles covered by at least four reads and, thus, removing the vast majority of sequencing errors. We merged intra‐individual loci using cd‐hit‐est from the cd‐hit tool (Fu, Niu, Zhu, Wu, & Li, 2012) using a minimum identity threshold of 90%. This step was repeated across samples to produce a combined catalogue of all loci identified for a given species. Loci shared by at least half of the probe samples (i.e., three out of six) were kept to remove uninformative specific loci. The absence of contamination in the catalogue was evaluated using centrifuge (Kim, Song, Breitwieser, & Salzberg, 2016).Reads from each historical sample were cleaned using trimmomatic (Bolger, Lohse, & Usadel, 2014) and individually mapped on the loci catalogue generated above using bwa aln (Li & Durbin, 2009). PCR duplicates were removed using markduplicates from the picard toolkit (http://broadinstitute.github.io/picard). Nucleotide misincorporation patterns were investigated using mapdamage
2.0 (Jónsson, Ginolhac, Schubert, Johnson, & Orlando, 2013), and base quality scores were rescaled according to their probability to represent a post‐mortem DNA deamination event in order to reduce the impact of DNA decay on downstream analyses. Finally, individual genotypes were called using freebayes (Garrison & Marth, 2012), and considered for further analyses when (a) showing qualities higher than 100, (b) shared by at least 80% of the samples and (c) biallelic (Figure 2). This complete pipeline has been implemented under the name PopHyRAD, with “Pop” standing for population genetics.
FIGURE 2
Schematic PopHyRAD pipeline including the creation of the catalogue from probes and the treatment of samples
Schematic PopHyRAD pipeline including the creation of the catalogue from probes and the treatment of samplesAlthough bwa aln is the recommended tool for ancient DNA analysis (Schubert et al., 2012), we evaluated the performance of two other mapping tools, namely bwa mem (Li, 2013) and bowtie2 (Langmead & Salzberg, 2012). Similarly, we evaluated the possible impact of the genotyping method on downstream analyses using varscan (Koboldt et al., 2009).
Population genetics analyses
Allelic frequencies, observed gene diversity (H
s) and inbreeding level (F
IS) were estimated using the r package hierfstat (Goudet, 2005). The frequency of fixed alleles, which is a proxy for the frequency of allele extinction, was estimated using a custom script. For these estimations only localities with more than eight samples were kept. These estimations were performed after the samples were binned within temporal groups (or time slices). More precisely, we merged samples from each geographical location within three time slices: before 1950, between 1950 and 2000, and present samples (i.e., 2015; see details in Table 1). Additionally, we merged all samples from all locations within the same temporal bin to identify patterns at the scale of Finland.Population genetic structure was investigated using a subset of SNPs, where only one SNP per locus was considered so as to minimize linkage disequilibrium (i.e., to minimize redundant information), as generally recommended (Falush, Stephens, & Pritchard, 2003). We used Bayesian admixture analysis implemented in structure (Pritchard, Stephens, & Donnelly, 2000) to estimate admixture proportions, that is, the proportion of each individual's genome inherited from each of K hypothetical source populations. We ran analyses with K from 1 to 5 with 10 independent Markov chains each, using 1,000,000 steps and including 50,000 burn‐in steps. We visually checked the results obtained in each run to assess whether the Markov chains were convergent. The most likely number of clusters was identified using Evanno's method (Evanno, Regnaut, & Goudet, 2005) implemented in structure harvester (Earl & vonHoldt, 2012). Then each sample was associated to the corresponding cluster and the proportion of the samples present in each cluster was drawn on the Finland map using pie charts.Isolation‐by‐distance (IBD) was tested by examining the correlation between genetic differentiation, F
ST/1 − F
ST, using F
ST estimations between population pairs performed with hierfstat (Goudet, 2005), and the log‐transformed distance, as suggested by Rousset (1997). Due to the ongoing debate in using Mantel tests for IBD examination (Diniz‐Filho et al., 2013), we used a simple Spearman's correlation. To corroborate the role of geographical distance as well as time in the differentiation between localities with another approach than a simple Spearman's correlation, we performed a distance‐based redundancy analysis (dbRDA) integrating geographical variables and year of collection, in an individual‐based approach, following recommendations from Laura Benestan (unpublished data; https://github.com/laurabenestan/db‐RDA‐and‐db‐MEM). In breif, we first created spatial variables using Moran Eigenvector's Maps (MEMs) implemented in the adespatial
r package (Dray, Blanchet, Borcard, Guenard, & Jombart, 2016). Second, a principal coordinates analysis (PCoA) was performed on the Euclidean genetic distances based on genotypes of each sample. Finally a global dbRDA was applied by integrating the year of collection as an additional variable to spatial components, and an ANOVA with 1,000 permutations was performed to assess the significance of each variable within the model using the vegan r package (Oksanen, Kindt, Legendre, & O'Hara, 2016).
RESULTS
HyRAD efficiency
The HyRAD wetlab and PopHyRAD bioinformatic pipeline developed for this study were particularly efficient in retrieving DNA sequences of historical samples collected up to more than a century ago and stored in museum conditions. The analysis of ddRAD data used for designing the probes allowed us to obtain sufficient data at 3,826 loci for E. embla and 3,443 for L. helle, which allowed us to undertake population genomic analyses. In addition, the identification of deamination patterns at the extremities of the reads from the collection samples and not in the fresh samples confirmed the endogenous nature of DNA present in these historical samples (Figure S1).
Mapping and SNP calling comparison
We first aimed to test whether different read alignment procedures could impact on the amount of HyRAD data retrieved. bwa aln was able to align only a limited number of reads against the catalogue of HyRAD probes, representing a mean of 11.2% (SD 6.7) across all samples. This proportion was improved to 37.1% (SD 8.1) when using bwa mem (Table S2). bowtie2 showed intermediate efficiency with a mean value of 18.3% (SD 7.9) across all samples. These differences, however, decreased after quality filtering, with bwa aln showing 6.9% (SD 2.0) of mapped reads, bwa mem 12.1% (SD 4.8) and bowtie2 11.0% (SD 2.8).We also observed broad differences between methodologies aimed at calling genotypes. We counted the total number of positions covered by at least 80% of the samples and showing biallelic SNP polymorphisms. This number was maximal when combining bwa mem alignments and varscan genotype calling, representing a total of 10,116 SNPs from 1,289 loci in E. embla and 11,534 SNPs from 1,241 loci for L. helle. The number was minimal when bwa aln alignments and freebayes genotype calls were combined, which led to the identification of 2,742 SNPs from 869 loci for E. embla and 2,549 SNPs from 1,015 loci for L. helle.For each of these six combinations (three mappers and two SNP callers), we applied the complete pipeline and subsequently estimated genetic diversity statistics. We observed that these statistics were extremely variable according to the SNP calling tool used, especially the inbreeding coefficient (F
IS) (Figure S2). With varscan, a large proportion of the SNPs demonstrate an inbreeding coefficient below zero with a non‐normal distribution, while the normal distribution obtained from freebayes is more consistent with biological expectations. Therefore, we decided to utilize the conservative data set obtained using freebayes on the mapping generated by bwa aln, despite being associated with an overall smaller number of SNPs; and we implemented freebayes and bwa aln as the standard caller*mapper combination in the PopHyRAD pipeline.
Patterns of population declines
To investigate genetic patterns associated with population size reduction, we focused on three descriptive statistics: (a) the observed gene diversity (H
s), (b) the frequency of fixed alleles and (c) the inbreeding coefficient (F
IS). Results are depicted in Figure 3.
FIGURE 3
Observed gene diversity (H
s), frequency of fixed alleles and inbreeding coefficient (F
IS) estimated for each locality and years. For each point a number indicates the number of samples. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time points are linked by a line of the corresponding colour
Observed gene diversity (H
s), frequency of fixed alleles and inbreeding coefficient (F
IS) estimated for each locality and years. For each point a number indicates the number of samples. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time points are linked by a line of the corresponding colourFor E. embla, sufficient temporal data could only be retrieved for three localities, namely Kuusamo, Muonio and Pelkosenniemi. These indicated reduction of the observed gene diversity (H
s) and the frequency of fixed alleles through time. This overall reduction is, paradoxically, associated with a sharp increase of the inbreeding coefficient at only one locality, Kuusamo. For L. helle, only one locality, Kuusamo Liikasenvaara, was sampled at different time points. It showed a dynamic diversity trend in which the observed gene diversity (H
s) and the frequency of fixed alleles were relatively stable between the 1920s and the 1990s, but declined after the 1990s. In contrast, the inbreeding coefficient (F
IS) was found to decrease between the 1920s and the 1990s and to increase sharply since.Grouping all samples within three main time slices (i.e., before 1950, between 1950 and 2000, and in 2015) increased the sample size of each locality and reinforced the observations made above (Figure 4). Indeed, we observed a decrease of the observed allelic richness (H
s) over time in both species, concomitant with an increase in the frequency of fixed alleles.
FIGURE 4
Observed gene diversity (H
s) and frequency of fixed alleles over time (i.e., before 1950, between 1950 and 2000, and 2015), and for each locality. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time slices are linked by a line of the same colour
Observed gene diversity (H
s) and frequency of fixed alleles over time (i.e., before 1950, between 1950 and 2000, and 2015), and for each locality. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time slices are linked by a line of the same colourTo investigate whether the patterns observed at the local level were retrieved at the global level (i.e., throughout Finland), we merged all the samples within the three time slices considered above, regardless of their geographical origin. This provided an opportunity to retrace the temporal trajectory of the Finland‐wide genetic diversity present in both focal species. We found that the observed gene diversity decreased through time and that the frequency of fixed alleles increased, especially in the most recent time period (Figure 5).
FIGURE 5
Observed gene diversity (H
s) and frequency of fixed alleles estimated by time slices, before 1950, between 1950 and 2000, and 2015, over Finland as a whole after samples merging. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle)
Observed gene diversity (H
s) and frequency of fixed alleles estimated by time slices, before 1950, between 1950 and 2000, and 2015, over Finland as a whole after samples merging. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle)
Genetic structure and isolation by distance
To investigate the mechanisms that may explain the observed decrease in genetic diversity and the increase in the inbreeding coefficient (F
IS), we studied the spatial genetic structure of populations, which only revealed a faint structuring (Figure 6). In contrast, a more marked pattern of spatial structuring was retrieved in the IBD analysis, revealing a varying correlation between genetic and geographical distances over time (Figure 7). Before 1950, no correlation between genetic and geographical distances was found in any of the two species investigated. Between the 1950s and the 2000s, a significant correlation between genetic and geographical distances was found for E. embla. In this species the level of correlation and the associated slope increased further when considering the modern time period (year 2015), suggesting an increasing spatial structuring from 1950 onwards, probably in relation to increased habitat fragmentation. No significant correlation could be retrieved in L. helle in any of the two historical time periods considered (this correlative analysis could not be carried out for modern times, as only two extant populations are known) (Figure 7). Despite an overall low level of genetic structuring through space and time (overall R
2 of 0.95% and 1.52%, based on one temporal and eight spatial axes, for E. embla and L. helle, respectively), the dbRDA approach indicated a significant impact of time (p = .002) to explain population differentiation, as well as a fainter effect of 3/8 spatial variables for E. embla and 2/8 spatial variables for L. helle, although with respective contributions of spatial variables to the overall R
2 remaining <0.5% (see Table S3).
FIGURE 6
Population structure of all sampled localities according to the three studied periods. The pie charts indicate the proportion of samples associated with each cluster as identified by structure with K = 2
FIGURE 7
Isolation by distance was investigated for each time slice by computing the correlation between the genetic differentiation between populations (F
ST/1 − F
ST) and the log‐transformed geographical distance (log(distance)). Correlations were tested using Spearman's rank correlation, and coefficient of determination, intercept and slopes using a linear model are shown
Population structure of all sampled localities according to the three studied periods. The pie charts indicate the proportion of samples associated with each cluster as identified by structure with K = 2Isolation by distance was investigated for each time slice by computing the correlation between the genetic differentiation between populations (F
ST/1 − F
ST) and the log‐transformed geographical distance (log(distance)). Correlations were tested using Spearman's rank correlation, and coefficient of determination, intercept and slopes using a linear model are shown
DISCUSSION
A direct estimation of genetic variation across the past
The study of past population dynamics has received considerable attention in recent decades (Bi et al., 2019; Nadachowska‐Brzyska, Li, Smeds, Zhang, & Ellegren, 2015; Tallavaara, Luoto, Korhonen, Järvinen, & Seppä, 2015). Those studies classically identify the most likely demographic model underlying the allelic frequency spectrum measured in modern specimens (Csilléry, Blum, Gaggiotti, & François, 2010; Espíndola et al., 2012; François & Durand, 2010). These demographic inference approaches are, however, often limited as different models can produce similar allelic frequency spectrum and summary statistics, and cannot necessarily be discriminated (Lapierre, Lambert, & Achaz, 2017). In contrast, genomic data from historical specimens catch evolution red‐handed, and can help overcome such limitations by providing direct snapshots of the past genetic diversity present in a population.In this study, we collected a large sample set of two butterfly species spread across Finland, and spanning the last 110 years. This sampling provided us with a unique opportunity to quantify the variation of the genetic diversity in both species at a time when their distribution drastically declined (Rassi et al., 2010). We have benefited from the HyRAD genome–complexity–reduction method to obtain genetic data from these valuable samples. HyRAD has been increasingly used in different laboratories, not only to identify genetic variation in historical material (Crates et al., 2019; Keighley, Heinsohn, Langmore, Murphy, & Peñalba, 2019; Linck et al., 2017; Linck, Freeman, & Dumbacher, 2019; Schmid et al., 2017), but also in ancient DNA (Schmid et al., 2017). Indeed, these methods based on hybridization capture allow us to retrieve even very small quantities of degraded DNA, which often remain unquantifiable before capture (Table S1). The amount of DNA in historical samples and the ability to extract, capture and sequence it depends on the history of the sample, the conditions of collection, sample preparation (drying, pinning etc.) and storage. Unfortunately, for most of our historical samples we do not have access to such information. However, in this study we were able to perform the entire process from historical specimen subsampling to SNP calling for ~75% of the samples analysed from both species, thus suggesting that it is compliant with most preparation histories.For this study we developed a specific pipeline, PopHyRAD, to optimally exploit the genetic information contained in samples. For now, the PopHyRAD computational pipeline released here facilitates HyRAD sequence analysis at the within‐species level by automating the steps underlying read cleaning, trimming and merging, as well as read mapping, and probe clustering. This pipeline is versatile, and can be used to analyse any type of hybridization‐capture data, using either probes from ddRAD or another RADseq protocol, or any tool able to reduce genomic complexity such as selective extraction of organellar genomes or amplification of specific genes. Before this study, the catalogue definition and remaining analytical workflow using HyRAD data have essentially been empirically explored (or a posteriori chosen), considering the outputs, and thus the tools that provided the best geographical or phylogenetic structure (Schmid et al., 2018). Here, we take the opportunity to test more accurately the performance of aligners and SNP‐callers on HyRAD data, using different tool combinations, and using a realistic criterion from the point of view of population genetics, namely F
IS. The results revealed large differences on the SNPs identified and on the estimation of genetic diversity and inbreeding coefficient and suggested the bwa aln read aligner and the freebayes SNP‐caller as the most conservative combination. The nonbiologically relevant F
IS values obtained with other combinations are likely to be due, at least in part, to increased false positive alignment rates (e.g., misidentified paralogues) as well as to the oversplitting of loci (i.e., a locus separated in two loci in the catalogue). This type of difference has already been highlighted in analyses of standard RADseq data (Shafer et al., 2017) and calls for caution in downstream analyses. An analysis based on data simulation is outside the scope of this study but would probably clarify the specificity and sensitivity of the different aligners and SNP‐callers, and help each user to refine the most appropriate parameters for their analyses and model species.
Genetic diversity decline in butterfly populations across Finland
The HyRAD data gathered in this study supported an overall erosion of genetic diversity at the country‐wide level of Finland in both species (Figure 5). Although interpretation of variation in genetic diversity should be tempered due to our relatively reduced sampling size per locality and temporal binning, one should keep in mind that in the context of museomics, our sampling remains substantial. This pattern of genetic diversity reduction parallels those found by similar studies on butterflies in Northern Europe (Fountain et al., 2016; Ugelvig, Nielsen, Boomsma, & Nash, 2011) but also more broadly in other taxa (Dufresnes et al., 2018; Schmid et al., 2018). Our data also uncovered strong regional differences, with at least one locality (i.e., Kuusamo) showing a local increase in diversity at a given time point, potentially following migration linked to the persistence of their habitat in these specific localities (Habel, Meyer, & Schmitt, 2014) playing a role as refuges for individuals from other populations carrying genetic diversity (Craioveanu, Sitar, & Rákosy, 2014). However, estimations based on recent samples (i.e., collected in 2015) still show a decline in genetic diversity in this particular locality. The overall erosion of genetic diversity, both locally and country‐wide, is expected given that most Finnish populations of these two butterflies have gone extinct through the 20th century, as a result of a drastic reduction in habitat availability, with the remaining populations not being able to maintain genetic diversity to levels that once existed in an area of ~340,000 km2 a century or even a few decades ago.The second striking result of this study is the increase in IBD over time, at least in one of the two species. Indeed, in E. embla, when considering time slices that divide the time frame of collected specimens into three periods, only the two last (i.e., 1950–2000, and 2015) are associated with significant IBD, with an increasing slope as we reach contemporaneous times. The effect of time was also retrieved in the dbRDA approach, although because this analysis is individual‐based, it was less representative of genetic variation per deme through space and time, thus explaining the low R
2 retrieved in the overall model.Our main result of an overall increase in the slope of the IBD pattern is probably the consequence of an increase in habitat fragmentation, revealing a reduced number of migrants among demes, and thus an increase in the differentiation of populations, essentially due—given the short time span involved—to drift. This signature might be also found in L. helle, even if our sampling does not allow the estimation of IBD for the most recent period (i.e., only two populations are still extant today). This transition from a virtually countrywide panmictic system to a more marked structuring in space is indicative of the fact that despite acknowledged dispersal capabilities of these butterflies in Finland (Habel, Rödder, Schmitt, & Nève, 2011; Habel, Finger, Schmitt, & Nève, 2011), generally related to a colonization‐edge syndrome characteristic of populations found at the northern edge of a species' distribution (Duplouy, Wong, Corander, Lehtonen, & Hanski, 2017), the fragmentation of habitats has led to a decrease in these exchanges, and thus to local differentiation.Through their impact on biodiversity, human activities are accelerating the extinction of populations and the differentiation of those that persist. This could be catalyzing lineage divergence, except that habitat destruction is an ongoing process, potentially hampered by geopolitical, but potentially ubiquitous, decisions. Our study of two species of butterflies in Finland indicates that not all species might respond identically to this fragmentation, and that comparative studies, involving a larger number of species represented by fresh but also historical specimens, are needed to understand how life history traits influence the species' population response to anthropogenic habitat disturbance and destruction. With the application of both the wetlab HyRAD protocol to historical and fresh specimens, and the PopHyRAD bioinformatic pipeline as described in this study, access to both past and extant genetic diversity should allow a better understanding and anticipation of the neutral response of populations to drastic habitat loss.
AUTHOR CONTRIBUTIONS
N.A. and M.P. designed the study. M.P. and L.K. performed sampling. M.P. performed laboratory work. J.G. analysed the molecular data, with contributions from S.N., M.P., N.A., L.O. and S.S. All authors took part in discussions concerning the analyses and interpretations. J.G. and N.A. wrote the paper, with contributions from all authors.
DATA ACCESSIBILITY
Sequence reads are archived at Zenodo: http://doi.org/10.5281/zenodo.3668644 for E. embla and http://doi.org/10.5281/zenodo.3668660 for L. helle. Scripts for the whole analytical process have been uploaded to Github (https://github.com/JeremyLGauthier/Scripts_Gauthier_et.al_2019_MER). The PopHyRAD pipeline is constantly under development and improvement. The current version can be found at https://github.com/JeremyLGauthier/PHyRAD).Supplementary MaterialClick here for additional data file.
Authors: Ross Crates; George Olah; Marcin Adamski; Nicola Aitken; Sam Banks; Dean Ingwersen; Louis Ranjard; Laura Rayner; Dejan Stojanovic; Tomasz Suchan; Brenton von Takach Dukai; Robert Heinsohn Journal: PLoS One Date: 2019-10-24 Impact factor: 3.240
Authors: Ke Bi; Tyler Linderoth; Sonal Singhal; Dan Vanderpool; James L Patton; Rasmus Nielsen; Craig Moritz; Jeffrey M Good Journal: PLoS Genet Date: 2019-05-03 Impact factor: 5.917
Authors: Deborah M Leigh; Charles B van Rees; Katie L Millette; Martin F Breed; Chloé Schmidt; Laura D Bertola; Brian K Hand; Margaret E Hunter; Evelyn L Jensen; Francine Kershaw; Libby Liggins; Gordon Luikart; Stéphanie Manel; Joachim Mergeay; Joshua M Miller; Gernot Segelbacher; Sean Hoban; Ivan Paz-Vinas Journal: Nat Rev Genet Date: 2021-08-18 Impact factor: 53.242
Authors: Tomasz Suchan; Mariya A Kusliy; Naveed Khan; Loreleï Chauvey; Laure Tonasso-Calvière; Stéphanie Schiavinato; John Southon; Marcel Keller; Keiko Kitagawa; Johannes Krause; Alexander N Bessudnov; Alexander A Bessudnov; Alexander S Graphodatsky; Silvia Valenzuela-Lamas; Jarosław Wilczyński; Sylwia Pospuła; Krzysztof Tunia; Marek Nowak; Magdalena Moskal-delHoyo; Alexey A Tishkin; Alexander J E Pryor; Alan K Outram; Ludovic Orlando Journal: Mol Ecol Resour Date: 2021-10-15 Impact factor: 8.678
Authors: Jérémy Gauthier; Mila Pajkovic; Samuel Neuenschwander; Lauri Kaila; Sarah Schmid; Ludovic Orlando; Nadir Alvarez Journal: Mol Ecol Resour Date: 2020-05-14 Impact factor: 7.090
Authors: Evelyn L Jensen; Maud C Quinzin; Joshua M Miller; Michael A Russello; Ryan C Garrick; Danielle L Edwards; Scott Glaberman; Ylenia Chiari; Nikos Poulakakis; Washington Tapia; James P Gibbs; Adalgisa Caccone Journal: Heredity (Edinb) Date: 2022-02-25 Impact factor: 3.832
Authors: Malene Nygaard; Alexander Kopatz; James M D Speed; Michael D Martin; Tommy Prestø; Oddmund Kleven; Mika Bendiksby Journal: Ecol Evol Date: 2022-08-12 Impact factor: 3.167
Authors: Emmanuel F A Toussaint; Jérémy Gauthier; Julia Bilat; Conrad P D T Gillett; Harlan M Gough; Håkan Lundkvist; Mickael Blanc; Carlos P Muñoz-Ramírez; Nadir Alvarez Journal: Genome Biol Evol Date: 2021-07-06 Impact factor: 3.416