Literature DB >> 32304133

Museomics identifies genetic erosion in two butterfly species across the 20th century in Finland.

Jérémy Gauthier1, Mila Pajkovic2, Samuel Neuenschwander3, Lauri Kaila4, Sarah Schmid2,5, Ludovic Orlando6,7, Nadir Alvarez1,2.   

Abstract

Erosion of biodiversity generated by anthropogenic activities has been studied for decades and in many areas at the species level, using taxa monitoring. In contrast, genetic erosion within species has rarely been tracked, and is often studied by inferring past population dynamics from contemporaneous estimators. An alternative to such inferences is the direct examination of past genes, by analysing museum collection specimens. While providing direct access to genetic variation over time, historical DNA is usually not optimally preserved, and it is necessary to apply genotyping methods based on hybridization-capture to unravel past genetic variation. In this study, we apply such a method (i.e., HyRAD), to large time series of two butterfly species in Finland, and present a new bioinformatic pipeline, namely PopHyRAD, that standardizes and optimizes the analysis of HyRAD data at the within-species level. In the localities for which the data retrieved have sufficient power to accurately examine genetic dynamics through time, we show that genetic erosion has increased across the last 100 years, as revealed by signatures of allele extinctions and heterozygosity decreases, despite local variations. In one of the two butterflies (Erebia embla), isolation by distance also increased through time, revealing the effect of greater habitat fragmentation over time.
© 2020 John Wiley & Sons Ltd.

Entities:  

Keywords:  HyRAD; Lepidoptera; museomics; past gene frequencies; population dynamics

Mesh:

Year:  2020        PMID: 32304133      PMCID: PMC7540272          DOI: 10.1111/1755-0998.13167

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   7.090


INTRODUCTION

An increasing number of studies have revealed that a dramatic collapse of biodiversity has been ongoing since the early 20th century, and that decay rates have accelerated over the past 50 years (de Oliveira Roque et al., 2018; Johnson et al., 2017). There is no doubt that this global ongoing crisis of biodiversity loss is due to anthropogenic factors that have transformed species' habitats, and caused a decrease in population sizes, with potential long‐term consequences for local or global extinction. Many studies have highlighted the consequences of this extinction in terms of erosion of species diversity for decades (Ceballos, Ehrlich, & Dirzo, 2017; Ehrlich, 1995), providing the basis for international reports assessing the state of biodiversity and the ecosystem services it provides to society (e.g., the Intergovernmental Platform on Biodiversity and Ecosystem Services, IPBES; https://bit.ly/IPBESReport). However, it is only recently that observations and empirical approaches have shown that biomass and, by extension, population sizes have also declined significantly over the last 50 years, especially for insects, resulting in multitrophic cascades affecting the biomass of many insectivorous species (Karp et al., 2013). In particular, Hallmann et al. (2017) have measured insect biomass during 27 years at 63 German localities, and found a greater than three‐quarters decline in this short time span, which represents an average 3% drop per year. Similarly, Lister and Garcia (2018) reported a decrease of more than 90% of the terrestrial arthropod biomass and 80% of that of the canopy over the last 36 years in Puerto Rico, which has in turn strongly reduced the abundance of predator populations of lizards, frogs and birds. Anthropogenic factors are strong candidates for this recognized “insectageddon” including: pesticide and fertilizer use in agriculture, land‐use change (habitat destruction), climate change, chemical contamination (Whiteside & Marvin Herndon, 2018), light pollution, invasion by exogenous species (Sánchez‐Bayo & Wyckhuys, 2019; van Strien, van Swaay, van Strien‐van Liempt, Poot, & WallisDeVries, 2019) and wireless communication systems (Thielens et al., 2018). These anthropic factors will impact the populations at different intensities, ranging from reduction of their size to local extinction. The reduction in population sizes will have several consequences. A reduction in effective population size (N e) will cause a loss of genetic diversity (i.e., the number of alleles in the populations), thus decreasing both neutral variation and adaptive potential. In addition, shrinking populations will experience increased allele extinction due to inbreeding that causes allele fixation by genetic drift. Also, inbred populations will tend to accumulate deleterious alleles and such a mechanism, referred to as inbreeding depression, will further decrease the average fitness of a population (Keller & Waller, 2002; Kristensen, Pedersen, Vermeulen, & Loeschcke, 2010). Furthermore, the extinction dynamics will weaken the connectivity network between remaining populations and thus reduce overall gene flow. This will increase their divergence and prevent their capacity to exchange potential adaptive alleles. The combination of these mechanisms will further decrease the population persistence likelihood over time (Bouzat, 2010). However, empirical data providing time series of abundance data and genetic diversity remain scarce, which limits our ability to precisely infer the recent demographic trajectory of most populations. A variety of indirect proxies can help to overcome the general lack of long‐term abundance data across taxa to estimate demographics. For example, the genetic information present in museum specimens collected across the last few decades or even centuries provides a unique opportunity to obtain temporal snapshots of past population genetic diversity and quantify the extent and dynamics of the current biodiversity crisis (Jensen et al., 2018; Meineke, Davies, Daru, & Davis, 2018; Ryan et al., 2018). It is estimated that the number of museum specimens collected around the world exceeds 1 billion individuals, and covers ~1.2 million species (Pyke & Ehrlich, 2010). The molecular diversity present in these specimens can help assess population sizes through time but signatures can also help track adaptation to changing environments (Hoffmann, Sgrò, & Kristensen, 2017). These approaches can thus both help define conservation priorities and estimate the future resilience to ongoing environmental change. However, analysing DNA contained in museum collections (hereafter referred to as historical DNA) is not devoid of difficulties owing to post‐mortem decay reactions fragmenting the DNA backbone and modifying the chemical nature of individual nucleotidic bases (Dabney, Meyer, & Pääbo, 2013). The extensive fragmentation of historical DNA molecules precludes widely used conventional genetic analyses, such as high‐density array genotyping (Decker et al., 2009) and Restriction Site‐Associated DNA Sequencing (RADseq) (Linck, Hanna, Sellas, & Dumbacher, 2017). Over the past decade, a number of alternative molecular methods have been developed to gather historical DNA data at population and genome‐wide scales (reviewed in Burrell, Disotell, & Bergey, 2015; Horn, 2012; Orlando, Gilbert, & Willerslev, 2015). These most often rely on the construction of next‐generation sequencing DNA libraries, and the capture of those DNA library templates annealing to short synthetic nucleic acid baits spread across predefined loci of interest. This rationale was applied in 2016 to target Genotyping‐By‐Sequencing (GBS) or RADseq loci, either through bench‐top synthesized probes (i.e., HyRAD; Suchan et al., 2016), or commercially produced synthetic GBS or RADseq oligonucleotides (Ali et al., 2016; Boucher, Casazza, Szövényi, & Conti, 2016; Hoffberg et al., 2016; Sánchez Barreiro et al., 2017). These methods generally improve the quality of the genotypic information retrieved by increasing the overall average depth‐of‐coverage and reducing the fraction of loci for which no data could be obtained (Suchan et al., 2016). The HyRAD method, which utilizes the standard double digest (dd) RAD protocol (Mastretta‐Yanes et al., 2015) to prepare a set of DNA probes from fresh samples, has proven especially versatile in its applications to ancient (Schmid et al., 2017) and historical DNA (Schmid et al., 2018; Suchan et al., 2016), and for obtaining genome‐wide data at the population scale in a cost‐effective manner. The museum collections of Finland, in particular the Finnish Museum of Natural History in Helsinki (Luonnontieteellinen keskusmuseo, Luomus), contains ~22 million specimens collected over recent centuries (Tegelberg, Haapala, Mononen, Pajari, & Saarenmaa, 2012). This provides a fantastic opportunity to obtain molecular data across wide time series, in a range of species, including butterfly taxa that have declined over the past century. This is notably the case of Erebia embla—a northern European and eastern Palearctic butterfly species inhabiting bogs—which has experienced a strong decrease in southern Finland, following the extensive drainage of its habitat (Rassi, Hyvärinen, Juslén, & Mannerkoski, 2010). The same holds true for Lycaena helle, an arctic‐alpine Palearctic butterfly species inhabiting fresh, damp meadows, which shows a patchy distribution throughout Europe and has declined throughout most of its range, especially in Finland, where it remains present at only two sites, Kuivaniemi Simo and Kuusamo (Heino, Poykko, & Itames, 1998). In this study, we aim to examine how the genetic structures and diversities of E. embla and L. helle have been impacted by recent environmental changes and human activities, relying on HyRAD data generated from 118 and 165 museum specimens from Finland, respectively, collected across the last century as well as from extant populations. To achieve this, we developed a pipeline, namely PopHyRAD, that (a) aligns each sequence read against the probe catalogue (including reference genomes when available, or probe sequences provided by users), (b) identifies and controls for deamination patterns (typical of historical DNA), (c) eliminates putative paralogues, PCR duplicates, low‐quality genotypes and indels, and (d) keeps only bi‐allelic loci for downstream analyses. We then investigated the spatial and temporal dynamics of genetic diversity indices as well as isolation by distance in the two above‐mentioned butterflies, both locally and regionally.

MATERIAL AND METHODS

Sampling

Historical samples of L. helle and E. embla were sampled among the Lepidoptera collection of the Finnish Museum of Natural History in 2014 and 2015. Fresh samples were collected following fieldwork in summer 2015, by capturing flying adults with a net, sampling and storing a single leg in 95% EtOH, before releasing individuals alive. The sampling details are given in Table 1, and illustrated in Figure 1.
TABLE 1

Sample information comprising locality name, approximate GPS positions, year of collection and number of samples. The three time slices used for the analyses have been indicated

  Erebia embla Lycaena helle
LocalityLatitudeLongitudeYear N LocalityLatitudeLongitudeYear N
Before 1950Haminalahti62.853464127.5326783190912Kuusamo Liikasenvaara65.964563729.1883283192810
Pirkkala61.465449723.645625219091Paanajarvi66.455500628.979801719341
Pirkkala61.465449723.645625219252Paanajarvi66.455500628.979801719356
Pirkkala61.465449723.645625219305Ivalo68.658818527.5348114193713
Pernio60.205078223.123577119321Mikkeli61.687795627.272656919383
Portom62.710020721.6163442193713Harmoinen61.485247725.140973619408
Muonio67.959339723.6774037193813Kannus63.900777323.917036319405
Nurmes63.542207929.141010019416Nurmes63.542207929.1410100194113
Pernio60.205078223.123577119442Ruovesi61.985630324.0703481194113
Pirkkala61.465449723.645625219452Mikkeli61.687795627.272656919429
Jakobstad Pietarsaari63.666670922.700022919471Haapavesi64.137873725.365817619465
Pelkosenniemi67.109596927.511811619479Pelkosenniemi67.109596927.5118116194712
Jakobstad Pietarsaari63.666670922.700022919494Paltamo64.406866827.8335512194910
Between 1950 and 2000Jakobstad Pietarsaari63.666670922.700022919512Kuusamo Liikasenvaara65.964563729.188328319551
Jakobstad Pietarsaari63.666670922.700022919536Tohmajarvi62.225944830.333551219571
Kuusamo65.964563729.188328319558Tohmajarvi62.225944830.333551219585
Tyrvanto61.154611224.328316819591Kuopio62.824142427.594561519594
Karttula62.895201326.972378419634Kuusamo Liikasenvaara65.964563729.188328319591
Ikaalinen61.770149323.063377719655Kuopio62.824142427.594561519604
Ikaalinen61.770149323.063377719695Tohmajarvi62.225944830.333551219604
Tuulos61.118165624.833706419702Kuusamo Liikasenvaara65.964563729.188328319622
Tuulos61.118165624.833706419732Kuopio62.824142427.594561519641
Tuulos61.118165624.833706419751Kuusamo Liikasenvaara65.964563729.188328319753
Kuusamo65.964563729.188328319774Kuivaniemi Simo65.604021725.203839219806
Mikkeli61.687795627.272656919792Kuusamo Liikasenvaara65.964563729.1883283198512
Kuusamo65.964563729.188328319801Kuusamo Liikasenvaara65.964563729.1883283199113
Kuusamo65.964563729.188328319813     
Mikkeli61.687795627.272656919831     
2015Ivalo68.658818527.534811420155Kuivaniemi Simo65.604021725.203839220159
Kuusamo65.964563729.188328320159Kuusamo Liikasenvaara65.964563729.1883283201523
Muonio67.959339723.677403720159     
Oulu65.011873425.4716809201512     
Pelkosenniemi67.109596927.511811620158     
Rovaniemi66.497621425.7192101201511     
FIGURE 1

(a, c) Pictures of the two focal species, Lycaena helle and Erebia embla. (b, d) Maps of Finland with the sampling localities. The different colours indicate whether the samples considered were historical, with a distinction before and after 1950, or modern (year 2015)

Sample information comprising locality name, approximate GPS positions, year of collection and number of samples. The three time slices used for the analyses have been indicated (a, c) Pictures of the two focal species, Lycaena helle and Erebia embla. (b, d) Maps of Finland with the sampling localities. The different colours indicate whether the samples considered were historical, with a distinction before and after 1950, or modern (year 2015)

DNA extraction and HyRAD protocol

The HyRAD protocol was carried out according to Suchan et al. (2016). Briefly, for historical and fresh samples, DNA was extracted from a leg using the QIAamp DNA Micro kit (Qiagen). The probe precursors were prepared using a ddRAD protocol applied to six fresh samples from each focal species (Mastretta‐Yanes et al., 2015; Peterson, Weber, Kay, Fisher, & Hoekstra, 2012). Total genomic DNA was digested with the restriction enzymes SbfI and MseI, DNA adapters were ligated and the resulting library was purified, size‐selected with a range of 200–250 bp and amplified by PCR for 30 cycles. An aliquot of the final library was sequenced on one lane of an Illumina MiSeq 150‐bp paired‐end at the Lausanne Genomic Technology Facility (LGTF) in order to obtain a sequence catalogue of the loci represented in the ddRAD probes, and the rest of the library was converted into probes by removing adapter sequences. Individual Illumina DNA libraries were prepared from the fresh and museum specimens based on a published protocol for degraded DNA samples (Tin, Economo, & Mikheyev, 2014). Hybridization‐capture and enrichment was performed as described in Schmid et al. (2017), using a dual‐indexing tagging (i.e., with different combinations of barcodes and indexes), allowing extensive multiplexing of samples on a single sequencing lane: in L. helle, nine and 25 indexes and barcodes were used, respectively; while 10 and 25 were used in E. embla, respectively. For each butterfly species, the final (capture‐hybridized) enriched libraries were sequenced on two lanes of a HiSeq 2500 instrument using a 100‐bp paired‐end protocol  at the LGTF.

PopHyRAD bioinformatic pipeline

The sequence pairs generated from the ddRAD probe libraries were first cleaned and overlapping reads were collapsed using adapterremoval version 2 (Schubert, Lindgreen, & Orlando, 2016). Then a first data set reduction was performed following the first step of the ddocent pipeline (Puritz, Hollenbeck, & Gold, 2014), keeping only alleles covered by at least four reads and, thus, removing the vast majority of sequencing errors. We merged intra‐individual loci using cd‐hit‐est from the cd‐hit tool (Fu, Niu, Zhu, Wu, & Li, 2012) using a minimum identity threshold of 90%. This step was repeated across samples to produce a combined catalogue of all loci identified for a given species. Loci shared by at least half of the probe samples (i.e., three out of six) were kept to remove uninformative specific loci. The absence of contamination in the catalogue was evaluated using centrifuge (Kim, Song, Breitwieser, & Salzberg, 2016). Reads from each historical sample were cleaned using trimmomatic (Bolger, Lohse, & Usadel, 2014) and individually mapped on the loci catalogue generated above using bwa aln (Li & Durbin, 2009). PCR duplicates were removed using markduplicates from the picard toolkit (http://broadinstitute.github.io/picard). Nucleotide misincorporation patterns were investigated using mapdamage 2.0 (Jónsson, Ginolhac, Schubert, Johnson, & Orlando, 2013), and base quality scores were rescaled according to their probability to represent a post‐mortem DNA deamination event in order to reduce the impact of DNA decay on downstream analyses. Finally, individual genotypes were called using freebayes (Garrison & Marth, 2012), and considered for further analyses when (a) showing qualities higher than 100, (b) shared by at least 80% of the samples and (c) biallelic (Figure 2). This complete pipeline has been implemented under the name PopHyRAD, with “Pop” standing for population genetics.
FIGURE 2

Schematic PopHyRAD pipeline including the creation of the catalogue from probes and the treatment of samples

Schematic PopHyRAD pipeline including the creation of the catalogue from probes and the treatment of samples Although bwa aln is the recommended tool for ancient DNA analysis (Schubert et al., 2012), we evaluated the performance of two other mapping tools, namely bwa mem (Li, 2013) and bowtie2 (Langmead & Salzberg, 2012). Similarly, we evaluated the possible impact of the genotyping method on downstream analyses using varscan (Koboldt et al., 2009).

Population genetics analyses

Allelic frequencies, observed gene diversity (H s) and inbreeding level (F IS) were estimated using the r package hierfstat (Goudet, 2005). The frequency of fixed alleles, which is a proxy for the frequency of allele extinction, was estimated using a custom script. For these estimations only localities with more than eight samples were kept. These estimations were performed after the samples were binned within temporal groups (or time slices). More precisely, we merged samples from each geographical location within three time slices: before 1950, between 1950 and 2000, and present samples (i.e., 2015; see details in Table 1). Additionally, we merged all samples from all locations within the same temporal bin to identify patterns at the scale of Finland. Population genetic structure was investigated using a subset of SNPs, where only one SNP per locus was considered so as to minimize linkage disequilibrium (i.e., to minimize redundant information), as generally recommended (Falush, Stephens, & Pritchard, 2003). We used Bayesian admixture analysis implemented in structure (Pritchard, Stephens, & Donnelly, 2000) to estimate admixture proportions, that is, the proportion of each individual's genome inherited from each of K hypothetical source populations. We ran analyses with K from 1 to 5 with 10 independent Markov chains each, using 1,000,000 steps and including 50,000 burn‐in steps. We visually checked the results obtained in each run to assess whether the Markov chains were convergent. The most likely number of clusters was identified using Evanno's method (Evanno, Regnaut, & Goudet, 2005) implemented in structure harvester (Earl & vonHoldt, 2012). Then each sample was associated to the corresponding cluster and the proportion of the samples present in each cluster was drawn on the Finland map using pie charts. Isolation‐by‐distance (IBD) was tested by examining the correlation between genetic differentiation, F ST/1 − F ST, using F ST estimations between population pairs performed with hierfstat (Goudet, 2005), and the log‐transformed distance, as suggested by Rousset (1997). Due to the ongoing debate in using Mantel tests for IBD examination (Diniz‐Filho et al., 2013), we used a simple Spearman's correlation. To corroborate the role of geographical distance as well as time in the differentiation between localities with another approach than a simple Spearman's correlation, we performed a distance‐based redundancy analysis (dbRDA) integrating geographical variables and year of collection, in an individual‐based approach, following recommendations from Laura Benestan (unpublished data; https://github.com/laurabenestan/db‐RDA‐and‐db‐MEM). In breif, we first created spatial variables using Moran Eigenvector's Maps (MEMs) implemented in the adespatial r package (Dray, Blanchet, Borcard, Guenard, & Jombart, 2016). Second, a principal coordinates analysis (PCoA) was performed on the Euclidean genetic distances based on genotypes of each sample. Finally a global dbRDA was applied by integrating the year of collection as an additional variable to spatial components, and an ANOVA with 1,000 permutations was performed to assess the significance of each variable within the model using the vegan r package (Oksanen, Kindt, Legendre, & O'Hara, 2016).

RESULTS

HyRAD efficiency

The HyRAD wetlab and PopHyRAD bioinformatic pipeline developed for this study were particularly efficient in retrieving DNA sequences of historical samples collected up to more than a century ago and stored in museum conditions. The analysis of ddRAD data used for designing the probes allowed us to obtain sufficient data at 3,826 loci for E. embla and 3,443 for L. helle, which allowed us to undertake population genomic analyses. In addition, the identification of deamination patterns at the extremities of the reads from the collection samples and not in the fresh samples confirmed the endogenous nature of DNA present in these historical samples (Figure S1).

Mapping and SNP calling comparison

We first aimed to test whether different read alignment procedures could impact on the amount of HyRAD data retrieved. bwa aln was able to align only a limited number of reads against the catalogue of HyRAD probes, representing a mean of 11.2% (SD 6.7) across all samples. This proportion was improved to 37.1% (SD 8.1) when using bwa mem (Table S2). bowtie2 showed intermediate efficiency with a mean value of 18.3% (SD 7.9) across all samples. These differences, however, decreased after quality filtering, with bwa aln showing 6.9% (SD 2.0) of mapped reads, bwa mem 12.1% (SD 4.8) and bowtie2 11.0% (SD 2.8). We also observed broad differences between methodologies aimed at calling genotypes. We counted the total number of positions covered by at least 80% of the samples and showing biallelic SNP polymorphisms. This number was maximal when combining bwa mem alignments and varscan genotype calling, representing a total of 10,116 SNPs from 1,289 loci in E. embla and 11,534 SNPs from 1,241 loci for L. helle. The number was minimal when bwa aln alignments and freebayes genotype calls were combined, which led to the identification of 2,742 SNPs from 869 loci for E. embla and 2,549 SNPs from 1,015 loci for L. helle. For each of these six combinations (three mappers and two SNP callers), we applied the complete pipeline and subsequently estimated genetic diversity statistics. We observed that these statistics were extremely variable according to the SNP calling tool used, especially the inbreeding coefficient (F IS) (Figure S2). With varscan, a large proportion of the SNPs demonstrate an inbreeding coefficient below zero with a non‐normal distribution, while the normal distribution obtained from freebayes is more consistent with biological expectations. Therefore, we decided to utilize the conservative data set obtained using freebayes on the mapping generated by bwa aln, despite being associated with an overall smaller number of SNPs; and we implemented freebayes and bwa aln as the standard caller*mapper combination in the PopHyRAD pipeline.

Patterns of population declines

To investigate genetic patterns associated with population size reduction, we focused on three descriptive statistics: (a) the observed gene diversity (H s), (b) the frequency of fixed alleles and (c) the inbreeding coefficient (F IS). Results are depicted in Figure 3.
FIGURE 3

Observed gene diversity (H s), frequency of fixed alleles and inbreeding coefficient (F IS) estimated for each locality and years. For each point a number indicates the number of samples. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time points are linked by a line of the corresponding colour

Observed gene diversity (H s), frequency of fixed alleles and inbreeding coefficient (F IS) estimated for each locality and years. For each point a number indicates the number of samples. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time points are linked by a line of the corresponding colour For E. embla, sufficient temporal data could only be retrieved for three localities, namely Kuusamo, Muonio and Pelkosenniemi. These indicated reduction of the observed gene diversity (H s) and the frequency of fixed alleles through time. This overall reduction is, paradoxically, associated with a sharp increase of the inbreeding coefficient at only one locality, Kuusamo. For L. helle, only one locality, Kuusamo Liikasenvaara, was sampled at different time points. It showed a dynamic diversity trend in which the observed gene diversity (H s) and the frequency of fixed alleles were relatively stable between the 1920s and the 1990s, but declined after the 1990s. In contrast, the inbreeding coefficient (F IS) was found to decrease between the 1920s and the 1990s and to increase sharply since. Grouping all samples within three main time slices (i.e., before 1950, between 1950 and 2000, and in 2015) increased the sample size of each locality and reinforced the observations made above (Figure 4). Indeed, we observed a decrease of the observed allelic richness (H s) over time in both species, concomitant with an increase in the frequency of fixed alleles.
FIGURE 4

Observed gene diversity (H s) and frequency of fixed alleles over time (i.e., before 1950, between 1950 and 2000, and 2015), and for each locality. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time slices are linked by a line of the same colour

Observed gene diversity (H s) and frequency of fixed alleles over time (i.e., before 1950, between 1950 and 2000, and 2015), and for each locality. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle). Localities sampled at different time slices are linked by a line of the same colour To investigate whether the patterns observed at the local level were retrieved at the global level (i.e., throughout Finland), we merged all the samples within the three time slices considered above, regardless of their geographical origin. This provided an opportunity to retrace the temporal trajectory of the Finland‐wide genetic diversity present in both focal species. We found that the observed gene diversity decreased through time and that the frequency of fixed alleles increased, especially in the most recent time period (Figure 5).
FIGURE 5

Observed gene diversity (H s) and frequency of fixed alleles estimated by time slices, before 1950, between 1950 and 2000, and 2015, over Finland as a whole after samples merging. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle)

Observed gene diversity (H s) and frequency of fixed alleles estimated by time slices, before 1950, between 1950 and 2000, and 2015, over Finland as a whole after samples merging. Vertical lines represent the standard error across all SNPs (2,742 for Erebia embla and 2,549 for Lycaena helle)

Genetic structure and isolation by distance

To investigate the mechanisms that may explain the observed decrease in genetic diversity and the increase in the inbreeding coefficient (F IS), we studied the spatial genetic structure of populations, which only revealed a faint structuring (Figure 6). In contrast, a more marked pattern of spatial structuring was retrieved in the IBD analysis, revealing a varying correlation between genetic and geographical distances over time (Figure 7). Before 1950, no correlation between genetic and geographical distances was found in any of the two species investigated. Between the 1950s and the 2000s, a significant correlation between genetic and geographical distances was found for E. embla. In this species the level of correlation and the associated slope increased further when considering the modern time period (year 2015), suggesting an increasing spatial structuring from 1950 onwards, probably in relation to increased habitat fragmentation. No significant correlation could be retrieved in L. helle in any of the two historical time periods considered (this correlative analysis could not be carried out for modern times, as only two extant populations are known) (Figure 7). Despite an overall low level of genetic structuring through space and time (overall R 2 of 0.95% and 1.52%, based on one temporal and eight spatial axes, for E. embla and L. helle, respectively), the dbRDA approach indicated a significant impact of time (p = .002) to explain population differentiation, as well as a fainter effect of 3/8 spatial variables for E. embla and 2/8 spatial variables for L. helle, although with respective contributions of spatial variables to the overall R 2 remaining <0.5% (see Table S3).
FIGURE 6

Population structure of all sampled localities according to the three studied periods. The pie charts indicate the proportion of samples associated with each cluster as identified by structure with K = 2

FIGURE 7

Isolation by distance was investigated for each time slice by computing the correlation between the genetic differentiation between populations (F ST/1 − F ST) and the log‐transformed geographical distance (log(distance)). Correlations were tested using Spearman's rank correlation, and coefficient of determination, intercept and slopes using a linear model are shown

Population structure of all sampled localities according to the three studied periods. The pie charts indicate the proportion of samples associated with each cluster as identified by structure with K = 2 Isolation by distance was investigated for each time slice by computing the correlation between the genetic differentiation between populations (F ST/1 − F ST) and the log‐transformed geographical distance (log(distance)). Correlations were tested using Spearman's rank correlation, and coefficient of determination, intercept and slopes using a linear model are shown

DISCUSSION

A direct estimation of genetic variation across the past

The study of past population dynamics has received considerable attention in recent decades (Bi et al., 2019; Nadachowska‐Brzyska, Li, Smeds, Zhang, & Ellegren, 2015; Tallavaara, Luoto, Korhonen, Järvinen, & Seppä, 2015). Those studies classically identify the most likely demographic model underlying the allelic frequency spectrum measured in modern specimens (Csilléry, Blum, Gaggiotti, & François, 2010; Espíndola et al., 2012; François & Durand, 2010). These demographic inference approaches are, however, often limited as different models can produce similar allelic frequency spectrum and summary statistics, and cannot necessarily be discriminated (Lapierre, Lambert, & Achaz, 2017). In contrast, genomic data from historical specimens catch evolution red‐handed, and can help overcome such limitations by providing direct snapshots of the past genetic diversity present in a population. In this study, we collected a large sample set of two butterfly species spread across Finland, and spanning the last 110 years. This sampling provided us with a unique opportunity to quantify the variation of the genetic diversity in both species at a time when their distribution drastically declined (Rassi et al., 2010). We have benefited from the HyRAD genome–complexity–reduction method to obtain genetic data from these valuable samples. HyRAD has been increasingly used in different laboratories, not only to identify genetic variation in historical material (Crates et al., 2019; Keighley, Heinsohn, Langmore, Murphy, & Peñalba, 2019; Linck et al., 2017; Linck, Freeman, & Dumbacher, 2019; Schmid et al., 2017), but also in ancient DNA (Schmid et al., 2017). Indeed, these methods based on hybridization capture allow us to retrieve even very small quantities of degraded DNA, which often remain unquantifiable before capture (Table S1). The amount of DNA in historical samples and the ability to extract, capture and sequence it depends on the history of the sample, the conditions of collection, sample preparation (drying, pinning etc.) and storage. Unfortunately, for most of our historical samples we do not have access to such information. However, in this study we were able to perform the entire process from historical specimen subsampling to SNP calling for ~75% of the samples analysed from both species, thus suggesting that it is compliant with most preparation histories. For this study we developed a specific pipeline, PopHyRAD, to optimally exploit the genetic information contained in samples. For now, the PopHyRAD computational pipeline released here facilitates HyRAD sequence analysis at the within‐species level by automating the steps underlying read cleaning, trimming and merging, as well as read mapping, and probe clustering. This pipeline is versatile, and can be used to analyse any type of hybridization‐capture data, using either probes from ddRAD or another RADseq protocol, or any tool able to reduce genomic complexity such as selective extraction of organellar genomes or amplification of specific genes. Before this study, the catalogue definition and remaining analytical workflow using HyRAD data have essentially been empirically explored (or a posteriori chosen), considering the outputs, and thus the tools that provided the best geographical or phylogenetic structure (Schmid et al., 2018). Here, we take the opportunity to test more accurately the performance of aligners and SNP‐callers on HyRAD data, using different tool combinations, and using a realistic criterion from the point of view of population genetics, namely F IS. The results revealed large differences on the SNPs identified and on the estimation of genetic diversity and inbreeding coefficient and suggested the bwa aln read aligner and the freebayes SNP‐caller as the most conservative combination. The nonbiologically relevant F IS values obtained with other combinations are likely to be due, at least in part, to increased false positive alignment rates (e.g., misidentified paralogues) as well as to the oversplitting of loci (i.e., a locus separated in two loci in the catalogue). This type of difference has already been highlighted in analyses of standard RADseq data (Shafer et al., 2017) and calls for caution in downstream analyses. An analysis based on data simulation is outside the scope of this study but would probably clarify the specificity and sensitivity of the different aligners and SNP‐callers, and help each user to refine the most appropriate parameters for their analyses and model species.

Genetic diversity decline in butterfly populations across Finland

The HyRAD data gathered in this study supported an overall erosion of genetic diversity at the country‐wide level of Finland in both species (Figure 5). Although interpretation of variation in genetic diversity should be tempered due to our relatively reduced sampling size per locality and temporal binning, one should keep in mind that in the context of museomics, our sampling remains substantial. This pattern of genetic diversity reduction parallels those found by similar studies on butterflies in Northern Europe (Fountain et al., 2016; Ugelvig, Nielsen, Boomsma, & Nash, 2011) but also more broadly in other taxa (Dufresnes et al., 2018; Schmid et al., 2018). Our data also uncovered strong regional differences, with at least one locality (i.e., Kuusamo) showing a local increase in diversity at a given time point, potentially following migration linked to the persistence of their habitat in these specific localities (Habel, Meyer, & Schmitt, 2014) playing a role as refuges for individuals from other populations carrying genetic diversity (Craioveanu, Sitar, & Rákosy, 2014). However, estimations based on recent samples (i.e., collected in 2015) still show a decline in genetic diversity in this particular locality. The overall erosion of genetic diversity, both locally and country‐wide, is expected given that most Finnish populations of these two butterflies have gone extinct through the 20th century, as a result of a drastic reduction in habitat availability, with the remaining populations not being able to maintain genetic diversity to levels that once existed in an area of ~340,000 km2 a century or even a few decades ago. The second striking result of this study is the increase in IBD over time, at least in one of the two species. Indeed, in E. embla, when considering time slices that divide the time frame of collected specimens into three periods, only the two last (i.e., 1950–2000, and 2015) are associated with significant IBD, with an increasing slope as we reach contemporaneous times. The effect of time was also retrieved in the dbRDA approach, although because this analysis is individual‐based, it was less representative of genetic variation per deme through space and time, thus explaining the low R 2 retrieved in the overall model. Our main result of an overall increase in the slope of the IBD pattern is probably the consequence of an increase in habitat fragmentation, revealing a reduced number of migrants among demes, and thus an increase in the differentiation of populations, essentially due—given the short time span involved—to drift. This signature might be also found in L. helle, even if our sampling does not allow the estimation of IBD for the most recent period (i.e., only two populations are still extant today). This transition from a virtually countrywide panmictic system to a more marked structuring in space is indicative of the fact that despite acknowledged dispersal capabilities of these butterflies in Finland (Habel, Rödder, Schmitt, & Nève, 2011; Habel, Finger, Schmitt, & Nève, 2011), generally related to a colonization‐edge syndrome characteristic of populations found at the northern edge of a species' distribution (Duplouy, Wong, Corander, Lehtonen, & Hanski, 2017), the fragmentation of habitats has led to a decrease in these exchanges, and thus to local differentiation. Through their impact on biodiversity, human activities are accelerating the extinction of populations and the differentiation of those that persist. This could be catalyzing lineage divergence, except that habitat destruction is an ongoing process, potentially hampered by geopolitical, but potentially ubiquitous, decisions. Our study of two species of butterflies in Finland indicates that not all species might respond identically to this fragmentation, and that comparative studies, involving a larger number of species represented by fresh but also historical specimens, are needed to understand how life history traits influence the species' population response to anthropogenic habitat disturbance and destruction. With the application of both the wetlab HyRAD protocol to historical and fresh specimens, and the PopHyRAD bioinformatic pipeline as described in this study, access to both past and extant genetic diversity should allow a better understanding and anticipation of the neutral response of populations to drastic habitat loss.

AUTHOR CONTRIBUTIONS

N.A. and M.P. designed the study. M.P. and L.K. performed sampling. M.P. performed laboratory work. J.G. analysed the molecular data, with contributions from S.N., M.P., N.A., L.O. and S.S. All authors took part in discussions concerning the analyses and interpretations. J.G. and N.A. wrote the paper, with contributions from all authors.

DATA ACCESSIBILITY

Sequence reads are archived at Zenodo: http://doi.org/10.5281/zenodo.3668644 for E. embla and http://doi.org/10.5281/zenodo.3668660 for L. helle. Scripts for the whole analytical process have been uploaded to Github (https://github.com/JeremyLGauthier/Scripts_Gauthier_et.al_2019_MER). The PopHyRAD pipeline is constantly under development and improvement. The current version can be found at https://github.com/JeremyLGauthier/PHyRAD). Supplementary Material Click here for additional data file.
  57 in total

1.  Target enrichment via DNA hybridization capture.

Authors:  Susanne Horn
Journal:  Methods Mol Biol       Date:  2012

2.  Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.

Authors:  G Evanno; S Regnaut; J Goudet
Journal:  Mol Ecol       Date:  2005-07       Impact factor: 6.185

Review 3.  Research on inbreeding in the 'omic' era.

Authors:  Torsten N Kristensen; Kamilla S Pedersen; Cornelis J Vermeulen; Volker Loeschcke
Journal:  Trends Ecol Evol       Date:  2009-09-04       Impact factor: 17.712

4.  Biological annihilation via the ongoing sixth mass extinction signaled by vertebrate population losses and declines.

Authors:  Gerardo Ceballos; Paul R Ehrlich; Rodolfo Dirzo
Journal:  Proc Natl Acad Sci U S A       Date:  2017-07-10       Impact factor: 11.205

Review 5.  Revisiting Adaptive Potential, Population Size, and Conservation.

Authors:  Ary A Hoffmann; Carla M Sgrò; Torsten N Kristensen
Journal:  Trends Ecol Evol       Date:  2017-05-02       Impact factor: 17.712

6.  Genomic impact of severe population decline in a nomadic songbird.

Authors:  Ross Crates; George Olah; Marcin Adamski; Nicola Aitken; Sam Banks; Dean Ingwersen; Louis Ranjard; Laura Rayner; Dejan Stojanovic; Tomasz Suchan; Brenton von Takach Dukai; Robert Heinsohn
Journal:  PLoS One       Date:  2019-10-24       Impact factor: 3.240

7.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

8.  The development of a digitising service centre for natural history collections.

Authors:  Riitta Tegelberg; Jaana Haapala; Tero Mononen; Mika Pajari; Hannu Saarenmaa
Journal:  Zookeys       Date:  2012-07-20       Impact factor: 1.546

9.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

10.  Temporal genomic contrasts reveal rapid evolutionary responses in an alpine mammal during recent climate change.

Authors:  Ke Bi; Tyler Linderoth; Sonal Singhal; Dan Vanderpool; James L Patton; Rasmus Nielsen; Craig Moritz; Jeffrey M Good
Journal:  PLoS Genet       Date:  2019-05-03       Impact factor: 5.917

View more
  10 in total

Review 1.  Opportunities and challenges of macrogenetic studies.

Authors:  Deborah M Leigh; Charles B van Rees; Katie L Millette; Martin F Breed; Chloé Schmidt; Laura D Bertola; Brian K Hand; Margaret E Hunter; Evelyn L Jensen; Francine Kershaw; Libby Liggins; Gordon Luikart; Stéphanie Manel; Joachim Mergeay; Joshua M Miller; Gernot Segelbacher; Sean Hoban; Ivan Paz-Vinas
Journal:  Nat Rev Genet       Date:  2021-08-18       Impact factor: 53.242

2.  Performance and automation of ancient DNA capture with RNA hyRAD probes.

Authors:  Tomasz Suchan; Mariya A Kusliy; Naveed Khan; Loreleï Chauvey; Laure Tonasso-Calvière; Stéphanie Schiavinato; John Southon; Marcel Keller; Keiko Kitagawa; Johannes Krause; Alexander N Bessudnov; Alexander A Bessudnov; Alexander S Graphodatsky; Silvia Valenzuela-Lamas; Jarosław Wilczyński; Sylwia Pospuła; Krzysztof Tunia; Marek Nowak; Magdalena Moskal-delHoyo; Alexey A Tishkin; Alexander J E Pryor; Alan K Outram; Ludovic Orlando
Journal:  Mol Ecol Resour       Date:  2021-10-15       Impact factor: 8.678

3.  Museomics identifies genetic erosion in two butterfly species across the 20th century in Finland.

Authors:  Jérémy Gauthier; Mila Pajkovic; Samuel Neuenschwander; Lauri Kaila; Sarah Schmid; Ludovic Orlando; Nadir Alvarez
Journal:  Mol Ecol Resour       Date:  2020-05-14       Impact factor: 7.090

4.  A new lineage of Galapagos giant tortoises identified from museum samples.

Authors:  Evelyn L Jensen; Maud C Quinzin; Joshua M Miller; Michael A Russello; Ryan C Garrick; Danielle L Edwards; Scott Glaberman; Ylenia Chiari; Nikos Poulakakis; Washington Tapia; James P Gibbs; Adalgisa Caccone
Journal:  Heredity (Edinb)       Date:  2022-02-25       Impact factor: 3.832

5.  Phylogenomic and mitogenomic data can accelerate inventorying of tropical beetles during the current biodiversity crisis.

Authors:  Michal Motyka; Dominik Kusy; Matej Bocek; Renata Bilkova; Ladislav Bocak
Journal:  Elife       Date:  2021-12-20       Impact factor: 8.140

6.  Spatiotemporal monitoring of the rare northern dragonhead (Dracocephalum ruyschiana, Lamiaceae) - SNP genotyping and environmental niche modeling herbarium specimens.

Authors:  Malene Nygaard; Alexander Kopatz; James M D Speed; Michael D Martin; Tommy Prestø; Oddmund Kleven; Mika Bendiksby
Journal:  Ecol Evol       Date:  2022-08-12       Impact factor: 3.167

7.  The untapped potential of macrofossils in ancient plant DNA research.

Authors:  Christoph Schwörer; Maria Leunda; Nadir Alvarez; Felix Gugerli; Christoph Sperisen
Journal:  New Phytol       Date:  2022-04-05       Impact factor: 10.323

Review 8.  Challenges in quantifying genome erosion for conservation.

Authors:  Mirte Bosse; Sam van Loon
Journal:  Front Genet       Date:  2022-09-26       Impact factor: 4.772

9.  HyRAD-X Exome Capture Museomics Unravels Giant Ground Beetle Evolution.

Authors:  Emmanuel F A Toussaint; Jérémy Gauthier; Julia Bilat; Conrad P D T Gillett; Harlan M Gough; Håkan Lundkvist; Mickael Blanc; Carlos P Muñoz-Ramírez; Nadir Alvarez
Journal:  Genome Biol Evol       Date:  2021-07-06       Impact factor: 3.416

10.  The Genome Assembly and Annotation of the Apollo Butterfly Parnassius apollo, a Flagship Species for Conservation Biology.

Authors:  Lars Podsiadlowski; Kalle Tunström; Marianne Espeland; Christopher W Wheat
Journal:  Genome Biol Evol       Date:  2021-08-03       Impact factor: 3.416

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.