Literature DB >> 26797914

PSMC analysis of effective population sizes in molecular ecology and its application to black-and-white Ficedula flycatchers.

Krystyna Nadachowska-Brzyska1, Reto Burri1, Linnéa Smeds1, Hans Ellegren1.   

Abstract

Climatic fluctuations during the Quaternary period governed the demography of species and contributed to population differentiation and ultimately speciation. Studies of these past processes have previously been hindered by a lack of means and genetic data to model changes in effective population size (Ne ) through time. However, based on diploid genome sequences of high quality, the recently developed pairwise sequentially Markovian coalescent (PSMC) can estimate trajectories of changes in Ne over considerable time periods. We applied this approach to resequencing data from nearly 200 genomes of four species and several populations of the Ficedula species complex of black-and-white flycatchers. Ne curves of Atlas, collared, pied and semicollared flycatcher converged 1-2 million years ago (Ma) at an Ne of ≈ 200 000, likely reflecting the time when all four species last shared a common ancestor. Subsequent separate Ne trajectories are consistent with lineage splitting and speciation. All species showed evidence of population growth up until 100-200 thousand years ago (kya), followed by decline and then start of a new phase of population expansion. However, timing and amplitude of changes in Ne differed among species, and for pied flycatcher, the temporal dynamics of Ne differed between Spanish birds and central/northern European populations. This cautions against extrapolation of demographic inference between lineages and calls for adequate sampling to provide representative pictures of the coalescence process in different species or populations. We also empirically evaluate criteria for proper inference of demographic histories using PSMC and arrive at recommendations of using sequencing data with a mean genome coverage of ≥18X, a per-site filter of ≥10 reads and no more than 25% of missing data.
© 2016 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

Entities:  

Keywords:  Atlas flycatcher; PSMC; collared flycatcher; coverage; effective population size; pied flycatcher; semicollared flycatcher; whole-genome sequencing

Mesh:

Year:  2016        PMID: 26797914      PMCID: PMC4793928          DOI: 10.1111/mec.13540

Source DB:  PubMed          Journal:  Mol Ecol        ISSN: 0962-1083            Impact factor:   6.185


Introduction

Patterns of genetic diversity within and among species are mainly a consequence of their evolutionary history. Among other factors, Quaternary (2.4 Myr to the present) climatic fluctuations strongly influenced species’ demography and distribution (Hewitt 2000, 2004). During glaciation periods, many suitable habitats disappeared or, in case of the Northern Hemisphere, shifted to the south because of ice and permafrost. As a consequence, species went extinct or shifted to new areas where they survived in glacial refugia and/or adapted to new conditions (Arenas et al. 2012). During interglacial periods, large areas of suitable habitats again became available and were recolonized. These range shifts likely had several genetic consequences (Hewitt 2004; Arenas et al. 2012). For example, previously large populations may have lost significant amounts of genetic diversity due to bottlenecks in glacial refugia and/or founder effects during postglacial expansions. Geographic separation of populations in glacial refugia promoted differentiation between populations and in some cases led to allopatric speciation (Hewitt 2004). Moreover, newly established species came into secondary contact during expansions from different refugia, exchanged migrants and sometimes formed stable hybrid zones (Barton & Gale 1993; Hewitt 2001). In Europe, the main southern refugial areas included Iberia, Italy, the Balkans and the Caucasus. Their contribution to recolonization of northern Europe via relatively well‐defined postglacial migration routes has differed from species to species (Taberlet et al. 1998; Hewitt 1999, 2000). Growing evidence also suggests more complex patterns of spatial separation of species during glacial periods (Stewart & Lister 2001; Gómez & Lunt 2007; Schmitt 2007). For example, some species had their refugia in more northern parts of Europe (Kotlík et al. 2006; Parducci et al. 2012; Ruiz‐González et al. 2013), while southern refugia likely consisted of heterogeneous climatic niches/regions that created complex ‘refugia‐within‐refugium’ patterns (Canestrelli & Nascetti 2008; Dubreuil et al. 2008; Ursenbacher et al. 2008; Ferrero et al. 2011; Miraldo et al. 2011; Velo‐Antón et al. 2012; Pabijan et al. 2015).

Demographic inference based on genetic data and the pairwise sequentially Markovian coalescent

A wide range of approaches based on genetic data have been developed to infer the demographic history of species (Emerson et al. 2001; Hey & Machado 2003). For example, relatively recent examples that attracted a great deal of interest are Skyline plots (Ho & Shapiro 2011) and methods based on approximate Bayesian computation (Beaumont 2010; Csilléry et al. 2010). However, many demographic methods are restricted to particular molecular markers (Pybus et al. 2000; Drummond et al. 2005; Nikolic & Chevalet 2014), can only use a limited number of loci (Hey & Nielsen 2004; Heled & Drummond 2008) and/or are computational intensive when applied to large data sets (Beaumont et al. 2002; Heled & Drummond 2008; Csilléry et al. 2010). Sampling large numbers of unlinked loci spread across the genome may be needed to accurately estimate population genetic parameters and determine the timing of speciation events and rates of gene flow (Edwards & Beerli 2000; Arbogast et al. 2002; Ballard & Whitlock 2004). But until recently, the application of genomewide approaches has been limited by the access to genomewide population genetic data, although this is about to change with the rapid developments in sequencing technology and population genomics (Ellegren 2014). New analytical methods have been developed that provide unique opportunities to model changes in effective population size (N ) through time using information from whole‐genome sequences (Li & Durbin 2011). Specifically, the pairwise sequentially Markovian coalescent (PSMC) model uses a hidden Markov framework and identifies historical recombination events across a single diploid genome. It also infers the time to the most recent common ancestor (TMRCA) for each independent DNA segment and, based on the rate of coalescent events and the TMRCA distribution, infers ancestral N at a given time epoch (Li & Durbin 2011). The approach capitalizes on coalescent theory according to which pairwise sequence divergence is proportional to the time of the coalescent, with long stretches of low heterozygosity corresponding to recent coalescent events and short stretches of high heterozygosity corresponding to more ancient coalescent events. The rate of the inferred coalescent events at a given time is inversely proportional to N . A major strength of the method is that capitalizes on the combined pattern of the distributions of the TMRCA between two alleles in an individual at a very large number of loci spread across the genome. This is in contrast to previous coalescent work analysing nonrecombining loci like mtDNA. The PSMC approach has recently been applied in several genome sequencing projects (e.g. Groenen et al. 2012; Cho et al. 2013; Zhan et al. 2013; Zhao et al. 2013; Carbone et al. 2014; Lamichhaney et al. 2015), including investigation of past population‐size changes of endangered species (Zhao et al. 2013; Li et al. 2014; Nadachowska‐Brzyska et al. 2015), detection of population declines or expansions to assist in the interpretation of population genomic statistics (Deinum et al. 2015), or to complement other demographic history inference (Nadachowska‐Brzyska et al. 2013; Freedman et al. 2014). However, in spite of its recent wide usage, it is still not clear how the quality and quantity of sequence data affect the outcome of PSMC analyses.

Study system

Western Palearctic black‐and‐white flycatchers of the genus Ficedula are represented by four closely related species: Atlas flycatcher (F. speculigera), collared flycatcher (F. albicollis), pied flycatcher (F. hypoleuca) and semicollared flycatcher (F. semitorquata). They are all small‐bodied (15 g), migratory, forest‐dwelling passerine birds. The pied flycatcher (von Haartman 1949, 1956; Campbell 1968; Harvey et al. 1985; Alatalo et al. 1986; Lundberg & Alatalo 1992) and the collared flycatcher (Gustafsson & Sutherland 1988; Gustafsson & Pärt 1990; Gustafsson et al. 1995; Merilä & Sheldon 2000; Qvarnström et al. 2000, 2006; Both & Visser 2001; Merilä et al. 2001) represent two of the most well‐studied bird species in ecological and evolutionary research, including aspects of how the two species diverged and speciated (Alatalo et al. 1982, 1990; Sætre et al. 1997; Veen et al. 2001; Saether et al. 2007; Qvarnström et al. 2010; Sætre & Saether 2010; Ellegren et al. 2012). Whereas collared flycatcher and semicollared flycatcher occur in restricted ranges in eastern and southeastern Europe (with semicollared extending into the Middle East and southwestern Asia), and Atlas flycatchers are endemic to northwestern Africa, pied flycatcher is widely distributed from the Iberian peninsula over central and northern Europe to Western Asia (Fig. 1). Their current range distributions must have been shaped by climate changes during Pleistocene, with species origins likely reflecting survival in separate glacial refugia (Sætre et al. 2001a). In a previous study, we found that the demographic model with the highest support in approximate Bayesian computation modelling was one with significant declines in N in both collared flycatcher and pied flycatcher since their divergence from a common ancestor (Nadachowska‐Brzyska et al. 2013). Secondary contact has been established at least between collared flycatcher and pied flycatcher in central Europe and on the Baltic Sea islands (Tegelstrom & Gelter 1990; Sætre et al. 1999a; Veen et al. 2001). Previous genetic studies of flycatchers have indicated no or very low levels of gene flow between allopatric populations of the species, and moderate gene flow in the area of recent sympatry on the Baltic Sea islands (Hogner et al. 2012; Backström et al. 2013; Nadachowska‐Brzyska et al. 2013).
Figure 1

Range distributions of the four black‐and‐white Ficedula species: pied flycatcher (green), collared flycatcher (violet), semicollared flycatcher (red) and Atlas flycatcher (yellow). Black circles indicate sampling locations.

Range distributions of the four black‐and‐white Ficedula species: pied flycatcher (green), collared flycatcher (violet), semicollared flycatcher (red) and Atlas flycatcher (yellow). Black circles indicate sampling locations. We have recently sequenced and de novo assembled the genome of collared flycatcher (Ellegren et al. 2012; Kawakami et al. 2014), providing an excellent platform for in‐depth population genomic analyses. Here, we use whole‐genome resequencing data from 200 individuals, representing 10 different populations of the four flycatcher species, to infer changes in N over time using PSMC modelling. We first analyse how the quality and quantity of sequence data affect the outcome of PSMC analyses and then go on to test for concordance in the demographic history of populations and species, and examine when they are likely to have had a common ancestry.

Material and methods

Sampling

We analysed 200 individuals of four closely related Ficedula flycatcher species that have been recently subject to whole‐genome resequencing (Burri et al. 2015). In brief, sequencing was performed with Illumina paired‐end sequencing technology on a HiSeq 2000 instrument at the SNP&SEQ Technology Platform of Uppsala University. Individually tagged libraries with an insert size of approximately 450 bp were created and sequenced from both ends using 100 cycles. All raw sequencing reads were mapped to a repeat‐masked version of the collared flycatcher genome assembly version ficalb1.5 (Kawakami et al. 2014) using bwa 0.7.4 (Li & Durbin 2009). Alignment quality was enhanced by local realignment with gatk (McKenna et al. 2010; DePristo et al. 2011), and duplicates were marked with picard (http://picard.sourceforge.net). Base quality score recalibrations (BQSR) were performed for each population separately. Eighty individuals were sequenced of each of collared flycatcher and pied flycatcher, with two samples (one from each taxon set) excluded after identification as F1 hybrids, 20 of Atlas flycatcher and 20 of semicollared flycatcher. Collared flycatchers were sampled at four distinct locations in Italy, Hungary, the Czech Republic and on the Baltic Sea island Öland (Sweden), and pied flycatchers were sampled in Spain, Sweden (mainland), Czech Republic and on Öland. We will for simplicity refer to these birds as representing populations from their geographical origin of sampling and are well aware of that their geographical distribution during glaciation cycles must have been different. Atlas flycatchers were sampled in the Moroccan Atlas Mountains and semicollared flycatchers in Bulgaria.

PSMC analysis

PSMC analysis requires a consensus genome sequence (fastq) that can be filtered to account for coverage and sequencing errors. For each sampled individual, we obtained a fastq sequence for autosomal regions using the ‘mpileup’ command in samtools (Li et al. 2009). The samtools pipeline uses single individuals for SNP calling, that is it is not based on population frequencies for variant calling and does not assume Hardy–Weinberg equilibrium. This approach has been used in the majority of PSMC‐based studies so far. By default, sites were marked as missing data if the root‐mean‐squared mapping quality of reads covering the site was below 25, the site was within 10 bp of a predicted insertion–deletion polymorphism or the inferred consensus quality (i.e. the Phred‐scaled probability that the consensus is wrong) was below 20. To minimize the impact of collapsed regions in the assembly, we also masked all sites at which read depth was more than twice the average read depth across the genome. As our data included individuals for which the mean genomewide coverage varied from 7 to 29, we had the opportunity to investigate the influence of the combined effect of mean coverage and minimum per‐site coverage thresholds on the results of the PSMC analysis. To perform such a test, we applied a set of four additional filters to the sequence data from one collared flycatcher population (Baltic Sea, n = 19). First, we ran PSMC analysis without coverage filtering. Second, we excluded sites for which read depth was less than one‐third of the average read depth (representing default settings in PSMC when analysing highly covered genomes, https://github.com/lh3/psmc). Third, we excluded sites for which read depth was less than six. Finally, we included only sites for which read depth ≥10. The last two filters were based on the results from several simulation studies (Crawford & Lazzaro 2012; Alex Buerkle & Gompert 2013; Korneliussen et al. 2013; Han et al. 2014) and led to the elimination of sites in the focal individual. We also noted how much missing data different coverage filters generated. Based on the results from the test described above (see Results and Discussion), we ran the PSMC analyses for all other populations with a minimum read depth per site of 10. The settings of the PSMC analysis (−p and −t options) were chosen manually according to suggestions given by H. Li and R. Durbin (2011, https://github.com/lh3/psmc) and based on our previous experience (Nadachowska‐Brzyska et al. 2013). The upper limit of the TMRCA was set to 5 (this is set by ‐t option) and the initial θ/ρ value to 1 (this is set by ‐r option). N was inferred across 34 free atomic time intervals (4 + 30*2 + 4+6 + 10), which means that the first population‐size parameter spans the first four atomic time intervals, each of the next 30 parameters spans two intervals, while the last three parameters span four, six and 10 intervals, respectively (this is set by the −p option). To check for variance in N , we performed 100 bootstrap replicates (the number of replicates limited by computational time). Bootstrapping was conducted by randomly sampling with replacement 5‐Mb sequence segments obtained from the consensus genome sequence. For denoting estimates of N , we use . A generation time of 2 years (Brommer et al. 2004) and a mutation rate of 1.4 × 10−9 year/site (Ellegren et al. 2012) were applied.

Results

Influence of per‐site coverage filter

We started by examining the effect of different filters for the per‐site coverage (no filter, ≥1/3 of mean genomewide coverage, ≥6 reads or ≥10 reads) in one collared flycatcher population. Overall, the shape of the N curve was not strongly affected by varying the filtering criteria for individuals with high (≥20X) mean genome coverage. However, N estimates increased with increased filter level, with estimates on average ≈30% higher at a filter of ≥10 than when no filter was used (Fig. 2A, Fig. S1, Supporting information). Moreover, there was slight shift in the N curve towards more recent times when filtering level was increased (Fig. 2A, Fig. S1, Supporting information).
Figure 2

Influence of different per‐site filtering regimes on PSMC plots for example individuals with high (A; coverage 23) and low (B; coverage 8) mean genomewide coverage. Colours of the lines indicate filtering thresholds: no filtering (red), read depth ≥1/3 of the average read depth (green), read depth ≥6 (blue) and read depth ≥10 (violet). Note that each panel shows data from a single individual subject to different filtering thresholds.

Influence of different per‐site filtering regimes on PSMC plots for example individuals with high (A; coverage 23) and low (B; coverage 8) mean genomewide coverage. Colours of the lines indicate filtering thresholds: no filtering (red), read depth ≥1/3 of the average read depth (green), read depth ≥6 (blue) and read depth ≥10 (violet). Note that each panel shows data from a single individual subject to different filtering thresholds. For individuals with lower mean genome coverage (especially ≤10X), the per‐site filtering criteria had a significant effect not only on the magnitude of N estimates but also on the shape of the N curve, such that population expansions and contractions were poorly captured at low filters (Fig. 2B). The lower the mean coverage, the more pronounced were the differences among different filtering regimes. A filter of one‐third of the mean coverage or less in individuals with a mean genome coverage ≤10X produced nearly flat lines without signals of population‐size changes, while expansions and contractions were visible with a stringent filter of 10 (Fig. 2B). In all further analyses, we therefore applied a per‐site filter of 10.

The effect of mean genome coverage and missing data

A similar shape of the N curve was seen among all individuals in the test population although amplitudes were less pronounced at low coverage (Fig. 3), likely reflecting a failure to correctly call heterozygous sites. The tendency of more marked N dynamics at higher mean genome coverage was manifested in a strong positive correlation between coverage and the maximum N estimate along the N curve (r = 0.83, P < 0.00001; Fig. 4A). A similar correlation was found between mean coverage and the timing of the maximum N (r = 0.91, P < 0.00001; Fig. 4B). Individual N curves did not converge until a mean genome coverage of ≥18X. On the other hand, the precise shapes were then remarkably similar among individuals.
Figure 3

PSMC results for a test population of collared flycatcher. The numbers in figure legend indicate mean genomewide coverage with percentage of missing data in brackets.

Figure 4

Correlations between maximum N estimate and mean genomewide coverage (A), timing of the maximum N estimate and mean coverage (B), maximum N estimate and percentage of missing data (C), and percentage of missing data and mean coverage (D).

PSMC results for a test population of collared flycatcher. The numbers in figure legend indicate mean genomewide coverage with percentage of missing data in brackets. Correlations between maximum N estimate and mean genomewide coverage (A), timing of the maximum N estimate and mean coverage (B), maximum N estimate and percentage of missing data (C), and percentage of missing data and mean coverage (D). The proportion of sites with missing data (which may either be due to failure to meet the per‐site filter or because of poor sequencing quality) in the test population ranged from 19% to 75% per individual. Not surprisingly, there was a strong negative correlation between amount of missing data and mean coverage (r = −0.89; P < 0.00001), but some outliers (relatively high coverage coupled with relatively high amount of missing data) were also seen (Fig. 4C; Fig. 3). There was also a strong negative correlation between maximum N and amount of missing data (r = −0.85, P < 0.00001; Fig. 4D). For individuals where the amount of missing data was less than 25%, N curves were essentially identical (Fig. 3). This suggests that both mean coverage and percentage of missing data should be use as filtering thresholds in PSMC analysis. Based on these results, we will address and discuss patterns of N variation among populations and species of Ficedula flycatchers by only considering individuals that have no more than 25% of missing data and a mean genomewide coverage of ≥18X (and applying a per‐site coverage filter of ≥10; Table S1, Supporting information). We present PSMC plots for all individuals not meeting these criteria in Figs S2–S10 (Supporting information).

Variation in patterns of N dynamics among collared flycatcher populations

The PSMC results clearly indicated that all four analysed collared flycatcher populations shared ancestry and demography for most of the investigated time period (Fig. 5, Fig. S11, Supporting information). The species’ started to increase from a level of ≈200 000 approximately 1 Ma, reached its highest population size ( ≈ 600,000) at approximately 200 kya and then declined for a period of about hundred thousand years. A minimum of ≈ 300,000 was reached about 50 kya for the populations from Hungary, Czech Republic and Baltic Sea; the minimum of the Italian population was ≈ 400,000 at the same time point. Then, a new and significant population expansion occurred 20–50 kya, visible for all four populations. Up until this time point, all populations thus showed similar N curves. However, the Baltic Sea population declined to a level of  ≈ 170 000 at 10–20 kya. The Baltic Sea islands (including Gotland) are thought to have been colonized by collared flycatchers in modern times, perhaps in the 18th or 19th century (Lundberg & Alatalo 1992). To exclude that inbreeding in connection with founding of these island populations affected PSMC estimates, we filtered identity‐by‐descent (IBD) regions larger than 10 kb, that is regions with runs of homozygosity (runs of consecutive homozygous SNPs) in individual birds. However, this had essentially no effect on the PSMC curves (Fig. S12, Supporting information; thresholds of 5 kb and 20 kb had similarly no effect).
Figure 5

PSMC estimates of the changes in effective population size over time for four populations of collared flycatcher. Each line represents one individual and only individuals with no more than 25% of missing data and a mean genomewide coverage of ≥18X are included (Italy, n = 4; Hungary, n = 2; Czech Republic, n = 5; Baltic Sea, n = 8).

PSMC estimates of the changes in effective population size over time for four populations of collared flycatcher. Each line represents one individual and only individuals with no more than 25% of missing data and a mean genomewide coverage of ≥18X are included (Italy, n = 4; Hungary, n = 2; Czech Republic, n = 5; Baltic Sea, n = 8).

Variation in patterns of N dynamics among pied flycatcher populations

Patterns in pied flycatcher differed from collared flycatcher both in terms of the shape of the N curve and in the consistency among populations (Fig. 6, Fig. S13, Supporting information). Populations from Sweden (mainland), the Baltic Sea and the Czech Republic shared similar N dynamics through time. They had a stable of ≈ 200 000 from approximately 3 Ma to 700–800 kya and then started to slowly increase in size. From ≈ 50 kya, a significant population expansion took off, with the most recent N estimates >1 000 000. These pied flycatcher populations did not show any signs of population decline during the investigated time period.
Figure 6

PSMC estimates of the changes in effective population size over time for four populations of pied flycatcher. Each line represents one individual and only individuals with no more than 25% of missing data and a mean genomewide coverage of ≥18X are included (Spain, n = 2; Sweden, n = 8; Czech Republic, n = 2; Baltic Sea, n = 1).

PSMC estimates of the changes in effective population size over time for four populations of pied flycatcher. Each line represents one individual and only individuals with no more than 25% of missing data and a mean genomewide coverage of ≥18X are included (Spain, n = 2; Sweden, n = 8; Czech Republic, n = 2; Baltic Sea, n = 1). PSMC analysis of Spanish pied flycatchers revealed similar N dynamics up until 300–400 kya. However, from then on, the Spanish population experienced a significant expansion much earlier than the central/northern European populations. Moreover, in sharp contrast to the other populations, the expansion halted at an  ≈ 600 000 at 100 kya and was followed by a population decline at the beginning of the last glacial period (LGP). Another cycle of expansion–contraction took place subsequently within a relatively short period of time, with reaching ≈ 1 000 000 20–30 kya but then decreasing to ≈ 340 000 by the end of the LGP.

N dynamics in Atlas flycatcher and semicollared flycatcher

For the two remaining Ficedula species, we had data from one population of each species. Atlas flycatchers increased in numbers from  ≈ 200 000 1 Ma to 500 000–600 000 about 150 kya. Then, the species went through a cycle of population contraction and expansion, with a minimum of ≈ 300 000 50 kya and up again to ≈ 600 000 at the end of the LGP (Fig. 7). The semicollared flycatcher was the one of the four species that showed the least fluctuations in over time; was relatively constant between 200 000 and 300 000 up until 50 kya (Fig. 7). More recently, increased to ≈ 800 000 by the end of the LGP.
Figure 7

PSMC estimates of the changes in effective population size over time for four black‐and‐white Ficedula species.

PSMC estimates of the changes in effective population size over time for four black‐and‐white Ficedula species.

Among‐species variation in the dynamics of N trajectories and statistical inference of PSMC curves

Our results showed that the N curves of the four species converged back in time around 1–2 Ma, at  ≈ 200 000 and then started to diverge (Fig. 7). This is consistent with the existence of a common ancestor at this time, followed by rapid radiation and speciation. The curve for the collared flycatcher indicates a slightly higher ancestral than for the other species (240 000 vs 200 000), but the difference is small and could be related to a minor difference in generation time or some methodological aspect. To our knowledge, there is no specifically developed statistics available to test if PSMC curves from two or more populations differ significantly. However, by bootstrapping, we can analyse to what extent curves from different populations overlap at certain time points. Figure 8 shows bootstrapped PSMC curves for each of the four species, and Figs S11 and S13 (Supporting information) show bootstrapped curves for different populations of collared flycatcher and pied flycatcher, respectively. Some important conclusion can be made from these plots. For example, the separation into four different N curves, one for each species, starting about 1 Ma is well supported by separated confidence intervals. The recent decrease in N of the Baltic Sea collared flycatcher is supported by separated confidence intervals compared to other collared flycatcher populations. Finally, the rapid increase in N of Spanish pied flycatcher 200–80 kya is supported by separated confidence intervals compared to other pied flycatcher populations. As a general note, the bootstrap curves clearly confirm the limitations of PSMC analysis to infer demography in the more recent past; there is considerable variation in N estimates among bootstrap replicates for the most recent time intervals.
Figure 8

PSMC estimates of changes in the effective population size over time for four Ficedula flycatcher species, with bootstrap results indicated with thin lines.

PSMC estimates of changes in the effective population size over time for four Ficedula flycatcher species, with bootstrap results indicated with thin lines.

Discussion

PSMC in molecular ecology

Stimulated by the work of McVean & Cardin (2005), the PSMC approach was developed by Li & Durbin (2011). PSMC is a rare example of a method that uses whole‐genome sequence information and at the same time is relatively fast and straightforward to apply to any organisms. Indeed, it has been common in recent genome assembly projects to analyse the data in a PSMC framework (e.g. Cho et al. 2013; Zhan et al. 2013; Zhao et al. 2013; Carbone et al. 2014; Kelley et al. 2014; Lamichhaney et al. 2015). An obvious advantage of the PSMC method is that it gives detailed information on historical‐ancient N dynamics, information that is often impossible or very hard to obtain from other demographic methods (e.g. Beaumont et al. 2002; Hey & Nielsen 2004). As for any other population genetic method that uses NGS data, PSMC is prone to biases that are the result of sequencing/genotyping errors and missing data (Crawford & Lazzaro 2012; MacLeod et al. 2013; Han et al. 2014). Not filtering out sites with low coverage will intuitively lead to heterozygous positions being erroneously called homozygous (Han et al. 2014). Alex Buerkle and Gompert (2013) noted that six or more reads of the same allele are needed to call an individual homozygous at a particular site with >95% confidence (using binomial probability for an individual's genotype at a locus). Filtering is likely to be particularly critical to PSMC analysis because it estimates the TMRCA based on the observed patterns of local heterozygosity across the genome sequence. At least two types of coverage parameters may be important: coverage at individual sites and mean genomewide coverage. The latter will affect the confidence with which SNP calling can capture the true genotype state across the genome (Crawford & Lazzaro 2012; Han et al. 2014). Our results suggested that a mean genomewide coverage of 18X and no more than 25% of missing data is necessary to make reliable demographic inference using PSMC analysis. These filters may provide a general recommendation to similar studies in other species. Importantly, we note that this recommendation is more stringent than what has been used in some recent PSMC‐based demographic studies (Li et al. 2013; Lamichhaney et al. 2015). As many other methods that infer demography from sequence data, PSMC suffers from uncertainty in parameter estimation due to factors unrelated to data quality, like scaling parameters used to interpret the results and/or confounding effects of selection or particular demographic situations [e.g. PSMC does not deal well with very sudden changes in population size and may smooth them out; (Li & Durbin 2011)]. Mutation rate and generation time estimates are necessary to scale the results of PSMC analyses and if these estimates are under‐ or overestimated, the PSMC would also be biased. However, mutation rate and generation time estimates influence the PSMC outputs in a predictable manner: they do not change the principal shape of the PSMC curve but only move the curve along the axes (Nadachowska‐Brzyska et al. 2015). For example, a halved generation time will double the estimate of N (given a fixed mutation rate per year) and a halved mutation rate per year will move the curve to older times and also double the estimate of N . In the context of PSMC analysis, mutation rate estimates are related to the influence of selection on sequence variation. Heterozygosity is reduced in regions subject to purifying selection (or to selective sweeps), as well as in genomic regions linked to loci under selection. The rate of mutation is typically considered independent of the fitness effects of new mutations, but (Li & Durbin 2011) noted that the influence of reduced heterozygosity due to selection is similar to the effect of reduced mutation rate. When masking coding sequences in the human genome they found 5% higher heterozygosity than the genomewide average and by making a corresponding correction of the mutation rate, the PSMC estimates from both data sets were nearly identical. In general, species‐wide (global) N estimates can be biased if there is population structure. In the case of PSMC analysis, historical‐ancient population structure would imply that coalescences occur less frequently. As a consequence, PSMC may overestimate N at any period of time when the species has been subdivided into separated populations (Li & Durbin 2011). Broadly speaking, the effect should be positively correlated with the number of isolated populations; the more structure within species, the more overestimated will N be at the time of separation. As the extent and character of ancient population structure (and how this has varied over time, for example during glacial cycles) is hard to know for most species, it is difficult to judge the effect on PSMC analyses, or on other coalescent‐based demographic methods for that matter. In summary, applying genetic methods for inferring demography comes with several caveats and so is the case also with PSMC analysis (see further Freedman et al. 2014; Deinum et al. 2015; Nadachowska‐Brzyska et al. 2015). An important aspect in future work using the PSMC approach should be to keep the caveats in mind when interpreting results. Also, attempts should be made to reduce biases caused by the quality of data. In the present study, we feel that we have been able to reduce the impact of several potential biases. We made an extensive investigation of the effects of variable data quality, and based on the results of this analysis, we used stringent filtering criteria. Moreover, with the access to a flycatcher genome assembly and a wealth of comparative genomic data based on sequence alignment to other bird species (Ellegren et al. 2012), we have access to detailed mutation rate estimates for the flycatcher lineage. Furthermore, accurate generation time estimates are available from life tables on fecundity and survivorship (Brommer et al. 2004). Importantly, the possibility of substantial differences in genomewide mutation rate and/or generation time among the investigated species seems negligible given their close relatedness and similar ecology. As a final remark it should be emphasized that the PSMC cannot accurately resolve demography in times relatively recent compared to the average TMRCA (the number of recent coalescent events joining sequences will in most cases be very small). An interesting solution to this limitation has recently been presented by Schiffels & Durbin (2014) by the introduction of the multiple sequential Markovian coalescent (MSMC). This method uses genome sequences from several individuals (but not too many, for reasons of the approximations involved and computational complexity) and focuses on the first coalescence between all pairs of haplotypes. It can also handle population structure. At present, the MSMC framework builds on haploid sequences and thus requires phased data, which can be difficult to retrieve for many organisms.

Demography of Ficedula flycatchers

The phylogenetic relationships among the four black‐and‐white Ficedula flycatchers have been a matter of discussion (Lundberg & Alatalo 1992; Sætre et al. 2001ab). Mitochondrial DNA (mtDNA) data suggested that they shared a common ancestor <1.5–2.0 Ma (Sætre et al. 2001a) and ABC modelling of whole‐genome sequences indicated that collared flycatcher and pied flycatcher diverged <1 Ma (Nadachowska‐Brzyska et al. 2013). An extensive phylogenomic analysis indicates that the four species radiated in quick succession (Nater at al. 2015). Several coalescence‐based species tree methods support an asymmetric topology with a sister relationship between Atlas flycatcher and pied flycatcher, with collared flycatcher as outgroup and semicollared flycatcher at the root. Parameter estimation suggest a 90% highest posterior density interval (HPDI) for divergence time between semicollared flycatcher and the other species of 815–1731 kya, between collared flycatcher and Atlas flycatcher‐pied flycatcher of 386–888 kya, and between Atlas flycatcher and pied flycatcher of 185–645 kya (Nater et al. 2015). Formally, identical shapes of N curves do not exclude that two or more populations were differentiated yet showed similar N dynamics. For example, divergent lineages may have been similarly affected by glacial cycles. However, the different PSMC curves displayed by the four species <1 Ma demonstrate that this has not been the case at least during that time period and it seems indeed unlikely that four species with geographically distinct distributions and variation in habit preferences (see further below) would have nearly identical N . We therefore favour the interpretation of convergent PSMC curves (similar N trajectories) of all four species up until 1–2 Ma as shared ancestry and demographic history at that time. This is consistent with what has previously been found and suggests that PSMC analyses of multiple species can accurately pinpoint the timing of lineage splitting. All four species (but only the Spanish population of pied flycatchers) have experienced at least one cycle of population expansion and decline, which is not unexpected for western Palearctic species during Quaternary periods of alternating glacials and interglacials (Fig. 7). We estimated that the first decline in all species started 100–200 kya, which means that it approximately corresponds to the time of glaciation period MIS 6 (191–130 kya; Lisiecki & Raymo 2005), indicating range contraction and associated reduction in abundance. The decline continued through the warmer MIS5e period (130–109 kya) and during the LGP. In contrast, pied flycatchers from central/northern populations increased in throughout the whole investigated time period, especially in more recent times. Given its current distribution over large parts of Europe including boreal forests up to a latitude of 70°N, and its tolerance of higher altitudes in central Europe than other Ficedula species, it may have been less sensitive to glacial periods and able to maintain stable and widespread populations even under such periods. While pied flycatchers breed in both coniferous and deciduous forests, collared flycatcher and semicollared flycatcher are limited to deciduous forests and their current distributions are restricted to southern Europe. It is interesting to note that after a long period of relatively slow increase in , the start of the more exponential increase of central/northern European pied flycatchers ≈ 50 kya approximately coincides with a relatively long period of decline of both collared flycatcher and semicollared flycatcher. It is tempting to see these as related events with a possible scenario being that declining collared flycatcher populations allowed pied flycatchers to expand into new habitats, or, conversely, that expanding pied flycatcher populations led to contraction of collared flycatchers. In areas where pied flycatcher and collared flycatcher currently occur in sympatry, there is strong interspecific competition for food sources and suitable nest cavities (Löhrl 1955; Alerstam et al. 1978;. Collared flycatcher has a more aggressive behaviour than its sister species and is dominant in competition over nest sites (Lundberg & Alatalo 1992). This may lead to the exclusion of pied flycatchers in areas where they co‐occur and forced use of less preferred habitats (Sætre et al. 1999a), seemingly inconsistent with the observed expansion of pied flycatcher populations concurrent with reduction of collared flycatcher populations. However, it has been suggested that the collared flycatcher is more sensitive to climate fluctuations, a hypothesis supported by the finding of a correlation between index of large‐scale climate variation and annual density fluctuations in collared flycatcher but not in pied flycatcher (Sætre et al. 1999b). An interesting topic for further research would therefore be to use climate data to model the change in distribution of the two species over time. The pronounced increase in of Spanish pied flycatchers starting 300–400 kya, not seen in other pied flycatcher populations, could potentially be due to gene flow. The trajectory of the N curve of Spanish pied flycatchers resembles that of Atlas flycatchers and its geographical vicinity (in the Atlas Mountains of Morocco, Algeria and Tunisia) and phylogenetic sister relationship to the pied flycatcher (Nater et al. in 2015) makes gene flow between these adjacent populations a possible explanation to a common pattern of change in . The Atlas flycatcher was previously regarded as a subspecies of pied flycatcher but its taxonomic status was revised by DNA evidence suggesting that is should be seen as a distinct species (Sætre et al. 2001a, b), a conclusion more recently supported by genomewide data (Nater et al. in 2015). The discrepancy in the timing of significant population expansion and the contrasting patterns of population dynamics subsequently shown by Spanish and other pied flycatcher populations could suggest long‐term structure in refugial populations on the Iberian Peninsula. For example, central/northern Europe populations may represent postglacial expansions from a distinct origin compared to the refugial Spanish population. This scenario would be compatible with the geographical heterogeneity of the Iberian Peninsula and its refugia‐within‐refugium structure (Gómez & Lunt 2007). The existence of distinct genetic lineages in Iberia has been reported for other species including birds (Centeno‐Cuadros et al. 2009; Ferrero et al. 2011; Miraldo et al. 2011; Velo‐Antón et al. 2012). Moreover, our results fit well with observed degrees of genetic differentiation among pied flycatcher populations in Europe with significant F ST estimates between Spanish and central/northern European populations (Lehtonen et al. 2009; Hogner et al. 2012; Backström et al. 2013).

Changes in N in the more recent past

Table 1 summarizes population‐size estimates through time in the investigated species, with focus on the recent past including the most recent PSMC estimates of N towards the end of LGP, ABC and IMa estimates of current N , and current census sizes (N). These numbers suggest dramatic fluctuations in population size in connection with the last glacial maximum (LGM) and subsequently during Holocene up to present. For example, PSMC estimated the N of central/northern European pied flycatchers at >1 000 000 40 kya. This is about an order of magnitude higher than current N , indicating a severe population decline after 40 kya (Table 1). On the other hand, the current census size of pied flycatchers in Europe – 36–60 million birds – indicating a striking recent population expansion. The combined data from estimates of population sizes in pied flycatcher thus suggest that range contractions by the end of LGM implied a significant reduction in population size. Then, as the ice sheet disappeared after the LGM, a dramatic population expansion took place, compatible with the successful recolonization of large parts of Europe including subarctic areas. Clearly, census and effective population size are two very different parameters and we note that the N /N ratio for pied flycatcher (≈0.001) is much lower than what is often estimated or assumed for many species (≈0.1; Frankham 1995). Such low N /N ratios as seen in pied flycatchers are indeed expected after population expansions (Kalinowski & Waples 2002; Hedrick 2005).
Table 1

Point estimates of recent (0–40 kya) N and current census sizes of the four black‐and‐white Ficedula species

SpeciesCensus sizeCurrent N e (ABC estimate)a Current N e (IMa estimate)b N e at the end of LGP estimated by PSMC
Atlas flycatcher?111 000–176 000600 000
Collared flycatcher4 200 000–7 200 00079 000232 000–435 000700 000–1 000 000
Pied flycatcher36 000 000–60 000 00031 000106 000–269 000>1 000 000
Semicollared flycatcher40 000–210 000136 000–235 000800 000

Data from Nadachowska‐Brzyska et al. (2013).

Data from Hogner et al. (2012). Intervals are from in each case three point estimates.

no estimates available.

Point estimates of recent (0–40 kya) N and current census sizes of the four black‐and‐white Ficedula species Data from Nadachowska‐Brzyska et al. (2013). Data from Hogner et al. (2012). Intervals are from in each case three point estimates. no estimates available. Collared flycatcher also shows recent fluctuations in population size, although less pronounced than in the case of the pied flycatcher. decreased from the LGM to the present and the species may recently have increased in size again. A different pattern is seen in the semicollared flycatcher (Fig. 7). As for the other species, decreased from the LGM to the present but there is no sign of recovery as census size is similar to current N (i.e. N /N ≈ 1). Habitat destruction is likely to underlie an ongoing decline (BirdLife International 2004) and the distribution is fragmented and the species listed a near‐threatened on the IUCN Red List (http://www.iucnredlist.org/details/full/22709319/0).

Conclusions

PSMC analysis offer unique possibilities to investigate population dynamics back in time. As illustrated in this study of four species and multiple populations of Ficedula flycatchers, it can reveal fluctuations in N during a time period when glacial cycles profoundly affected the climate and thereby the distribution of habitats and species. The fluctuations observed in this system suggest that demographic modelling including scenarios of constant population size or relatively simple trends of declines or expansions may be biologically less relevant and misleading. Coupled with recent effects of the LGP and LGM on the distribution and abundance of species, the complex demographic history displayed by these species with several cycles of population expansion and decline may very well be representative for many species in temperate regions of the Northern Hemisphere. It may also be of some general significance that our results demonstrate heterogeneity in N dynamics among closely related species, and even between populations within species (like in the case of the pied flycatcher). This would caution against extrapolation of demographic inference between lineages and call for adequate sampling to provide representative pictures of the coalescence process in different species or populations. Moreover, our results provide important recommendations and guidelines for the amount and quality of sequence data needed to make reliable inference of N dynamics. K.N.B. and H.E. conceived of the study and wrote the manuscript. K.N.B. performed the PSMC analyses. R.B. and L.S. processed raw sequencing data.

Data accessibility

DNA sequences: All sequences have been deposited to the European Nucleotide Archive under the BioProject Accession no. PRJEB7359. Table S1 Number of individuals per population with a mean genome coverage of ≥18X, and meeting the criteria of a per‐site filter of ≥10 reads and with no more than 25% of missing data. Fig. S1 PSMC results for a test population of collared flycatchers for different per‐site filtering regimes: no filtering (A), read depth ≥1/3 of the average read depth (B), read depth ≥6 (C). Fig. S2 PSMC estimates of changes in the effective population size over time for collared flycatchers from Italy. The numbers in the figure legend indicate mean coverage with percentage of missing data in brackets. Fig. S3 PSMC estimate of the effective population size change over time for collared flycatcher from Hungary. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S4 PSMC estimate of the effective population size change over time for collared flycatcher from Czech Republic. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S5 PSMC estimate of the effective population size change over time for pied flycatcher from Spain. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S6 PSMC estimate of the effective population size change over time for pied flycatcher from Sweden. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S7 PSMC estimate of the effective population size change over time for pied flycatcher from Czech Republic. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S8 PSMC estimate of the effective population size change over time for pied flycatcher from Ӧland. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S9 PSMC estimate of the effective population size change over time for Atlas flycatcher. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S10 PSMC estimate of the effective population size change over time for semicollared flycatcher. The numbers in figure legend indicate mean coverage and percentage of missing data in brackets. Fig. S11 PSMC estimates of changes in the effective population size over time for example individuals from four collared flycatcher populations, with bootstrap results indicated with thin lines. Fig. S12 Comparison of original PSMC results (red line) and PSMC results obtained after removing long runs (10 kb) of homozygosity (green line). Fig. S13 PSMC estimates of changes in the effective population size over time for example individuals from four pied flycatcher populations, with bootstrap results indicated with thin lines. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  71 in total

1.  Speciation, introgressive hybridization and nonlinear rate of molecular evolution in flycatchers.

Authors:  G P Saetre; T Borge; J Lindell; T Moum; C R Primmer; B C Sheldon; J Haavie; A Johnsen; H Ellegren
Journal:  Mol Ecol       Date:  2001-03       Impact factor: 6.185

Review 2.  Speciation, hybrid zones and phylogeography - or seeing genes in space and time.

Authors:  G M Hewitt
Journal:  Mol Ecol       Date:  2001-03       Impact factor: 6.185

3.  Adaptive plasticity in mate preference linked to differences in reproductive effort.

Authors:  A Qvarnström; T Pärt; B C Sheldon
Journal:  Nature       Date:  2000-05-18       Impact factor: 49.962

Review 4.  Perspective: gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies.

Authors:  S V Edwards; P Beerli
Journal:  Evolution       Date:  2000-12       Impact factor: 3.694

5.  An integrated framework for the inference of viral population history from reconstructed genealogies.

Authors:  O G Pybus; A Rambaut; P H Harvey
Journal:  Genetics       Date:  2000-07       Impact factor: 4.562

Review 6.  The genetic legacy of the Quaternary ice ages.

Authors:  G Hewitt
Journal:  Nature       Date:  2000-06-22       Impact factor: 49.962

7.  Cryptic evolution in a wild bird population.

Authors:  J Merilä; L E Kruuk; B C Sheldon
Journal:  Nature       Date:  2001-07-05       Impact factor: 49.962

8.  Adjustment to climate change is constrained by arrival date in a long-distance migrant bird.

Authors:  C Both; M E Visser
Journal:  Nature       Date:  2001-05-17       Impact factor: 49.962

9.  Hybridization and adaptive mate choice in flycatchers.

Authors:  T Veen; T Borge; S C Griffith; G P Saetre; S Bures; L Gustafsson; B C Sheldon
Journal:  Nature       Date:  2001-05-03       Impact factor: 49.962

10.  Lifetime Reproductive Success and Heritability in Nature.

Authors:  J Merilä; B C Sheldon
Journal:  Am Nat       Date:  2000-03       Impact factor: 3.926

View more
  65 in total

1.  Novel Candidate Genes Underlying Extreme Trophic Specialization in Caribbean Pupfishes.

Authors:  Joseph A McGirr; Christopher H Martin
Journal:  Mol Biol Evol       Date:  2017-04-01       Impact factor: 16.240

2.  Inferring Individual Inbreeding and Demographic History from Segments of Identity by Descent in Ficedula Flycatcher Genome Sequences.

Authors:  Marty Kardos; Anna Qvarnström; Hans Ellegren
Journal:  Genetics       Date:  2017-01-18       Impact factor: 4.562

Review 3.  Making sense of genomic islands of differentiation in light of speciation.

Authors:  Jochen B W Wolf; Hans Ellegren
Journal:  Nat Rev Genet       Date:  2016-11-14       Impact factor: 53.242

4.  One Hundred Years of Linkage Disequilibrium.

Authors:  John A Sved; William G Hill
Journal:  Genetics       Date:  2018-07       Impact factor: 4.562

5.  Complete genomes of two extinct New Zealand passerines show responses to climate fluctuations but no evidence for genomic erosion prior to extinction.

Authors:  Nicolas Dussex; Johanna von Seth; Michael Knapp; Olga Kardailsky; Bruce C Robertson; Love Dalén
Journal:  Biol Lett       Date:  2019-09-04       Impact factor: 3.703

6.  Breaking Free: The Genomics of Allopolyploidy-Facilitated Niche Expansion in White Clover.

Authors:  Andrew G Griffiths; Roger Moraga; Marni Tausen; Vikas Gupta; Timothy P Bilton; Matthew A Campbell; Rachael Ashby; Istvan Nagy; Anar Khan; Anna Larking; Craig Anderson; Benjamin Franzmayr; Kerry Hancock; Alicia Scott; Nick W Ellison; Murray P Cox; Torben Asp; Thomas Mailund; Mikkel H Schierup; Stig Uggerhøj Andersen
Journal:  Plant Cell       Date:  2019-04-25       Impact factor: 11.277

7.  Historical demography of common carp estimated from individuals collected from various parts of the world using the pairwise sequentially markovian coalescent approach.

Authors:  Zihao Yuan; Wei Huang; Shikai Liu; Peng Xu; Rex Dunham; Zhanjiang Liu
Journal:  Genetica       Date:  2018-01-03       Impact factor: 1.082

8.  Fluctuating fortunes: genomes and habitat reconstructions reveal global climate-mediated changes in bats' genetic diversity.

Authors:  Balaji Chattopadhyay; Kritika M Garg; Rajasri Ray; Frank E Rheindt
Journal:  Proc Biol Sci       Date:  2019-09-18       Impact factor: 5.349

9.  The evolutionary history and genomics of European blackcap migration.

Authors:  Kira Delmore; Juan Carlos Illera; Javier Pérez-Tris; Gernot Segelbacher; Juan S Lugo Ramos; Gillian Durieux; Jun Ishigohoka; Miriam Liedvogel
Journal:  Elife       Date:  2020-04-21       Impact factor: 8.140

10.  Intra-species differences in population size shape life history and genome evolution.

Authors:  David Willemsen; Rongfeng Cui; Martin Reichard; Dario Riccardo Valenzano
Journal:  Elife       Date:  2020-09-01       Impact factor: 8.140

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.