Literature DB >> 31693149

Divergent Selection and Primary Gene Flow Shape Incipient Speciation of a Riparian Tree on Hawaii Island.

Jae Young Choi1, Michael Purugganan1,2, Elizabeth A Stacy3.   

Abstract

A long-standing goal of evolutionary biology is to understand the mechanisms underlying the formation of species. Of particular interest is whether or not speciation can occur in the presence of gene flow and without a period of physical isolation. Here, we investigated this process within Hawaiian Metrosideros, a hypervariable and highly dispersible woody species complex that dominates the Hawaiian Islands in continuous stands. Specifically, we investigated the origin of Metrosideros polymorpha var. newellii (newellii), a riparian ecotype endemic to Hawaii Island that is purportedly derived from the archipelago-wide M. polymorpha var. glaberrima (glaberrima). Disruptive selection across a sharp forest-riparian ecotone contributes to the isolation of these varieties and is a likely driver of newellii's origin. We examined genome-wide variation of 42 trees from Hawaii Island and older islands. Results revealed a split between glaberrima and newellii within the past 0.3-1.2 My. Admixture was extensive between lineages within Hawaii Island and between islands, but introgression from populations on older islands (i.e., secondary gene flow) did not appear to contribute to the emergence of newellii. In contrast, recurrent gene flow (i.e., primary gene flow) between glaberrima and newellii contributed to the formation of genomic islands of elevated absolute and relative divergence. These regions were enriched for genes with regulatory functions as well as for signals of positive selection, especially in newellii, consistent with divergent selection underlying their formation. In sum, our results support riparian newellii as a rare case of incipient ecological speciation with primary gene flow in trees.
© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Metrosideros; ecological speciation; gene flow; genomic islands of divergence; incipient speciation; sympatric speciation

Mesh:

Substances:

Year:  2020        PMID: 31693149      PMCID: PMC7038655          DOI: 10.1093/molbev/msz259

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

A major aim of evolutionary biology is to understand the mechanisms underlying the formation of species. Of particular interest is the formation of species from within a panmictic population (Mayr 1963) or between populations that co-occur within geographic “cruising range” (Mallet et al. 2009), as gene flow between diverging populations is expected to erode linkage between alleles associated with differential adaptation and reproductive isolation, thus reversing speciation (Turelli et al. 2001). As a result, the “speciation-with-gene-flow” (i.e., sympatric speciation) model is highly controversial (Bolnick and Fitzpatrick 2007; Fitzpatrick et al. 2008; Nosil 2008). Moreover, when diverging populations are connected by ongoing (“primary”) gene flow without any period of physical isolation (i.e., allopatry), speciation is generally not expected (Felsenstein 1981; Dieckmann and Doebeli 1999; Kondrashov and Kondrashov 1999; Turelli et al. 2001). In fact, recent analyses enabled by the availability of population-level genome sequence data have challenged the few purported cases of speciation with primary gene flow (Foote 2018) by revealing historical periods of allopatry between the diverging lineages or secondary contact with an allopatric lineage (i.e., “secondary” gene flow) (Alcaide et al. 2014; Martin, Cutler, et al. 2015; Malinsky et al. 2018). In the latter scenario, gene flow from an external lineage may facilitate speciation, since reproductive isolating barriers are more easily formed in allopatry (Otto et al. 2008). Sympatric speciation can thus be differentiated into primary and secondary models (Foote 2018) with the expectation that the former is less likely to occur, based on theoretical predictions and the paucity of empirical evidences (Richards et al. 2019). Despite the rarity of cases, speciation with primary gene flow is an intriguing model of speciation for evolutionary biologists, one that requires the integration of both ecology and genetics for its inference (Fitzpatrick et al. 2008). Sympatric speciation occurs predominantly through strong disruptive selection (Kirkpatrick and Ravigné 2002; Bolnick and Fitzpatrick 2007). When incipient species inhabit different environments, divergent selection (i.e., selection acting in contrasting directions between populations or selection favoring phenotypes of opposite extremes; Rundle and Nosil 2005; Via 2012) favors locally adaptive alleles that are beneficial in the alternative ecological niches. In the ecological speciation model (Schluter 2000), reproductive isolation evolves between populations as a consequence of ecologically based divergent selection (but see Coyne and Orr 2004 for nonecological mechanisms). For instance, when a pair of incipient species have adapted to contrasting environments, reproductive barriers can arise in the form of immigrant inviability, where selection acts against migrants in each environment (Nosil et al. 2005). In plants, immigrant inviability is a strong isolating mechanism (Lowry et al. 2008; Westberg et al. 2010; Melo et al. 2014; Baack et al. 2015), suggesting that ecological speciation may be a principal driver of incipient speciation by restricting gene flow between sympatric populations through a process analogous to reinforcement (Servedio and Noor 2003; Nosil et al. 2005). The ecotypes that arise through divergent selection should represent the earliest stage of speciation (Rundle and Nosil 2005). These lineages are likely to be only partially reproductively isolated, which makes them ideal for studying the initial stages of barrier formation (Via 2009; Stacy et al. 2017) before the onset of confounding genetic changes that accumulate once speciation is complete (Matute et al. 2010). At the genomic level, under the ecological speciation-with-gene-flow model (Via 2009) gene flow will homogenize the genomic landscape and prevent the divergence of regions not directly under divergent selection or linked to such regions (Feder et al. 2012; Via 2012; Seehausen et al. 2014). Intense interest has focused on identifying these genomic islands of divergence (Turner et al. 2005; Harr 2006; Nosil, Harmon, et al. 2009), since these regions should contain the genetic variants involved in speciation (Turner et al. 2005; Nadeau et al. 2012; Renaut et al. 2013; Carneiro et al. 2014; Poelstra et al. 2014; Malinsky et al. 2015; Marques et al. 2016; Han et al. 2017; Riesch et al. 2017). Any genomic region resistant to gene flow would form localized peaks of differentiation that can be readily detected through genome-wide scans of differentiation (e.g., FST) (Seehausen et al. 2014; Wolf and Ellegren 2017). However, because differential selection rather than differential gene flow can also produce islands of divergence through genetic hitchhiking or background selection (Burri et al. 2015; Delmore et al. 2015; Irwin et al. 2016), absolute measures of differentiation such as Dxy have been proposed as an alternative for identifying islands of divergence (Nachman and Payseur 2012; Cruickshank and Hahn 2014). In plants, studies of the role of ecological divergence in the formation of reproductive barriers at the genic/genomic level are few and involve just a handful of “model” species (Lexer and Widmer 2008). Reconstructing the past demographic histories of the diverged populations and using the combined measures of FST and Dxy in multiple populations can aid our understanding of the genomic changes that accompany incipient speciation (Ma et al. 2018). Metrosideros Gaud. (Myrtaceae) is a young (likely 3–4 myo; Percy et al. 2008; Dupuis et al. 2019) woody species complex that dominates Hawaii’s native flora, comprising >20 vegetatively distinct (Dawson and Stemmermann 1990; Stacy and Sakishima 2019) but interfertile taxa that are nonrandomly distributed across ecotones and environmental gradients on each of the main islands (Mueller-Dombois 1987; Dawson and Stemmermann 1990). Differential local adaptation across Hawaii’s heterogeneous landscape (Corn and Hiesey 1973; Stemmermann 1983; Drake 1993; Vitousek et al. 1995; Kitayama et al. 1997; Cordell et al. 1998; Morrison and Stacy 2014; Ekar et al. 2019) and partial reproductive isolation (Rhoades 2012; Stacy et al. 2017) have evolved among Metrosideros taxa despite their long life spans (Hart 2010) and the high dispersibility of their seeds and pollen (Carpenter 1976; Corn 1979; Drake 1993), two characters that are expected to impede diversification (Petit and Hampe 2006). Metrosideros appears to be an example of incipient adaptive radiation with gene flow in trees (Stacy et al. 2014; Stacy and Sakishima 2019). Metrosideros polymorpha var. newellii (hereafter newellii) is an incipient species endemic to the waterways of the windward (wet) side of Hawaii Island. Newellii is a morphologically distinct, narrow-leaved ecotype (fig. 1) that appears to derive from the archipelago-wide, broad-leaved M. polymorpha var. glaberrima (hereafter glaberrima; fig. 1) (Stacy et al. 2014) following a progenitor-derivative speciation model (Crawford 2010). Riparian zones on Hawaii Island are narrow (typically the width of a single tree on either side of the waterway) and embedded within glaberrima-dominated wet forest. Along waterways, mating opportunities among adults of newellii and glaberrima are presumably random, consistent with a strict definition of sympatry (Barraclough and Vogler 2000; Gavrilets 2003; Butlin et al. 2008; Fitzpatrick et al. 2008, 2009; Mallet et al. 2009), and apparent hybrids are occasionally found. For sympatric, conspecific trees, glaberrima and newellii show surprisingly strong genetic differentiation (mean FST at SSR loci = 0.094 vs. a mean pairwise FST = 0.054 among the three common varieties on the island [i.e., all except newellii]; Stacy et al. 2014).
. 1.

Genetic relationship between glaberrima and newellii, two varieties of Metrosideros polymorpha. GH1, GH2, and N are the genetic populations identified through this study. (A) Geographic locations of the 40 sampled trees of glaberrima and newellii. Inset shows the leaf morphology of glaberrima and newellii. (B) ADMIXTURE plot of K = 2, 3, 4, and 5 for the 40 trees. Individuals from GH1, GH2, and N are indicated with ^, *, and #, respectively. (C) PCA plot of the 40 trees. (D) Neighbor-joining tree of the 40 trees. Nodes with >90% bootstrap support are indicated with black circles.

Genetic relationship between glaberrima and newellii, two varieties of Metrosideros polymorpha. GH1, GH2, and N are the genetic populations identified through this study. (A) Geographic locations of the 40 sampled trees of glaberrima and newellii. Inset shows the leaf morphology of glaberrima and newellii. (B) ADMIXTURE plot of K = 2, 3, 4, and 5 for the 40 trees. Individuals from GH1, GH2, and N are indicated with ^, *, and #, respectively. (C) PCA plot of the 40 trees. (D) Neighbor-joining tree of the 40 trees. Nodes with >90% bootstrap support are indicated with black circles. Riparian zones on Hawaii Island are harsh, and disruptive selection across a sharp ecotone results in partial reproductive isolation through significant reciprocal immigrant inviability (Nosil et al. 2005) in adjacent forest and riparian environments (Ekar et al. 2019). Seedlings of newellii are better adapted than those of glaberrima to the strong water current and secondarily to the high-light stress of these environments (Ekar et al. 2019). Whether additional reproductive isolating barriers exist is not yet known. Regardless, the morphological and ecological divergence and strength of genetic differentiation observed between newellii and glaberrima equal or exceed those of other documented incipient species pairs or radiations (Motley and Carr 1998; Kocher 2004; Huber et al. 2007; Knope et al. 2012). Although newellii is endemic to Hawaii Island (0.5 Ma; Clague 1996), an archipelago-wide analysis of SSR variation in Hawaiian Metrosideros suggested that it may have arisen from glaberrima on the prehistoric island of Maui Nui, comprising modern-day Maui, Lanai, Molokai, and Kahoolawe (≤1.8 Ma; Clague 1996; Price and Elliott-Fisk 2004). We investigated the demographic history of M. polymorpha during its colonization of Hawaii Island, the relationship between newellii and glaberrima, and the genomic regions that were involved in the origin of newellii. We found evidence of extensive admixture between close and divergent lineages, including evidence that incipient speciation between glaberrima and newellii is consistent with a speciation-with-primary-gene-flow model. Examination of the genomic islands of divergence that formed during incipient speciation revealed evidence of positive selection on genes associated with the regulation of gene expression, consistent with the origin of newellii through disruptive selection across a forest-riparian ecotone. We estimate that newellii emerged 0.3–1.2 Ma. Our results shed light on the genomic changes associated with local adaptation and diversification within a long-lived and highly dispersible tree.

Results

Forty-two adults were chosen for genome sequencing, including 18 glaberrima and 20 newellii adults from Hawaii Island, two glaberrima adults from Molokai, and two closely related species (M. rugosa and M. tremuloides) from Oahu (fig. 1 also see supplementary fig. S1, Supplementary Material online, for locations of the Hawaii Island samples). Genome coverage was on an average ∼19.9×, ranging from 1× to 43× per individual (supplementary table S1, Supplementary Material online). For the majority of the samples >95% of the reads could be aligned to the reference M. polymorpha genome (supplementary table S1, Supplementary Material online) (Izuno et al. 2016). Because the reference genome was still in a draft state with over 55,000 scaffolds, however, we restricted alignments to just scaffolds that were >100,000 bp (totaling 285 Mb of the 304-Mb genome assembly) to avoid analyzing potential repetitive regions in unassembled contigs.

Population Structure of Glaberrima and Newellii on Hawaii Island

Because of the varying sequence depth among individuals, population relationships were examined using a complete probabilistic approach to account for the uncertainty in genotypes (Fumagalli et al. 2014; Korneliussen et al. 2014). Polymorphic sites were pruned randomly to minimize the effect of linkage between physically close sites. Using the ADMIXTURE method (Alexander et al. 2009; Skotte et al. 2013), ancestry proportions for each individual were estimated by varying the assumed number of ancestral populations (K) from 2 to 7 (supplementary fig. S2, Supplementary Material online). At K = 2, individuals were largely divided into two groups representing the two varieties (fig. 1). Principal components analysis (PCA) corroborated the K = 2 ancestry result with the first component largely dividing the individuals by variety (fig. 1). There were several individuals with an admixed ancestry (labeled with * in fig. 1), which in principal component space were positioned between the two genetic groups (varieties). The phylogenetic analysis also revealed a largely complete division of individuals by variety (fig. 1). Genetic structure was observed within each variety as well (fig. 1). We used monophyly (fig. 1) to identify three core groups (GH1, GH2, and N). GH1 comprised eight glaberrima trees from the Alakahi Bog on Kohala Volcano that formed a separate cluster in the principal component space (fig. 1) and showed a unique ancestry in the ADMIXTURE analyses with higher K values (fig. 1). GH2 comprised four closely spaced glaberrima trees situated alongside newellii at Kalohewahewa Stream. These four individuals, along with three others (E70, E72, E74) at the Wailuku_A site (supplementary fig. S1, Supplementary Material online) were glaberrima trees sampled immediately adjacent to newellii trees along waterways (fig. 1). These three groups were evident in the PCA and ADMIXTURE analysis regardless of the SNP data set used (i.e., including or excluding more SNPs to account for linkage between SNPs. See supplementary figure S3, Supplementary Material online, for ADMIXTURE and supplementary figure S4, Supplementary Material online, for PCA results, respectively). Phylogenetically, GH2 was sister to newellii (fig. 1) and in principal component space was closer to newellii than was GH1 (fig. 1). The ancestry proportions for GH2 were inconsistent across K values, but the results for K = 2, 3, and 4 indicated that ancestry of GH2 included newellii and a glaberrima population that was not related to GH1. Group N comprised 14 individuals of newellii that occur along the largest waterway on Hawaii Island, the Wailuku River. Although individuals from the N group clustered together in principal component space (fig. 1), there was evidence of substructure within the group. Trees from the most up-river population (Wailuku_A) were most diverged in the phylogenetic tree and formed their own cluster at higher K values (K = 4 and 5; fig. 1). Among the individuals excluded from these three monophyletic groups were two glaberrima trees (E70 and E74) that were genetically similar to glaberrima from Molokai despite occurring alongside the most divergent newellii subpopulation (i.e., Wailuku_A; fig. 1). Hence, there may be further genetic structure with both varieties on Hawaii Island, but low sample sizes from these additional subgroups precluded further analysis. There were two other individuals (H196 and H504) with unusually low coverage that grouped together in PC space. Although we implemented the ANGSD method, which takes low coverage into account, we excluded these samples from downstream analyses in case their position were artifacts of poor coverage. Lastly, two trees, H439 and H493, from northern waterways were phenotypically characterized as newellii, but were more closely related to the glaberrima group, and a single phenotypically defined glaberrima (E72) grouped within newellii (fig. 1).

Admixture between Glaberrima and Newellii on Hawaii Island

We limited subsequent analyses to the GH1, GH2, and N populations, glaberrima from Molokai (individuals X83 and X36, hereafter referred as GM), and the two closely related species M. rugosa, and M. tremuloides. There were 8,436,114 variable positions across all samples, 7,566,381 variable positions within M. polymorpha, and 6,818,011 variable positions among the samples from Hawaii Island. Consistent with the genotype likelihood-based tree (fig. 1), the phylogenetic relationships from computationally called genotypes indicated a tree topology of [(((GH2, N), GH1), GM), M. rugosa, M. tremuloides] (supplementary fig. S5, Supplementary Material online). Since admixture events occurring between branches are not detected in a simple bifurcating tree, we used methods that quantify, or test for evidence of, admixture occurring between lineages. To characterize the relationships among GH1, GH2, and N on Hawaii Island, we implemented a topology-weighting method (Martin and Van Belleghem 2017). Using one of GM,M. rugosa, or M. tremuloides as the outgroup, we conducted sliding-window analyses and in each window quantified the taxon topology weight, which is a count of all of the unique subtrees within which a single individual of each taxon is represented. Initially, results showed that regardless of the outgroup used, no single topology dominated the weighting (fig. 2), suggesting that many variants are shared among GH1, GH2, and N. The choice of outgroup, however, did affect the majority topology. Specifically, with GM or M. rugosa as the outgroup, the topology that grouped N and GH2 as sisters had the highest weighting (fig. 2 dotted box). In contrast, with M. tremuloides as the outgroup the topology that grouped N and GH2 as sisters had the lowest weighting, consistent with admixture between M. tremuloides and either GH2 or N.
. 2.

Relationships among the three focal populations of Metrosideros polymorpha on Hawaii Island. (A) Topology weight for a three-population topology under three different outgroups. Dotted box represents the topological relationship obtained from genome-wide data. (B) Treemix graph of the three focal populations and three outgroups assuming four migration edges. (C) Results of f4 tests of admixture in the m = 4 treemix graph.

Relationships among the three focal populations of Metrosideros polymorpha on Hawaii Island. (A) Topology weight for a three-population topology under three different outgroups. Dotted box represents the topological relationship obtained from genome-wide data. (B) Treemix graph of the three focal populations and three outgroups assuming four migration edges. (C) Results of f4 tests of admixture in the m = 4 treemix graph. Admixture among the three Hawaii Island populations and each of the outgroup populations was examined using treemix (Pickrell and Pritchard 2012) by fitting migration edges on a bifurcating tree (fig. 2). The initial treemix tree with zero migration recapitulated the population relationships recovered using the topology-weighting method, while the treemix graphs with increasing numbers of migration edges showed different relationships (supplementary fig. S6, Supplementary Material online). Briefly, the first migration edge was fitted between M. tremuloides and the common ancestor of GH2 and N, consistent with the topology-weighting results. At two migration edges, there was admixture between GH1 and N, whereas in higher migration models the common ancestor of all three Hawaii Island populations admixed into N. At three migration edges, there was admixture between GM and GH2, while at four migration edges, an unsampled M. rugosa-like population was admixed with N. At five migration edges, there was additional gene flow originating from the common ancestor of the Hawaii Island populations into GH2. This suggests that five migration edges may over-fit the data or that Hawaii Island M. polymorpha may have originated as a hybrid swarm (Jeffery et al. 2017; Wang et al. 2017; Richards et al. 2018). In addition, at five migration edges, the log-likelihood of the model started to plateau (supplementary fig. S7, Supplementary Material online), suggesting the four migration edge model best explains the admixture history of these Metrosideros populations. The treemix results suggested that N and GH2, the two sister groups from Hawaii Island, had extensive evidence of admixture with other populations from both Hawaii Island and one or more older islands. To specifically test for evidence of introgression, we implemented the four-population (f4) test (Reich et al. 2009). All significant f4 test results were consistent with the treemix result with four migration edges (fig. 2) and revealed two major admixture patterns. 1) The significant f4 test on the topology [(GH2, N);(GH1, T or GM)] was consistent with evidence of admixture between N and GH1, and admixture between GH2 and GM. The nonsignificance of the topology [(GH2, N);(GH1, R)] is likely due to N but not GH2 sharing alleles with both GH1 and M. rugosa. 2) The significant f4 test on topology [(GH2 or N, GH1);(R, T)] was consistent with M. tremuloides admixing with either GH2 and N, or their common ancestor. This also indicated that among the three Hawaii Island groups, GH1 was the least affected by admixture from lineages on older islands.

Demographic History of Hawaii Island Glaberrima and Newellii

Initially, we focused on the demographic histories of GH1 and N to infer the recent splitting event that led to the formation of newellii. Although the close evolutionary relationship observed between GH2 and N may initially suggest sympatric divergence, we focused on the comparison between GH1 and N for two reasons. First, geographically, the GH1 population inhabits the oldest volcano on Hawaii Island (fig. 1), and its proximity to the second youngest island, Maui, suggests that GH1 may represent one of the earliest colonizing Metrosideros lineages on Hawaii Island. Because GH1 was a potential initial colonizer, subsequent lineages may have derived from, or interacted with, this population, making GH1 a population of interest in analyses of Hawaii Island Metrosideros. Second, while GH2 and N are geographically sympatric, GH2 had evidence of admixture with several allopatric lineages, especially with the GM group (fig. 2), suggesting that secondary gene flow may have contributed to the divergence between GH2 and N (see Discussion for further detail). Because we were interested in the genomic changes associated with speciation with primary gene flow, we examined genomic patterns of divergence between GH1 and N. Levels of polymorphism were similar between the two groups (median θπ = 0.0041 and 0.0043 for GH1 and N, respectively), while linkage disequilibrium decreased rapidly to low levels (r2 < 0.2) within 1 kb for both populations (supplementary fig. S8, Supplementary Material online). Median FST between GH1 and N was 0.043 (supplementary fig. S9, Supplementary Material online). The demographic histories of GH1 and N were estimated using the diffusion approximation method of δaδi (Gutenkunst et al. 2009). Twenty demographic models (supplementary fig. S10, Supplementary Material online) (Portik et al. 2017) were fit onto the observed 2D joint-site-frequency spectrum (2D-SFS), which was estimated using genotype likelihoods between GH1 and N (see supplementary table S2, Supplementary Material online, for all models with their estimated parameters and log-likelihood). The top three best-fitting models for the 2D-SFS all involved an ongoing or secondary contact-based gene flow between GH1 and N (asymmetric secondary gene flow model AIC = 2076.62; symmetric primary gene flow model AIC = 2111.68; asymmetric primary gene flow model AIC = 2124.5). We also used the 2D-SFS generated from genotype calls to fit the 20 demographic models (supplementary fig. S10, Supplementary Material online) and found the top three best-fitting models (see supplementary table S3, Supplementary Material online, for all δaδi results using genotype calls) again involved an ongoing or secondary contact-based gene flow (symmetric primary gene flow model AIC = 255.26; asymmetric primary gene flow model AIC = 257.14; secondary contact with symmetric migration and population size change model AIC = 257.56). Regardless of whether the underlying 2D-SFS was estimated using genotype likelihood or genotype calls, the demographic model with primary gene flow and without a population size change consistently appeared among the top three best-fitting models. The top three best-fitting models were then further optimized to identify the best-fitting demographic model (supplementary table S4, Supplementary Material online). The optimized best-fitting model included ongoing asymmetric migration between GH1 and N (AIC = 2037.04; see supplementary fig. S11, Supplementary Material online, for model fit). This model was a large improvement from the initial models (AIC difference >39), and the difference in AIC from the second best-fitting model (ΔAIC) was 18.8. The model estimated a higher level of gene flow from GH1 into N, while the effective population size of GH1 was slightly higher than that for N (fig. 3).
. 3.

Estimation of demographic parameters for N and GH1 (the glaberrima closely related and purported progenitor to newellii) on Hawaii Island. (A) δaδi parameter estimates for the best-fitting model. Effective population sizes of current populations are scaled by the effective population size of the ancestral population (Nanc) that is arbitrarily set to 1. Time is reported in unscaled units of Tdiv where Tdiv×2Nanc=τdiv (divergence time in generations). Migration rates are reported in units of m12 where m12/2Nanc=M12 (proportion of individuals in each generation that are new migrants from population 2 to population 1). Bootstrap parameter estimates are shown in square brackets. nuGH1: effective population size of GH1; nuN: effective population size of N; m12: migration rate from N to GH1; m21: migration rate from GH1 to N; and Tdiv: unscaled divergence time between GH1 and N. (B) Divergence time (years) between GH1 and N under various mutation rates and generation times. Bootstrap confidence intervals are shown in shaded areas.

Estimation of demographic parameters for N and GH1 (the glaberrima closely related and purported progenitor to newellii) on Hawaii Island. (A) δaδi parameter estimates for the best-fitting model. Effective population sizes of current populations are scaled by the effective population size of the ancestral population (Nanc) that is arbitrarily set to 1. Time is reported in unscaled units of Tdiv where Tdiv×2Nanc=τdiv (divergence time in generations). Migration rates are reported in units of m12 where m12/2Nanc=M12 (proportion of individuals in each generation that are new migrants from population 2 to population 1). Bootstrap parameter estimates are shown in square brackets. nuGH1: effective population size of GH1; nuN: effective population size of N; m12: migration rate from N to GH1; m21: migration rate from GH1 to N; and Tdiv: unscaled divergence time between GH1 and N. (B) Divergence time (years) between GH1 and N under various mutation rates and generation times. Bootstrap confidence intervals are shown in shaded areas. We then tested if different proportions of the ancestral population contributed to GH1 and N, testing whether N was derived from a smaller ancestral population than GH1 (Charles et al. 2018). Specifically, the model was set up by allowing a proportion of the ancestral population to form N, and proportion of the ancestral population to form GH1. This approach is analogous to testing an island colonization model, where GH1 originates as the “mainland” population while N originates as a potentially small environmentally restricted “island” population. Seven scenarios (supplementary fig. S12, Supplementary Material online) were tested, and all models indicated almost equal proportions of the ancestral population contributed to GH1 and N ( was between 0.45 and 0.49 for all tested models; supplementary table S5, Supplementary Material online). After further optimization, the best-fitting model was one that assumed an asymmetric secondary gene flow between GH1 and N while (AIC = 2042.46; supplementary table S6, Supplementary Material online). This best-fitting island model, however, was less fit than the model of a simple split with continuous asymmetric gene flow (ΔAIC = 5.42). Further, none of the demographic models allowing a change in population size was fit in any of the island or non-island δaδi models. In the end, these results suggested that N was not likely to have derived from GH1. Rather, the results were consistent with divergence of the ancestral population into the two ecotypes, GH1 and N, with primary gene flow. We also used δaδi to estimate the demography between GH2 and N by testing the same 20 demographic models (see supplementary fig. S10, Supplementary Material online, for models and supplementary table S7, Supplementary Material online, for initial optimization results). The top three best-fitting models were symmetric primary gene flow (AIC = 1095.3), two epochs with asymmetric gene flow (AIC = 1112.98), and symmetric primary gene flow with population size change (AIC = 1143.94). After further optimization the best-fitting model was a two-epoch model of asymmetric migration (AIC = 1050.2; supplementary table S8, Supplementary Material online). The resulting unscaled divergence time (Tdiv) between GH2 and N (0.52) almost overlapped the bootstrap confidence interval of the divergence time for GH1 and N (0.54–1.38). To infer the evolutionary history of both Hawaii Island (N, GH1, and GH2) and non-Hawaii Island (GM, M. rugosa, M. tremuloides) species together, the Bayesian coalescence method of G-PhoCS (Gronau et al. 2011) was used to estimate the demographic histories from the genome-wide variations of a single representative individual from each Metrosideros group. Because the topological relationship between M. rugosa and M. tremuloides was not certain, we conducted G-PhoCS analysis assuming two topologies 1) where M. rugosa was outgroup to all species and 2) where M. rugosa and M. tremuloides were sisters (supplementary fig. S13, Supplementary Material online). Our previous analysis could not determine the precise direction of gene flow between populations (i.e., for the pairs GH1GH2, GH1–N, GH2–N, GH2GM, GH2–M. tremuloides, and N–M. tremuloides). Because of this, for the G-PhoCS analysis, we fitted bidirectional migration bands between lineages with significant evidence of admixture from the treemix and f4 tests (fig. 2). Results showed that parameter estimates for effective population size and migration rate were not affected by the underlying topology (supplementary fig. S13, Supplementary Material online). Effective population size was highest for GM followed by GH2. The confidence intervals of the parameter estimates for GH1 and M. rugosa overlapped, suggesting similar population sizes. Group N had the lowest effective population size. Migration bands largely recapitulated the evidence of gene flow detected from the admixture test results; however, the direction of gene flow was always from an older to a younger lineage (supplementary fig. S13, Supplementary Material online), which is a known bias of G-PhoCS (Gronau et al. 2011). Interestingly, the admixture between M. tremuloides and both N and GH2 detected through treemix and f4 test was not evident with G-PhoCS. Meanwhile, parameter estimates for N and GH2 fluctuated the most, reflecting the difficulty of estimating demographic parameters for recent lineages using G-PhoCS (Gronau et al. 2011). In the end, G-PhoCS was used primarily as a method to incorporate the effect of gene flow on divergence time estimation with two major aims: 1) to compare G-PhoCS-estimated divergence times to the δaδi-estimated divergence times, and 2) to infer the divergence times for other Metrosideros lineages that were not amenable to δaδi analysis. δaδi estimated Tdiv between GH1 and N can be converted to years (i.e., scaled divergence time, τdiv) with a mutation rate and generation time, both of which are unknown for Metrosideros. Hence, we estimated τdiv assuming a range of mutation rates (Arabidopsis mutation rate = 7 × 10−9; Ossowski et al. 2010, Prunus mutation rate = 9.5 × 10−9; Xie et al. 2016) and generation times that were biologically plausible (i.e., 10, 15, and 20 years per generation) for Metrosideros (fig. 3). τdiv varied the most at lower mutation rates, but within previously known plant mutation rates (>7 × 10−9 mutations per bp per generation) all estimated divergence times were within 1.25 My regardless of generation time. The G-PhoCS analysis indicated that regardless of the species topology used for the demographic modeling, the confidence intervals of Tdiv estimated for the common ancestor of the Hawaii Island group (N, GH1, and GH2) overlapped with the confidence intervals of Tdiv estimated for the common ancestor of all Metrosideros (supplementary fig. S13, Supplementary Material online). For the topology that assumed M. rugosa and M. tremuloides as sisters, Tdiv was significantly higher for the Hawaii Island group (4.43 × 10−4 [4.33 × 10−4–4.54 × 10−4]) compared with the Tdiv between M. rugosa and M. tremuloides (2.99 × 10−4 [2.75 × 10−4–3.23 × 10−4]). Tdiv values estimated using M. rugosa as the outgroup to all lineages showed no significant difference between the timing of the split that led to M. tremuloides (4.21 × 10−4 [4.11 × 10−4–4.32 × 10−4]) and the split that led to M. rugosa (4.23 × 10−4 [4.14 × 10−4–4.32 × 10−4]), which also overlapped the Tdiv estimated for the common ancestor of the Hawaii Island group (4.20 × 10−4 [4.09 × 10−4–4.30 × 10−4]). We note in both topologies the Tdiv estimated for the common ancestor of the Hawaii Island group had confidence intervals that almost overlapped each other, suggesting the topology does not affect this divergence time estimate. Tdiv was converted into years (τdiv) using the same approach as the δaδi analysis by using a range of mutation rate and generation times. Given known mutation rates in plants (>7 × 10−9 mutations per bp per generation), τdiv estimates for the Hawaii Island populations were within 1.5 My regardless of generation time (supplementary fig. S14, Supplementary Material online) and overlapped with the estimate from the δaδi analysis.

Genomic Islands of Divergence between GH1 and N

The genomic landscape of differentiation between GH1 and N was estimated through local genomic windows of FST values. FST was Z-transformed (zFST), and genomic regions with zFST >4 were considered significant outliers (Han et al. 2017) (fig. 4). Using a lower zFST threshold of 3 did not change the results (supplementary fig. S15A, Supplementary Material online). Genomic islands with increased levels of relative divergence (FST) were identified in 237 of 27,508 windows (merged into 147 nonoverlapping windows) for a total of 2.37 Mb. Compared with the genomic background, these islands also had significantly elevated levels of absolute divergence (Dxy) (fig. 4 Mann–Whitney U [MWU] P ≤1e−10). Between GH1 and GH2, the same genomic windows that had elevated levels of Dxy between GH1 and N were also elevated between GH1 and GH2, indicating that the genomic regions that formed islands of divergence between GH1 and N were also shared between GH2 and N. Between GH2 and N, however, Dxy in these genomic islands was significantly lower than that observed between GH1 and N (supplementary table S9, Supplementary Material online).
. 4.

Patterns of genome-wide divergence between GH1 and N. (A) Genome-wide zFst values calculated from 10 kb windows. Scaffolds were ordered from longest to shortest. (B) Absolute measures of divergence (Dxy) between genomic islands of divergence (zFST>4) and the genomic background (zFST<4). (C) Selective sweep test statistics for genomic islands of divergence and the genomic background. Significant differences are indicated with *<0.05, **<0.01, and ***<0.001.

Patterns of genome-wide divergence between GH1 and N. (A) Genome-wide zFst values calculated from 10 kb windows. Scaffolds were ordered from longest to shortest. (B) Absolute measures of divergence (Dxy) between genomic islands of divergence (zFST>4) and the genomic background (zFST<4). (C) Selective sweep test statistics for genomic islands of divergence and the genomic background. Significant differences are indicated with *<0.05, **<0.01, and ***<0.001. Levels of polymorphism were significantly lower in genomic islands relative to background in both GH1 and N (supplementary fig. S16, Supplementary Material online; MWU P≤1e−15), suggesting that the increased FST and Dxy were partly caused by those regions being frequent targets of selection. We measured haplotype homozygosity (H12) (Garud et al. 2015) and haplotype length (H) (Schlamp et al. 2016), and performed an LD-based test of selection (ωmax) (Kim and Nielsen 2004), and found that all selective sweep statistics were significantly elevated in genomic islands (fig. 4 MWU P≤1e−10) for both GH1 and N. N, however, had higher levels of selective sweep statistics compared with GH1. This result did not change whether we decreased the window size to 5 kb (supplementary fig. S15B, Supplementary Material online), or increased it to 50 kb (supplementary fig. S15C, Supplementary Material online). Because our analysis used a SNP data set that was imputed and phased, we also used the original unimputed/unphased SNP data and recalculated the selective sweep statistics to examine any potential artifacts arising from imputation. Results showed that the selective sweep statistics from the unimputed/unphased SNP data were lower than those from the imputed SNP set (supplementary fig. S15D, Supplementary Material online vs. fig. 4); however, the selective sweep statistics remained significantly elevated in the genomic islands. We note that recurrent sweeps that affect the same genomic region in both the common ancestor and the current population are expected to lead to a decrease in Dxy (Cruickshank and Hahn 2014). Our observations of increased Dxy and evidence of selection, in contrast, suggest that islands of divergence were shaped by positive selection on divergent ancestral haplotypes predating the split between GH1 and N (Nosil, Funk, et al. 2009). We then examined if the secondary contact between GH1 and/or N and outgroup species on older Hawaiian Islands (GM, M. rugosa, and M. tremuloides) had contributed to the formation of the genomic islands of divergence. If secondary gene flow was recurrent, the genomic islands of divergence identified between GH1 and N would also have elevated Dxy levels between GH1 or N and the relevant outgroup. However, no such effect was observed (supplementary table S9, Supplementary Material online), suggesting that sufficient levels of genome-wide divergence had eroded the genomic islands in allopatric comparisons (Feder et al. 2012). We then further examined the N group because of its extensive evidence of admixture with outgroup species, in particular M. tremuloides, and its incipient speciation status. We tested whether genomic islands had origins relating to M. tremuloides by estimating localized windows of f4 statistics for the topology [GH1, N; GM, tremuloides]. Because GH1 was least affected by admixture with any of the outgroups (fig. 2), an excess of positive f4 statistics across local windows would indicate increased admixture between N and M. tremuloides in the region. There was no significant difference in f4 statistics between the genomic islands of divergence and the genomic background. In addition, we calculated the genic proportion of each window to examine if the admixture had affected coding sequences more than noncoding regions, suggesting a role for selection maintaining the introgression. No significant correlation was seen between the proportion of gene sequence per window and either positive or negative f4 statistics (supplementary fig. S17, Supplementary Material online). These results, in the end, suggested that secondary contact with allopatric lineages did not contribute to the incipient speciation of N. Gene ontology (GO)-enrichment analysis was conducted for the 341 genes (supplementary table S10, Supplementary Material online) that overlapped with genomic islands of divergence. GO could be assigned to 147 genes, which were significantly enriched (hypergeometric test P < 0.05) for functions relating to two major categories: 1) DNA binding and transcriptional regulation (GO: 0003700, GO: 0140110, and GO: 0003677) and 2) acetyltransferase activity (GO: 0016747 and GO: 0016407).

Discussion

Newellii on Hawaii Island appears to be a case of early-stage incipient speciation in a highly dispersible and long-lived tree. Based on our evidence, we argue that divergence between newellii and glaberrima is consistent with a speciation-with-primary-gene-flow model, where barriers to gene flow are developing through disruptive selection. The incipient status of newellii notwithstanding, our results place this riparian tree in the small but growing number of cases of speciation occurring despite the homogenizing effects of gene flow. This pattern is consistent with the observation of differential adaptation of newellii to the mechanical stress of rushing water and high light (Ekar et al. 2019) and the greater shade tolerance of wet-forest-dominant glaberrima (Morrison and Stacy 2014) resulting from disruptive selection across the forest-riparian ecotone. Evidence of admixture was extensive among the sampled populations, occurring within and between different island lineages, and may be common throughout the species complex (Izuno et al. 2017). Our results are in line with the recent view of sympatric speciation in the genomic era, in which gene flow is almost ubiquitous between recently diverged sister species (Marques et al. 2019; Richards et al. 2019). Clarifying the relative roles of primary versus secondary gene flow during speciation, however, is critical for understanding the evolutionary processes underlying the divergence between sympatric sister lineages (Richards et al. 2019). For instance, our genomic analyses suggested a close genetic relationship between GH2 and N, yet one that did not result from divergence in sympatry. N and GH2 shared islands of divergence, but these regions were not elevated in Dxy between N and GH2. This result suggests that the shared islands of divergence between N and GH2 arose through recent shared selective sweeps (Cruickshank and Hahn 2014), excluding the possibility of primary gene flow (and subsequent divergent selection) underlying their formation. Given the close proximity of N and GH2, the appearance of locally adaptive alleles from N in GH2 may result from introgression between varieties where the forest-riparian ecotone is more diffuse and disruptive selection is less intense (Ekar et al. 2019). The recent admixture between GH2 and GM is also consistent with the formation of GH2 through secondary contact between a GM-like lineage and N. Our results also indicated gene flow between N and an allopatric lineage (M. tremuloides on Oahu). The genomic regions resulting from this secondary contact, however, were not enriched in the islands of divergence between GH1 and N. This result suggests that primary gene flow between GH1 and N, but not secondary gene flow between N and M. tremuloides, contributed to the formation of genomic islands of divergence. This interpretation requires some caution, however, since we have not functionally validated the genes that contributed to the divergence of N. It is possible that a small number of loci with large effects may have originated from the allopatric lineage (Richards and Martin 2017). On the other hand, the conflicting gene flow estimates from the f4 test and G-PhoCS modeling suggest that the admixture detected between N and M. tremuloides may be an artifact resulting from ancestral population structure (Durand et al. 2011; Martin, Davey, et al. 2015). The evidence of a potential rapid lineage splitting at the root of the Metrosideros samples examined in this study is consistent with this scenario. Lastly, while this secondary gene flow may have been neutral during speciation, it is possible that some of the introgressed regions were utilized during local adaptation unrelated to incipient speciation (Heliconius Genome Consortium 2012; Lamichhaney et al. 2015; Malinsky et al. 2015; Miao et al. 2016; Richards and Martin 2017; Enard and Petrov 2018). Hence, an important next step is to examine the evolutionary history of the candidate genes that were involved in divergent selection between GH1 and N to differentiate ecological speciation genes from other confounding evolutionary events. Based on the δaδi analysis, the divergence time between glaberrima and newellii was estimated to be between 347 and 695 ka (assuming a mutation rate of 7 × 10−9 bp per site per generation and generation time of between 10 and 20 years). This time frame coincides roughly with the geological age of Hawaii Island, which formed ∼500 ka (Clague 1996), and is consistent with an in situ origin of newellii. Our more conservative estimates of divergence time, however, extend back to ∼1.2 Ma and suggest that divergence between glaberrima and newellii may have begun on the next youngest island in the chain, Maui Nui, before the colonization of Hawaii Island. Although the site of origin of newellii remains unresolved, our results suggest that Hawaii-Island glaberrimanewellii may have derived from a hybrid swarm-like population (fig. 2). Harsh environments such as newly formed volcanic islands may promote colonization by hybrid progenies (Seehausen 2004) with transgressive traits that are not seen in either parent (Gross and Rieseberg 2005; Yakimowski and Rieseberg 2014). A denser sampling of M. polymorpha on the islands of Maui Nui may distinguish whether the Hawaii Island glaberrimanewellii complex has a deeper ancestry (i.e., hybridization of multiple progenitors) or a recent ancestry (i.e., deriving from a single ancestral background). Interestingly, the G-PhoCS estimate of divergence time for all Metrosideros in this study, which included lineages from Molokai and Oahu, was similar to the G-PhoCS- and δaδi-estimated divergence time for the Hawaii Island group. This result is consistent with an ancient rapid radiation of Hawaiian Metrosideros across the islands (Whitfield and Lockhart 2007). The phylogenetic trees (fig. 2 and supplementary fig. S5, Supplementary Material online) were consistent with this scenario in that the internal branch lengths were often much shorter than the terminal branch lengths. Based on the ecological differentiation and range of isolating barriers observed between taxa, the Hawaiian Metrosideros complex has been proposed as a case of incipient adaptive radiation in trees (Stacy et al. 2014), and our demographic modeling suggests that this process may have been very rapid. Population sampling of Metrosideros across the Hawaiian archipelago will be necessary to examine the extent of, and evolutionary mechanisms underlying, the rapid radiation. Genome-wide FST levels suggested that gene flow has largely homogenized the genomes of newellii and glaberrima, but we discovered regions resistant to gene flow based on elevated absolute and relative divergence, and these genomics islands of divergence were enriched for evidence of positive selection in both varieties. Given the contrasting ecological niches of newellii and glaberrima (Ekar et al. 2019), it seems likely that these islands of divergence are associated with ecological divergence between the varieties (Sicard et al. 2015; Pease et al. 2016). In addition to shared selective sweeps or divergent selection, there are other evolutionary factors that can cause elevated genetic differentiation between diverging populations that we were unable to explore given the available data. Factors such as variation in recombination or mutation rates (Ravinet et al. 2017), or lineage sorting of ancestral polymorphisms (Guerrero and Hahn 2017) have been implicated as affecting genome-wide patterns of divergence. Variation in recombination rates, in particular, have been associated with genome-wide patterns of divergence, where islands of divergence are commonly localized to regions of low recombination rates (Burri et al. 2015; Han et al. 2017; Ravinet et al. 2017). Whether this pattern is an artifact of a potential correlation between nucleotide divergence and recombination rate (Haddrill et al. 2007) or a result of a genetic architecture of ecological speciation that involves tight linkage of barrier loci (Yeaman 2013; Rafajlović et al. 2016) is unclear. Given that our LD-based selection statistics were elevated in islands of divergence for both GH1 and N, low recombination rates may partially explain the pattern of divergence in Hawaii Island M. polymorpha. The higher values of selective sweep statistics in N compared with GH1, however, support a role for selection (stronger in newellii) in the genomic divergence observed. The generation of a recombination map and sequencing parent–offspring trios would allow discrimination of the effects of recombination and mutation rate variation, and QTL mapping or GWAS would provide independent molecular evidence of the loci affected by divergent selection. Our results suggest that selection has been stronger or more frequent in the island-endemic newellii relative to the widespread glaberrima, consistent with isolation of newellii by adaptation to Hawaii Island’s harsh riparian environment. The genomic islands of divergence were enriched for genes with functions related to transcription factor activity, suggesting that divergence in the gene expression landscape underlies the emergence of this riparian specialist. Indeed, many “speciation genes” are known to have functions relating to DNA binding and transcriptional regulation (Presgraves 2010; Maheshwari and Barbash 2011), and global gene expression patterns of hybrids from reproductively isolated species are often misregulated (Mack and Nachman 2017). Future work is needed to understand the changes in genetic architecture that coincide with incipient speciation of newellii.

Materials and Methods

Sequencing data generated from this study were deposited under NCBI project ID PRJNA534153. Codes used in the analysis can be found from github (https://github.com/cjy8709/Hawaii_Metrosideros_polymorpha; last accessed November 10, 2019) and data generated from this study can be found in zenodo (https://doi.org/10.5281/zenodo.3475496; last accessed November 10, 2019).

Sample Genome Sequencing

Samples used in this study can be found in supplementary table S1, Supplementary Material online. DNA was extracted from leaf buds using Qiagen DNeasy Plant Mini Kit and 2 × 100 bp paired-end libraries were constructed using Nextera library kit. Libraries were sequenced on an Illumina HiSeq 2500 system. Sequencing reads were aligned to the reference Metrosideros polymorpha var. glaberrima genome sequence from Izuno et al. (2017). FASTQ reads were aligned using the program bwa-mem version 0.7.16a-r1181 (Li 2013), and duplicate reads from PCR during the library preparation step were removed using picard version 2.9.0 (http://broadinstitute.github.io/picard/). Alignment statistics were reported using GATK version 3.8–0 (https://software.broadinstitute.org/gatk/; last accessed November 10, 2019) and samtools version 1.9.

SNP Calling and Genotyping

Genotype calls were executed using GATK. Alignment BAM files were used by GATK HaplotypeCaller engine with the option “-ERC GVCF” to output variants as the genomic variant call format (gVCF). The gVCFs of each sample were merged together to conduct a multisample joint genotyping procedure using the GATK GenotypeGVCFs engine. Variants were divided into SNPs and INDELs using GATK SelectVariants engine, and variant filtration were conducted using GATK VariantFiltration enigne with GATK bestpractice hard filter guidelines. All SNPs located within 5 bp of an INDEL variant and any SNPs for which <80% of the samples were genotyped were removed.

Population Relationship Analysis

Initially, genetic relationships between and within the individuals were examined using genotype likelihoods under a complete probabilistic framework using ANGSD version 0.929 (Korneliussen et al. 2014) and ngsTools (Fumagalli et al. 2014). For all analyses, we required a minimum base and mapping quality score of 30, while requiring genotypes from a minimum of 75% of individuals per SNP site, a minimum of one-third of the average total sequencing depth across the population (270×) per site, and a maximum of three times the average total sequencing depth across the population (2,434×) per site. Our admixture analysis suggested evidence of admixture between the Oahu species M. rugosa and M. tremuloides and Hawaii Island M. polymorpha, suggesting that neither Oahu species would be an appropriate outgroup for downstream analysis of the Hawaii Island populations. Therefore, we implemented a folded-site-frequency spectrum. To minimize the effect of linkage on inferences of population relationships, we randomly pruned the polymorphic sites. Within each sliding window of size 10,000 bp, a single polymorphic site was randomly chosen, keeping a minimum distance of 5,000 bp between adjacent SNP sites. To examine the effect of window size on population relationships, we generated two data sets: 1) a sliding window size of 50,000 bp with a minimum distance of 25,000 bp between adjacent SNP sites, and 2) the entire set of SNP sites. Ancestry proportions (K) were estimated with NGSadmix (Skotte et al. 2013) using genotype likelihoods calculated from ANGSD. K was estimated from 2 to 7, and for each K the analysis was repeated 100 times to choose the run with the highest log-likelihood. Genotype posterior probabilities (GPP) were calculated with ANGSD for use in principal components analysis (PCA) and phylogeny reconstruction; GPP were translated to genetic distances between individuals using ngsCovar (Fumagalli et al. 2014) for PCA and NGSdist (Vieira et al. 2016) for genetic distances. Using the genetic distances, FastME ver. 2.1.5 (Lefort et al. 2015) was used to reconstruct a neighbor-joining tree, which was visualized using iTOL version 3.4.3 (http://itol.embl.de/; last accessed November 10 2019). Bootstrap resampling of the GPP was conducted using NGSdist generating a 1,000-bootstrap resampled data set. Each resampled data set was used to estimate genetic distances, which were then used to estimate the bootstrap confidence level for the phylogenetic tree. Genotype calls from Hawaii Island individuals plus the outgroups, M. rugosa and M. tremuloides, from Oahu were used to generate a neighbor-joining tree (supplementary fig. S5, Supplementary Material online). Genetic distance between two individuals (X and Y) was estimated using the Kronecker delta function-based equation (Freedman et al. 2014): where L is the number of sites, a and b are the two allele copies in sample X, c and d are the two allele copies in sample Y, and δ is the Kronecker delta function (equals one if allele j is identical to allele k and 0 otherwise). From the genetic distance matrix, FastME was used to build the neighbor-joining tree.

Tests for Admixture

The local genomic topological relationship was examined using the topological weighting procedure implemented in twisst (Martin and Van Belleghem 2017) (https://github.com/simonhmartin/twisst; last accessed November 10, 2019). Initially, local phylogenetic trees were estimated in sliding windows with a size of 100 polymorphic sites using RAxML with the GTRCAT substitution model. We used the script raxml_sliding_windows.py from the genomics_general package by Simon Martin (https://github.com/simonhmartin/genomics_general/tree/master/phylo; last accessed November 10, 2019), a package that has been used in several related studies (Richards and Martin 2017; Martin et al. 2019; Stankowski et al. 2019; Tian et al. 2019). We then used the “complete” option of twisst to calculate the exact weighting of each local window by considering all subtrees. Treemix version 1.13 (Pickrell and Pritchard 2012) was used to fit migration edges on a bifurcating tree. We thinned the polymorphism data by randomly sampling a SNP every 1 kb using plink version 2.0 to minimize the effects of LD between SNPs. The four-population test (Reich et al. 2009) was conducted using the fourpop program from the Treemix package. Because the SNPs were thinned to reduce the LD, each SNP was used as a “block” to conduct a block jack-knife procedure to calculate the SE, which was used to calculate the Z-score for significance testing.

Demography Modeling with δaδi

Demography models were tested using the diffusion approximation method of δaδi (Gutenkunst et al. 2009). A visual representation of the 27 demographic models we tested can be found in supplementary figures S10 and S12, Supplementary Material online. Because we lacked an appropriate outgroup (i.e., a population with no evidence of gene flow with any ingroup), folded 2D site frequency spectra (2D-SFS) were analyzed. 2D-SFS were generated using two different approaches: 1) using ANGSD to calculate the likelihood of the sample allele frequency followed by realSFS from the ANGSD package to calculate the expected number of sites within a given allele frequency based on that likelihood, and 2) using the genotype calls to calculate the allele frequency. With ANGSD the random polymorphic sites that were chosen in the “Population relationship analysis” were used to calculate the site allele frequency. With the genotype call data set, we randomly sampled a SNP every 10 kb using plink and used this thinned variant call data set. The script easySFS.py (https://github.com/isaacovercast/easySFS; last accessed November 10, 2019) was then used to generate the site frequency spectrum and choose the best sample size to project down and maximize the number of segregating sites to be analyzed. Optimization of the model parameters was performed through four rounds of randomly perturbing the parameter values using the Nelder-Mead method. In the first round, random starting parameters were 3-fold perturbed for a total of 10 replicates with a maximum of three iterations. Each optimized parameter was then used to simulate the 2D-SFS, and a multinomial approach was used to compare and estimate the log-likelihood of the observed 2D-SFS. Best-scoring likelihood was used in the second round for 20 replicates with parameters perturbed 2-fold and maximum iterations of 5. The best likelihood model of the previous round was used in a third round for 30 replicates with parameters perturbed 2-fold and maximum iterations of 10. In the final round the best likelihood model of the third round was used for 40 replicates, with parameters perturbed 1-fold, and maximum iterations of 15. Parameters from the round with the highest likelihood were selected to represent each demographic model. Akaike Information Criteria (AIC) values were used to compare demography models. The three best-fitting models were further analyzed with increased replicates and maximum iteration values. Specifically, with increasing rounds the replicates were 20, 40, 50, and 70; with maximum iterations of 10, 10, 25, and 50; and 3-, 2-, 2-, and 1-fold parameter perturbations. The demography model with the lowest AIC was chosen as the best-fitting model. Because ANGSD-based and genotype call-based site frequency spectra gave largely concordant results, all further analysis used only the ANGSD-based site frequency spectrum. Confidence of the best-fitting demography model parameter values was obtained through bootstrapping the site frequency spectrum and re-estimating the demography parameter values using the bootstrap data. The bootstrap site frequency spectrum was obtained using the realSFS program and randomly selected polymorphic sites by chromosome. Divergence time between GH1 and N was a main parameter of interest, and the unscaled divergence times, Tdiv, were converted into years using the following equation: where Nanc is the ancestral population size and g is the generation time in years. Nanc is unknown but can be inferred from the effective mutation rate θ (4 NancμL, where μ is the mutation rate and L is the total length of the sequenced region that was examined to analyze the SNPs), which was calculated by δaδi in the optimal demographic model. Here, the previous equation can be rewritten as: For L, ANGSD had analyzed 280,279,511 positions to detect polymorphic sites in 2,261,660 positions. But because we pruned the sites to 28,742 positions L was calculated as 280,279,511 × (28,742/2,261,660). Because μ and g for M. polymorpha are unknown, various biologically plausible values (see fig. 3 for the values) were used to estimate divergence time.

Demographic Modeling with G-PhoCS

To estimate the parameters of population size, divergence time, and migration rate across the sampled populations, we used the software Generalized Phylogenetic Coalescent Sampler (G-PhoCS) ver 1.2.3 (Gronau et al. 2011). Because G-PhoCS uses the genome-wide variation of a single individual, we selected one representative individual from each population, specifically selecting samples with genome coverage comparable to that for M. rugosa (22×) and M. tremuloides (18×). The samples selected were: H198 (21×) from GH1, E111 (19×) from GH2, X36 (23×) from GM, and H269 (19×) from N. Following previous studies that used G-PhoCS (Gronau et al. 2011; Freedman et al. 2014; McManus et al. 2015; Choi et al. 2017; Poelstra et al. 2018), we analyzed 1-kb sized loci that were close to neutrality. Neutral loci were determined by looking for genomic regions that were 5 kb away from a genic sequence and 500 bp away from a repetitive DNA sequence. Loci that were at least 10 kb away from each other were selected to minimize the effect of linkage disequilibrium between the neutral loci. For every demographic scenario, we ran five replicates to ensure convergence. Each MCMC run had 1,000,000 iterations, and 50% of the iterations were discarded as burn-in runs. Priors were modeled using a gamma distribution (α = 1 and β = 10,000 for population size and divergence time; α = 0.002 and β = 0.00001 for migration rates). Migration bands were fitted for lineage pairs that showed evidence of admixture in the treemix and f4 test results. Specifically, we fitted migration bands for the pairs GH1GH2, GH1–N, GH2–N, GH2GM, GH2–M. tremuloides, and N–M. tremuloides. We used the program Tracer version 1.6 (http://tree.bio.ed.ac.uk/software/tracer/; last accessed November 10, 2019) to determine the burn-in cutoff and 95% highest posterior density for each parameter. The unscaled divergence times were converted to years using various biologically plausible values of μ and g using the equation:

Population Genetic Analysis

The gVCFs from the SNP-calling step were used again for the GATK GenotypeGVCFs engine but to call genotypes for all sites including the nonvariant positions (option “–includeNonVariantSites”). This was done in order to correctly infer the number of variable and invariant sites for downstream calculation of window-based population genetics statistics. The genomics_general package was used to calculate θ, Dxy, and FST in 10 kb sliding windows and 5 kb increments. Within each window, sites were required to have a minimum quality score of 30 and a minimum depth of 5×. Only windows with a minimum of 3 kb of sites that passed the quality filter were further analyzed. For the calculation of sliding windows of f4 statistics, we used the script from Richards and Martin (2017) and analyzed 10 kb windows with a minimum of 50 polymorphic sites. To check for a possible window-size effect, we also analyzed two additional window sizes: 1) 5 kb windows with a minimum of five polymorphic sites and 2) 50 kb windows with a minimum of 250 polymorphic sites. Using the genotype calls, plink was used to calculate linkage disequilibrium (LD). LD was measured in squared correlations (r2) between polymorphic sites that were <3 kb apart. Evidence of selective sweeps was detected using haplotype-based or LD-based methods (Kim and Nielsen 2004; Garud et al. 2015; Schlamp et al. 2016). These approaches were taken because of the lack of an appropriate outgroup (i.e., a population with no evidence of gene flow with any ingroup). As such, it was not possible to polarize the variants and infer the high-frequency alleles, which are necessary for many allele frequency-based methods of detecting selective sweeps. The focal populations GH1 and N were phased and imputed using the program beagle ver. 5.0 (Browning and Browning 2016). Sweeps were detected using the haplotype homozygosity (H12), haplotype length (H), and LD-based tests of selection (ωmax). For the calculation of ωmax statistics, the total number of grids varied across scaffolds (Pavlidis et al. 2010). Grids were set so that ωmax would be calculated every 5,000 bp, and for each grid the minimum and maximum window size was set at 5,000 and 100,000 bp, respectively. The H12 statistic was calculated by setting the analysis window size to 100 SNPs and sliding the analysis window by 10 SNPs. The H statistics were estimated for every SNP position. For any given window, statistics were averaged across all haplotype-based tests of selection to represent the selective sweep value for that window.

Gene Ontology Enrichment

Coding sequences of each gene model were assigned a computationally predicted function and gene ontology using the eggnog pipeline (Huerta-Cepas et al. 2017). An ontology could be assigned to 12,448 genes out of the total 37,956 gene models. We required an ontology to have more than one gene group member for further consideration. Gene ontology enrichment was tested through a hypergeometric test.

Multiple Testing Corrections

All statistical tests with a P value (Mann–Whitney U test, Pearson’s correlation, f4 test, jack-knife bootstrap P value, and hypergeometric test of gene ontology enrichment) were pooled together and corrected for multiple hypotheses testing using the Benjamini and Hochberg (Benjamini and Hochberg 1995) correction method. Click here for additional data file.
  110 in total

Review 1.  Divergence hitchhiking and the spread of genomic isolation during ecological speciation-with-gene-flow.

Authors:  Sara Via
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2012-02-05       Impact factor: 6.237

Review 2.  What, if anything, is sympatric speciation?

Authors:  B M Fitzpatrick; J A Fordyce; S Gavrilets
Journal:  J Evol Biol       Date:  2008-09-18       Impact factor: 2.411

Review 3.  Review. The strength and genetic basis of reproductive isolating barriers in flowering plants.

Authors:  David B Lowry; Jennifer L Modliszewski; Kevin M Wright; Carrie A Wu; John H Willis
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2008-09-27       Impact factor: 6.237

4.  Testing for ancient admixture between closely related populations.

Authors:  Eric Y Durand; Nick Patterson; David Reich; Montgomery Slatkin
Journal:  Mol Biol Evol       Date:  2011-02-15       Impact factor: 16.240

Review 5.  A Combinatorial View on Speciation and Adaptive Radiation.

Authors:  David A Marques; Joana I Meier; Ole Seehausen
Journal:  Trends Ecol Evol       Date:  2019-03-15       Impact factor: 17.712

Review 6.  Making sense of genomic islands of differentiation in light of speciation.

Authors:  Jochen B W Wolf; Hans Ellegren
Journal:  Nat Rev Genet       Date:  2016-11-14       Impact factor: 53.242

7.  Strong extrinsic reproductive isolation between parapatric populations of an Australian groundsel.

Authors:  Maria C Melo; Alicia Grealy; Beth Brittain; Greg M Walter; Daniel Ortiz-Barrientos
Journal:  New Phytol       Date:  2014-03-28       Impact factor: 10.151

8.  Evolution of Darwin's finches and their beaks revealed by genome sequencing.

Authors:  Sangeet Lamichhaney; Jonas Berglund; Markus Sällman Almén; Khurram Maqbool; Manfred Grabherr; Alvaro Martinez-Barrio; Marta Promerová; Carl-Johan Rubin; Chao Wang; Neda Zamani; B Rosemary Grant; Peter R Grant; Matthew T Webster; Leif Andersson
Journal:  Nature       Date:  2015-02-11       Impact factor: 49.962

9.  Butterfly genome reveals promiscuous exchange of mimicry adaptations among species.

Authors: 
Journal:  Nature       Date:  2012-07-05       Impact factor: 49.962

10.  Divergent sorting of a balanced ancestral polymorphism underlies the establishment of gene-flow barriers in Capsella.

Authors:  Adrien Sicard; Christian Kappel; Emily B Josephs; Young Wha Lee; Cindy Marona; John R Stinchcombe; Stephen I Wright; Michael Lenhard
Journal:  Nat Commun       Date:  2015-08-13       Impact factor: 14.919

View more
  8 in total

1.  Patterns of genomic divergence in sympatric and allopatric speciation of three Mihoutao (Actinidia) species.

Authors:  Yongbo Liu; Wenhao Yu; Baofeng Wu; Junsheng Li
Journal:  Hortic Res       Date:  2022-03-03       Impact factor: 7.291

2.  Admixture may be extensive among hyperdominant Amazon rainforest tree species.

Authors:  Drew A Larson; Oscar M Vargas; Alberto Vicentini; Christopher W Dick
Journal:  New Phytol       Date:  2021-09-12       Impact factor: 10.323

3.  Population genomics reveal rapid genetic differentiation in a recently invasive population of Rattus norvegicus.

Authors:  Yi Chen; Lei Zhao; Huajing Teng; Chengmin Shi; Quansheng Liu; Jianxu Zhang; Yaohua Zhang
Journal:  Front Zool       Date:  2021-01-26       Impact factor: 3.172

4.  Genomic Adaptations to Salinity Resist Gene Flow in the Evolution of Floridian Watersnakes.

Authors:  Rhett M Rautsaw; Tristan D Schramer; Rachel Acuña; Lindsay N Arick; Mark DiMeo; Kathryn P Mercier; Michael Schrum; Andrew J Mason; Mark J Margres; Jason L Strickland; Christopher L Parkinson
Journal:  Mol Biol Evol       Date:  2021-03-09       Impact factor: 16.240

5.  Ancestral polymorphisms shape the adaptive radiation of Metrosideros across the Hawaiian Islands.

Authors:  Jae Young Choi; Xiaoguang Dai; Ornob Alam; Julie Z Peng; Priyesh Rughani; Scott Hickey; Eoghan Harrington; Sissel Juul; Julien F Ayroles; Michael D Purugganan; Elizabeth A Stacy
Journal:  Proc Natl Acad Sci U S A       Date:  2021-09-14       Impact factor: 11.205

6.  Hybrid zone of a tree in a Cerrado/Atlantic Forest ecotone as a hotspot of genetic diversity and conservation.

Authors:  André Carneiro Muniz; Ricardo José Gonzaga Pimenta; Mariana Vargas Cruz; Jacqueline Gomes Rodrigues; Renata Santiago de Oliveira Buzatti; Myriam Heuertz; José P Lemos-Filho; Maria Bernadete Lovato
Journal:  Ecol Evol       Date:  2022-01-22       Impact factor: 2.912

Review 7.  Cell types as species: Exploring a metaphor.

Authors:  Jeff J Doyle
Journal:  Front Plant Sci       Date:  2022-08-22       Impact factor: 6.627

8.  Demography and selection analysis of the incipient adaptive radiation of a Hawaiian woody species.

Authors:  Ayako Izuno; Yusuke Onoda; Gaku Amada; Keito Kobayashi; Mana Mukai; Yuji Isagi; Kentaro K Shimizu
Journal:  PLoS Genet       Date:  2022-01-21       Impact factor: 5.917

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.