Sonal Singhal1, Guarino R Colli2, Maggie R Grundler3,4, Gabriel C Costa5, Ivan Prates6,7, Daniel L Rabosky8,7. 1. Department of Biology, California State University, Dominguez Hills, Carson, CA 90747; sonal.singhal1@gmail.com drabosky@umich.edu. 2. Departamento de Zoologia, Universidade de Brasília, Brasília, Distrito Federal 70910-900, Brazil. 3. Department of Environmental Science, Policy, & Management, University of California, Berkeley, CA 94720. 4. Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720. 5. Department of Biology and Environmental Sciences, Auburn University at Montgomery, Montgomery, AL 36117. 6. Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109. 7. Museum of Zoology, University of Michigan, Ann Arbor, MI 48109. 8. Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109; sonal.singhal1@gmail.com drabosky@umich.edu.
Abstract
Rates of species formation vary widely across the tree of life and contribute to massive disparities in species richness among clades. This variation can emerge from differences in metapopulation-level processes that affect the rates at which lineages diverge, persist, and evolve reproductive barriers and ecological differentiation. For example, populations that evolve reproductive barriers quickly should form new species at faster rates than populations that acquire reproductive barriers more slowly. This expectation implicitly links microevolutionary processes (the evolution of populations) and macroevolutionary patterns (the profound disparity in speciation rate across taxa). Here, leveraging extensive field sampling from the Neotropical Cerrado biome in a biogeographically controlled natural experiment, we test the role of an important microevolutionary process-the propensity for population isolation-as a control on speciation rate in lizards and snakes. By quantifying population genomic structure across a set of codistributed taxa with extensive and phylogenetically independent variation in speciation rate, we show that broad-scale patterns of species formation are decoupled from demographic and genetic processes that promote the formation of population isolates. Population isolation is likely a critical stage of speciation for many taxa, but our results suggest that interspecific variability in the propensity for isolation has little influence on speciation rates. These results suggest that other stages of speciation-including the rate at which reproductive barriers evolve and the extent to which newly formed populations persist-are likely to play a larger role than population isolation in controlling speciation rate variation in squamates.
Rates of species formation vary widely across the tree of life and contribute to massive disparities in species richness among clades. This variation can emerge from differences in metapopulation-level processes that affect the rates at which lineages diverge, persist, and evolve reproductive barriers and ecological differentiation. For example, populations that evolve reproductive barriers quickly should form new species at faster rates than populations that acquire reproductive barriers more slowly. This expectation implicitly links microevolutionary processes (the evolution of populations) and macroevolutionary patterns (the profound disparity in speciation rate across taxa). Here, leveraging extensive field sampling from the Neotropical Cerrado biome in a biogeographically controlled natural experiment, we test the role of an important microevolutionary process-the propensity for population isolation-as a control on speciation rate in lizards and snakes. By quantifying population genomic structure across a set of codistributed taxa with extensive and phylogenetically independent variation in speciation rate, we show that broad-scale patterns of species formation are decoupled from demographic and genetic processes that promote the formation of population isolates. Population isolation is likely a critical stage of speciation for many taxa, but our results suggest that interspecific variability in the propensity for isolation has little influence on speciation rates. These results suggest that other stages of speciation-including the rate at which reproductive barriers evolve and the extent to which newly formed populations persist-are likely to play a larger role than population isolation in controlling speciation rate variation in squamates.
Speciation rates vary widely across the tree of life, and this variation contributes to profound disparities in species richness among clades, across regions, and through time (1). Despite the importance of speciation rates for broad-scale diversity patterns, only recently have we been able to quantify and compare these rates across taxa. Ernst Mayr, who laid much of the conceptual foundation of modern speciation research, noted in his landmark book Animal Species and Evolution that “[t]here is perhaps no other aspect of speciation of which we know so little as its rate. We shall probably never have very accurate information on this phenomenon” (ref. 2, p. 575). Yet, today, our understanding of speciation rate variation has been fundamentally transformed by the ascendance of molecular-based phylogenetic inference. The availability of time-calibrated phylogenetic trees (3, 4), in concert with analytical models of macroevolution (5), has demonstrated wide variation in the rate of speciation, even among closely related groups of organisms (Fig. 1 and ) (6–11). In mammals, for example, the fastest speciating lineages (Rattus rats: 1.12 species per million years [My]; per lineage rates) are estimated to form new species nearly 100-fold faster than the slowest speciating lineages [aardvarks: 0.013 species per My (12)].
Fig. 1.
Speciation rate variation across major vertebrate clades. Speciation rates are estimated using the DR statistic (DR) (5, 6); data are from Cooney and Thomas (11). Speciation rate distributions are highly skewed and are thus presented on a log10 scale. Within each group, the slowest and fastest 2.5% of speciation rates differ anywhere from 14-fold to 46-fold. The sources of this variation remain largely unexplained. Speciation rates estimated using a model-based approach to speciation rate estimation [BAMM (136)] are presented in .
Speciation rate variation across major vertebrate clades. Speciation rates are estimated using the DR statistic (DR) (5, 6); data are from Cooney and Thomas (11). Speciation rate distributions are highly skewed and are thus presented on a log10 scale. Within each group, the slowest and fastest 2.5% of speciation rates differ anywhere from 14-fold to 46-fold. The sources of this variation remain largely unexplained. Speciation rates estimated using a model-based approach to speciation rate estimation [BAMM (136)] are presented in .To understand why speciation rate varies, we can deconstruct the speciation process into population-level factors that control, at least in principle, the observed rate of new lineage origination at the macroevolutionary scale (2, 13, 14). “Successful” speciation typically requires that new populations become geographically isolated (15–17) and that reproductive barriers evolve between the populations. Finally, these nascent species must persist while they acquire reproductive and potentially ecological isolation from other populations. Any factor modifying the rate of these processes can potentially influence speciation rate (14, 18, 19). For example, species with traits leading to high dispersal (e.g., marine organisms with long larval periods) are predicted to maintain panmictic populations connected by gene flow across large areas in the face of biogeographic processes that might otherwise fragment species with lower capacity for cohesion (20–22). If the formation of population isolates is a rate-limiting step for speciation, we would then predict that species with high dispersal should exhibit reduced speciation rates compared to species with low dispersal. Similarly, biogeographic and climatic factors—including the dynamic nature of some high-elevation and high-latitude landscapes—can facilitate increased speciation rates by promoting population isolation in newly colonized geographic areas (23–25). This framework thus links the microevolutionary processes leading to species formation with macroevolutionary patterns of species diversification (26), including the speciation rates we typically estimate from phylogenetic trees or the fossil record (Fig. 1).Directly estimating the extent to which variation in population processes affects speciation requires comparable datasets on the population processes themselves (26). These must be collected across a broad sample of taxa characterized by phylogenetically independent variation in both evolutionary rates and the focal population processes. Collecting such datasets is a daunting task for most groups of organisms (27–29). For this reason, studies often use organismal traits as proxies for population processes, implicitly assuming that these proxies capture lineage-specific propensities toward reproductive isolation or population splitting (30, 31). For example, avian wing morphology has been used as a proxy for dispersal and population isolation (32, 33), and sexual dimorphism has been used as a proxy for the strength of sexual selection and thus the rate at which reproductive barriers evolve (34, 35). Results from this trait-based approach have generally been mixed. Traits often exhibit inconsistent associations with speciation across clades (36), and, even for a given clade, purported relationships vary when challenged with alternate methodologies or different datasets (37, 38). This inconsistency may partially reflect the differential effects of specific organismal traits upon diversification across clades. For example, the evolution of reproductive barriers might be a rate-limiting step in one clade but not another. However, this inconsistency might also indicate that these organismal traits are imperfect proxies for the population-level processes they are trying to capture.Here, we apply a direct approach to determining the factors that predict speciation rate. We measure population isolation across multiple species and then test whether population isolation predicts speciation rate, without relying on organismal traits as proxies for isolation. The formation of geographically isolated populations is an initial and essential step in most speciation events (15–17), and population structure has long been recognized as a potential source of incipient species, across analyses of phylogeographic structure (39, 40), studies of ecotypes and subspecies (41–43), and explorations of species richness in the fossil record (22, 44). In this study, we focus on the macroevolutionary consequences of population isolation in the lizards and snakes from a single geographic region: the Neotropical Cerrado, South America’s second-largest biome (45). The Cerrado savannas harbor one of the world's most diverse squamate reptile communities (46, 47), enabling us to sample phylogenetically distinct lineages that span nearly the complete range of speciation rates as measured from fully sampled squamate phylogenies (Fig. 2). Because most of these species have large ranges that span the same geographic area (Fig. 2 and ), they likely experienced similar biogeographic histories. Thus, our analyses partially control for the effect of extrinsic factors like biogeographic changes and environmental conditions on speciation rates.
Fig. 2.
Speciation rates, geographic sampling, and genomic data coverage for focal lizard and snake taxa. (A) Speciation rate (ClaDS) across all squamates for the Tonini et al. (50) phylogeny. Major squamate clades are labeled at nodes: Gekkota (G), Scincoidea (Sc), Lacertoidea (L), Iguania (I), and Serpentes (S). Phylogeny shown for a random 50% subsample of lineages to ease visualization. Focal taxa included in this study are labeled with circles at the tips, with fill color indicating tip-averaged speciation rate. A subset of tips is represented by photos of taxa; photos are not to scale (photos by R. Recoder, G.R.C., and I.P.). (Inset) Rank order plot with speciation rates of the focal taxa relative to the distribution of speciation rates of all squamate taxa. (B) Map of South America with sampling sites shown in white. (C) Concatenated phylogeny of 4,796 loci and 375 individuals; colored tip groupings denote the 59 putative species-level taxa studied here. (D) Summary of genetic data for the 375 individuals included in this study. (E) Landscape images of the Cerrado, the grassland biome on which field sampling was centered (image credit for Left photo: D.A.A. Souza [photographer]; Center and Right photos, G.R.C.). This study’s focal taxa span a wide phylogenetic distribution and encompass almost the full range of speciation rates seen in squamates.
Speciation rates, geographic sampling, and genomic data coverage for focal lizard and snake taxa. (A) Speciation rate (ClaDS) across all squamates for the Tonini et al. (50) phylogeny. Major squamate clades are labeled at nodes: Gekkota (G), Scincoidea (Sc), Lacertoidea (L), Iguania (I), and Serpentes (S). Phylogeny shown for a random 50% subsample of lineages to ease visualization. Focal taxa included in this study are labeled with circles at the tips, with fill color indicating tip-averaged speciation rate. A subset of tips is represented by photos of taxa; photos are not to scale (photos by R. Recoder, G.R.C., and I.P.). (Inset) Rank order plot with speciation rates of the focal taxa relative to the distribution of speciation rates of all squamate taxa. (B) Map of South America with sampling sites shown in white. (C) Concatenated phylogeny of 4,796 loci and 375 individuals; colored tip groupings denote the 59 putative species-level taxa studied here. (D) Summary of genetic data for the 375 individuals included in this study. (E) Landscape images of the Cerrado, the grassland biome on which field sampling was centered (image credit for Left photo: D.A.A. Souza [photographer]; Center and Right photos, G.R.C.). This study’s focal taxa span a wide phylogenetic distribution and encompass almost the full range of speciation rates seen in squamates.We characterized population isolation across 66 snake and lizard taxa from vouchered samples collected over 20+ y of fieldwork in the Cerrado, generating a genome-wide perspective on intraspecific variation (mean: 5,019 loci per individual) from 398 individuals. We then measured how levels of genetic differentiation accumulate across geographic space (e.g., isolation-by-distance [IBD]). The slope of this IBD relationship reflects both population density of demes and the extent of gene flow among them (48) and serves as a reasonably direct measure of species cohesion, which we define here as the capacity of species’ geographic ranges to resist allopatric fragmentation into new population isolates. We then tested a foundational hypothesis in speciation research— that the formation of geographically isolated populations is often an initial and essential step in speciation—by determining whether an increased propensity for population isolation correlates with higher speciation rates.
Results
We collected an average of 3.2 Mb across 5,019 nuclear loci and 3.8 kb of the mitochondrial genome for the 398 individuals in this study (Fig. 2 and ). We deliberately sampled geographically unique taxon–locality combinations to maximize spatial coverage, and thus our dataset spans 386 distinct localities. Using these high-coverage and high-quality data, we first delimited species-level taxa (operational taxonomic unit [OTU]; see Methods) to ensure our analyses were not confounded by cryptic diversity or taxonomic uncertainty. Fifty-one of our 66 nominal species (76%) directly corresponded to an OTU; for the purposes of analysis, we revised 15 nominal species in total. Our final dataset consisted of 59 provisional OTUs spanning 375 individuals (Fig. 2 and ).For each OTU, we used an average of 28,500 variant sites to estimate the slope at which genetic isolation (FST) accumulates across space (βIBD; Fig. 3). On average, geographic distance explained 31% of the variation in FST across individuals within OTUs (). OTUs showed notable differences in βIBD, with values ranging from −0.41 (indicating low levels of differentiation across space) to 2.48 (indicating steep turnover of genetic variation across space; Figs. 3 and 4 and ). βIBD exhibited moderate phylogenetic signal ( = 0.24, P = 0.009; Fig. 4), suggesting that the variation in βIBD across taxa is phylogenetically conserved. βIBD is expected to vary as a function of both population density of demes and levels of migration between them (48). Accordingly, we identified organismal traits that might serve as proxies for density or migration and tested whether they could predict βIBD. We find that βIBD is correlated with a measure of body elongation (phylogenetic generalized least squares [PGLS], P = 0.004; Fig. 5 and and Table S2) that likely affects both population density and migration. Animals that were more elongated and had a more snake-like form (e.g., snakes and legless lizards) had lower levels of differentiation across space than animals with more lizard-like forms. Finally, although the genetic loci used in this study are likely under purifying selection (49), estimates of genetic diversity and differentiation across these markers and putatively neutrally evolving loci were highly concordant (r = 0.81 to 0.97; ). These results suggest our estimates of βIBD are likely robust to the evolutionary history of the loci used for inference.
Fig. 3.
Illustrative IBD slopes (βIBD) generated for two species. These taxa exhibit strongly contrasting rates of population isolation: After controlling for geographic distance, the viper Bothrops moojeni (n = 10, Top) has a low rate of divergence between populations, whereas the gymnophthalmid lizard Micrablepharus atticolus (n = 6, Bottom) shows a high rate. Maps illustrate geographic ranges (see for information on reconstruction) for each taxon along with sampling points. Scatter plots show how genetic differentiation (as measured by FST) accumulates across space. shows IBD plots across all taxa.
Fig. 4.
Correlation between population isolation (βIBD) and speciation rates (ClaDS) across squamate reptiles. (A) βIBD exhibits phylogenetic signal, as might be expected if it tracks intrinsic organismal traits such as dispersal capacity. Snakes (node “S”) are nested within squamates; all other lineages are traditionally known as “lizards.” (B) βIBD is not significantly correlated with speciation rate (PGLS r = 0.04, P = 0.77), and snakes and lizards show little overlap in the bivariate space defined by βIBD and speciation rate.
Fig. 5.
Speciation rate and population structure for taxa with lizard-like and snake-like body forms. More elongated animals (higher body elongation values) have lower rates of population isolation, but most of this variation is captured by the evolutionary split between snakes and “lizards.” Body elongation was computed as the ratio of length to diameter, treating each animal as an idealized cylinder. The focal lizards in this study (n = 24) exhibit higher levels of population isolation than the focal snakes (n = 35). Despite exhibiting higher levels of population isolation, lizards have overall lower speciation rates compared to snakes, contrary to our central hypothesis.
Illustrative IBD slopes (βIBD) generated for two species. These taxa exhibit strongly contrasting rates of population isolation: After controlling for geographic distance, the viper Bothrops moojeni (n = 10, Top) has a low rate of divergence between populations, whereas the gymnophthalmid lizard Micrablepharus atticolus (n = 6, Bottom) shows a high rate. Maps illustrate geographic ranges (see for information on reconstruction) for each taxon along with sampling points. Scatter plots show how genetic differentiation (as measured by FST) accumulates across space. shows IBD plots across all taxa.Correlation between population isolation (βIBD) and speciation rates (ClaDS) across squamate reptiles. (A) βIBD exhibits phylogenetic signal, as might be expected if it tracks intrinsic organismal traits such as dispersal capacity. Snakes (node “S”) are nested within squamates; all other lineages are traditionally known as “lizards.” (B) βIBD is not significantly correlated with speciation rate (PGLS r = 0.04, P = 0.77), and snakes and lizards show little overlap in the bivariate space defined by βIBD and speciation rate.Speciation rate and population structure for taxa with lizard-like and snake-like body forms. More elongated animals (higher body elongation values) have lower rates of population isolation, but most of this variation is captured by the evolutionary split between snakes and “lizards.” Body elongation was computed as the ratio of length to diameter, treating each animal as an idealized cylinder. The focal lizards in this study (n = 24) exhibit higher levels of population isolation than the focal snakes (n = 35). Despite exhibiting higher levels of population isolation, lizards have overall lower speciation rates compared to snakes, contrary to our central hypothesis.Using the most comprehensive squamate phylogeny available to date (50), we estimated speciation rates using both a model-based [CLaDS; (8, 51)] and a semiparametric approach [DR; (6)]. These estimates of speciation rate are highly correlated both across all squamates (n = 9,755, r = 0.87) and across our focal taxa (n = 59, r = 0.87; ). Given this high concordance, all subsequent results focus on model-based estimates of speciation rate. Speciation rates across all squamates ranged from 0.018 to 0.424 species per My, with the highest rates occurring in snakes (Fig. 2). Speciation rates in our focal taxa ranged from 0.027 to 0.280 species per My, encompassing nearly the full variation seen in squamates (Fig. 2).We tested the prediction that OTUs with greater rates of population differentiation would show greater speciation rates, as would be expected if population differentiation is a rate-limiting step in speciation. We found no such pattern (PGLS r = 0.04, P value = 0.77; Fig. 4). Rather, we recovered some evidence for an inverse relationship, which can be seen when comparing βIBD and rates of speciation in snakes versus lizards (Fig. 5). Snakes have high speciation rates but low βIBD, opposite to our predictions.We confirmed that these patterns are robust to possible methodological and technical artifacts, by conducting a series of analyses in which we accounted for limited sampling, measurement error, alternate measures of genetic differentiation across space, possible taxonomic error, and the effects of biogeography. Across all these analyses, we recover the same result: no evidence for a relationship between βIBD and speciation rate (). Most importantly, restricting our analyses to only those OTUs with significant βIBD relationships (n = 29) did not change these results and, in fact, yielded a nonsignificant but negative correlation between βIBD and speciation rate (r = −0.004, P = 0.98; ). Alternate measures of genetic differentiation (e.g., d across geographic space) are only weakly correlated with each other (), and, across all these robustness analyses, our finding that population structure and speciation rates are decoupled holds (). Further, although we sampled relatively few individuals for some OTUs, both subsampling and simulations show that few individuals can effectively recover βIBD similar to that seen with larger datasets (). Finally, we conducted a power analysis to estimate our power to identify a correlation between isolation and speciation rates, if one exists. We found the power to detect a true correlation was moderate if correlations were <0.5, and power was >65% if correlations were ≥0.5 (). However, this analysis also suggests that the potential correlation between βIBD and speciation rates is unlikely to be substantial—and is indeed equally likely to be negative or positive—given the small magnitude of our inferred correlation between βIBD and speciation rates (mean r = 0.04; 95% range: −0.37 to 0.45; ).
Discussion
We recovered wide variation in population isolation, as indexed by βIBD, across our focal taxa (Fig. 4). Moreover, we found evidence that βIBD is an evolutionarily structured trait—close relatives of species that are prone to population isolation are themselves prone to isolation—and isolation varies with morphology, with more elongate species showing lower βIBD than more lizard-like species (Figs. 4 and 5). Elongate species might either experience greater dispersal or maintain greater population density, either of which would result in lower βIBD (). Whatever the mechanism, these results suggest that species cohesion varies across taxa and that it is influenced by organismal traits, consistent with previous studies in squamates and other taxa (21, 28, 52).Species cohesion is a fundamental property of species that influences the formation of new species, as described in both verbal and mathematical models (39, 53, 54). The homogenizing effects of gene flow (55) have long been thought to inhibit speciation by reducing the probability of population fission. We might thus predict those taxa that have a higher potential for isolation, as indexed by IBD slopes, will also have greater propensity to form new species (20, 55). Yet, despite our broad phylogenetic sampling, we found no correlation between the potential for isolation and speciation rates (Fig. 4).Factors such as limited sampling and environmental and historical effects can make estimating IBD slopes challenging, potentially weakening our ability to recover a true underlying relationship between IBD slope and speciation rate. Our IBD slopes appear to be capturing a real property of species propensity to isolation, however. These slopes are phylogenetically structured and correlate with organismal morphology (Figs. 4 and 5). Further, our results are qualitatively and quantitatively similar across a series of robustness analyses (). It is unlikely that our results are solely due to low statistical power. Perhaps the most substantive and well-supported contrast in our dataset is the evolutionary split between lizards and snakes. This partition accounts for much of the variation in both speciation rate (ANOVA r2 = 0.54) and IBD slope (ANOVA r2 = 0.27) across our full dataset. Under our hypothesis, we would expect to see much higher patterns of IBD in snakes than lizards, given that speciation rates are markedly higher in snakes (lizards: average CLaDS = 0.068; snakes: average CLaDS = 0.16). However, we see the opposite pattern (Fig. 5). While this pattern reflects just a single evolutionary contrast, it is a major feature of our dataset and suggests our negative result is robust and biologically meaningful.Most previous studies exploring the link between population isolation and diversification have used organismal traits related to dispersal as proxies for gene flow and population isolation. For example, in birds, species with more elongate, narrower wings are hypothesized to have higher dispersal (33), higher levels of gene flow, and thus reduced rates of population isolation. Lower rates of population isolation would then result in lower rates of diversification, as has been seen in some avian datasets (25, 56, 57). Other organismal traits that have been hypothesized to predict both patterns of genetic connectivity and broad-scale diversification include pollination strategy (58), length of pelagic dispersal in marine species (refs. 44 and 59, but see ref. 38), and seed size in angiosperms (60). These studies necessarily make several simplifying assumptions about how organismal traits impact dispersal and thus gene flow. For example, although levels of population isolation vary as a function of both migration and local genetic drift, these analyses focus solely on the role of dispersal in determining levels of population isolation. However, as a whole, these studies confirm the potentially important role of variation in dispersal in driving broad-scale diversity patterns (but see ref. 61).Studies that have instead directly estimated levels of population isolation have found mixed results (21, 27–29, 52, 62). In particular, a previous study in lizards—focused solely on a ∼300-species radiation endemic to Australia (28)—also found no connection between population isolation and speciation. What might explain our results and the mixed conclusions of previous studies? First, population isolation can be quantified in many ways. Population isolation spans a continuum that ranges from IBD to discrete phylogroups that experience little gene flow among themselves. In the present study, we chose to estimate population isolation directly by measuring patterns of IBD. A pattern of IBD reflects an early stage in population isolation and primarily captures the role of declining gene flow over distance in determining genetic differentiation. IBD is thus a continuous measure of genetic differentiation (63). How IBD interacts with landscape stability, geographic barriers, and environmental gradients then determines the likelihood of formation of discrete breaks characteristic of phylogeographic lineages (64, 65). We expect these continuous and discrete measures of population isolation to be correlated; species with greater IBD should also exhibit greater levels of phylogeographic structuring. With our current sampling, we cannot test this expectation, and it also remains relatively untested in the broader literature (26). If continuous and discrete measures of population isolation are uncorrelated, then using alternate metrics of population isolation like the number of subspecies (66, 67), genetic population clusters (63), or phylogenetically distinct lineages within a species (27) might better explain speciation rate. That said, despite its simplicity, IBD explains 31% of the average variation in patterns of genetic divergence across our focal species (), suggesting IBD remains a powerful approach for measuring population isolation. Further, exploration of alternate metrics of isolation—including levels of genetic differentiation at a fixed geographic distance—recovers the same patterns seen when isolation is measured as IBD ().Additionally, population isolation might impact speciation in ways more complicated than typically outlined in verbal and formal models (61). For example, an increased propensity to form population isolates might lead to smaller populations, which would be more vulnerable to demographic fluctuations, making them more likely to go locally extinct (44). In such a scenario, species cohesion—which is expected to increase with gene flow—may have opposing effects on rate of population formation and population extinction. The net effect of increased population structure on speciation rate might then be neutral.Alternatively, perhaps population isolation is not a significant factor in determining speciation rate variation at all. Indeed, our power simulations suggest that, if such an effect exists, it is likely to be small (). Instead, other unmeasured intrinsic and extrinsic factors might be the source of speciation rate variation in squamates. Speciation consists of multiple stages, and variation in progress through any of these stages might influence speciation rate. For example, some species might evolve reproductive barriers more quickly than others, whether because they undergo more rapid body size divergence (11), more rapid changes in genome structure and organization (68), or more intense levels of sexual selection (34). If evolving reproductive barriers serves as a rate-limiting step on speciation, then species that evolve reproductive barriers more quickly will also have higher speciation rates (but see ref. 69). Additionally, a variety of factors might influence the propensity for new species to persist through time (19, 70), including the evolution of ecological traits that reduce competition and facilitate range expansions and/or secondary sympatry (2, 14, 71–73). Further, historical changes in the environment and geography can have marked effects on population fission and thus speciation rate (23–25). In our focal system, however, where all species are found in a common geographic region, the impact of extrinsic forces on speciation rate variation is likely to be more uniform across taxa. Of course, multiple factors could reasonably be acting concurrently or unevenly to influence speciation rates across the squamate tree of life. Given the broad phylogenetic scale of this study (∼50 species spanning all squamate diversity, representing ∼180 million years of evolution), the forces that control speciation rate variation might be so variable across clades that the overall impact of any single force ultimately is rather diffuse.Focused studies of species complexes—including many in lizards and snakes (74–77)—have revealed the central role of population isolation in the formation of new species, consistent with the conceptual model of speciation popularized by Ernst Mayr (2, 78). Given this, it is perhaps intuitive that population-level processes such as population isolation should predict broad-scale speciation patterns. Here, we have shown that “population isolation” is itself a trait that varies widely among clades, consistent with numerous studies that have directly or indirectly linked organismal traits to broad-scale patterns of population structure. And, like most other groups of organisms (9–11), squamate reptiles from Neotropical savannas are characterized by extensive clade-specific variation in rates of species formation (Fig. 1). However, we find no support for the hypothesis that population isolation—as indexed by contemporary patterns of population structure—is predictably related to phylogenetic variation in the rate of species formation.This study provides one of the most comprehensive tests of a single component of the speciation process as described by Mayr and others (2, 13, 14). Generating the comparative datasets to test the role of other steps in the speciation process will be challenging. In particular, we need to assemble datasets that measure the rate at which reproductive barriers evolve and the likelihood that populations persist across species that vary in their speciation rate. Relative to inferring present-day levels of population isolation, measuring reproductive barriers and species persistence is difficult. For example, quantifying reproductive barriers in even a single species or species complex can require decades of work for a single research group, in both controlled and natural settings (79–83). To understand how processes identified from single species groups contribute to broad-scale diversity patterns, biologists studying a diversity of taxa will need to collaborate to measure population-level processes in standardized ways that can be collated across study systems and tested within a common analytical framework (84). Only then can we truly understand the causes of speciation and the extent to which those causes underlie the dynamics of biological diversity at the largest scales of time and space.
Methods
Sampling and Genetic Data Collection.
We sampled squamate (lizard and snake) taxa broadly across clades that differ in speciation rates, thus increasing the power of our study by maximizing the amount of phylogenetically independent variation in both diversification rates (DRs) and biological traits (Fig. 2). Our study leverages over 20 y of field sampling in the Cerrado and adjoining biomes (46, 47, 85, 86). Most of these samples correspond to voucher specimens preserved at Coleção Herpetológica da Universidade de Brasília. Across all samples, we identified nominal species for which individuals were sampled at four or more locations. For each of those species, we included one individual per sampling locality. Given our levels of genomic sampling, sampling one individual is akin to sampling multiple individuals in a population for fewer loci (87). In total, we sampled 398 individuals from 66 nominal species (Fig. 2 and ).We sequenced homologous loci across species using a target capture approach. We used the Squamate Conserved Loci (SqCL) as our targets, a set of 5,462 loci that span ultraconserved elements, anchored hybrid enrichment loci, and single-copy genes used traditionally in phylogenetics (88–90). This bait set has been shown to work effectively across lizards and snakes (89, 91). For each individual, we extracted DNA from either tail or liver tissue using high-salt extraction (92). After shearing DNA to a modal length of ∼350 base pairs (bp) using a sonicator, the commercial provider RAPiD Genomics then prepared doubly indexed libraries following standard Illumina protocols. We created pools of 16 equimolar libraries, grouping closely related individuals into the same pool. The commercial provider Arbor Biosciences used these pools for the target capture experiment, following the myBaits v3 protocol. Finally, we combined six pools (or 96 libraries) and sequenced each set on one 125-bp paired-end lane of an Illumina HiSeq 4000.
Genetic Data Analysis.
Data processing.
For the raw sequencing reads per sample, we first trimmed adapters using Trimmomatic v0.39, removed low-quality sequences, and then merged overlapping paired reads using PEAR v0.9.11 (93, 94). Next, we assembled cleaned reads using Trinity v2.1 (95) and annotated the resulting contigs by comparing them to our SqCL targets via blat v36x2 (96).In target capture experiments, typically anywhere from 50 to 80% of sequencing reads map to targeted loci; the remaining reads are called by-catch. These by-catch reads can be used to assemble high-copy loci like the mitochondrial genome, which can be an important source of additional genetic data. We assembled mitochondrial genomes from our raw sequence reads using MITObim v1.9.1 (97). As a starting reference, we used the full mitochondrial genome from the most closely related species available on National Center for Biotechnology Information (NCBI) GenBank. We then annotated each assembled mitochondrial genome for the 13 coding mitochondrial genes and ribosomal RNA 12S and 16S genes using exonerate v2.4.0 (98). Finally, we extracted these loci, aligned them using mafft v7.310 (99), and concatenated them to generate a mitochondrial DNA (mtDNA) alignment across all individuals.
Species delimitation.
Studies across lizards and snakes from both temperate and tropical regions have shown that nominal species often consist of multiple, cryptic lineages (100–102). Accordingly, we first analyzed our sampled individuals’ genetic data to ensure that nominal species assignments reflected evolutionary relationships. We delimited lineages (OTUs) based on evidence for evolutionary cohesion and independence. As detailed below, we used a combination of phylogenetic, population genetic, and spatial data to identify putative species-level taxa that form a monophyletic group and show a single and continuous IBD curve across their geographic range. We refrain from providing formal taxonomic recommendations, given our study only collects genetic data.To determine whether individuals within nominal species form monophyletic groups, we first inferred a nuclear phylogeny across all individuals. We created locus-specific alignments using mafft, removing any loci sampled for <40% of individuals and any individuals sampled for <5% of loci. We then concatenated these alignments and used RAxML v8.2.11 (103) to infer a phylogeny across all individuals. To determine whether individuals within nominal species show coherent IBD relationships, we calculated pairwise genetic divergence (dxy) based on the concatenated alignment and determined geographic distances among individuals. For each nominal species, we plotted the monophyletic group spanning all the samples identified to that species, pairwise levels of genetic divergence (dxy) by pairwise geographic distance, and spatial distribution of samples. Using this approach, we were able to identify samples that had been misidentified in the field and cases of likely cryptic speciation (). For a few species complexes that require major taxonomic revision (e.g., the snake taxon Bothrops moojeni), this approach provided a provisional taxonomy. In general, we took a conservative approach and only revised taxa in which monophyletic groups included individuals from other nominal species or where a taxon consisted of two or more deeply divergent lineages.
Measures of divergence.
After defining provisional species limits, we then called genetic variants in each OTU. To do so, per OTU, we first created a reference set of loci, which comprised the longest assembly per locus across all individuals. We then mapped individual trimmed reads to the reference using bwa v0.7.15 (104). We called variants across all individuals using samtools v1.5 (105), filtered variants to retain only those with coverage > 20× and quality > 20, and used this variant set to recalibrate alignments using GATK v4.1.8 (106). We then called genotypes across variable and invariable sites using samtools, removing genotypes with coverage < 10× and sites with quality < 20.For all pairwise individual comparisons within an OTU, we calculated three metrics of genetic divergence: FST (107), nuclear dxy (108), and mtDNA dxy. We then used these metrics to calculate four different rates of differentiation. First, we calculated our focal metric for genetic differentiation (βIBD), which is the slope of inverse FST across the log of geographic distance (109). We determined whether βIBD shows evidence for phylogenetic signal using Pagel’s lambda () (110) as implemented in the R package phytools (111). Further, βIBD is predicted to vary as a function of both population density and dispersal (48, 109). Accordingly, we tested whether βIBD is predictable based on a few organismal traits that are proxies for these two properties: average genetic diversity (π), geographic range area, body mass, and elongation index. We calculated average genetic diversity across synonymous sites with >10× coverage and geographic range area using squamate geographic ranges presented in ref. 112. Body mass and elongation index were based on published morphological data (113). We calculated the elongation ratio as a ratio of length to diameter, treating each individual as a cylinder with length equivalent to its snout–vent length. Correlations between βIBD and these four organismal traits were tested using PGLS as implemented in the R package nlme (114).We calculated four additional measures that might better capture how individuals differentiate across space: magnitude of inverse FST at 1,000 km (which captures the effects of baseline differentiation), slope of nuclear dxy across geographic distance, slope of mtDNA dxy across geographic distance, and slope of FST across environmental distance. To calculate environmental distance, we extracted climatic data at each individual’s location using the CHELSA (climatologies at high resolution for the earth's land surface areas) climatic layers (115) and the R raster package (116). We then summarized these climatic data using a scaled and centered principal component analysis and calculated environmental distance between individuals as the summed Euclidean distance across all principal component axes. Finally, we assessed whether correlations between divergence and distance matrices were significant, using Mantel tests using the R package ade4 (117).
Comparison to ddRAD.
Many of the loci targeted in the SqCL dataset are highly conserved loci assumed to be under purifying selection (49). Using such loci can bias demographic inference (118). To test this possibility, we compared SqCL-based estimates of genetic divergence with those derived from double-digest restrictionsite–associated DNA sequencing (ddRAD). The ddRAD loci are randomly selected across the genome and are assumed to be mostly evolving neutrally (119). We used 61 individuals from 10 species of Australian lizards and snakes for which we have previously collected and analyzed both ddRAD and SqCL data (28, 120). We then calculated and compared the correlation in estimates of genetic diversity (π), genetic differentiation (FST), and rate of genetic differentiation (βIBD) across these two marker sets.
Phylogenetic Framework and Speciation Rate Estimation.
We estimated “recent” speciation rates (e.g., tip rates) across squamates, which index the instantaneous rate of lineage splitting in the limit as time approaches the present day (121). Although tip rates can be biased by incomplete taxon sampling, they are much less susceptible to biases that affect diversification rate as measured over deeper timescales and/or at earlier points in a clade’s history (122, 123). The distribution of branch lengths in a reconstructed phylogenetic tree is a function of rates of lineage splitting and extinction (121); the net diversification rate is the difference between the rate at which lineages give rise to new species and the rates at which lineages become extinct. Although tip rate metrics have been interpreted as estimates of net diversification rate (e.g., DR statistic), theory and simulations show that branch lengths near the tips of the tree are much more highly correlated with speciation rate than with net diversification rate (5).We estimated tip speciation rates across the phylogeny first published in Tonini et al. (50). This phylogeny is based on genetic data for 5,415 of 9,754 squamate taxa. Tonini et al. generated a pseudoposterior distribution of 10,000 timetrees by adding the remaining 4,339 taxa that lacked genetic data using a stochastic birth–death polytomy resolver (124); phylogenies constructed in this fashion yield conservative inferences about tip speciation rates and are more accurate than approaches that rely on analytical “sampling fraction” corrections (125).For each of 100 trees sampled randomly from the full Tonini et al. (50) distribution, we inferred speciation rates across each tree sample using both semiparametric and model-based approaches and then took the average tip rates across all trees to use in downstream analyses. For a semiparametric estimate, we used the inverse splits statistic (DR, DR) (6, 126). The DR statistic is a weighted mean of inverse branch lengths along the path connecting a given tip to the root of the tree, with recent branches afforded proportionately more weight. Although the metric weights recent splitting events more, it estimates tip speciation rates with reasonable accuracy, even in the presence of extensive stochastic variation in branch lengths (5). For a model-based approach, we used CLaDS (cladogenetic diversification rate shift model; 51) to estimate speciation rates (CLaDS). CLaDS applies a Bayesian approach to inferring speciation rates along a phylogeny and assumes that rates change after every speciation event. Both these approaches to measuring speciation rates appear capable of capturing fine-scale variation in speciation rates.Our taxonomic modifications identified a few OTUs that do not occur in the tree. For these OTUs, in all comparative analyses, we mapped them to a closely related congeneric taxon.
Test of Genetic Differentiation and Diversification.
To test whether rates of genetic differentiation (βIBD) positively correlate with rates of speciation, we used two complementary approaches. First, we tested the correlation between βIBD and speciation rate using a PGLS approach, implemented in the R package nlme (114). Here, we first took the natural log of both βIBD and speciation rate and inferred the phylogenetic signal of the residuals under a Brownian motion model. Second, we used the simulation-based approach ES-SIM (127), which tests for correlations between speciation rates and continuous traits. In this approach, traits are simulated along the given phylogeny, following the same Brownian motion model parameters inferred from the empirical trait data. The true correlation is then compared to the null distribution of correlations between the simulated traits and speciation rate. Under a range of trait evolution scenarios, ES-SIM has greater power than PGLS. We repeated these two tests for both speciation rates inferred using DR and ClaDS. Because of the sparseness of our IBD trait data (59 species-level taxa from a clade with ∼10,000 species), we cannot use formal state-dependent models of evolutionary diversification (128, 129).We additionally conducted a series of robustness analyses to ensure that our results were not due to technical or methodological artifacts. First, we investigated the effects of sampling. We iteratively dropped OTUs sampled for a minimum number of individuals and then tested the correlation between IBD slopes and speciation rates using these filtered datasets. Further, for those OTUs where we sampled more than five individuals, we subsampled all possible combinations of four individuals and measured IBD slopes to determine the potential for error due to small sample sizes. Additionally, we conducted individual-based forward genetic simulations using SLiM v3.6 (130) to determine how sample size affects error in IBD estimation (see SI Appendix for full details). Then, we checked for the effects of error in IBD slope measurement in three ways. First, we repeated our analysis after removing OTUs with nonsignificant IBD slopes. Second, we iteratively dropped OTUs based on minimum r and then tested the correlation between IBD slopes and speciation rates using these filtered datasets. Finally, we fit a phylogenetic mixed model with the squared SE of the estimated slope as a factor in the model [MCMCglmm (131)]; IBD slopes with greater error are weighted less in the model. Next, we used alternate rates of differentiation—absolute inverse FST at 1,000 km, rate of nuclear dxy divergence across geographic distance, rate of mtDNA dxy divergence across geographic distance, and rate of FST divergence across environmental distance—because these alternate estimates might better capture the differentiation of these OTUs across space (132). Then, we repeated our analyses using species designations as recognized by current taxonomy to ensure our results were not affected by our revised species delimitations. Finally, a modest number of individuals were sampled outside of the core Cerrado region (); these individuals might have experienced a different biogeographic history. We removed these individuals, recalculated IBD slopes, and again tested our hypothesis.Finally, we tested the power of our approach to recover a significant correlation between βIBD and speciation rate, should one exist. Here, we simulated traits on our phylogeny under a Brownian motion of model, with a known correlation to speciation rate. For a range of correlations from 0 to 1.0, we simulated 100 datasets and determined how often both PGLS and ES-SIM recovered a significant correlation. Then, to determine the potential error in estimates of correlation, we simulated 100,000 datasets, drawing correlations from a uniform distribution of −1 to 1. We accepted correlations that deviated by less than 0.01 from the observed value (r = 0.04). This yielded a distribution of actual correlations between βIBD and speciation rate that could potentially have generated our observed correlation.
Authors: Robb T Brumfield; Elizabeth P Derryberry; Michael G Harvey; Gustavo A Bravo; Santiago Claramunt; Andrés M Cuervo; Graham E Derryberry; Jaqueline Battilana; Glenn F Seeholzer; Jessica Shearer McKay; Brian C O'Meara; Brant C Faircloth; Scott V Edwards; Jorge Pérez-Emán; Robert G Moyle; Frederick H Sheldon; Alexandre Aleixo; Brian Tilston Smith; R Terry Chesser; Luís Fábio Silveira; Joel Cracraft Journal: Science Date: 2020-12-11 Impact factor: 47.728
Authors: Michael G Harvey; Glenn F Seeholzer; Brian Tilston Smith; Daniel L Rabosky; Andrés M Cuervo; Robb T Brumfield Journal: Proc Natl Acad Sci U S A Date: 2017-05-30 Impact factor: 11.205
Authors: Sol Katzman; Andrew D Kern; Gill Bejerano; Ginger Fewell; Lucinda Fulton; Richard K Wilson; Sofie R Salama; David Haussler Journal: Science Date: 2007-08-17 Impact factor: 47.728