Literature DB >> 34705499

Selection on ancestral genetic variation fuels repeated ecotype formation in bottlenose dolphins.

Marie Louis^1,2,3,4, Marco Galimberti^5,6, Frederick Archer^7,8, Simon Berrow^9,10, Andrew Brownlow¹¹, Ramon Fallon¹², Milaja Nykänen¹³, Joanne O'Brien^9,10, Kelly M Roberston⁷, Patricia E Rosel¹⁴, Benoit Simon-Bouhet², Daniel Wegmann^5,6, Michael C Fontaine^3,15,16, Andrew D Foote^17,18, Oscar E Gaggiotti¹.

Abstract

Studying repeated adaptation can provide insights into the mechanisms allowing species to adapt to novel environments. Here, we investigate repeated evolution driven by habitat specialization in the common bottlenose dolphin. Parapatric pelagic and coastal ecotypes of common bottlenose dolphins have repeatedly formed across the oceans. Analyzing whole genomes of 57 individuals, we find that ecotype evolution involved a complex reticulated evolutionary history. We find parallel linked selection acted upon ancient alleles in geographically distant coastal populations, which were present as standing genetic variation in the pelagic populations. Candidate loci evolving under parallel linked selection were found in ancient tracts, suggesting recurrent bouts of selection through time. Therefore, despite the constraints of small effective population size and long generation time on the efficacy of selection, repeated adaptation in long-lived social species can be driven by a combination of ecological opportunities and selection acting on ancestral standing genetic variation.

Entities: Chemical

Year: 2021 PMID： 34705499 PMCID： PMC8550227 DOI： 10.1126/sciadv.abg1245

Source DB: PubMed Journal: Sci Adv ISSN： 2375-2548 Impact factor: 14.136

INTRODUCTION

Understanding the processes that allow species to extend their ranges and adapt to novel environments is a long-standing question in biology, in which interest now extends well beyond this disciplinary field because of the potential effect of global change on species ranges. The colonization of novel environments may result in new selective pressures on individuals and promote local adaptation (). However, linking genetic divergence to local adaptation is particularly challenging as genetic differentiation may also arise because of demographic history () and other selective processes such as background selection (). Replicate adaptation of different populations to similar environments is often considered strong evidence of the repeated action of natural selection (). Hence, we can study repeated evolution to gain insights into the mechanisms driving genetic variation and adaptation. Iconic examples of repeated evolution include adaptation to similar environments—i.e., parallel adaptation to freshwater environments from marine habitats in threespine sticklebacks, Gasterosteus aculeatus (); adaptation to the same host species in stick insects, Timema cristinae (); high-altitude adaptation in multiple human, Homo sapiens, populations (); and different light conditions in cichlid fish ()—or similar responses to comparable stressors [e.g., myxoma virus ()]. Our understanding of the mechanisms involved in repeated phenotypic evolution has recently shifted from a binary view of identical versus idiosyncratic processes to a continuum ranging from parallel (i.e., selection on the same variants), to convergent (selection on different variants in the same genes or in pathways with similar functions), to nonparallel (i.e., selection specific to one population) (, –). Adaptation to novel habitats may occur rapidly if the genetic substrate which selection acts upon was already segregating in the ancestral population [i.e., standing genetic variation (SGV) (, , , , , )], as balanced polymorphisms (), or introgressed from a locally adapted outgroup (). In the latter scenario, gene flow may be an important driver of recurrent adaptation, implying that independent evolution is not necessarily a prerequisite (). Alleles present as SGV may have been selected in past environments, potentially increasing their chances to be the recurrent target of natural selection (). Recent studies have highlighted that the origin of the alleles that enable populations to recurrently adapt to similar environments may be much older than the divergence of the populations themselves (, , ). For example, the reservoir of freshwater-adaptive alleles in marine populations of threespine sticklebacks, which have been recurrently selected after freshwater colonization during the past 12,000 years and presumably during previous interglacials, has been segregating for millions of years (). With the rare exception of humans (, ), reported cases of parallel and convergent evolution (repeated adaptation) among populations almost exclusively involve relatively short-lived species (, , , , ). In long-lived species, such as large mammals, long generation time, low fecundity, and small effective population size may reduce the efficacy of selection (, ). In humans, cultural innovations such as animal domestication and the colonization of new habitats, such as high altitude, likely exposed extant neutral genetic variation to novel selective regimes (, ). In other long–life span social mammals, the colonization of new ecological niches followed by stable transmission of learned behaviors, such as foraging strategies or habitat preferences, may also create opportunities for natural selection to act upon locally adaptive genetic variation, although examples are scarce [but see killer whales, Orcinus orca ()]. Here, we tested for repeated evolution driven by ecological niche specialization in a highly social marine mammal, the common bottlenose dolphin, Tursiops truncatus, which has a worldwide temperate and tropical distribution. Two ecotypes of common bottlenose dolphins (“pelagic” or “offshore” and “coastal”) have repeatedly formed in multiple regions of the world (–). Coastal populations were suggested to have been founded from pelagic source populations (–, ). They are thus an excellent study system to test whether repeated evolution occurred and involved the same molecular processes during the repeated colonization of coastal habitats. Throughout their range, coastal populations have different diets compared with pelagic populations (, ) and can display phenotypic traits adapted to coastal waters, in particular, for feeding (–). Coastal populations in distinct regions of the world can share some morphological traits such as larger teeth, rostra, and internal nares when compared to pelagic populations. They can also show some population-specific traits, such as body size in western North Atlantic (WNA) coastal bottlenose dolphins, which are smaller than their pelagic counterparts, while in other regions, the pattern is reversed or there are no discernable differences (–). There is no overall convergence in the morphology of the species in coastal habitats across its range. However, coastal populations tend to share more cryptic behavioral phenotypes, such as strong site fidelity and reduced dispersal (, , ), and stable foraging ecology, which can be socially transmitted from mother to calves and from conspecifics in the same social group (, ). The colonization of coastal habitats may have created opportunities for local adaptation to arise, and social behavior and learning abilities may have facilitated the transmission of advantageous learned behaviors, such as habitat-specific foraging techniques. In this study, we tested the hypothesis that SGV was repeatedly targeted by selection when the common bottlenose dolphins colonized the coastal environment. We first identified population structure and demographic history. We showed that the pelagic and coastal ecotype pairs have a complex non–tree-like history. Then, we assessed the evidence of parallel selection to the coastal habitat across the genome and identified candidate genes under parallel linked selection, potentially involved in cognitive abilities and feeding.

RESULTS AND DISCUSSION

Genetic structure

We analyzed the genomic variation at single-nucleotide polymorphisms (SNPs) by sequencing the whole genomes of 57 common bottlenose dolphins (Fig. 1A) at a mean sequencing depth (± SD) of 10.56X (± 2.18), after read quality and variant filtering (see table S1 and Materials and Methods). Our sampling includes 10 pelagic and 10 coastal individuals from the eastern North Atlantic (ENAp and ENAc), 10 pelagic and 7 coastal individuals from the western North Atlantic (WNAp and WNAc), and 11 pelagic and 9 coastal individuals from California, eastern North Pacific (ENPp and ENPc). Whenever possible, we used genotype likelihoods in the analyses to account for the uncertainty inherently present in low-depth sequencing data (, ) or used methods based on allele frequencies (table S2).

Fig. 1.

Sampling location and population structure of coastal and pelagic common bottlenose dolphins.

Sampling location and population structure of coastal and pelagic common bottlenose dolphins.

(A) Map of sample locations of the common bottlenose dolphin ecotypes, in the eastern North Atlantic (ENA), western North Atlantic (WNA), and eastern North Pacific (ENP). (B) Ancestry proportions for each of the 57 individuals inferred in NGSAdmix () for a number of clusters, K = 4, identified as the highest level of structure using the Evanno method (). (C) PCA showing the first and second PCs. The proportion of genetic variance captured by each component is indicated between parentheses. The genetic structure obtained from a principal components analysis (PCA) () and the individual-based ancestry and clustering analysis of NGSAdmix () based on a set of 798,572 unlinked high-quality SNPs indicated that the samples assigned a priori to a population clustered together (Fig. 1, B and C, and figs. S2 and S3). The analyses showed two major axes of differentiation: Atlantic versus Pacific (PC1) and pelagic versus coastal (PC2). Coastal populations drove this pattern in both cases: ENPc along PC1 and WNAc along PC2 (Fig. 1C). The ENAc population was intermediate between the WNAc and Atlantic pelagic populations. Genetic differentiation among the three pelagic populations was less than between any pelagic and coastal populations, even those in parapatry (Fig. 1, B and C, figs. S2 and S3, and pairwise FST in table S3). These patterns of differentiation suggest that each coastal population has a history that includes both population-specific drift and drift shared with the other coastal populations and/or differences in gene flow with parapatric pelagic populations.

Evolutionary relationships among ecotypes

To test whether our three geographic pairs of coastal and pelagic ecotypes represent independent divergence events, we reconstructed their population histories using approaches that draw inferences from covariance in allele frequencies among populations. We first explored the evolutionary relationships among populations and potential admixture events using TreeMix (), using the Indo-Pacific bottlenose dolphin, Tursiops aduncus, as an outgroup to root the tree. The TreeMix results may be difficult to interpret as our samples cover only a part of the geographical distribution of the species, implying that there are ghost populations in the tree inference. Thus, caution should be taken against any strong interpretation on branching orders and migration edges. All the internal branch lengths were relatively short in the TreeMix analyses, consistent with populations rapidly radiating from a shared ancestral population. Branch lengths support marginal drift in the pelagic populations, in contrast to stronger drift in allele frequencies in the coastal populations, from a shared common ancestral gene pool (Fig. 2A, figs. S4 and S5, and Supplementary Text). Given this, the allele frequencies in present-day pelagic populations will be closer to the common shared ancestral population than to any of the present-day coastal populations. The best supported population tree had two migration edges, supporting the existence of conflicting genealogies across the genome (Fig. 2A). There was no clear single bifurcation between pelagic and coastal ecotypes or between Pacific and Atlantic populations. There were also no clear independent pelagic and coastal bifurcations associated with each geographic region.

Fig. 2.

Admixture among populations of common bottlenose dolphins.

Admixture among populations of common bottlenose dolphins.

(A) Left: TreeMix consensus tree and bootstrap values displaying the relationships among populations as a bifurcating maximum-likelihood tree with two migration edges (M = 2), inferred as the best topology. Branch lengths on the horizontal axis represent the amount of genetic drift that has occurred along each branch. Bootstrap supports for each of the nodes are indicated. Right: Residual fit of the observed versus the predicted squared allele frequency difference, expressed as the number of SE of the deviation. SE values are represented by colors according to the palette on the right. Residuals above zero indicate populations that are more closely related to each other in the data than in the best-fit tree and have potentially undergone admixture. Negative residuals represent populations that are less closely related in the data than represented in the best-fit tree. (B) Patterns of allele sharing expressed using the F4 statistics of the form F4(pelagic, coastal; pelagic, coastal) or F4(pelagic, pelagic; coastal, coastal). All SE estimations are less than 1 × 10−4, and all F4 statistics were significant on the basis of z scores greater than 3, which is the equivalent of a significance of P < 0.0026. The red line on the x axis of the TreeMix graph in (A) represents the extent of variation in the shared drift parameter represented on the x axis in the F4 statistics plot in (B) and illustrates that drift in coastal populations is largely population specific. The placement of the red line on the x axis is arbitrary and not indicative of where on the tree shared drift is inferred to have occurred. We further tested whether geographic pairs of pelagic and coastal ecotypes had evolved independently by exploring patterns of correlated changes in allele frequencies between two pairs of populations, estimated using F4 statistics (, ) of the form (pelagic, coastal; pelagic, coastal), where x and y represent different geographic regions. We also calculated F4 statistics comparing drift between the same ecotypes from different geographic regions, F4(pelagic, pelagic; coastal, coastal), therefore testing for a single origin of each ecotype. Tests of both forms were all significantly positive (Fig. 2B and Supplementary Text), indicating that none of the topologies were perfect representations of the relationships among the four populations. Statistics of the form F4(XNAp,ENPp;XNAc,ENPc), with X representing east or west, had the highest values, which were much higher than those of F4(XNAp,XNAc;ENPp,ENPc), which were close to 0, indicating relatively independent evolution of the ecotype pairs in the Atlantic and the Pacific. In contrast, F4(ENAp,WNAp;ENAc,WNAc) was the lowest of all and lower than F4(ENAp,ENAc;WNAp,WNAc), suggesting that the two Atlantic coastal populations may be derived from the same ancestral pelagic population and/or experienced gene flow. Thus, TreeMix and F4 statistics both support complex non–tree-like relationships. However, our TreeMix, NGSAdmix, and PCA results indicate that each coastal population has experienced independent histories to such an extent that allele frequencies are clearly differentiated among all coastal populations. Ultimately, a more comprehensive geographical sampling together with the temporal resolution offered by haplotype-based inferences may be needed to fully resolve these relationships ().

Demographic history

We then estimated historical variation in effective population size (Ne) for each population, which reflects changes in genetic diversity because of variation in population size, population structure, gene flow (), and linked selection (i.e., background selection or selective sweeps) (, ). For that purpose, we used the coalescent-based approach SMC++ () to carry out two analyses, one with the putatively neutral regions as identified by Flink [see details below ()] and another with all regions. As results were very similar, we present here the results with the putatively neutral regions (Fig. 3A and fig. S6); those with all regions can be found in the Supplementary Materials (fig. S7, A and B). We found that pelagic populations experienced demographic expansions, ~150,000 to 120,000 years before present (yBP), followed by a period of more stable Ne than the coastal populations (Fig. 3A and figs. S6 and S7, A and B). Population expansion in all populations during the first part of the last glacial period, which started ~115,000 yBP, may reflect changes in connectivity, rather than an increase in Ne, as suitable habitat became scarce ().

Fig. 3.

Demographic history of common bottlenose dolphin populations.

Demographic history of common bottlenose dolphin populations.

(A) Changes in effective population size through time inferred for each common bottlenose dolphin population using SMC++, using a mutation rate of 2.56 × 10−8 substitution per nucleotide per generation () and a generation time of 21.1 years (). The timing of the onset of the last glacial period (110,000 yBP) and the Last Glacial Maximum (26,500 to 19,000 yBP) are indicated in gray shading. (B) Split time between ecotypes within each region estimated using SMC++. Populations are eastern North Atlantic coastal (ENAc), eastern North Atlantic pelagic (ENAp), eastern North Pacific coastal (ENPc), eastern North Pacific pelagic (ENPp), western North Atlantic coastal (WNAc), and western North Atlantic pelagic (WNAp). (C) Tajima’s D estimated for each population; the violin plots indicate the kernel probability density of the data, the box indicates the interquartile range, and the horizontal marker indicates the median of the data. Coastal populations experienced more erratic fluctuations in Ne than the pelagic populations. Ne in the ENA coastal population closely followed the same trajectory as the ENA pelagic population up to 50,000 yBP, suggesting that they were a single ancestral population up to this point. In contrast, the WNA and ENP coastal populations displayed different Ne changes than their pelagic counterparts from ~150,000 and ~115,000 yBP, respectively. We observe a long period of low Ne in the WNA and ENP coastal populations from 25,000 to 12,000 yBP. The ENA coastal population showed a steady decrease from 70,000 to 7000 yBP. This suggests that during the Last Glacial Maximum (LGM), coastal populations had low Ne or were fragmented. Thus, all coastal populations went through an inferred reduction in Ne followed by a postglacial expansion. However, other mechanisms, which are not accounted for in SMC++, could be confounding factors in these demographic inferences; similar Ne trajectories can result from changes in gene flow and population structure () and/or linked selection (, ). Reduced nucleotide diversity, Watterson’s theta, and consequently positive Tajima’s D estimates—in particular, for the ENAc and ENPc populations (Fig. 3C and figs. S8, A and B, and S9)—and large amount of drift (Fig. 2A and fig. S5) may also indicate that the coastal populations have experienced reductions in Ne and suggest that they are derived from larger ancestral populations. Again, other processes may influence those estimates such as gene flow, which can lead to positive Tajima’s D estimates (). Nevertheless, access to novel, previously ice-covered shallow coastal habitats during past climate change such as at the end of the LGM in the ENA (, ) or during warm interstadials in the WNA and ENP has likely created opportunity for ecological differentiation. Coastal habitats provide a mosaic of environments and different and potentially more stable food resources (, ). We estimated divergence time between the two ecotypes within each region using SMC++ on putatively neutral regions. The oldest divergence between pelagic and coastal ecotypes occurred in the WNA (around 80,000 yBP), and the youngest was around 12,000 yBP, during a postglacial divergence in the ENA (Fig. 3B and fig. S10, A and B). We acknowledge that these estimations do not consider gene flow, which may have occurred between ecotypes since their divergences and possibly not at the same rate in all three regions. Overall, our analyses reveal a complex, reticulated evolutionary history of common bottlenose dolphins, with the pelagic populations being genetically similar to the common ancestral population. In contrast, each coastal population has experienced strong population-specific drift. Local PCA (), a method that describes heterogeneity in patterns of relatedness among populations (fig. S12 and Supplementary Text), indicated that the dolphin genomes were composed of regions with different evolutionary histories. This analysis supports both the demographic histories suggested by admixture plot and PCA results and those inferred by the F4 statistics and TreeMix where the ENA and WNA coastal populations were more closely related than expected under entirely independent ecotype splits on each side of the Atlantic (Fig. 2, A and B, and fig. S5). Furthermore, on PC3 and PC4, coastal populations from the Atlantic and Pacific clustered together and likewise for the pelagic populations, potentially suggesting parallel ecotype-based processes (fig. S12).

Mechanisms of repeated evolution to coastal habitat

To test whether the above results can indicate repeated selective sweeps associated with coastal habitat, we used Flink (), an extension of BayeScan () that takes linkage among loci into account. It uses a hierarchical island model, and we considered ecotype (coastal versus pelagic) as the top hierarchical level. Although details about the evolutionary history of the species need to be further studied, our results support this hierarchical structure. In particular, NGSAdmix results with K = 2 to 4 group all pelagic populations in the same genetic cluster (Fig. 1 and fig. S2C). Furthermore, TreeMix and F4 statistics results group the Atlantic populations by ecotype (Fig. 2, A and B, and fig. S5). Our analyses show notable differences in patterns of inferred selection involving mainly divergent selection between coastal and pelagic ecotypes (higher hierarchy) and both divergent selection and selection homogenizing allele frequencies (less genetic differentiation than expected under neutrality) among coastal populations (Fig. 4, A and B, Supplementary Text, and figs. S13A and S14A). For the sake of brevity, in what follows, we describe the latter as “homogenizing selection” [but note that it is referred to as “balancing selection” in Flink () and BayeScan ()].

Fig. 4.

Patterns of selection within and between common bottlenose dolphin ecotypes.

Patterns of selection within and between common bottlenose dolphin ecotypes.

(A) Boxplots of the genomic patterns of selection that are the proportion of neutral, homogenizing, and divergent loci within coastal (C) and pelagic (P) ecotypes and between the two ecotypes (CvsP). (B) Patterns of selection (divergent: yellow, homogenizing: blue) inferred using Flink from one super-scaffold for the different hierarchical groupings that are between coastal and pelagic populations (top), among pelagic populations (middle), and among coastal populations (bottom). The y axis indicates the locus-specific FDR for divergent (orange) and homogenizing (blue) selection. The black dashed line shows the 1% FDR threshold, above which we consider a locus under selection. Divergent selection patterns may be inflated by false positives associated with drift in coastal populations. We therefore conservatively consider only the 7165 SNPs inferred to be evolving under both homogenizing selection among coastal populations and divergent selection between ecotypes, as putative loci underlying parallel evolution to geographically distant coastal habitats (Fig. 5, A to C, and Supplementary Text), and focus on those variants in the rest of our study. Considering the possible origins of the variants inferred to be underlying parallel evolution, we find that most (87%) were polymorphic, i.e., present as SGV, in the pelagic populations, and 57% were polymorphic in all three pelagic populations. This suggests that each of the coastal populations had access to much of the same SGV, which was responsible for the homogenization of allele frequencies across geographically distant coastal populations. This same SGV would also be responsible for the differentiation between coastal and pelagic populations at these genomic sites. This can be visualized in the PCA and unrooted neighbor-joining tree based on these 7165 SNPs, where the populations cluster by ecotype (Fig. 5, B and C). In these analyses, the Atlantic coastal populations are more closely related to each other than to the ENPc population, and we therefore acknowledge that some of those 7165 SNPs may be under selection within the Atlantic only, possibly due to their partially shared ancestry or more similar SGV within oceans.

Fig. 5.

Patterns of genetic variation of the 7165 SNPs under parallel linked selection to coastal habitat, i.e., under both homogenizing selection among coastal populations and divergent selection between ecotypes.

Patterns of genetic variation of the 7165 SNPs under parallel linked selection to coastal habitat, i.e., under both homogenizing selection among coastal populations and divergent selection between ecotypes.

These SNPs included closely linked sites scattered across the genome in 362 regions separated by at least 100 kb. (A) Plot of the homozygote reference genotypes in blue, heterozygote in green, and homozygote for the alternate allele in red. (B) PCA and (C) neighbor-joining distance tree showing the genetic structure of the common bottlenose dolphin samples for this particular SNP set. Note that Flink (and any other genome-scan method) is more likely to detect sites linked to the targets of selection rather than the targets themselves (). These 7165 SNPs are linked among themselves [median distance of 54 base pairs (bp)] into 362 distinct clusters, which are separated by at least 100 kb and, therefore, may represent linked selection acting upon a much smaller number of haplotypes. The genotypes at those SNPs are mainly heterozygous (Fig. 5A), and the site frequency spectrums (SFSs) of the variable sites are shifted toward intermediate allele frequencies in all three coastal populations (fig. S18), in contrast to the SFS for all the SNPs (fig. S9). These two observations are consistent with selective sweeps from SGV. Although loci directly under selection are expected to quickly become fixed for the beneficial allele, linked neutral loci are expected to have alleles at intermediate frequencies () and therefore a high prevalence of heterozygous genotypes, such as observed in the coastal dolphin populations (Fig. 5A). In addition, under incomplete soft selective sweeps from SGV, we would expect both loci directly under selection and neutral loci closely linked to the selected variant to have intermediate frequencies, again as we see in the coastal populations. Incomplete sweeps are expected when effective population size is reduced and under low migration, such as in human populations () and the coastal bottlenose dolphin populations studied here. We therefore hypothesize that these 7165 SNPs likely include hitchhiking SNPs closely linked to the selected variants. However, we cannot exclude that some of them may be under balancing selection, i.e., due to heterozygote advantage/heterosis or frequency-dependent selection. We hereafter refer to these 7165 SNPs as evolving under parallel linked selection across geographically distant coastal populations, as they may include both the targets of selection and sites in tight linkage to them.

Evidence of ancient origins of selected variants

Given our finding of parallel linked selection in coastal populations separated across spatial scales, we hypothesize that this process may have occurred recurrently during coastal habitat colonization in previous interglacial periods. Under this proposed scenario, the present-day coastal populations of bottlenose dolphins would be only the latest of a series of postglacial colonizers to make use of these alleles. While we do not have genomic data from previous interglacials with which to investigate this hypothesis, we can make a prediction that SGV that has been subject to recurrent bouts of selection across multiple interglacials would be found in tracts of older ancestry. By this, we mean that the estimates of the time to the most recent common ancestor (TMRCA) of coastal and pelagic populations would be older for tracts containing SNPs under parallel linked selection than the genome-wide average. To identify such tracts in coastal dolphin genomes, we searched for clusters of dense private mutations () segregating in each coastal population relative to parapatric pelagic populations (see Materials and Methods, Supplementary Text, and fig. S11), taking variation in mutation rate along the genome into account. Regions of high densities of mutations private to each coastal population relative to the parapatric pelagic populations are indicative of an older TMRCA, and we hereafter refer to such tracts as “ancient.” We found ancient tracts in all three coastal populations (tables S4 and S5 and Supplementary Text), with the length of all those tracts being between 10 and 25 Mb (table S4). The inferred TMRCA of these ancient tracts (0.6 to 2.3 million years) was much older than those of the rest of the genome (0.1 to 0.4 million years) (table S5). The divergence dates of T. aduncus and T. truncatus estimated by Moura et al. () and McGowen et al. () are close to the TMRCA of the ancient tracts found in the WNA coastal individuals (1.0 to 2.6 million years; table S5), after correcting for the different mutation rates used between studies. We found that a large proportion (66%) of the 7165 candidate SNPs under parallel linked selection in coastal populations were found in these ancient tracts. In contrast, only an average of 22% (range, 21.2 to 22.7; SD, 0.32) of 100 random samples of the same number of putatively neutral SNPs were found in the ancient tracts (fig. S19). The ancient tracts containing coastal-associated alleles could have been introgressed from an unsampled “ghost” population (, ), which diverged from the sampled populations a long time ago, so that the introgressed regions contain mutations, which accumulated in the ghost population over time, likely close to the split time between T. truncatus and T. aduncus. The spread of these ancient alleles may have also occurred by gene flow between coastal populations. However, we do not have further support for these two hypotheses, and it is difficult to explain how gene flow could have happened between coastal populations in the Pacific and the Atlantic. Alternatively, coastal populations could have independently diverged from the same or closely related ancestral pelagic populations, making repeated adaptation through shared initial ancient SGV a possible alternative explanation for our results. This hypothesis may hold particularly for the Atlantic populations, given their partially shared ancestry, and could explain the stronger patterns of parallel linked selection within the Atlantic (Fig. 5, B and C). Another parsimonious hypothesis, considering the relatively independent demography of ecotype pairs in the Atlantic and Pacific, and given the prevalence of these SNPs as SGV in the pelagic populations, is that coastal adaptation occurred in different oceans by repeated selection through space and time on ancient SGV, which persisted at low frequencies in the large pelagic populations. There are precedents for such recurrent use of old SGV in nature; ancient polymorphisms have enabled rapid parallel ecotype formation in saltmarsh beetles () and in threespine sticklebacks (, ). In sticklebacks, freshwater-adapted alleles have persisted as SGV in the large marine populations as a result of episodic recurrent gene flow from freshwater populations (the so-called transporter hypothesis) (, ). A similar mechanism could apply to our study system; most of the SNPs inferred to be evolving under parallel linked selection in coastal populations are located in ancient tracts. The age of these tracts (0.6 to 2.3 million years) predates the start of the last glacial period (115,000 to 11,700 yBP) and of many other previous Quaternary glacial periods (table S5). We therefore speculate that ancient tracts containing variants evolving under parallel linked selection may have contributed to the recurrent colonization of newly available coastal habitats by bottlenose dolphins during past interglacial periods. In addition, we propose that migration back into the pelagic populations potentially retained these ancient tracts as standing variation at low or intermediate frequency within the pelagic populations. We see this akin to the “sieving” of balanced polymorphism during the speciation process proposed by Guerrero and Hahn (). Together, our results contribute toward the emerging hypothesis that old polymorphisms may allow rapid ecotype formation when new ecological opportunities arise and, ultimately, ecological speciation ().

Patterns of selection and ecology and behavior

Although the exact evolutionary scenario involved in repeated evolution still requires further investigation, likely involving extensive sampling across the range of the species, our results together with previous studies on human populations represent rare examples of species with long generation time for which repeated evolution from SGV has been uncovered (, ). In humans, similar stable lifestyles [e.g., life in high altitudes ()] or same cultural revolutions [e.g., cattle domestication ()] likely created opportunities for cryptic phenotypes such as resistance to hypoxia or lactase persistence to become beneficial and for convergent phenotypic adaptation to occur. Nonhuman examples of socially driven local adaptation are scarce, but killer whale ecotypes have likely evolved as a result of demographic history, ecological opportunity, and gene-culture interactions (). Coastal bottlenose dolphins (Tursiops sp.) also exhibit complex behaviors, such as stable habitat specializations or social learning of foraging techniques, that strongly influence their patterns of genetic variability (, , ), and we hypothesize that these also facilitated their ability to adapt to novel conditions. Although further investigation is warranted as many complex traits may be polygenic () and it is difficult to prove causal relations between behavioral/ecological traits and genes under selection, the 7165 SNPs inferred to have evolved under parallel linked and recurrent selection overlapped with 45 genes. Those include genes related to behavioral and ecologically relevant functions, and thus cryptic phenotypic variations (Supplementary Text and table S7). We found genes related to cognitive abilities, learning, and memory [RELN (, ) and ADARB2 ()]. RELN encodes for the reelin protein, which has a role in the modulation of synaptic transmission in response to experience (, ). Coastal bottlenose dolphins (Tursiops sp.) develop habitat-specific foraging techniques, which are transmitted maternally or in social groups (, ) and may require genetic adaptations for increased cognitive abilities. RELN has been found under positive selection in sea otters, which also exhibit maternally transmitted foraging behavior (). Other ecologically relevant genes include those involved in lipid metabolism and storage [AGK, LPIN2, and KLB ()], which may be involved in adaptation to the differing diets documented in coastal populations, mainly involving large fish, and pelagic populations, which primarily eat pelagic fish and squid (, ). We acknowledge that there may be other processes contributing to convergent ecotype adaptation, driven by selection on SGV or new mutations in other regions of the genome, and which are not shared by different ecotype pairs but that were not detected in our analysis, focusing on testing for parallel patterns of evolution. In addition, we observed that 113,530 SNPs were under divergent selection among coastal populations, and although these may include false positives due to extensive drift in coastal populations, this suggests divergent selection linked to heterogeneous habitat. This is not unexpected given the environmental variations of coastal habitats across the bottlenose dolphins’ range. Our findings corroborate other studies highlighting that other processes such as environmental heterogeneity may contribute to genotypic and phenotypic variation within ecotypes (–). This holds even for the most emblematic example of parallel evolution, the threespine sticklebacks, where deviation from parallel adaptation may be the result of geographic distance, stochastic processes, and adaptation to environmental variation within habitat types (, ). To conclude, we find that selection acting upon ancient SGV fueled repeated adaptation of common bottlenose dolphins to coastal environments. Recurrent bouts of selection on genetic variation may promote adaptation to coastal habitat via reusing linked variants with minimal pleiotropic effects, thereby facilitating their persistence at low frequency in source populations and enabling repeated evolution of derived populations at the range margins (). Our study contributes to the growing body of evidence that ancient polymorphisms are a major substrate for rapid ecological adaptive divergence () and can have a key role in local adaptation of long-lived organisms. We propose that such variation has been the source of past adaptation during the glacial cycles and may prove to be critical for species to cope with the current rapid directional environmental changes driven by current global climatic change.

MATERIALS AND METHODS

Ethics statement

Samples were collected under permits MMPA Permit 779-1633 and MMPA Permit 779-1339 for the WNA and NMFS 14097 for the ENP. They were shipped from the Southwest Fisheries Science Center, NOAA Fisheries, USA, to the University of St. Andrews, UK, under CITES institutional permits US057 and GB035, and the U.S. Fish and Wildlife Service permit 16US690343/9 and to BGI in Hong Kong, China, under CITES export permit 547016/01.

Sample collection

Epidermal tissue samples were collected from 57 bottlenose dolphins (Fig. 1A, table S1, and Supplementary Text).

Laboratory procedures

DNA extraction protocol procedures are detailed in the Supplementary Text. Library and whole-genome resequencing was performed at the Beijing Genomics Institute (BGI). Illumina libraries were built on 300-bp DNA fragments and sequenced on an Illumina HiSeq X Ten platform (Supplementary Text).

Data processing and filtering

The read trimming and mapping and data filtering are described in detail in the Supplementary Text. Sequencing reads were processed with Trimmomatic v.0.32 () using default parameters, and sequence reads shorter than 75 bp were discarded. The remaining filtered reads were first mapped to a modified version of a published common bottlenose dolphin mitochondrial genome (GenBank: KF570351.1) (). Reads that did not map to the mitochondrial genome were then mapped to the reference common bottlenose dolphin genome assembly (GenBank: GCA_001922835.1, NIST Tur_tru v1) using BWA mem (v.0.7.15) with default options (). Picard-tools v.2.1.0 () was used to add read groups, merge the bam files from each individual from the different lanes, and remove duplicate reads. Then, indel realignment was performed using GATK v.3.6.0 (). We kept only the mapped reads with a mapping quality of at least 30 and removed repeated regions as identified using RepeatMasker (), regions of excessive coverage, and the sex chromosomes (see details in the Supplementary Text).

SNP calling using genotype likelihoods

We called SNPs taking genotype uncertainty into account by calculating genotype likelihoods in ANGSD v.0.913 () using the samtools model (GL 1) and keeping SNPs with a minimum allele frequency (MAF) of 0.05 and having data in a minimum of 75% of the individuals. In ANGSD, all analyses described below were run considering only SNPs with a phred quality and a mapping quality score of 30. We further filtered the bam files by excluding SNPs that showed both significant deviation from Hardy-Weinberg equilibrium and an inbreeding coefficient (F) value <0 within populations in ANGSD, as these can be the result of paralogs or other mapping artifacts.

Linkage disequilibrium pruning and population structure analyses

We excluded one individual (sample 7Tt182 from the WNA pelagic population) from the population structure analyses, which were not based on population allele frequencies, as this individual had a coverage much lower than the others (table S1). We used NgsLD () to obtain a set of unlinked SNPs (Supplementary Text). Population structure analyses, admixture analysis in NGSAdmix (), and PCAs in PCAngsd () were run using a set of 798,572 unlinked SNPs. Note that all SNPs were included in the other analyses. NGSAdmix was run 10 times for each K value between 2 and 8, using a tolerance for convergence of 1 × 10−10 and a minimum likelihood ratio value of 1 × 10−6. Consistency between runs was checked, and the runs with the highest likelihood were plotted. The highest level of structure was identified using the Evanno method ().

Ancestral state reconstruction

We describe how we reconstructed the ancestral state of alleles in the Supplementary Text.

Genotype calling

We called variants (i.e., generation of a vcf file) using samtools v.1.2 mpileup and bcftools multiallelic and rare-variant calling, option –m on the filtered bam files (, ). Variable sites with a minimum mapping quality of 30, a phred score quality of 30, and genotype quality of 20 were retained in vcftools v.0.1.16 (). We kept SNPs with a minimum MAF of 0.05 and having genotype data in a minimum of 75% of all the individuals and a minimum of five individuals in each of the six populations. The vcf file was also filtered for monomorphic and nonbiallelic sites, totaling 2,003,833 SNPs. Coverage was estimated using vcftools. A vcf file was used as an input for the analyses described below apart from the unfolded SFS and diversity estimates, which were estimated using genotype likelihoods in ANGSD, and the ancient ancestry analyses, which were based on pseudohaploid calls (random sampling of an allele at each site; see details below and table S2).

Admixture analyses

We reconstructed the relationships of the coastal and pelagic ecotypes from the different regions as a maximum likelihood bifurcating tree using TreeMix version 1.13 (Supplementary Text) (). We ran TreeMix using one individual T. aduncus [SRX2653496/SRR5357656 ()] as a root. Reads of this T. aduncus individual were mapped to the common bottlenose dolphin reference genome assembly as described above and processed as described earlier for our data. The vcf file was further filtered for sites with missing data in T. aduncus. We first ran TreeMix 10 times for each value of M (migration events) ranging from 0 to 10 (-global -k 1000). We estimated the optimal number of migration events to two using the optM R package (https://cran.r-project.org/web/packages/OptM/index.html) and a possible suboptimal number of five migration events. We then ran TreeMix 100 times for 0 (as null model) to five migration events and obtained a consensus tree and bootstrap values using the BITE R package (). The residual covariance matrix was estimated for each M value and the consensus tree using TreeMix. We then estimated F4 statistics to test whether geographic pairs of pelagic and coastal ecotypes had evolved independently (, ). The F4 statistics can be used to test whether a given tree describes accurately the relationships among four test populations and to detect admixture events (see Supplementary Text for details). F4 statistics were computed for each possible combination of population using the fourpop function in TreeMix version 1.13 (). We accounted for linkage disequilibrium by jackknifing in windows of 1000 SNPs. This block jackknife was used to obtain an SE on the estimate of the F4 statistics and test for significance using a z score. We computed demographic history, that is, changes in effective population sizes (Ne) through time and ecotype splits within a region and splits of the different pelagic ecotypes, using the program SMC++ v.1 (). Details of the analysis procedure, run on autosome scaffolds that were more than 10 Mb and on a vcf file not filtered for any MAF, are provided in the Supplementary Text. Briefly, the repeated regions and excessive coverage regions were included as a mask file so that they were not misidentified as very long runs of homozygosity. The analysis was run both using all regions and taking out all the regions under selection, as identified with Flink (see below). Regions under selection were defined as 50 kb around each outlier SNPs under any type of selection. Regions under selection were included in the mask file when they were taken out from the dataset. Population size histories and split times between ecotypes in each region were estimated using the default settings, a generation time of 21.1 years for the species (), and two different mutation rates. Mutation rates were (i) 9.10 × 10−10 substitutions per site per year that is 1.92 × 10−8 substitution per nucleotide per generation () and (ii) 1.21 × 10−9 substitution rate per site per year () that is 2.56 × 10−8 substitution per nucleotide per generation. Results were plotted in R v.3.6.1 (Supplementary Text) ().

Diversity and population structure statistics

We estimated the unfolded SFS and the 2D-SFS using the ancestral state and nucleotide diversity, Watterson’s theta, and Tajima’s D for each population using ANGSD v.0.921 (see details in the Supplementary Text). We calculated nucleotide diversity and Watterson’s theta for each site, and then we estimated both the latter and Tajima’s D using a sliding-window size of 50 kb and a step size of 10 kb. We estimated mean pairwise-weighted FST using vcftools across all sites.

Ancient ancestry analyses

Ancient tracts introgressed into the coastal ecotype from a divergent lineage after splitting from the pelagic source population or differentially sorted from structure in an ancestral population will contain clusters of private alleles, and the density of which will depend on the divergence time of the introgressing and receiving lineages (fig. S11 and Supplementary Text) (, ). We therefore set out to screen for genomic tracts of consecutive or clustered private (i.e., relative to the allopatric pelagic individuals) alleles in each of the individuals from the coastal ecotype, taking variation in mutation rate along the genome into account. To ensure that the results are comparable despite variation in coverage between samples at some sites, we randomly sampled a single allele at each site from each diploid modern genome in all scaffolds longer than 1 Mb using ANGSD. We did not apply any MAF filter for this analysis. For the outgroup, we used all variants found in a dataset consisting of all non-allopatric pelagic samples (fig. S11). We then used a hidden Markov model (HMM) to classify 1-kb windows into “nonancient” and ancient states based on the density of private alleles (). The background mutation rate was estimated in windows of 100 kb, using the variant density in all individuals. We then weighted each 1-kb window by the proportion of sites not masked by our RepeatMasker and CallableLoci bed files. The HMM was trained using a set of starting parameters based on those used for humans (). We trained the model across five independent runs, varying the starting parameters each time to ensure the consistency of the final parameter input. Posterior decoding then determines whether consecutive 1-kb windows change or retain state (ancient or nonancient) depending on the posterior probability. Considering windows inferred as ancient with posterior probabilities of >0.8 (, ), we identified >1000 ancient tracts totaling >10 Mb in each coastal genome tested (table S4). The emission probabilities of the HMM are modeled as Poisson distributions with means of λAncient = μ ∙ L ∙ TAncient for the introgressed state and λIngroup = μ ∙ L ∙ TIngroup for the nonancient (or ingroup) state (), where L is the window size (1000 bp) and μ is the mutation rate [1.92 × 10−8 and 2.56 × 10−8 substitutions per nucleotide per generation (, )]. This allowed us to estimate the mean TMRCA of the ancient and ingroup windows with the corresponding segments in the outgroup dataset.

Patterns of structuration across the genome

We used Local PCA () to describe the three major patterns of relatedness (“corners”) among populations on four PCs for the 56 scaffolds longer than 10 Mb using the default options [two PCs and two multidimensional scaling (MDS) coordinates], the R codes available on GitHub and bins of 100 SNPs. We plotted the pairwise plots of the first four PCs for each of the three corners.

Selection analyses

We used Flink () to test for selection to pelagic versus coastal habitat. Flink is an extension of BayeScan (), respectively describing selection and drift, which takes linkage among loci into account. Specifically, it applies an HMM to identify the effect of selection at linked markers using correlation in the loci-specific elements along the genome. Flink was run grouping the populations into two groups: pelagic and coastal (higher hierarchical level). Each group was composed of the three populations from each region. Scaffolds were grouped into super-scaffolds, so that each contains at least 50,000 SNPs, but each scaffold was considered independent in the analysis. In Flink, the function estimate was run, and parameter settings are described in the Supplementary Text. The number of iterations was set to 500,000, the burnin to 300,000, and the thinning to 100. We considered a locus under selection when it is within the 1% false discovery rate (FDR) threshold. Given the uncertainty about evolutionary relationships of dolphin populations, we also ran Flink using the three regions as the higher hierarchical level. This resulted in much more prevalent selection with between ~165,000 and ~195,000 SNPs under divergent selection between ecotypes within each geographical region, potentially including many false positives because of deviation from the assumed hierarchical structure model. The approach presented here, considering ecotype as the higher hierarchical level, is therefore more conservative. To get further insights into the results obtained by Flink, we plotted the raw genotypes of all the SNPs, SNPs under homogenizing selection in the coastal populations, SNPs under divergent selection between ecotypes, and SNPs under both homogenizing selection in the coastal populations and divergent between ecotypes (defined as the SNPs under parallel linked selection) in R (see details in the Supplementary Text). We also plotted a neighbor-joining distance tree and a PCA for the SNPs under each type of selection. To determine the origin of the SNPs under selection, we defined how many were also polymorphic in the pelagic populations and compared the 2D-SFS between all pairs of populations, estimated in ANGSD (see details in Supplementary Text). Then, we defined how many SNPs under the different types of selection were found in ancient tracts. We compared the results with 100 random samples of the same number of putatively neutral SNPs found in ancient tracts. We identified the genes directly overlapping with the SNPs under parallel linked selection using the reference genome annotation file. We describe how we determined the putative functions of the genes under selection in the Supplementary Text.

88 in total

1. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.

Authors: G Evanno; S Regnaut; J Goudet
Journal: Mol Ecol Date: 2005-07 Impact factor: 6.185

2. Adaptation from standing genetic variation.

Authors: Rowan D H Barrett; Dolph Schluter
Journal: Trends Ecol Evol Date: 2007-11-14 Impact factor: 17.712

3. Speciation as a sieve for ancestral polymorphism.

Authors: Rafael F Guerrero; Matthew W Hahn
Journal: Mol Ecol Date: 2017-09-08 Impact factor: 6.185

4. Cultural transmission of tool use combined with habitat specializations leads to fine-scale genetic structure in bottlenose dolphins.

Authors: Anna M Kopps; Corinne Y Ackermann; William B Sherwin; Simon J Allen; Lars Bejder; Michael Krützen
Journal: Proc Biol Sci Date: 2014-03-19 Impact factor: 5.349

5. Convergent adaptation of human lactase persistence in Africa and Europe.

Authors: Sarah A Tishkoff; Floyd A Reed; Alessia Ranciaro; Benjamin F Voight; Courtney C Babbitt; Jesse S Silverman; Kweli Powell; Holly M Mortensen; Jibril B Hirbo; Maha Osman; Muntaser Ibrahim; Sabah A Omar; Godfrey Lema; Thomas B Nyambo; Jilur Ghori; Suzannah Bumpstead; Jonathan K Pritchard; Gregory A Wray; Panos Deloukas
Journal: Nat Genet Date: 2006-12-10 Impact factor: 38.330

6. A high-resolution map of human evolutionary constraint using 29 mammals.

Authors: Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F Lin; Brian J Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D Ward; Craig B Lowe; Alisha K Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J Hubisz; David B Jaffe; Irwin Jungreis; W James Kent; Dennis Kostka; Marcia Lara; Andre L Martins; Tim Massingham; Ida Moltke; Brian J Raney; Matthew D Rasmussen; Jim Robinson; Alexander Stark; Albert J Vilella; Jiayu Wen; Xiaohui Xie; Michael C Zody; Jen Baldwin; Toby Bloom; Chee Whye Chin; Dave Heiman; Robert Nicol; Chad Nusbaum; Sarah Young; Jane Wilkinson; Kim C Worley; Christie L Kovar; Donna M Muzny; Richard A Gibbs; Andrew Cree; Huyen H Dihn; Gerald Fowler; Shalili Jhangiani; Vandita Joshi; Sandra Lee; Lora R Lewis; Lynne V Nazareth; Geoffrey Okwuonu; Jireh Santibanez; Wesley C Warren; Elaine R Mardis; George M Weinstock; Richard K Wilson; Kim Delehaunty; David Dooling; Catrina Fronik; Lucinda Fulton; Bob Fulton; Tina Graves; Patrick Minx; Erica Sodergren; Ewan Birney; Elliott H Margulies; Javier Herrero; Eric D Green; David Haussler; Adam Siepel; Nick Goldman; Katherine S Pollard; Jakob S Pedersen; Eric S Lander; Manolis Kellis
Journal: Nature Date: 2011-10-12 Impact factor: 49.962

7. ANGSD: Analysis of Next Generation Sequencing Data.

Authors: Thorfinn Sand Korneliussen; Anders Albrechtsen; Rasmus Nielsen
Journal: BMC Bioinformatics Date: 2014-11-25 Impact factor: 3.169

8. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

9. The contribution of ancient admixture to reproductive isolation between European sea bass lineages.

Authors: Maud Duranton; François Allal; Sophie Valière; Olivier Bouchez; François Bonhomme; Pierre-Alexandre Gagnaire
Journal: Evol Lett Date: 2020-04-15

10. Detecting archaic introgression using an unadmixed outgroup.

Authors: Laurits Skov; Ruoyun Hui; Vladimir Shchur; Asger Hobolth; Aylwyn Scally; Mikkel Heide Schierup; Richard Durbin
Journal: PLoS Genet Date: 2018-09-18 Impact factor: 5.917

4 in total

1. Seascape genomics of common dolphins (Delphinus delphis) reveals adaptive diversity linked to regional and local oceanography.

Authors: Andrea Barceló; Jonathan Sandoval-Castillo; Chris J Brauer; Kerstin Bilgmann; Guido J Parra; Luciano B Beheregaray; Luciana M Möller
Journal: BMC Ecol Evol Date: 2022-07-12

2. Genomic architecture of adaptive radiation and hybridization in Alpine whitefish.

Authors: Rishi De-Kayne; Oliver M Selz; David A Marques; David Frei; Ole Seehausen; Philine G D Feulner
Journal: Nat Commun Date: 2022-08-02 Impact factor: 17.694

3. Repeated genetic adaptation to altitude in two tropical butterflies.

Authors: Simon H Martin; Chris D Jiggins; Gabriela Montejo-Kovacevich; Joana I Meier; Caroline N Bacquet; Ian A Warren; Yingguang Frank Chan; Marek Kucka; Camilo Salazar; Nicol Rueda-M; Stephen H Montgomery; W Owen McMillan; Krzysztof M Kozak; Nicola J Nadeau
Journal: Nat Commun Date: 2022-08-09 Impact factor: 17.694

4. Demographic histories shape population genomics of the common coral grouper (Plectropomus leopardus).

Authors: Samuel D Payet; Morgan S Pratchett; Pablo Saenz-Agudelo; Michael L Berumen; Joseph D DiBattista; Hugo B Harrison
Journal: Evol Appl Date: 2022-08-05 Impact factor: 4.929

4 in total