Literature DB >> 34731007

Speciation dynamics and extent of parallel evolution along a lake-stream environmental contrast in African cichlid fishes.

Alexandra A-T Weber1, Jelena Rajkov1, Kolja Smailus1, Bernd Egger1, Walter Salzburger1.   

Abstract

Understanding the dynamics of speciation is a central topic in evolutionary biology. Here, we investigated how morphological and genomic differentiation accumulated along the speciation continuum in the African cichlid fish Astatotilapia burtoni. While morphological differentiation was continuously distributed across different lake-stream population pairs, we found that there were two categories with respect to genomic differentiation, suggesting a “gray zone” of speciation at ~0.1% net nucleotide divergence. Genomic differentiation was increased in the presence of divergent selection and drift compared to drift alone. The quantification of phenotypic and genetic parallelism in four cichlid species occurring along a lake-stream environmental contrast revealed parallel and antiparallel components in rapid adaptive divergence, and morphological convergence in species replicates inhabiting the same environments. Furthermore, we show that the extent of parallelism was higher when ancestral populations were more similar. Our study highlights the complementary roles of divergent selection and drift on speciation and parallel evolution.

Entities:  

Year:  2021        PMID: 34731007      PMCID: PMC8565912          DOI: 10.1126/sciadv.abg5391

Source DB:  PubMed          Journal:  Sci Adv        ISSN: 2375-2548            Impact factor:   14.136


INTRODUCTION

The formation of new species, speciation, is a fundamental evolutionary process that has attracted much interest during the past 160 years (–). However, relatively little is known about how genomic and morphological differentiation accumulates along the so-called speciation continuum (). The dynamics of genomic differentiation associated with the divergence of populations has previously been explored using modeling (), empirical data (–), or a combination of both (), revealing examples for both scenarios, the “sudden” transition from a state of well-intermixed populations to two reproductively isolated entities (, ) and the more gradual accumulation of genomic divergence (, ). In contrast, the dynamics of morphological diversification has only rarely been assessed in the context of the speciation continuum. Likewise, only few study systems have been established that allow joint investigation of early and late stages of genomic and morphological differentiation along the entire speciation continuum. Inferring these differentiation trajectories is crucial, however, to understand the underlying processes and, possibly, to identify thresholds of differentiation that delimit species boundaries. The integrative examination of morphological and genomic differentiation also allows the evaluation of whether the speciation process is predictable (, ), for example, by examining evolutionary parallelism at the level of phenotypes and of genotypes across population pairs diverging along similar environmental contrasts (). In fishes, it appears that adaptive divergence involves both parallel and nonparallel components, that is, while some traits or genes were found to evolve in parallel, others in different systems did not [reviewed in ()]. Recently, the concept of (non-)parallel evolution has been introduced as a gradient ranging from truly parallel to completely divergent evolution (). This concept also allows for a better quantification of the extent of parallelism and to distinguish between convergence and parallelism, whereby in convergent evolution, similar types evolve from different ancestral states, whereas in parallel evolution, the starting points are similar (). In general, it is assumed that the extent of parallelism becomes smaller the more distantly related two populations are, which is likely due to reduced levels of shared genetic variation in these populations (). Empirical work in threespine stickleback fish that repeatedly colonized freshwater habitats from marine environments supports this view (, ). However, to the best of our knowledge, the extent of parallelism as a function of relatedness has not yet been quantified across a continuum of genomic divergence spanning from closely related populations to different species that diverged along a similar environmental contrast. Here, we investigate differentiation trajectories and the extent of parallel evolution in East African cichlid fishes, which constitute important model taxa in speciation research and serve as textbook examples of adaptive radiations characterized by rapid speciation accompanied by the evolution of substantial phenotypic, behavioral, and ecological diversity (–). More specifically, we examine the dynamics of morphological and genomic diversification along a lake-stream environmental contrast in the East African cichlid fish Astatotilapia burtoni (Günther, 1893) across its native distribution range. A. burtoni, which occurs in African Lake Tanganyika and affluent rivers (Fig. 1A), was among the first five cichlid species to have their genomes sequenced (). It is a generalist able to feed on a variety of food sources and can thrive in varied environments such as rivers and lakes (), thus representing an ideal model to investigate the dynamics of diversification along an environmental contrast across different temporal and geographic scales. Lake Tanganyika is the oldest African Great Lake (~10 million years old) and has a complex paleoenvironmental history characterized by, for example, substantial lake-level fluctuations caused by variations in rainfall regimes, temperature, evaporation, and tectonic activity (, ). In particular, lake-level lowstands during arid phases are likely to also have affected the tributaries of Lake Tanganyika and the locations of their estuaries into the main lake. Many of these tributaries, be they small creeks or larger rivers, are inhabited by A. burtoni populations derived from lake fish, thereby forming “population pairs” consisting of a source (that is, ancestral) population in the lake and a phenotypically distinct river population featuring habitat-specific adaptations in morphology and ecology (, ). For example, stream fish have a more inferior mouth position and a more slender body than lake fish (). The different population pairs display varying levels of genomic differentiation, ranging from virtually no divergence to highly differentiated populations, and show strong signals of isolation by distance (). A comprehensive phylogeographic study of A. burtoni across Lake Tanganyika revealed that the populations from the North and the South of Lake Tanganyika are genetically clearly distinct ().
Fig. 1.

The study system comprising nine lake-stream population pairs in four cichlid fish species from African Lake Tanganyika and surroundings.

(A) Illustrations of the four species used in this study and a schematic representation of their phylogenetic relationships [see fig. S1 and () for details]. (B) Map of sampling localities and names of the different lake-stream population pairs, that is, systems. A. burtoni (ngenomes = 132; nmorphometrics = 289), C. horei (ngenomes = 24; nmorphometrics = 67), H. stappersii (ngenomes = 24; nmorphometrics = 81), and P. philander (ngenomes = 24; nmorphometrics = 31).

The study system comprising nine lake-stream population pairs in four cichlid fish species from African Lake Tanganyika and surroundings.

(A) Illustrations of the four species used in this study and a schematic representation of their phylogenetic relationships [see fig. S1 and () for details]. (B) Map of sampling localities and names of the different lake-stream population pairs, that is, systems. A. burtoni (ngenomes = 132; nmorphometrics = 289), C. horei (ngenomes = 24; nmorphometrics = 67), H. stappersii (ngenomes = 24; nmorphometrics = 81), and P. philander (ngenomes = 24; nmorphometrics = 31). To quantify the extent of parallel evolution in lake-stream divergence not only within species but also between species co-occurring in the same river systems, we further examined lake-stream population pairs of three additional cichlid species (), namely, Ctenochromis horei, Haplochromis stappersii, and Pseudocrenilabrus philander (Fig. 1A) [the taxonomy follows ()]. Two of these species co-occur in sympatry with A. burtoni in two of the largest tributaries to Lake Tanganyika (C. horei in the Lufubu River in the South and H. stappersii in the Rusizi River in the North), providing an excellent opportunity to examine two species replicates of lake-stream divergence in the very same river systems. We used whole-genome resequencing, geometric morphometric analyses, and mate-choice experiments to (i) assess the dynamics of morphological and genomic differentiation along the lake-stream environmental contrast and across geography in A. burtoni, (ii) evaluate to what extent genome-wide differentiation scales with reproductive isolation in A. burtoni, (iii) quantify the extent of morphological and genomic parallelism and convergence among nine lake-stream population pairs from four cichlid species, and (iv) test whether the extent of parallelism is higher when ancestral (i.e., lake) populations are more similar.

RESULTS

We performed whole-genome resequencing of 180 specimens of cichlid fishes from Lake Tanganyika and its surroundings (132 A. burtoni and 24 of each of C. horei and H. stappersii) and included available sequence data of 24 individuals of P. philander (). We examined six A. burtoni lake-stream population pairs and one population pair for each of the three additional species (C. horei, H. stappersii, and P. philander), totaling nine lake-stream population pairs (see Materials and Methods, Fig. 1, fig. S1, and tables S1 and S2). Each population pair consisted of one lake and one stream population, whereby the lake population was sampled from a lake habitat close to the stream’s estuary. The population pairs, or “systems,” were named after the respective river, except for the Lake Chila system (Fig. 1B). Note that from the Kalambo drainage, we sampled two ecologically distinct riverine populations, one from a locality near the estuary where the river is deep and flows slowly (Kalambo 1) and the other one from about 6-km upstream in a white-water environment (Kalambo 2), resulting in two population pairs for this system (). In addition, we quantified body shape of 468 specimens covering all 17 populations (table S2). Last, we evaluated the degree of reproductive isolation between A. burtoni populations displaying increasing levels of genomic divergence. To do so, we revisited published studies that examined the same A. burtoni populations as in the present study and, in addition, performed mate-choice experiments in the laboratory between the genetically most distinct A. burtoni populations from the North and the South of Lake Tanganyika.

The dynamics of morphological and genomic diversification

We first compared morphological [Mahalanobis distance (DM)] and genomic (genome-wide FST) differentiation across all 11 A. burtoni populations to examine the respective roles of environmental variation and geography in diversification, classifying the 55 pairwise comparisons according to habitat contrasts (lake-lake, lake-stream, and stream-stream) and to the geographic distance between populations (1 to 40 km, South-East of Lake Tanganyika; 70 to 140 km, East-West; and 700 to 750 km, North-South; see Fig. 1B). The analysis of body shape revealed that all populations differed significantly from one another in average body shape (table S3). The distribution of pairwise DM values, which ranged from 2.1 (Lunzua lake versus Kalambo lake) to 7.6 (Rusizi lake versus Lufubu River), was continuous (Fig. 2A and table S3). In all three geographic groups (South-East, East-West, and North-South), the most similar system pairs according to pairwise DM values were lake-lake comparisons (i.e., populations from similar environments), whereas the most distinct ones were lake-stream comparisons (i.e., populations from different environments), suggesting that ecological factors affect body shape differentiation. Last, a wide range of DM values was observed within a small geographic range (1 to 40 km: Lunzua lake versus Kalambo lake, DM = 2.1; Chitili River versus Kalambo lake, DM = 6.3), suggesting that body shape can differ substantially across small geographic distances and at low levels of genome-wide divergence (Fig. 2A and fig. S2).
Fig. 2.

Dynamics of diversification in A. burtoni.

(A) Genome-wide FST between A. burtoni populations plotted against DM calculated from body shape data. Geographic distance between populations and habitat type for each comparison are highlighted with symbol color and shape, respectively. The sympatric lake-stream systems are colored using the same color coding as in Fig. 1. Morphological distances increase gradually, whereas two groups of genetic distances are observed: the one-species (FST < 0.3) and the two-species (FST > 0.6) categories. (B) Genome-wide FST plotted against the net nucleotide difference [Da = d − (π1 + π2)/2; note the log-scale for the x axis to which the gray shading has been adjusted] in A. burtoni populations, covering the entire speciation continuum. Genomic differentiation accumulated rapidly during early stages of divergence (FST < 0.3; that is, the one-species category) but then slowed down as Da increases (FST > 0.6; that is, the two-species category). Levels of reproductive isolation increase along a continuum of genomic differentiation (see Table 1 for details on the experiments and population used). (C) Isolation by distance in the two comparison categories lake-stream (the color coding for lake-stream systems is the same as in Fig. 1) and lake-lake (light blue) in the southern populations of A. burtoni (that is, within the one-species category). R2, Pearson’s correlation coefficient. (D) Trajectories for three differentiation axes: morphology, geography, and genetics. In lake-stream comparisons, morphological differentiation builds up first, and then genomic differentiation increases sharply. In contrast, in the absence of strong divergent selection (that is, in the lake-lake comparisons), morphological differentiation builds up first, and then genomic differentiation accumulates only moderately. All pairwise comparisons are represented with circles in (B) to (D).

Dynamics of diversification in A. burtoni.

(A) Genome-wide FST between A. burtoni populations plotted against DM calculated from body shape data. Geographic distance between populations and habitat type for each comparison are highlighted with symbol color and shape, respectively. The sympatric lake-stream systems are colored using the same color coding as in Fig. 1. Morphological distances increase gradually, whereas two groups of genetic distances are observed: the one-species (FST < 0.3) and the two-species (FST > 0.6) categories. (B) Genome-wide FST plotted against the net nucleotide difference [Da = d − (π1 + π2)/2; note the log-scale for the x axis to which the gray shading has been adjusted] in A. burtoni populations, covering the entire speciation continuum. Genomic differentiation accumulated rapidly during early stages of divergence (FST < 0.3; that is, the one-species category) but then slowed down as Da increases (FST > 0.6; that is, the two-species category). Levels of reproductive isolation increase along a continuum of genomic differentiation (see Table 1 for details on the experiments and population used). (C) Isolation by distance in the two comparison categories lake-stream (the color coding for lake-stream systems is the same as in Fig. 1) and lake-lake (light blue) in the southern populations of A. burtoni (that is, within the one-species category). R2, Pearson’s correlation coefficient. (D) Trajectories for three differentiation axes: morphology, geography, and genetics. In lake-stream comparisons, morphological differentiation builds up first, and then genomic differentiation increases sharply. In contrast, in the absence of strong divergent selection (that is, in the lake-lake comparisons), morphological differentiation builds up first, and then genomic differentiation accumulates only moderately. All pairwise comparisons are represented with circles in (B) to (D).
Table 1.

Summary of experimental testing for reproductive isolation in A. burtoni.

Populations FST Genetic markers Experimental setup Main results Reproductive isolation Reference
Kalambo lake ×Kalambo River 20.113Whole genomeTransplantexperiment in thelake environmentAdaptivephenotypicplasticityWeak forwild-caught adults;not observed for F1(27)
Kalambo lake ×Lunzua River0.123Whole genomeCommon gardenexperiment inmesocosmsRandom matingNot observed(20)
Kalambo lake ×Ndole lake0.189*Whole genome*Common gardenexperiment inmesocosmsImmigrant andextrinsic hybridinviability (weak)Extrinsic prezygoticand postzygoticisolation (weak)(28)
Ndole lake × LufubuRiver0.472*Whole genome*Common gardenexperiment inmesocosmsImmigrant andextrinsic hybridinviability (strong)Extrinsic prezygoticand postzygoticisolation (strong)(28)
Kalambo lake ×Lufubu River0.624Whole genomeCommon gardenexperiment inmesocosmsImmigrant andextrinsic hybridinviability (strong)Extrinsic prezygoticand postzygoticisolation (strong)(28)
Kalambo lake ×Rusizi lake0.693Whole genomeLaboratorymate-choiceexperimentPartial assortativematingPremating isolationThis study

*Genetic distances inferred from the Lufubu lake population rather than Ndole lake population [both populations are geographically close and belong to the same genetic cluster ()].

Contrary to morphological differentiation, we observed a sudden increase in the levels of genomic differentiation between population pairs, with two clearly separated categories in the pairwise FST values (≤0.3 and ≥0.6), hereafter referred to as the “one-species category” and the “two-species category,” respectively (Fig. 2B). All pairwise comparisons between the South-East populations and between any of the South-East populations and the Lufubu lake population were assigned into the one-species category, just as the comparison between the two Rusizi populations. The two-species category included all pairwise comparisons between any of the South-East populations and the Lufubu River population and all comparisons involving one southern and one northern population. From this, it emerges that geographic distance alone is not sufficient to explain the extent of genetic divergence, as, for example, pairwise comparisons of populations with moderate geographic distances (70 to 140 km) were found in both FST categories. Instead, it appears that the separation at moderate geographic distance was driven by both adaptation to different habitats and genetic drift (see also below). We found a single pairwise comparison showing an FST value in between the one-species and the two-species categories, namely, the population pair from the Lufubu system (Lufubu lake versus Lufubu River: FST = 0.46). Demographic modeling of this population pair indicated that the most likely scenario of divergence between these populations involved a past secondary contact event, that is, one or several periods of allopatric divergence in the past with currently ongoing gene flow (fig. S3 and table S4). Next, we investigated the increase in genomic differentiation against a proxy of time since population divergence, net nucleotide difference [Da = d − ((π1 + π2)/2)]. To extend the comparative framework to encompass the entire continuum of genomic divergence from populations to species, we included the between-species comparisons of the three additional species C. horei, H. stappersii, and P. philander. We found a rapid increase in FST at low levels of Da, which slowed down as Da increased. Given this trend, we fitted a logarithmic model to the data, which was highly significant [linear regression FST ~ log(Da), P < 2.2 × 10−16; Pearson correlation coefficient, R2 = 0.96]. Therefore, to better visualize the dynamics of diversification at early stages of genomic differentiation, we plotted Da on a logarithmic scale (Fig. 2B). The one-species category encompassed Da values ranging from 5 × 10−6 to 5 × 10−4, while the two-species category features Da values of 0.002 and above. With Da ~ 0.001, the population pair from the Lufubu system occupied the “gray zone” of speciation between the one-species and two-species categories.

Reproductive isolation begins establishing at low levels of genome-wide differentiation

As a next step, we assessed to what extent the observed levels of genome-wide differentiation scale with the degree of reproductive isolation between populations in A. burtoni. To this end, we revisited available data from previous experiments that used the same A. burtoni populations as in the present study (, , ) and conducted additional mate-choice experiments in the laboratory between the genetically most distinct populations of A. burtoni from the North and the South of Lake Tanganyika (Table 1, Supplementary Text, and fig. S4) (). *Genetic distances inferred from the Lufubu lake population rather than Ndole lake population [both populations are geographically close and belong to the same genetic cluster ()]. Previous experiments involving populations that feature low levels of genomic divergence [Kalambo lake versus Kalambo River 2, FST = 0.11 (); Kalambo lake versus Lunzua River, FST = 0.12 ()] revealed random mating patterns with respect to source population, suggesting a lack of reproductive isolation. In a mesocosm experiment in a seminatural setting at Lake Tanganyika with populations at a slightly higher level of genome-wide differentiation, yet still within the one-species category (Kalambo lake versus Ndole lake, FST = 0.18), a weak signal of extrinsic hybrid inviability was detected (). In contrast, populations at intermediate (Ndole lake versus Lufubu River, FST = 0.47) and high (Kalambo lake versus Lufubu River, FST = 0.62) levels of genome-wide divergence showed strong signs of extrinsic hybrid inviability in the same mesocosm experiment (). Notably, these differences in the extent of reproductive isolation between population pairs were significant (). Thus, there is evidence for reproductive isolation between populations in the gray zone of speciation and in the two-species category, indicating that our species categories that are solely based on levels of genome-wide differentiation in A. burtoni are biologically meaningful. In support of this, our new laboratory-based mate-choice experiments targeting two populations that feature one of the highest genome-wide FST values (Kalambo lake versus Rusizi lake: FST = 0.69) revealed signatures of assortative mating with respect to source population, at least in a multisensory laboratory setting, allowing for a combination of mating cues (see Supplementary Text and fig. S4).

Divergent selection and drift accelerate genome-wide differentiation

We then investigated the relative contribution of divergent selection and geography on genomic and morphological diversification in A. burtoni. To do so, we compared the more closely related South-West and East-West populations of A. burtoni, focusing on within habitat (lake-lake) versus between habitat (lake-stream) comparisons. The latter consisted of the five lake-stream systems (i.e., Chitili, Kalambo 1, Lunzua, Kalambo 2, and Lufubu) (Fig. 1B), whereas the lake-lake comparisons consisted of all pairwise comparisons of lake populations from that area. In both cases, we found an increase in FST over geographic distance (Fig. 2C), which is compatible with an isolation-by-distance scenario for lake-stream (linear regression, P = 0.0013; Pearson correlation coefficient, R2 = 0.97) and lake-lake (Mantel test, P = 0.08; Pearson correlation coefficient, R2 = 0.98) contrasts. Genomic differentiation increased much more strongly with respect to geographic distance when populations were compared that inhabit contrasting environments (lake-stream population comparisons, i.e., in the presence of divergent selection and drift) than when they inhabit similar environments (lake-lake comparisons, i.e., primarily in the presence of drift) (Fig. 2C). Such a signature has previously been shown in threespine sticklebacks (). We also performed demographic modeling of all lake-stream population pairs to test different scenarios of divergence and to infer demographic parameters such as population size and the extent of gene flow between lake and stream populations. We found that there is ongoing gene flow between lake and stream fish in all population pairs (fig. S3 and table S4) and that the effective population size of most stream populations was much smaller than that of the respective lake population (table S4), which is compatible with the scenario that the rivers were colonized from the respective lake stocks (, ). As the impact of genetic drift is stronger in small populations, drift may also have contributed to the increased genomic divergence of stream populations. We found that morphological differentiation built up first at small genetic and geographic distances (Fig. 2D) and that genomic differentiation built up much more rapidly compared to geographic distance in the presence of divergent selection and increased drift (lake-stream comparisons) (Fig. 2D). This highlights that the environment (via divergent selection) and demographic events play a crucial role in differentiation trajectories in A. burtoni.

Little overlap between differentiation regions among independent lake-stream systems

We then turned our attention to the question of parallel morphological and genomic evolution along a lake-stream environmental contrast in cichlids. To do so, we compared all lake-stream population pairs from the four species to also include between-species comparisons. At the morphological level, we found that in eight of the nine lake-stream population pairs, lake fish had on average deeper bodies than river fish (Fig. 3A). A clustering analysis in A. burtoni revealed that, with the exceptions of Chitili lake and Kalambo River 1, populations clustered according to habitat and not to locality (Fig. 3B), suggesting the presence of two main ecotypes in this species, a “lake” and a “river” ecotype. This is further supported by a canonical variate analysis (CVA) (Fig. 3, C and E) and a principal components analysis (PCA; fig. S2), showing that A. burtoni lake-stream population pairs diverged along a common axis of morphological differentiation. Contrastingly, there was no such common axis of morphological differentiation along the lake-stream contrast when all species were compared (Fig. 3, D and F).
Fig. 3.

Body shape variation among nine lake-stream population pairs from four cichlid species: A. burtoni (N = 289), H. stappersii (N = 81), C. horei (N = 67), and P. philander (N = 31).

(A) Size-corrected comparison of average body depth among lake and river fishes. Black lines indicate decreasing value between lake and river populations. Gray line indicates increasing value between lake and river populations. (B) Dendrogram of body shape based on DM among populations. Clustering algorithm, unweighted pair group method with arithmetic mean. (C) CVA of body shape for A. burtoni. Lake population outlines are shown in dashed lines, and river population outlines are shown in solid lines. (D) CVA of body shape for all species. Lake population outlines are shown in dashed lines, and river population outlines are shown in solid lines. (E) CV1 of shape change in A. burtoni only (scaling factor, 10; outlines are for illustration purposes only, from light gray to dark outlines with increasing values). (F) CV2 shape change for all four species (scaling factor, 10; outlines are for illustration purposes only, from light gray to dark outlines with increasing values).

Body shape variation among nine lake-stream population pairs from four cichlid species: A. burtoni (N = 289), H. stappersii (N = 81), C. horei (N = 67), and P. philander (N = 31).

(A) Size-corrected comparison of average body depth among lake and river fishes. Black lines indicate decreasing value between lake and river populations. Gray line indicates increasing value between lake and river populations. (B) Dendrogram of body shape based on DM among populations. Clustering algorithm, unweighted pair group method with arithmetic mean. (C) CVA of body shape for A. burtoni. Lake population outlines are shown in dashed lines, and river population outlines are shown in solid lines. (D) CVA of body shape for all species. Lake population outlines are shown in dashed lines, and river population outlines are shown in solid lines. (E) CV1 of shape change in A. burtoni only (scaling factor, 10; outlines are for illustration purposes only, from light gray to dark outlines with increasing values). (F) CV2 shape change for all four species (scaling factor, 10; outlines are for illustration purposes only, from light gray to dark outlines with increasing values). At the genomic level, the A. burtoni lake-stream population pairs displayed contrasting levels of genome-wide differentiation, ranging from FST = 0.005 (Chitili) to FST = 0.465 (Lufubu) (Fig. 4). Differentiation between lake-stream population pairs in the other species ranged from low (H. stappersii; FST = 0.046) to intermediate (C. horei; FST = 0.532) to high (P. philander; FST = 0.733), corroborating that the two P. philander populations may actually represent two distinct species, as, for example, suggested by their different sex-determining systems (). Consistent with this, the P. philander populations display relatively high levels of absolute divergence (d = 3.6 × 10−3), which are above the levels of divergence measured between the northern and southern lineages in A. burtoni (d = 3.0 × 10−3 to 3.3 × 10−3) (Table 2).
Fig. 4.

Distribution of genome-wide FST for the nine lake-stream systems sorted by increasing genome-wide FST values.

Each dot represents an FST value calculated in a 10-kb window along a given linkage group (LG). Linkage groups are highlighted in different shades of gray. Regions of differentiation (overlap of the top 5% values of FST, d, and π) are highlighted in red.

Table 2.

d values between divergent populations and species used in this study.

d values are presented as a 10−3 factor for readability. The category A. burtoni “South” included all populations from southeast of Lake Tanganyika and the Lufubu lake population. A. burtoni “Lufubu” included the Lufubu River population. A. burtoni “North” included both populations (Rusizi lake and Rusizi River) from the North of Lake Tanganyika.

A. burtoni South A. burtoni Lufubu A. burtoni North H. stappersii C. horei P. philander
A. burtoni South0.9–1.3
A. burtoni Lufubu1.7–2.10
A. burtoni North3.0–3.23.31.0
H. stappersii 6.7–6.86.76.4–6.51.6
C. horei 8.4–8.58.4–8.58.4–8.58.5–8.71.1
P. philander 10.5–10.910.5–10.810.6–10.910.7–11.09.6–10.03.6

Distribution of genome-wide FST for the nine lake-stream systems sorted by increasing genome-wide FST values.

Each dot represents an FST value calculated in a 10-kb window along a given linkage group (LG). Linkage groups are highlighted in different shades of gray. Regions of differentiation (overlap of the top 5% values of FST, d, and π) are highlighted in red.

d values between divergent populations and species used in this study.

d values are presented as a 10−3 factor for readability. The category A. burtoni “South” included all populations from southeast of Lake Tanganyika and the Lufubu lake population. A. burtoni “Lufubu” included the Lufubu River population. A. burtoni “North” included both populations (Rusizi lake and Rusizi River) from the North of Lake Tanganyika. We next investigated genomic regions of high differentiation between lake-stream population pairs and evaluated to which extent these outlier regions (defined here conservatively as the intersection between the top 5% 10-kb windows with respect to FST, d, and absolute value of π difference) are shared between population pairs and species. Our analyses revealed between 2 and 101 outlier regions of high differentiation per lake-stream population pair and that these regions were between 10 to 70 kb in length (red dots in Fig. 4 and table S5). It has previously been shown that heterogeneity in crossover rates can produce contrasting patterns of genomic differentiation between diverging populations that are not due to divergent selection. These are manifested, for example, in greater levels of differentiation near chromosome centers (where crossover rates are lower) compared to the peripheries (where crossover rates are higher) (). In our case, we did not find evidence for an accumulation of differentiation regions in the chromosome centers in any of the lake-stream population pairs, nor when all 525 outlier regions were considered jointly (table S5), suggesting that our results reflect true signatures of divergent selection. We then evaluated to what extent differentiation regions were shared among lake-stream population pairs and species. We found that there was little overlap (Fig. 5A) and that no such region of high differentiation was shared between more than two population pairs (Fig. 5A). The regions of high differentiation were distributed across all linkage groups, although there seemed to be an overrepresentation on LG3 (64 of 525), which remained when correcting for chromosome length (table S5). The 525 differentiation regions contained a total of 637 genes. There was no obvious overrepresentation in functional enrichment with respect to Gene Ontology (GO) categories. The only exception was the Lunzua lake-stream comparison, in which outlier genes were significantly enriched for the molecular function “binding” (table S6). The 25 genes located in the 19 differentiation regions shared between two systems also showed no functional enrichment (table S7).
Fig. 5.

Extent of parallel evolution among nine lake-stream cichlid population pairs.

(A) Number of unique and overlapping differentiation regions (overlap of top 5% values of FST, d, and π) in the nine lake-stream population pairs. (B) Number of unique and overlapping differentiation windows in the three sets of genomic outliers from BayPass in A. burtoni from the North of Lake Tanganyika, A. burtoni from the South, and H. stappersii. (C) The angles of phenotypic (θP) and genetic outlier (θG_outlier) lake-stream divergence vectors are positively correlated (Mantel test, P = 0.0075; Pearson correlation coefficient, R2 = 0.27). (D) The lengths of phenotypic (LP) and genetic outlier (LG_outlier) lake-stream divergence vectors were not correlated (linear regression model, P = 0.55). The color scheme is the same as in (A). (E) The differences between phenotypic (ΔLP) and genetic outlier (ΔLG_outlier) vector length were not correlated (Mantel test, P = 0.26). (F) Histogram of pairwise angles between lake-stream phenotypic divergence vectors (θP) in degrees. (G) A multivariate analysis of phenotypic parallelism revealed one significant dimension of parallel evolution. (H) The angles of phenotypic divergence vectors (θP) and morphological distances (DM) between ancestral (i.e., lake) populations were positively correlated (Mantel test, P = 0.045; R2 = 0.17). (I) Histogram of pairwise angles between lake-stream genetic outlier divergence vectors (θG_outlier) in degrees. (J) A multivariate analysis of genetic parallelism revealed one significant dimension of parallel evolution. (K) The angles of genetic outlier divergence vectors (θG_outlier) and genetic (FST) distances between lake populations were positively correlated (Mantel test, P = 0.0039; R2 = 0.46).

Extent of parallel evolution among nine lake-stream cichlid population pairs.

(A) Number of unique and overlapping differentiation regions (overlap of top 5% values of FST, d, and π) in the nine lake-stream population pairs. (B) Number of unique and overlapping differentiation windows in the three sets of genomic outliers from BayPass in A. burtoni from the North of Lake Tanganyika, A. burtoni from the South, and H. stappersii. (C) The angles of phenotypic (θP) and genetic outlier (θG_outlier) lake-stream divergence vectors are positively correlated (Mantel test, P = 0.0075; Pearson correlation coefficient, R2 = 0.27). (D) The lengths of phenotypic (LP) and genetic outlier (LG_outlier) lake-stream divergence vectors were not correlated (linear regression model, P = 0.55). The color scheme is the same as in (A). (E) The differences between phenotypic (ΔLP) and genetic outlier (ΔLG_outlier) vector length were not correlated (Mantel test, P = 0.26). (F) Histogram of pairwise angles between lake-stream phenotypic divergence vectors (θP) in degrees. (G) A multivariate analysis of phenotypic parallelism revealed one significant dimension of parallel evolution. (H) The angles of phenotypic divergence vectors (θP) and morphological distances (DM) between ancestral (i.e., lake) populations were positively correlated (Mantel test, P = 0.045; R2 = 0.17). (I) Histogram of pairwise angles between lake-stream genetic outlier divergence vectors (θG_outlier) in degrees. (J) A multivariate analysis of genetic parallelism revealed one significant dimension of parallel evolution. (K) The angles of genetic outlier divergence vectors (θG_outlier) and genetic (FST) distances between lake populations were positively correlated (Mantel test, P = 0.0039; R2 = 0.46). In situations of a shared evolutionary history of the populations under investigation, such as in our setup, frequency-based outlier detection methods may not represent the most appropriate strategy to detect regions under selection. Thus, we also performed Bayesian analyses of selection at the single-nucleotide polymorphism (SNP) level that take into account population relatedness (). Because of their high levels of genome-wide differentiation, the population pairs of C. horei and P. philander were excluded from these analyses, as signatures of selection and drift cannot easily be disentangled in these cases. For the same reason, we treated the northern and southern populations of A. burtoni as separate units. We identified 1704 shared 10-kb windows of differentiation between A. burtoni from the South and H. stappersii, 1683 shared windows between A. burtoni from the North and H. stappersii, and 1542 shared windows between A. burtoni from the North and the South (Fig. 5B). In total, 373 windows were shared among the three core sets, containing 367 genes (Fig. 5B and table S8). Genes involved in sensory perception (sound and light) were overrepresented in the common set of outliers between H. stappersii and the southern A. burtoni populations.

The extent of parallel evolution in lake-stream divergence

We further aimed to quantify the extent of phenotypic and genomic parallel evolution among lake-stream pairs. To examine how (non-)parallel the nine lake-stream population pairs are, we performed vector analyses (, ) using the morphological (37 traits and landmarks representing body shape) and genomic (outlier and non-outlier SNPs) data at hand (outlier SNPs: BayPass outliers, based on 78 principal components (PCs) from a genomic PCA, potentially affected by natural selection; non-outlier SNPs: all SNPs but excluding BayPass outliers). Following Stuart et al. (), we quantified variation in lake-stream divergence by calculating vectors of phenotypic and genomic differentiation for each lake-stream population pair, whereby the length of a vector (L) represents the magnitude of lake-stream divergence and the angle between two vectors (θ) informs about the directionality of divergence. Accordingly, two lake-stream systems are more “parallel” if θ is small (similar direction of divergence) and ΔL (difference in length between two vectors) is close to zero (similar magnitude of divergence) (). Most comparisons between independent lake-stream population pairs fell into the category “nonparallel,” with close to orthogonal (~90°) angles of differentiation in both phenotype (θP) and genotype (θG_outlier and θG) (Fig. 5, F and I, and fig. S5D). However, parallelism was higher when only the within A. burtoni comparisons were considered, with values of θP and θG_outlier between 70° and 80° in many cases (Fig. 5, F and I), but not for θG (fig. S5D). More clear signatures of parallelism were found in closely related A. burtoni lake-stream systems, with Lunzua–Kalambo 1 being the most parallel system at the phenotype level (θP, 42°) and Chitili–Kalambo 1 at the genotype level (θG_outlier, 26°). Overall, the directions of phenotypic (θP) and genetic (θG_outlier) differentiations were significantly positively correlated (Mantel test, P = 0.0075; Pearson correlation coefficient, R2 = 0.27; Fig. 5C), whereas their magnitudes were not (Mantel test ΔLP and ΔLG_outlier, P = 0.26; linear regression model LP and LG_outlier, P = 0.55; Fig. 5, D and E). In contrast, none of the non-outlier genetic vectors were correlated to phenotypic vectors (θP versus θG and ΔLP versus ΔLG: Mantel tests, P = 0.34; P = 0.35; LP and LG: linear regression, P = 0.93; fig. S5, A to C). It has recently been proposed to examine the vectors of divergence using a multivariate approach as a complementary set of analyses to investigate (non-)parallelism and convergence (). For each dataset (phenotype, genotype outlier, and genotype non-outlier), we thus calculated the eigen decomposition of the respective C matrix (m lake-stream systems × n traits). For the phenotypic data, we found that the first eigenvector (or PC) encompassed about 33% of the total phenotypic variance, which was significantly higher than expected under the null Wishart distribution (Fig. 5G). That is, there was one dimension of shared evolutionary change that contained significant parallelism. We next examined whether the different lake-stream systems evolve in parallel or antiparallel directions by examining the loading of each PC, where a shared sign (positive or negative) is indicative of parallel evolution (). The H. stappersii and the P. philander lake-stream systems had positive loadings, while all remaining seven lake-stream systems (all A. burtoni and C. horei) had a negative loading on their first PC axes. This indicates that all A. burtoni and C. horei systems evolve in parallel but in an antiparallel direction compared to H. stappersii and P. philander. Therefore, we detected both parallel and antiparallel evolution in the divergence of the four cichlid species along a similar environmental contrast. To infer which phenotypic characteristics were evolving in parallel among the different lake-stream systems, we examined which landmarks were contributing the most to the first PC axis by examining the loading of PC1. We found that the seven landmarks with the most extreme loading values (<−0.4 or >0.4) were related to mouth position (landmark y1), eye size (landmarks x4 and y5), and slenderness of the body (body depth/standard length ratio and landmarks y7, y8, and y9) (fig. S5, J and M). When examining the genetic outlier data, we found that PC1 encompassed about 30% of the total genetic variance, which was significantly higher than expected under the null Wishart distribution (Fig. 5J). The signs of the PC1 loadings (positive or negative) were the same in each lake-stream system as in the phenotypic data, namely, positive for H. stappersii and P. philander and negative for the remaining seven lake-stream systems. This again highlights parallel and antiparallel components in lake-stream divergence. Among the eight PCs of the genetic outliers with the most extreme loading values (<−0.3 or >0.3), four PCs (PC5, PC7, PC10, and PC11) were separating lake and stream populations (fig. S5, K and N). This indicates that half of the PCs contributing to genetic parallelism are involved in lake-stream divergence. Last, when examining the genetic non-outlier data, we found that the first PC encompassed about 25% of the total genetic (non-outlier) variance, which was also significantly higher than expected under the null Wishart distribution (fig. S5E). However, the interpretation of parallelism in the context of genetic non-outliers is less straightforward, as the seven genetic non-outlier PCs with the most extreme loading values (<−0.3 or >0.3) were not related to lake-stream divergence but rather to species separation or to divergence between the H. stappersii populations (fig. S5, L and O).

Parallelism is higher when ancestral populations are more similar

We then examined whether the degree of similarity in the ancestral lake populations correlates with the extent of morphological and genetic parallelisms. We found a significant positive correlation in both datasets, with a stronger effect in the genetic compared to the morphological data (Mantel tests: θG_outlier versus FST, P = 0.0039; θP versus DM, P = 0.045; R2 = 0.46 and 0.17, respectively) (Fig. 5, H and K). This indicates that phenotypic and genetic parallelisms are higher when ancestral populations are phenotypically and genetically more similar. The genetic non-outliers (that is, the neutral markers) did not reveal such a correlation (Mantel test: θG versus FST, P = 0.48; fig. S5F). We also found that the proportion of standing genetic variation between ancestral populations was negatively correlated with θP and θG_outlier (Mantel tests, P = 0.0061 and 0.0117; R2 = 0.32 and 0.43, respectively) (fig. S5, G and H), but not with θG (Mantel test, P = 0.19; fig. S5I), indicating that lake-stream population pairs sharing larger amounts of standing genetic variation display more parallelism at the level of both phenotype and genotype.

Body shape convergence in species inhabiting the same rivers

We lastly assessed the levels of convergence (and divergence) in phenotype and genotype across all lake-stream systems. To do so, we first examined how genomic differentiation scales with morphological differentiation across species. We thus contrasted the levels of absolute genomic (d) and morphological (DM) differentiation between all A. burtoni populations and those of the three other species, resulting in 136 pairwise comparisons. We used d rather than FST here, as d is an absolute measure of genomic differentiation that is better suited for between-species comparisons. As above, we classified these comparisons according to the environmental contrasts (lake-lake, lake-stream, and stream-stream). The extent of body shape differentiation was only partially in agreement with the respective levels of genome-wide divergence (Fig. 6A). For example, the population pair from the Kalambo River involving the upstream population (A. burtoni Kalambo 2, DM = 5.6) was morphologically more distinct than the between-species comparisons in the Rusizi River (A. burtoni versus H. stappersii, DM = 4.1). In agreement with the levels of genomic differentiation (d = 3.6 × 10−3), morphological differentiation was high in the P. philander population pair (DM = 6.0).
Fig. 6.

Body shape convergence among sympatric species pairs.

(A) Absolute divergence (d) plotted against morphological distance (DM) for 136 pairwise comparisons across all populations from the four species used in this study. A wide range of morphological distances can be observed for the same amount of genomic divergence. The sympatric populations (A. burtoni and H. stappersii from the Rusizi system; A. burtoni and C. horei from the Lufubu system) are highlighted. The habitat type for each comparison is indicated with different symbols (circle, square, and triangle). The color scheme is the same as in Fig. 5A. (B) Smaller morphological distances between sympatric riverine populations compared to sympatric lake populations revealed body shape convergence in the riverine populations (A. burtoni and H. stappersii from the Rusizi system; A. burtoni and C. horei from the Lufubu system).

Body shape convergence among sympatric species pairs.

(A) Absolute divergence (d) plotted against morphological distance (DM) for 136 pairwise comparisons across all populations from the four species used in this study. A wide range of morphological distances can be observed for the same amount of genomic divergence. The sympatric populations (A. burtoni and H. stappersii from the Rusizi system; A. burtoni and C. horei from the Lufubu system) are highlighted. The habitat type for each comparison is indicated with different symbols (circle, square, and triangle). The color scheme is the same as in Fig. 5A. (B) Smaller morphological distances between sympatric riverine populations compared to sympatric lake populations revealed body shape convergence in the riverine populations (A. burtoni and H. stappersii from the Rusizi system; A. burtoni and C. horei from the Lufubu system). We then calculated the among-lineage covariance matrices of mean trait values Driver and Dlake for each dataset (phenotype, genotype outlier, and genotype non-outlier) to investigate the levels of convergence and divergence in lake-stream differentiation. Following the definition of De Lisle and Bolnick (), less variance in Driver compared to Dlake is indicative of convergent evolution, while more variance in Driver compared to Dlake is indicative of divergent evolution. We found divergence at the genomic level in both “outlier” and “non-outlier” datasets since trace subtraction [tr(Driver) − tr(Dlake)] was positive in both cases. In contrast, we found convergence at the phenotypic level since the result of trace subtraction was negative. In this latter case, two traits encompassed more than 99% of the total variance, namely, centroid size (66%) and total length (33%). This might be explained by the fact that the different traits and landmarks have different units accounting differently for the total amount of variance (e.g., variation in total length is measured in millimeters that ranges from 42 to 73, while variation in landmarks is measured in Procrustes units that ranges from −0.3 to 0.4). This inherent issue to the method has been identified by De Lisle and Bolnick () and references therein. A more intuitive and biologically more meaningful way to assess convergence versus divergence is to compare Euclidean distances for a specific trait. Thus, for each pair of lake-stream systems, we compared the DM between the respective lake and the stream populations, where convergence is suggested when DM river − DM lake < 0, with a more negative value indicative of higher convergence (). Within A. burtoni, the majority of pairwise comparisons indicated divergent evolution in body shape except for Chitili/Kalambo 1 and Chitili/Kalambo 2, with the highest level of convergence observed for the latter comparison (table S9). The between-species comparisons revealed mostly divergent evolution, except for comparisons between the southern populations of A. burtoni and C. horei, for which a signature of convergence was detected. The highest level of convergence was found between A. burtoni and C. horei sampled in sympatry in the Lufubu River. The lake populations were highly differentiated (DM lake = 11.6), representing one of the most divergent comparisons in terms of morphology across all comparisons (Fig. 6A). In contrast, A. burtoni and C. horei from Lufubu River were morphologically rather similar (DM river = 7.2) given their high level of genome-wide divergence (d = 8.5 × 10−3) (Fig. 6A) and similar levels of within-species lake-stream divergence (A. burtoni, DM = 4.8; C. horei, DM = 4.9) (Fig. 6B). Body shape convergence was also observed in the A. burtoni and H. stappersii populations from the Rusizi system. While the Rusizi lake populations were morphologically differentiated (DM lake = 5.4), the respective populations from Rusizi River were much more similar (DM river = 4.1) despite almost identical levels of genome-wide differentiation (d = 6.5 × 10−3 for both comparisons) (Fig. 6B and Table 2). Moreover, the level of between-species morphological differentiation in the same river (DM = 4.1) was smaller than the within A. burtoni differentiation in contrasting environments (DM = 4.6) (Fig. 6B and table S3). Convergent evolution becomes even more evident in the clustering analysis, in which A. burtoni specimens from the Rusizi River clustered with H. stappersii, rather than with other A. burtoni populations (Fig. 3B). Thus, we observed body shape convergence in species that co-occur in the same river systems and, hence, diversified along the very same environmental contrast. We note that gene flow between species can be ruled out as the reason for the similarities between co-occurring species (figs. S6 and S7).

DISCUSSION

Morphological and genomic diversification in A. burtoni

The first aim of this study was to investigate the dynamics of morphological and genomic diversification in the cichlid species A. burtoni. Making use of body shape information of 289 specimens and whole-genome data of 132 individuals from six lake-stream population pairs displaying various levels of genome-wide divergence, we investigated how morphological and genomic differences accumulated along a lake-stream environmental contrast and across geographic distances. Consistent with previous findings based on restriction-site associated DNA sequencing (RAD-seq) data (), we found that stream populations feature smaller effective population sizes than the respective lake populations (fig. S3), possibly reflecting founder effects and/or physical limitations in the riverine environment. When considering only the populations from the South of Lake Tanganyika, for which we had a denser geographic sampling (Fig. 1), it turned out that, at a given geographic distance, genomic differentiation is greater in lake-stream than in lake-lake population comparisons (Fig. 2C). This suggests that in A. burtoni, genomic differentiation builds up more rapidly between populations in the presence of divergent selection and drift (i.e., in a lake-stream setting) compared to a situation dominated by only drift (i.e., in a lake-lake setting), at least over short geographic distances. Thus, it appears that the combined effect of divergent selection and genetic drift accelerate genome-wide differentiation in A. burtoni, just as has been reported for parapatric lake-stream population pairs of threespine sticklebacks (). The pairwise population comparisons revealed that, in both settings (lake-lake and lake-stream), morphological differentiation increased rapidly over short geographic distances and at low levels of genetic distances (Fig. 2D). The substantial differences in body shape observed between closely related and geographically adjacent populations are likely to reflect adaptive variation on the basis of genetic and environmental factors (, ). Through the examination of the southern populations of A. burtoni, we could therefore establish that divergent natural selection together with genetic drift facilitates differentiation, but whether these forces are sufficient to complete speciation remains open. To address this question, we investigated the patterns of morphological and genomic differentiation across all available A. burtoni population pairs, which span a continuum of divergence ranging from virtually panmictic to population pairs that resemble separate species (Fig. 2). We found opposite results for morphological and genomic differentiation: While morphological differentiation was continuous (Fig. 2A), the population pairwise FST values fell into two main categories, the “one-species” category (FST < 0.3) and the “two-species” category (FST > 0.6), with a single population pair in between (Fig. 2B). The observed gap in genomic differentiation can only partially be explained by geographic distance. While pairwise FST values were smaller than 0.2 for geographic distances between the respective populations ranging from 1 to 40 km, FST values for geographic distances from 70 to 140 km were assigned into both categories. Overall, our results in cichlids are, thus, similar to what has been observed in Timema stick insects, in which comparable levels of genomic differentiation were found along an ecologically driven speciation continuum (FST < 0.3 in within-species population comparisons versus FST > 0.7 in between-species comparisons) (). This suggests that there might be a certain threshold of genomic differentiation delimiting species. We note, however, that additional population pairs from additional taxa with different levels of genomic divergence should be investigated to confirm this pattern.

Reproductive isolation increases with genomic divergence

The second aim of our study was to assess to what extent genomic divergence scales with reproductive isolation. We therefore reviewed previous data that used the very same A. burtoni populations as in the present study and performed mate-choice experiments between the genetically most distinct A. burtoni populations from the North and the South of Lake Tanganyika. Assortative mating has not been detected between populations showing low levels of genomic differentiation (, ). However, there is evidence for reproductive isolation via extrinsic inviability between parapatric and allopatric lake and stream populations with intermediate (FST = 0.47) and high (FST = 0.62) levels of genomic differentiation (Fig. 2B) (). Furthermore, our new mate-choice experiments revealed that genetically highly distinct allopatric A. burtoni populations (FST = 0.69) show signatures of assortative mating in a setting including all cues [i.e., visual, olfactory and, possibly, acoustic; note that it has previously been shown that olfactory cues are more important than visual cues in A. burtoni mate choice ()]. This suggests that premating reproductive isolation mechanisms are at play between the genetically most distinct A. burtoni clades, the northern and southern lineages (), at a level of genomic differentiation that is similar to the one typically observed between other cichlid species (Fig. 2B). Together, the results from previous studies and our new experiment indicate that the extent of reproductive isolation between A. burtoni populations scales positively with their levels of genome-wide divergence, as has been shown for other taxa as well (, ).

Rapid diversification in cichlid fishes

Compared to other organisms, reproductive isolation is in place between A. burtoni populations featuring low levels of net nucleotide divergence (Da), which serves as a proxy of time since population divergence. For example, on the basis of the comparative analysis of 63 population or species pairs along a continuum of genomic divergence, Roux and coworkers () suggested the gray zone of speciation to correspond to Da values ranging from 0.5 to 2%. In the case of A. burtoni, however, extrinsic inviability was detected for population pairs at Da values ranging from 0.1 to 0.2% (), and evidence for premating isolation was found at Da = 0.4% (this study). While our results are not exactly comparable to () because of different ways of measuring reproductive isolation [gene flow modeling in () versus experiments in our study] and because Da is a relative measure of genome differentiation depending on within-species genetic diversity, it nevertheless appears that reproductive barriers build up more rapidly in cichlids compared to other animals. In line with this, in addition, the d values as absolute measure of divergence were lower between cichlid species compared to other organisms. For example, the levels of absolute divergence between A. burtoni populations from the North and the South of Lake Tanganyika (d = 3.0 × 10−3 to 3.2 × 10−3) or between A. burtoni from the North and H. stappersii (d = 6.4 × 10−3 to 6.5 × 10−3) are smaller or similar than what has been reported between allopatric populations in one species of stickleback fish (Pungitius sinensis, d = 6.2 × 10−3 to 9.8 × 10−3) (). One the other hand, the between-species d values reported in this study are similar to those measured for 73 species representative of the adaptive radiation of cichlid fishes in Lake Malawi (d = 1.0 × 10−3 to 2.4 × 10−3) () and to those between two Pundamilia cichlid species pairs from Lake Victoria (“Python,” d = 2.05 × 10−3; “Makobe,” d = 2.17 × 10−3) (). Together, this once again underpins the explosive nature of speciation in African cichlid fishes ().

Is there a role for allopatry in the completion of speciation?

Considering the allopatric distribution of the three most divergent lineages of A. burtoni in Lake Tanganyika (southern, northern, and Lufubu River) (fig. S1), it appears possible that geographic separation could have facilitated their diversification. A phylogeographic analysis based on genome-wide markers has even suggested that the northern lineage of A. burtoni originated from a colonization event of the lake from the Lukuga River (a temporary outlet in the central western part of Lake Tanganyika), whereas the ancestors of the southern lineage may have colonized the lake through the Lufubu basin (). Thus, it is possible that the initial divergence between the northern and the southern lineage of A. burtoni occurred outside Lake Tanganyika. We would like to note, however, that the evolutionary history of A. burtoni is complex with strong indications for past hybridization events [for example, there are shared mitochondrial haplotypes between populations from the North and the South ()], making it difficult to determine the relative contribution of allopatry in the diversification of populations. Nevertheless, in agreement with a role of allopatry, our demographic modeling for the divergent Lufubu lake-stream population pair in the “gray zone of speciation” suggested that, while there is ongoing gene flow, there likely was at least one period of allopatry in the past. Such a demographic scenario of secondary contact after isolation would be compatible with a complete separation between lake and stream populations during periods of low lake levels in Lake Tanganyika, during which the lake’s shoreline was substantially shifted (). While this would be in line with the view that allopatry contributes to the completion of speciation (), we cannot, with the data at hand, conclusively answer the question about a possible role of allopatry in the divergence of A. burtoni.

Little overlap between regions of differentiation among lake-stream systems

The third aim of this study was to examine and quantify the levels of parallelism among the different lake-stream populations. The inspection of genomic regions of differentiation, defined here as overlap between the top 5% in FST, d, and π, revealed that, with between 2 and 101 such regions, the number of outlier regions reported here is relatively small compared to other studies in cichlids (, ). This can, however, be explained by our more stringent definition of such regions of differentiation (the intersection between three metrics). Overall, we found little overlap among the differentiation regions from different lake-stream systems. Only up to 19 outlier regions were shared between two lake-stream systems, and not a single such region was shared by more than two systems. However, the Bayesian analyses of selection revealed that H. stappersii and the southern A. burtoni populations shared a common set of overrepresented outlier loci involved in sensory perception (sound and light). This suggests that, although a large majority of the genomic outlier regions are not shared among systems, some functions appear to be particularly important for riverine adaptation and may, hence, repeatedly be the target of natural selection. Consistent with this, a previous study identified common highly differentiated genomic regions between a young and an old cichlid species pair diverging along a depth gradient in Lake Victoria, with two-thirds of the differentiation regions being private to each species pair (). These results highlight that adaptive divergence often encompasses both parallel and system-specific (nonparallel) components. The overall low levels of gene and function sharing among the different lake-stream systems reported here may partly be explained by cryptic environmental heterogeneity, as the rivers from which the populations were sampled differ in size and other environmental features, possibly requiring specific mechanisms of adaptation [e.g., ()]. This may particularly be the case for the P. philander population pair, which was not sampled in Lake Tanganyika but in the small Lake Chila. A nonmutually exclusive explanation for the lack of sharing of regions of differentiation is that these regions may not be due to divergent selection but due to drift and background selection (), as the majority of population pairs did not diversify under a demographic scenario of primary divergence with gene flow (fig. S3). This effect could have further been enhanced by the small effective population sizes of the riverine populations. On the other hand, background selection is realistic mainly for population pairs with elevated levels of genome-wide differentiation (), which is not the case for some of the population pairs examined here. It is also possible that adaptive phenotypic plasticity (, ) and epigenetic factors (, ) play complementary roles during adaptive divergence. However, their investigation was beyond the scope of the present study.

Levels of parallelism in lake-stream divergence

To quantify the level of parallelism among the nine lake-stream systems, we performed phenotypic change vector analyses (, ). Similar to what has previously been reported in lake-stream population pairs of threespine sticklebacks (), we found that diversification was not particularly parallel among the different lake-stream systems, as the majority of θP and θG_outlier values were close to orthogonal (Fig. 5, F and I). θG_outlier values correlated with θP values, but θG values did not. This strongly suggests that adaptive phenotypic divergence has a genetic basis in the species under investigation. Furthermore, we conducted a recently proposed multivariate analysis of the C matrix, which is essentially a PCA of (phenotypic or genomic) vectors of differentiation (). We found that for both phenotypic and genomic outlier datasets, there was one significant axis of parallel evolutionary change, in which seven of nine lake-stream systems evolved in parallel. These seven lake-stream systems were the same in the phenotypic and in the genomic outlier datasets. That is, all A. burtoni and C. horei lake-stream systems have a common major axis of phenotypic and genomic parallel evolution, while H. stappersii and P. philander evolved in an antiparallel manner with regard to the seven other lake-stream systems (Fig. 5, G and J). The traits underlying parallel phenotypic evolution were related to mouth position, eye size, and slenderness of the fish body, all of which have previously been suggested to contribute to adaptive divergence in A. burtoni (). We show here that these phenotypic traits evolved in parallel not only in all A. burtoni lake-stream systems but also in C. horei. Furthermore, eight PCs contributing most to parallel genotypic evolution were related to lake-stream divergence (four PCs), species differences (three PCs), and geographic divergence within A. burtoni (one PC). Therefore, the genomic outliers mostly encompass candidate genomic loci for lake-stream adaptive divergence. Unexpectedly, the analysis of the genomic non-outliers also revealed a significant axis of parallel evolution. However, these genomic traits were not related to lake-stream divergence but rather summarized between-species (one PC), geographic (three PCs), or within-population (three PCs) divergence. This major axis of divergence may, hence, be the signature of drift under non-isotropic genetic variance ().

The distance between ancestral populations determines the level of parallelism

It has previously been suggested that the probability of parallelism at the molecular level decreases with time since divergence (). Furthermore, it has been suggested that the extent of parallelism should be higher when ancestral populations were closely related (). We thus examined whether there was a correlation between the distances between ancestral (i.e., lake) populations and the level of parallelism in lake-stream divergence. We found that for both phenotypic and genetic outliers, there was a significant positive correlation between these metrics (θP and DM; θG_outlier and FST; Fig. 5, H and K), corroborating the view that the morphological and genetic distances between ancestral populations influence the level of parallelism. Standing genetic variation is an essential component in replicated adaptive evolution (). Thus, we also tested whether the amount of standing genetic variation between ancestral populations was correlated to the levels of parallelism in the respective lake-stream systems. In line with our predictions, we found that the levels of standing genetic variation and parallelism were negatively correlated (fig. S5, G and H). That is, lake-stream population pairs sharing larger amounts of standing genetic variation display more parallelism at the level of both the phenotype and the genotype. Parallelism is, thus, likely constrained by the amount of standing genetic variation upon which natural selection can act, as the effect of de novo beneficial mutations on parallel evolution is much less likely to play an important role in the rapid adaptive divergence of recently evolved population pairs. Previous theoretical work has further shown that even small differences in the directionality of selection can greatly reduce genetic parallelism, especially in the case of complex traits (). This suggests that, besides time since divergence, cryptic habitat heterogeneity also (leading to small differences in the directionality of selection) can decrease the likelihood of parallelism. In support of this, a comparison of regional (within Vancouver Island) versus global (North America versus Europe) lake-stream population pairs of sticklebacks showed that parallelism was lower at increased spatial scales, with global replicates showing less parallelism ().

Body shape convergence between sympatric species pairs

Last, we investigated levels of convergence or divergence among the nine lake-stream population pairs. While we found divergence at the genomic levels (for both outlier and non-outlier datasets), we found convergence at the morphological level (Fig. 6B). We found body shape convergence in both species pairs coinhabiting the same river systems, namely, A. burtoni and H. stappersii in the Rusizi River and A. burtoni and C. horei in the Lufubu River (Fig. 6B) (also note that the sympatric population pairs show rather similar FST values across species; Fig. 3). This highlights that local selection regimes constrain body shape evolution in cichlids. This has previously been shown in Midas cichlids (Amphilophus sp.), where the body shapes of two syntopic species were more similar to one other than the body shape average of the first species from different localities (). That we did not find convergence at the genomic level may be due to several nonmutually exclusive factors. For example, the outlier detection method based on lake-stream differences might not be the most appropriate to uncover the genomic basis of morphological evolution. A quantitative traits loci approach might be more suitable here, as has previously been applied to uncover genomic regions underlying body shape differences along a benthic-limnetic axis of Midas cichlids (, ). Further, adaptive phenotypic plasticity may also play a role in body shape evolution, as has recently been suggested for A. burtoni (, , ). Future studies should thus focus on quantifying the respective influence of genomic divergence versus phenotypic plasticity in adaptive divergence.

CONCLUSION

By examining the dynamics of differentiation in the African cichlid A. burtoni, we found that morphological differentiation was continuous along the speciation continuum. Contrastingly, we detected a gap in genomic differentiation that was only partially explained by geographic patterns. Our results, therefore, provide additional support for the hypothesis that there is a tipping point in genomic differentiation during the speciation process (), suggesting that there might be a threshold of genomic differentiation to delimit species. We further showed that genomic differentiation was accelerated in the presence of both divergent selection and genetic drift, highlighting the combined effect of selective and neutral processes in speciation. To gain insights into the potential predictability of the speciation process, we investigated the extent of parallel evolution in nine population pairs from four cichlid species that diverged along a similar lake-stream environmental contrast. While pairwise comparisons failed to identify strong signatures of phenotypic and genomic parallelism, multivariate analyses uncovered major axes of shared evolutionary changes along the lake-stream contrast. Last, we found that levels of parallelism were higher between closely related and, hence, genetically more similar population pairs. While the speciation process is overall difficult to predict, our results support the view that evolution can be predictable to a certain extent if appropriate models and data are used (). To conclude, our study corroborates that contingency plays an important role in speciation and that speciation is a complex product of differentiation trajectories through multivariate space and time.

MATERIALS AND METHODS

Study design

Research objectives

In this study, we investigated the dynamics of morphological and genomic diversification in A. burtoni populations that have diverged along a lake-stream environmental contrast and across geography. We further aimed to investigate the extent and predictability of (non-)parallelism and convergence in nine lake-stream population pairs from four East African cichlid species.

Research subjects or units of investigation

There were four cichlid species: A. burtoni, C. horei, H. stappersii, and P. philander.

Experimental design

Individuals of A. burtoni (N = 132), C. horei (N = 24), H. stappersii (N = 24), and P. philander (N = 24) were collected in Zambia and Burundi between January and November 2015 (tables S1 and S2). All fishes were sampled with a ~1:1 sex ratio and were adult specimens except for three P. philander juveniles from the Mbulu River. Fishes were sampled in six different tributaries to Lake Tanganyika, whereby each system comprises a riverine population (N = 10 to 12) and a lake population (N = 12 to 14) and was named after the river, except for the Lake Chila system that was sampled outside the Lake Tanganyika basin. H. stappersii were sampled at the Rusizi River, in the North of Lake Tanganyika, along with sympatric A. burtoni (Fig. 1 and table S1). All other populations were sampled in the South of Lake Tanganyika. A. burtoni and C. horei were sampled at the Lufubu River; A. burtoni was further sampled in the Lunzua, Chitili, and Kalambo rivers (Fig. 1 and table S1). As two river populations were sampled in the Kalambo River, two lake-stream population comparisons were used for this river, namely, Kalambo 1 (comparison Kalambo lake versus Kalambo 1) and Kalambo 2 (comparison Kalambo lake versus Kalambo 2). Last, P. philander were sampled in small Lake Chila and in Mbulu creek (Fig. 1 and table S1). All fishes were caught with fishing rods or minnow traps and anesthetized using clove oil. Photographs of the left lateral side were taken using a Nikon D5000 digital camera, under standardized lighting conditions, and with a ruler as a scale. To aid in digital landmark placement, three metal clips were used to spread the fins at the anterior insertions of the dorsal and anal fin and at the insertion of the pectoral fin (fig. S2A). To increase the sample size for morphological analyses, additional individuals were sampled and photographed at the same locations and time points as the individuals whose genomes were sequenced (table S2). Standard length, total length, and weight were measured. A piece of fin clip was preserved in 99% ethanol for DNA extraction. Whole specimens were preserved in 70% ethanol.

Sample size

For the genomic analyses, we intended to sample 12 individuals per population (6 males and 6 females), as 24 alleles per population are sufficient to obtain accurate allele frequency estimates. We sampled only 10 A. burtoni specimens from the Rusizi River because we did not succeed in catching additional specimens after numerous attempts. To compensate, 14 A. burtoni specimens from Rusizi lake were sampled to obtain a total of 24 A. burtoni specimens from the Rusizi lake-stream system. For the morphometric analyses, we intended to photograph at least 17 specimens per population, which is equal to the number of landmarks used. Fewer specimens were photographed in three populations (A. burtoni Chitili lake, N = 13; A. burtoni Rusizi River, N = 10; P. philander Mbulu Creek, N = 9) because we did not succeed in catching additional specimens after numerous attempts and because three of the individuals caught in Mbulu Creek were juveniles and, thus, excluded from the morphometric analyses.

Data inclusion/exclusion criteria and outliers

For the genomic analyses, all individuals were included in the analyses except one hybrid specimen between A. burtoni and another species that was detected only after whole-genome sequencing. For the morphometric analyses, juveniles were excluded on the basis that their morphology is different from the adults’ morphology.

Replicates

The first mate-choice experiment (including only visual cues) was replicated 44 times; each replicate included one trio of fish (one female and two males from different populations). The second mate-choice experiment (including direct contact) was replicated eight times; each replicate included eight fish (two males and six females). Additional replicates could not be performed because of the limited number of fish available in the laboratory.

DNA extraction, sequencing, and data processing

DNA was extracted from fin clips using the E.Z.N.A. Tissue DNA Kit (Omega Bio-Tek) following the manufacturer’s instructions. Individual genomic libraries were prepared using TruSeq DNA PCR-Free Low Sample Kit (Illumina) and subsequently sequenced [150–base pair (bp) paired-end] on an Illumina HiSeq 3000 sequencer at the Genomics Facility Basel operated jointly by the ETH-Zürich Department of Biosystems Science and Engineering and the University of Basel. For each library, the quality of raw reads was visually inspected using FastQC (v0.11.3), and Illumina adapters were trimmed using Trimmomatic (v0.36). Filtered reads of each individual were aligned separately against the Metriaclima zebra reference genome (assembly M_zebra_UMD1). We chose this reference genome rather than the A. burtoni reference genome () to avoid any reference bias when comparing A. burtoni with the other species. We also chose M. zebra rather than Oreochromis niloticus as reference genome to maximize the number of reads mapped, as M. zebra is phylogenetically closer to A. burtoni, C. horei, H. stappersii, and P. philander than O. niloticus. The M. zebra reference genome was indexed using Burrows-Wheeler Aligner (BWA; v.0.7.12), and alignments were performed using BWA-mem with default parameters. Obtained alignments in SAM format were coordinate-sorted and indexed using SAMtools (v.1.3.1). The average coverage per individual ranged from 9.8× to 24.5× (table S1). We performed an indel realignment using RealignerTargetCreator and IndelRealigner of the Genome Analysis Tool Kit (GATK; v3.4.0). Variants were called using the GATK functions HaplotypeCaller (per individual and per scaffold), GenotypeGVCFs (per scaffold), and CatVariants (to merge all VCF files). The VCF file corresponding to the mitochondrial genome (scaffold CM003499.1) was then isolated from the VCF file corresponding to the nuclear genome (that is, all other scaffolds). The latter was annotated with the features ExcessHet (that is, the Phred-scaled P value for an exact test of excess heterozygosity) and FisherStrand (that is, the strand bias estimated using Fisher’s exact test) using the GATK function VariantAnnotator. To filter the VCF file, empirical distributions of depth (DP) and quality (QUAL) were examined. The VCF file was filtered using the GATK function VariantFiltration with the following values (variants meeting the criteria were excluded): DP < 2000, DP > 4000, QUAL<600, FisherStrand > 20, and ExcessHet > 20. In addition, variants were called using SAMtools mpileup (per scaffold) with the following options: -C50 -pm2 -F0.2 -q 10 -Q 15. Files per scaffold were then converted to VCF format, concatenated (except for the mitochondrial genome), and indexed using BCFtools (v.1.3.1). The VCF file was annotated for ExcessHet and FisherStrand, and the distribution of depth and quality were visually assessed as described above. The VCF file was filtered using the GATK function VariantFiltration with the following values: DP < 1500, DP > 4000, QUAL < 210, FisherStrand > 20, and ExcessHet > 20. Filtered GATK and SAMtools datasets were then combined using bcftools isec. The final VCF file contained variants present in both datasets. Genotypes were then imputed and phased per scaffold using beagle (v.4.0). In total, the final VCF file contained 26,704,097 variants. Chomonomer (v.1.05; http://catchenlab.life.illinois.edu/chromonomer/) was used to place the 3555 M. zebra scaffolds in 22 linkage groups using two linkage maps (). For BayPass selection analyses and allele frequency calculations (see below), we excluded indels and nonbiallelic sites, resulting in a VCF file containing 20,343,366 SNPs.

Genetic structure and phylogenetic relationships

Population genetic structure was examined using PCAs implemented in the smartPCA module of Eigensoft (v.6.1.1). To reconstruct a whole-genome nuclear phylogeny, a sequence corresponding to the first haplotype of each scaffold was extracted using bcftools consensus --haplotype 1 of BCFtools v1.5 (https://github.com/samtools/bcftools). Individual whole-genome sequences were then concatenated and a maximum likelihood (ML) analysis was performed in RAxML (v.8.2.11) using the GTRGAMMA sequence evolution model and 20 fast bootstrap replicates. The option –f a was used to report the best-scoring ML tree with branch lengths. KBC4, a putative A. burtoni individual sampled in the Rusizi River, did not cluster with other A. burtoni individuals in the phylogeny (labeled “hybrid” in fig. S1A). This specimen results most likely from a hybridization event with Astatoreochromis alluaudi, as its mitochondrial genome is identical to A. alluaudi (fig. S8). Therefore, this individual was excluded from further analyses. To test for introgression or retention of ancestral polymorphism between sympatric species (that is, A. burtoni and H. stappersii in the Rusizi system and A. burtoni and C. horei in the Lufubu system), a topology weighting analysis reconstructing fixed-length 5-kb-window phylogenies was performed using Twisst [topology weighting by iterative sampling of subtrees ()].

Demographic modeling

For each of the nine lake-stream population pairs, demographic simulations based on the joint site frequency spectrum (SFS) were performed to estimate the most likely model of population divergence and the best values of demographic parameters (effective population sizes, divergence times, and migration rates). Simulated SFS were obtained using diffusion approximation implemented in ∂a∂I (v.1.7.0) (). A modified version of the program including additional predefined models and the calculation of the Akaike information criterion (AIC) for model selection was used for the simulations (). Eight demographic models of population divergence were tested (fig. S3A): Bottle-Growth (BG), Strict Isolation (SI), Isolation with Migration (IM), Ancient Migration (AM), Secondary Contact (SC), and versions of these models including two categories of migration rates (IM2M, AM2M, and SC2M). These two categories of migration rates can separate, for example, selected versus neutral loci. For the population pairs with a genome-wide FST > 0.47 (A. burtoni Lufubu, C. horei Lufubu, and P. philander), the model BG was not tested, as it was obvious that the populations are separated. Each model was fitted to the observed joint SFS using three successive optimization steps: “hot” simulated annealing, “cold” simulated annealing, and Broyden-Fletcher-Goldfarb-Shanno method (BFGS) (). For each lake-stream population pair, 20 replicates comparing seven or eight models were run using different parameter starting values to ensure convergence. After these 20 runs, the model displaying the lowest AIC and the least variance among the replicates was chosen as the best model. For parameter estimation, additional runs were performed so that the total number of runs was 20 for the best model. To calculate the divergence times in years, a generation time of 1 year was used. As the scaled mutation rate parameter θ was estimated, we used the relation θ = 4 × Ne × μ × L to infer the ancestral effective population size (Ne). The mutation rate (μ) (3.51 × 10−9 mutation per generation per year) of Lake Malawi cichlids () and the length of the genome assembly (L) of M. zebra (UMD1: 859,842,111 bp) were applied.

Regions of differentiation

For each lake-stream system, genome-wide FST (Hudson’s estimator of FST), |πlake − πstream| (absolute value of the difference in nucleotide diversity between the lake and the river populations), and d (absolute divergence) were calculated in 10-kb nonoverlapping sliding windows using evo (https://github.com/millanek/evo). We defined as window of differentiation each window that was contained in the overlap of the top 5% values of these three metrics. Adjacent differentiation windows and windows separated by 10 kb were then merged in differentiation regions. To test whether the differentiation regions of each system are affected by chromosome center–biased differentiation (), each region was placed either in the “center” or in the “periphery” categories. These categories were defined by splitting each chromosome into four parts of equal length, where the center category encompassed the two central parts of the chromosome and the periphery category encompassed the two external parts.

Bayesian selection and association analyses

To detect signatures of selection at the SNP level, we used the Bayesian method BayPass (). The core model performs a genome scan for differentiation by estimating a population covariance matrix of allele frequencies. It allows determining outlier SNPs based on the top 1% of simulated XtX values, where XtX is a differentiation measure corresponding to an SNP-specific FST corrected for the scaled covariance of population allele frequencies (). As the northern and southern populations of A. burtoni are highly divergent [see () and fig. S1], only the southern A. burtoni populations were analyzed jointly. We thus compared outlier sets for the southern populations of A. burtoni, the northern populations of A. burtoni, and H. stappersii. For the southern A. burtoni populations, an additional association analysis using one categorical covariate (lake versus stream) was performed using the auxiliary variable covariate model. Five replicate runs were performed using different starting seed values and default search parameters, except for the number of pilot runs (). The final correlated SNP set contained the overlap of SNPs for which the Bayes factor was higher than 10 and which were in the top 5% of δ values (the posterior probability of association of the SNP with the covariable) in the five replicate runs.

GO annotation and enrichment analyses

To infer whether candidate genes in differentiation regions were enriched for a particular function, all genes included in differentiation regions of each lake-stream system were extracted. In addition, genes including overlapping SNPs between the southern A. burtoni, the northern A. burtoni, and H. stappersii core outliers were reported, as well as genes including overlapping SNPs between A. burtoni core outliers and SNPs significantly correlated with lake versus stream environment and morphology. All candidate genes were blasted (blastx) against the NR database (version 12.10.2017) using BLAST+ v.2.6.0, and the first 50 hits were reported. To obtain a reference gene set, all M. zebra genes were blasted against NR in the same way. GO and InterProScan annotations were retrieved from Blast2GO PRO (v.4.1.9). Enrichment analyses were performed using Fisher’s exact test for each differentiated gene set (one set per system) against the reference gene set (significance level, 0.001). For the genes located in the overlap of differentiation regions among systems, an additional step was performed by manually retrieving the annotations from Homo sapiens dataset in UniProt (accessed online 20.11.2017).

Diversification dynamics and genomic divergence in similar versus contrasted environments

To infer the dynamics of genomic differentiation along the lake-stream axis, genome-wide pairwise FST and the net nucleotide difference Da [as proxy of time since differentiation; d − (π1 + π2/2)] were calculated for all possible population pair combinations, resulting in 136 within and between systems comparisons. A logarithmic regression was fitted to the data using the lm function [lm(FST) ~ ln(Da)] implemented in R (v.3.4.2). To estimate the influence of divergent selection at early stages of genomic differentiation in sympatry/parapatry, population pairwise FST of the southern populations of A. burtoni were used (“lake-lake,” six comparisons; “lake-stream,” five comparisons), as well as the pairwise DM (see below). The northern A. burtoni populations were not used because of the high levels of genomic divergence compared to the southern populations. For each comparison, the geographic coastline distance between populations was measured using Google Earth (www.google.com/intl/en/earth/). Then, a linear model was fitted for each category (lake-lake and lake-stream) and the adjusted coefficient of determination R2 was reported in R.

Morphometric analyses

Geometric morphometrics was used to compare adult body shape between populations. Three juvenile individuals of P. philander from Mbulu creek whose genomes had been sequenced were excluded from the morphological analyses. In total, the photographs of 468 individuals (table S2) were used for geometric morphometric analyses (289 A. burtoni, 81 H. stappersii, 67 C. horei, and 31 P. philander). Using tpsDig2 (v.2.26), we placed 17 homologous landmarks (fig. S2A) on the lateral image of each fish. The tps file with x and y coordinates was used as an input for the program MorphoJ (v.1.06d) and superimposed with a Procrustes generalized least squares fit algorithm to remove all nonshape variation (i.e., size, position, and orientation). In addition, the data were corrected for allometric size effects using the residuals of the regression of shape on centroid size for further analyses. CVA and PCA were used to assess shape variation among A. burtoni populations (Fig. 3C) and among all populations of the four species (Fig. 3D and fig. S2, B to D). The mean shape distances of CVA were obtained using permutation tests (10,000 permutations). Mahalanobis distances (DM) among groups from the CVA were calculated and plotted against genetic distances (FST for within A. burtoni comparisons and d for between species comparisons).

Vectors of phenotypic and genomic divergence

We followed the method first developed by Adams and Collyer () and described in detail by Stuart et al. () and Bolnick et al. () to calculate multivariate vectors of phenotypic and genomic divergence. For vectors of morphological divergence, 37 traits and landmarks were used: centroid size (17 landmarks), standard length (landmarks 1 to 14), body depth (landmarks 8 to 12) corrected by standard length (BD/SL ratio), and the x and y coordinates of each of the 17 landmarks. We then calculated two types of vectors of genomic divergence. First, outlier vectors were calculated using the first 78 PCs (88% variance; same as for non-outliers) of a genomic PCA based on outlier SNPs present in at least one set of BayPass outliers (H. stappersii core model, A. burtoni North core model, A. burtoni South core model, and A. burtoni South auxiliary model). Then, “neutral” vectors were calculated using the first 78 PCs of a genomic PCA based on all remaining SNPs (that is, non-outliers) (88% variance; all PC summarizing between-population variation). For each lake-stream pair and, separately, for phenotypic and genomic outlier and non-outlier data, we calculated vector length (L), the difference in length between two vectors (ΔL), and the angle in degrees between two vectors (θ). For each morphological trait (each genomic PC, respectively), we ran t tests and used the t statistic as an estimate of lake-stream divergence for each trait. Thus, vectors of phenotypic divergence were represented as the matrix CP of 37 t statistics for each trait/landmark (columns) × 9 lake-stream pairs (rows), and vectors of genomic divergence were represented as a matrix CG of 78 t statistics for each PC × 9 lake-stream pairs. We calculated nine LP (phenotype), nine LG (genotype), nine LG_outlier values, all lake-stream pairwise comparisons ΔLP, θP, ΔLG, θG, ΔLG_outlier, and θG_outlier. We used Mantel tests (mantel.test function in the ape R package v.5.3; 9999 permutations) to test the correlation between θP and θG/θG_outlier, ΔLP and ΔLG/ΔLG_outlier, and linear regression models (lm function in R) to test the correlation between LP and LG/LG_outlier.

Multivariate analyses of (non-)parallelism and convergence

In addition to comparing all lake-stream divergence vectors in a pairwise manner, we performed the recently described eigen analyses of vector correlation matrices (C) (). These multivariate analyses allow the quantification of the extent of parallelism/antiparallelism by assessing the percentage of variance explained by the leading eigenvectors of C. These analyses also reveal how many dimensions of shared evolutionary change exist, by inferring how many eigenvalues are significant. For each dataset (phenotype, genotype outlier, and genotype non-outlier), we calculated the eigen decomposition of the respective matrix of vector correlations C described above (C = QVQ−1) to extract eigenvectors (Q) and eigenvalues (V) of each dataset. To construct a null expectation of evolutionary parallelism, we used an identity matrix sampled from the m-dimensional Wishart distribution with n degrees of freedom as suggested by De Lisle and Bolnick (), where m is the number of lake-stream pairs and n is the number of traits/landmarks. To infer which traits/landmarks/PCs contribute the most to the leading eigenvector of C, we investigated the value of trait loadings in the matrix A, where A = XQ. Furthermore, we investigated levels of convergence/divergence in multivariate trait space by comparing the among-lineage covariance matrices of trait mean values (D) for each environment (). Specifically, for each one of the three datasets (phenotype, genotype outlier, and genotype non-outlier), we calculated Driver and Dlake and their respective trace, which encompasses the total among-lineage variance per environment. To assess levels of convergence/divergence in lake-stream divergence, we then compared both trace values [tr(Driver) − tr(Dlake)], where a negative value indicates convergence (less among-lineage variance in the river environment) and a positive value indicates divergence (more among-lineage variance in the river environment). Last, it has also been proposed to investigate convergence/divergence of a specific trait by estimating the change in Euclidean distance between lineage pairs from different environments (here, river versus lake) (, ). We therefore calculated, for each pair of lake-stream systems, the difference in DM between the respective stream population pair versus lake population pair. For instance, the convergence/divergence in body shape between A. burtoni and C. horei from the Lufubu system was calculated as follows: (DM ) − (DM ), where a negative value indicates convergence and a positive value indicates divergence in body shape.

Predictability of (non-)parallelism

We then investigated the influence of similarity of ancestral (i.e., lake) populations on parallel evolution. We used DM and FST between lake populations as proxies of morphological and genetic similarities, respectively. We performed Mantel tests between lake-lake DM and θP and between lake-lake FST and θG/θG_outlier to infer whether the similarity of ancestral populations between systems was correlated to the direction of (non-)parallelism. Last, we assessed whether the proportion of standing genetic variation was correlated with the extent of morphological or genetic parallelism. For each of the 136 population pairwise comparison, we extracted allele frequencies of biallelic SNPs using VCFtools --freq command. We then categorized SNPs in four different groups: identical sites (fixed in both populations for the same allele), differentially fixed sites (fixed in both populations for different alleles), fixed and variable sites (fixed in one population and variable in the second population), and standing genetic variation (variable in both populations). We performed Mantel tests between the proportion of standing genetic variation SNPs between lake populations and θP and between the proportion of standing genetic variation SNPs and θG / θG_outlier.

Mate-choice experiments

We used two mate-choice experiments to test for reproductive isolation between the two geographically most distant and genetically most divergent A. burtoni populations, Rusizi lake and Kalambo lake (see Supplementary Text and fig. S4). Detailed methods and results of the experiments are provided in Supplementary Text. Briefly, in the first experiment, a two-way female choice setup was used to test whether females preferred males of their own population over others when only visual cues were available (fig. S4A). We placed a gravid female of either population (N = 44) in a central tank and allowed visual contact and interaction with two sized-matched males from Rusizi lake and Kalambo lake presented in two outer tanks. Within a period of up to 12 days, we assessed whether the female had laid the eggs in front of the Rusizi lake male, the Kalambo lake male, in front of both, or in the central section. The position of the laid eggs was used as a measure for female preference (conspecific choice coded as 1 and heterospecific and no choice coded as 0). The binomial data were then analyzed with a generalized linear mixed model, which tested whether the probability of the females spawning with the conspecific male was significantly different from 0.5. In the second experiment, female spawning decisions were determined in a multisensory setting with free contact between females and males. A single tank was subdivided into three equally sized compartments by plastic grids (fig. S4B). The middle compartment offered a resting and hiding place for the females, whereas the two outer compartments served as male territories. The grid size was chosen to allow the smaller females to migrate between the three compartments and to prevent direct contact between the larger males to exclude male-male competition. We conducted eight trials, each time using two males (one of each population) and six females from both populations (Nmales = 16; Nfemales = 46). Mouthbrooding females were caught, and 10 larvae from each female were collected for paternity analyses based on five microsatellite markers (). All adult males and females were also genotyped for these five markers. The percentage of fertilized offspring by conspecific or heterospecific males in each replicate was then used to infer whether the females spawned more frequently with the conspecific males. These experiments were performed under the cantonal veterinary permit number 2356.

Statistical analysis

Statistical parameters including the exact value of N are reported in the methods and figure legends. All statistical analyses were performed using R v.3.4.2.
  59 in total

1.  A general framework for the analysis of phenotypic trajectories in evolutionary studies.

Authors:  Dean C Adams; Michael L Collyer
Journal:  Evolution       Date:  2009-02-03       Impact factor: 3.694

2.  Demography and genome divergence of lake and stream populations of an East African cichlid fish.

Authors:  Bernd Egger; Marius Roesti; Astrid Böhne; Olivia Roth; Walter Salzburger
Journal:  Mol Ecol       Date:  2017-09-11       Impact factor: 6.185

Review 3.  Contingency and determinism in evolution: Replaying life's tape.

Authors:  Zachary D Blount; Richard E Lenski; Jonathan B Losos
Journal:  Science       Date:  2018-11-09       Impact factor: 47.728

4.  Genome-wide patterns of divergence and introgression after secondary contact between Pungitius sticklebacks.

Authors:  Yo Y Yamasaki; Ryo Kakioka; Hiroshi Takahashi; Atsushi Toyoda; Atsushi J Nagano; Yoshiyasu Machida; Peter R Møller; Jun Kitano
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2020-07-13       Impact factor: 6.237

5.  Genomics of Parallel Ecological Speciation in Lake Victoria Cichlids.

Authors:  Joana Isabel Meier; David Alexander Marques; Catherine Elise Wagner; Laurent Excoffier; Ole Seehausen
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

6.  Baby fish working out: an epigenetic source of adaptive variation in the cichlid jaw.

Authors:  Yinan Hu; R Craig Albertson
Journal:  Proc Biol Sci       Date:  2017-08-16       Impact factor: 5.349

7.  Olfactory communication in a Cichlid fish, Haplochromis burtoni.

Authors:  M D Crapon de Caprona
Journal:  Z Tierpsychol       Date:  1980

8.  A microsatellite-based genetic linkage map of the cichlid fish, Astatotilapia burtoni (Teleostei): a comparison of genomic architectures among rapidly speciating cichlids.

Authors:  Matthias Sanetra; Frederico Henning; Shoji Fukamachi; Axel Meyer
Journal:  Genetics       Date:  2008-08-30       Impact factor: 4.562

9.  On the causes of geographically heterogeneous parallel evolution in sticklebacks.

Authors:  Bohao Fang; Petri Kemppainen; Paolo Momigliano; Xueyun Feng; Juha Merilä
Journal:  Nat Ecol Evol       Date:  2020-06-22       Impact factor: 15.460

10.  Nothing in Evolution Makes Sense Except in the Light of Biology.

Authors:  Martha O Burford Reiskind; Michael L Moody; Daniel I Bolnick; Charles T Hanifin; Caroline E Farrior
Journal:  Bioscience       Date:  2021-02-03       Impact factor: 8.589

View more
  1 in total

Review 1.  Detecting (non)parallel evolution in multidimensional spaces: angles, correlations and eigenanalysis.

Authors:  Junya Watanabe
Journal:  Biol Lett       Date:  2022-02-16       Impact factor: 3.703

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.