Literature DB >> 34865126

Limited Introgression between Rock-Wallabies with Extensive Chromosomal Rearrangements.

Sally Potter1,2, Jason G Bragg3, Rustamzhon Turakulov1, Mark D B Eldridge2, Janine Deakin4, Mark Kirkpatrick5, Richard J Edwards6, Craig Moritz1.   

Abstract

Chromosome rearrangements can result in the rapid evolution of hybrid incompatibilities. Robertsonian fusions, particularly those with monobrachial homology, can drive reproductive isolation amongst recently diverged taxa. The recent radiation of rock-wallabies (genus Petrogale) is an important model to explore the role of Robertsonian fusions in speciation. Here, we pursue that goal using an extensive sampling of populations and genomes of Petrogale from north-eastern Australia. In contrast to previous assessments using mitochondrial DNA or nuclear microsatellite loci, genomic data are able to separate the most closely related species and to resolve their divergence histories. Both phylogenetic and population genetic analyses indicate introgression between two species that differ by a single Robertsonian fusion. Based on the available data, there is also evidence for introgression between two species which share complex chromosomal rearrangements. However, the remaining results show no consistent signature of introgression amongst species pairs and where evident, indicate generally low introgression overall. X-linked loci have elevated divergence compared with autosomal loci indicating a potential role for genic evolution to produce reproductive isolation in concert with chromosome change. Our results highlight the value of genome scale data in evaluating the role of Robertsonian fusions and structural variation in divergence, speciation, and patterns of molecular evolution.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Robertsonian fusion; chromosome rearrangement; introgression; marsupial; speciation

Mesh:

Substances:

Year:  2022        PMID: 34865126      PMCID: PMC8788226          DOI: 10.1093/molbev/msab333

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   8.800


Introduction

There is a long history of theory and empirical work investigating the role of chromosome rearrangements on divergence and speciation. Links between chromosome evolution and rapid divergence and speciation have been outlined across diverse taxa (e.g., Leaché et al. 2016; de Vos et al. 2020; Wang et al. 2020). Classical models of chromosomal evolution focus on hybrid sterility resulting from the unbalanced gametes produced by hybrids that are heterozygous for chromosome rearrangements (White 1969; King 1993). Although supported in various systems (see Brown and O’Neill [2010]), this model has been challenged due to the difficulty for underdominant rearrangements to fix within populations (see Hoffman and Rieseberg [2008]; Faria and Navarro [2010]). The alternate model focusses on the suppression of recombination in heterokaryotypes within or surrounding a rearrangement, and subsequent accumulation of alleles within the region of alleles causing local adaptation and/or reproductive isolation (Noor et al. 2001; Rieseberg 2001). This model has been explored theoretically (Navarro and Barton 2003; Kirkpatrick and Barton 2006; Guerrero and Kirkpatrick 2014; Dagilis and Kirkpatrick 2016) and has gained substantial empirical support from diverse organisms, including plants (e.g., Rieseberg et al. 1995; Rieseberg 2001) and insects (e.g., Noor et al. 2007; Manoukis et al. 2008; see also Ayala and Coluzzi [2005]). Understanding how chromosome differences between populations can suppress gene flow can be challenging. Some rearrangements may have little or no effect on fertility of hybrids (Rieseberg 2001). Thus, the number of rearrangements by which two species differ may be an inaccurate gauge of how much the rearrangements contribute to their isolation. Rearrangements can themselves act as Bateson Dobzhansky Muller Incompatibilities (Bateson 1909; Dobzhansky 1936; Muller 1942) or establish genic incompatibilities by the alleles that they carry. Alternatively, isolation might be entirely attributable to genic incompatibilities, with no contribution from the structural changes themselves (Coyne and Orr 2004). Further empirical work across diverse systems is required to understand the role of structural variation in divergence and speciation; in particular, whether different chromosome rearrangements (e.g., inversion vs. fusion) have different mechanisms for driving reproductive isolation. One type of chromosomal rearrangement, Robertsonian fusions, is particularly interesting in the context of chromosomal speciation. These occur when two acrocentric chromosomes fuse at their centromeres to form a new metacentric chromosome. A single Robertsonian fusion may have little impact on hybrid fitness because the fused and unfused chromosomes are able to pair successfully, by forming a trivalent that is able to segregate normally (e.g., in shrews and mice; reviewed in Searle [1993]; Garagna et al. [2014]; Searle et al. [2019]). Sterility and barriers to gene flow can evolve, however, when different populations fix for different fusions that involve the same chromosome arms (monobrachial homology). This situation can lead to complex chains of chromosomes in hybrid meiosis, causing segregation errors that lead to sterility (Gropp et al. 1982; Baverstock et al. 1983; Baker and Bickham 1986; Garagna et al. 2014). Longer chains of chromosomes are thought to generally show lower fertility (Searle 1993; Basset et al. 2019). Thus, establishment of such complex combinations of rearrangements is expected to cause stronger reproductive isolation. Two prominent organismal systems used to explore the role of Robertsonian fusions in divergence and speciation are mice (genus Mus; Giménez et al. 2013, Garagna et al. 2014, Britton-Davidian et al. 2017) and shrews (genus Sorex; see Searle et al. [2019]). The insight gained from these systems has come from in depth cytogenetic research, fertility, and meiotic studies and importantly, detailed research on the molecular composition and organization of centromeric regions across contact zones as well as between species (reviewed in Garagna et al. [2014]; Searle et al. [2019]). The recent divergence, multiple parallel pairwise comparisons between populations and/or species with varied combinations of Robertsonian fusions and the ability to conduct laboratory research has revealed a lot. However, such deep experimental evidence is lacking for most organisms with extensive chromosome change. New genomic tools and evidence are expected to improve our understanding of chromosome change and speciation for diverse organisms (e.g., Potter et al. 2017; Campbell et al. 2018; Mérot et al. 2020), and for some systems have already done so (Carbone et al. 2014; Franchini et al. 2020); yet most exemplars of rapid chromosome change are still to be examined at a genome scale. To avoid conflating the contributions to reproductive isolation of genic evolution with chromosome change, we need to focus genome-scale analyses on very recently diverged species groups (Coyne and Orr 2004). Here, we highlight an approach to evaluate the role of chromosomal structure in the history of divergence and speciation, as well as molecular evolution processes in a group of species with a complex evolutionary and taxonomic history due to recent divergence and complex chromosome rearrangements. The rock-wallabies (genus Petrogale) are a group of Australian marsupials that have long been promoted as an example of rapid chromosomal speciation caused by hybrid dysfunction (Sharman et al. 1989; King 1993; Brown and O’Neill 2010). This radiation is young, between 0.44 and 1.6 My old (Potter et al. 2012). Rock-wallabies typically live in small, isolated populations, a lifestyle that may facilitate fixation of underdominant chromosome rearrangements by genetic drift (Eldridge and Close 1993). The unusual population structure of Petrogale, together with their extensive chromosomal rearrangements, make this group an interesting contrast to other model systems in studies of the role of structural variation in speciation. The 17 recognized species of Petrogale comprise 23 distinct karyotypes, which is the greatest chromosomal diversity in any group of marsupial (Eldridge and Close 1993). Most rearrangements in Petrogale are Robertsonian fusions that are fixed between species, and the genus is unusual for having numerous examples of monobrachial homology (see Eldridge and Close [1993]). Inversions and centric shifts are also present (Eldridge et al. 1989, 1990, 1991, 1992; Eldridge and Close 1992, 1993), and some such rearrangements are polymorphic within species (Eldridge and Close 1997; refer to fig. 1). Given the diverse chromosomal rearrangements and little time for confounding effects of genic divergence, Petrogale is ripe for investigations into the effects of chromosomal rearrangements on speciation.
Fig. 1.

Karyotypes of six species of rock-wallaby (Petrogale) from north-eastern Australia which form part of the penicillata complex, including a map of sample locations and distributions for each species. The color of individuals on the map is species specific and linked to the karyotypes. Robertsonian fusions are highlighted by chromosome numbers, inversions are denoted as (i), and polymorphic karyotypes are shown for chromosome 2 and the X chromosome. The SAM (P. sharmani, P. assimilis, P. mareeba) complex is outlined. The outcomes of experimental hybrid crosses are outlined for both males and females: X = infertile; SF = subfertile; ? = unknown. Based on simple versus complex heterozygote formation between species pairs, we have predicted whether we expect to see introgression or not. For simple heterozygotes which form trivalents during meiosis, we expect introgression could be present as chromosomes could segregate properly during meiosis of hybrids. For complex heterozygotes which from chains of four or five chromosomes, we expect to see no introgression.

Karyotypes of six species of rock-wallaby (Petrogale) from north-eastern Australia which form part of the penicillata complex, including a map of sample locations and distributions for each species. The color of individuals on the map is species specific and linked to the karyotypes. Robertsonian fusions are highlighted by chromosome numbers, inversions are denoted as (i), and polymorphic karyotypes are shown for chromosome 2 and the X chromosome. The SAM (P. sharmani, P. assimilis, P. mareeba) complex is outlined. The outcomes of experimental hybrid crosses are outlined for both males and females: X = infertile; SF = subfertile; ? = unknown. Based on simple versus complex heterozygote formation between species pairs, we have predicted whether we expect to see introgression or not. For simple heterozygotes which form trivalents during meiosis, we expect introgression could be present as chromosomes could segregate properly during meiosis of hybrids. For complex heterozygotes which from chains of four or five chromosomes, we expect to see no introgression. Six species of Petrogale that form the penicillata complex (Eldridge et al. 1991) have extensive differences in chromosome structure (fig. 1). They are distributed parapatrically, and rare chromosomal hybrids have been found at contact zones (Briscoe et al. 1982). Some species differ by simple Robertsonian fusions (F1’s having chains of three chromosomes at meiosis), whereas others have more complex differences (monobrachial homology with F1’s having chains of four or five). Experimental hybrid crosses result in infertile males, which show extensive mispairing in meiosis, and partially fertile F1 females (Sharman et al. 1989; Close et al. 1996; Close and Bell 1997). These results suggest that postzygotic isolation is strong. The stronger sterility in Petrogale males conforms to Haldane’s rule, whereby in crosses between two taxa, the heterogametic sex is more likely to be sterile or inviable (Haldane 1922). Both Haldane’s rule and meiotic sex chromosome inactivation have been linked to a disproportionally large role of the X chromosome in hybrid incompatibilities (the large X-effect; Coyne and Orr 1989; Masly and Presgraves 2007). However, in mice where there is extensive evidence for the large X-effect, it is unclear if chromosome rearrangements are important to it (Larson et al. 2017). Recent analysis of X loci in Petrogale (Potter et al. 2017) identified greater phylogenetic resolution on the X compared with autosomes, as well as greater divergence (i.e., faster X effect; Vicoso and Charlesworth 2006). Comparisons of divergence across the genome will be important in understanding the role of sex chromosome versus autosomal evolution in reproductive isolation. Existing molecular data have given an inconclusive picture of gene flow between species in Petrogale. Mitochondrial DNA (mtDNA) phylogenies show strong discordance with species boundaries as defined by karyotypes and hybrid sterility (Potter et al. 2015, 2017). This discord could result from introgression or from incomplete lineage sorting. Coalescent analysis of microsatellites inferred high levels of introgression, even among the most chromosomally divergent taxa (Potter et al. 2015). Allozymes, mtDNA, microsatellites (Briscoe et al. 1982; Bee and Close 1993; Potter et al. 2015), and (most recently) a sample of some 2,000 nuclear exons (Potter et al. 2017) have only partially resolved the evolutionary relationships of these six species. Proper evaluation of the competing hypotheses for the roles of chromosome rearrangements in speciation demands a knowledge of the history of divergence among species. But these relationships are challenging to estimate when there is extensive discordance between gene trees resulting from introgression and/or incomplete lineage sorting (Maddison 1997; Edwards 2009; Linkem et al. 2016). In some cases, introgression is so extensive that only a small fraction of the genome shares an evolutionary history that is consistent with the species phylogeny (Fontaine et al. 2015; Edelman et al. 2019; McGee et al. 2020). However, approaches are now available to test for introgression separate from incomplete lineage sorting, using both population and phylogenomic approaches (e.g., Pickrell and Pritchard 2012; Pease and Hahn 2015; Wen et al. 2018). How taxa are sampled can also affect the ability to resolve recent population divergence with gene flow; sampling of multiple geographically dispersed individuals per taxon should result in improved parameter estimates compared with just one or two individuals (Robinson et al. 2014; McLaughlin and Winker 2020; e.g., Teng et al. 2017; Linck et al. 2019). Here, using population genomic and phylogenomic approaches, we resolve divergence histories among these closely related taxa and test for introgression while accounting for incomplete lineage sorting. We expect that introgression will be most strongly reduced between species with complex monobrachial homology (chains of four or more chromosomes in hybrid meiosis) compared with hybrids between species with simple chromosomal differences (fig. 1). We found varied evidence for introgression between species pairs, with the most extensive support found between two species which share a simple Robertsonian fusion (P. assimilis and P. inornata). There was evidence of introgression between other species pairs where complex heterozygotes would form: P. godmani and P. mareeba which have a known contact zone and F1 hybrids; and less compelling evidence between P. assimilis and P. sharmani. As we expected based on infertility of male hybrids and potential links to Haldane’s rule, we found the X chromosome to have greater divergence than loci from autosomes.

Results

Genomic Data Sets

We used the draft genome assembly for Petrogale penicillata to serve as a reference for mapping reads and calling single-nucleotide polymorphisms (SNPs; see Materials and Methods). We generated three data sets, two exon sequence data sets for the six species in the penicillata group and one SNP data set for five species (excluding P. coenensis). The SNP data set we refer to as “DArT” based on the genome reduction approach used (see Materials and Methods). It genotyped 22,724 SNPs from 77 individuals with a mean of 15 per taxon (including four known F1 P. godmani×P. mareeba hybrids). The second data set is referred to as “phased exons” because it consists of phased haplotype sequences (627,699 bp) from exon capture experiments (see supplementary table S1, Supplementary Material online, for sampling). It resolved haplotypes at 1,215 loci for 67 individuals (mean of 15 per taxon). The third data set is referred to as “unphased exons” and is based on sequence data (843,619 bp) from exon capture experiments, and includes 1,617 loci on those same individuals. Except where noted, all the analyses reported here are based on the phased exons. A set of 583 loci were successfully mapped to the tammar wallaby (Notamacropus eugenii) genome (supplementary table S2, Supplementary Material online). These were sorted into three genomic categories: X-linked loci (n = 45); loci on rearranged autosomes, that is, those involved in fusions (chromosomes 5, 6, 9, 10; n = 108); and loci on nonrearranged autosomes not involved in fusions (chromosomes 1, 2, 7, 8; n = 229). Unfortunately, we are not able to identify which SNPs fall within rearranged parts of the chromosomes.

Population Structure and Phylogenetic Relationships

Principal coordinate analysis (PCoA) based on the DArT data set clearly separate out each of P. godmani and P. inornata from the clade of P. sharmani, P. assimilis, and P. mareeba, hereon referred to as the sharmani–assimilis–mareeba (SAM) clade (fig. 2). The three SAM species could not be further subdivided based on this analysis. Four F1 hybrids between P. godmani and P. mareeba that were identified previously by karyotype form a cluster intermediate between those two species. By contrast, discriminant analysis of principal components (Jombart 2008, Jombart et al. 2010) correctly assigned individuals to the three species. This result is based on 19 PCs (63.1% of conserved variance) and two discriminant eigenvalues (fig. 2).
Fig. 2.

Summary of population genomic (a, b) and phylogenomic (c, d) results based on the DArT and phased exon data sets, respectively. (a) PCoA results for the five penicillata complex taxa examined in this study (not including P. penicillata used as an outgroup). Species are colored individually and the symbols match species identification from (c). The PCoA (left) includes four known F1 hybrids (designated by +) between P. godmani and P. mareeba. Structure results for these same SNPs and individuals (right) are grouped into three clusters, with P. assimilis, P. mareeba, and P. sharmani (SAM) forming one group despite K = 4 clusters supported. (b) DAPC result for the SAM group based on 19 principal components (PCAs) and two discriminant functions (DAs), colored by species (left). Structure results for just the SAM group indicate two populations and are colored by cluster (right). The lines on the right represent the species boundaries. (c) Phylogenetic species tree of the penicillata complex based on the multispecies coalescent analysis in SVDquartets. Species colors and symbols match those in the population analyses (a, b) and the bootstrap support values are located on branches. (d) Network for individuals estimated using a distance-based approach in SplitsTree (Neighbor-Net). The regions of the network where multiple branches connect individuals and groups indicate reticulation.

Summary of population genomic (a, b) and phylogenomic (c, d) results based on the DArT and phased exon data sets, respectively. (a) PCoA results for the five penicillata complex taxa examined in this study (not including P. penicillata used as an outgroup). Species are colored individually and the symbols match species identification from (c). The PCoA (left) includes four known F1 hybrids (designated by +) between P. godmani and P. mareeba. Structure results for these same SNPs and individuals (right) are grouped into three clusters, with P. assimilis, P. mareeba, and P. sharmani (SAM) forming one group despite K = 4 clusters supported. (b) DAPC result for the SAM group based on 19 principal components (PCAs) and two discriminant functions (DAs), colored by species (left). Structure results for just the SAM group indicate two populations and are colored by cluster (right). The lines on the right represent the species boundaries. (c) Phylogenetic species tree of the penicillata complex based on the multispecies coalescent analysis in SVDquartets. Species colors and symbols match those in the population analyses (a, b) and the bootstrap support values are located on branches. (d) Network for individuals estimated using a distance-based approach in SplitsTree (Neighbor-Net). The regions of the network where multiple branches connect individuals and groups indicate reticulation. Bayesian clustering of individuals using Structure (Pritchard et al. 2000) also failed to resolve the SAM clade. With the full DArT data set, K = 4 clusters were identified. Despite this, only P. inornata and P. godmani are separated, and the closely related SAM clade forms a third cluster (fig. 2). When these three species were analyzed alone, two clusters (K = 2) were identified but with different outcomes across replicate runs. With K = 3, we again fail to separate the three species, and the groupings are identical to those with K = 2 (fig. 2). We estimated the phylogenetic relationships amongst species, applying SVDquartets and a species tree approach with the phased exon data set at both species- and individual levels. In the species-tree analysis, there is strong support for P. godmani and P. coenensis forming a clade, and P. inornata is most likely the sister lineage to the SAM clade, which itself is strongly supported (fig. 2). These patterns also emerge in examining divergence histories of individuals, although the relationships amongst the three species in the SAM clade are unresolved, and most relationships have low bootstrap support (<50%; supplementary fig. S1, Supplementary Material online). These results are consistent with, but improve on, previous estimates of the phylogeny based on exon sequences with just two individuals per species (Potter et al. 2017). The SplitsTree network (Bryant and Moulton 2002; Huson and Bryant 2006) correctly assigns individuals to species, but with a few exceptions in the SAM clade (fig. 2). The network indicates introgression among species, or incomplete lineage sorting, manifest as webs in the network. We analyzed divergence, measured as dXY, between the four most closely related species (P. inornata and the SAM clade). Consistent with our expectations, we find that divergence is significantly greater on the X than on the autosomes (fig. 3). Inconsistent with expectations, we find greater divergence in nonrearranged than in rearranged chromosomes (fig. 3). Restricting the analysis to just the three species in the SAM clade shows the same trends and significant differences, although with lower dXY values (fig. 3 and supplementary fig. S2 and table S3, Supplementary Material online).
Fig. 3.

(a) Average pairwise divergence (dXY) between P. inornata and P. sharmani, P. assimilis, P. mareeba (SAM) species for comparisons between rearranged (R: 5,6,9,10) and nonrearranged (NR: 1,2,7,8) chromosomes, and between X and autosomal (A) loci. Tests for significant differences between the average dXY were calculated using t-tests and significant values (P < 0.05) are marked with an asterisk (*). (b) Average dXY within the SAM group only, for R versus NR and X versus A.

(a) Average pairwise divergence (dXY) between P. inornata and P. sharmani, P. assimilis, P. mareeba (SAM) species for comparisons between rearranged (R: 5,6,9,10) and nonrearranged (NR: 1,2,7,8) chromosomes, and between X and autosomal (A) loci. Tests for significant differences between the average dXY were calculated using t-tests and significant values (P < 0.05) are marked with an asterisk (*). (b) Average dXY within the SAM group only, for R versus NR and X versus A.

Introgression

We used three approaches to explore introgression (table 1). The first is based on TreeMix, an algorithm that uses SNP frequency data to evaluate population splitting, drift, and introgression (Pickrell and Pritchard 2012). Our DArT data supported a model of one or two introgression events over alternatives with up to four events. Using P. godmani as the outgroup based on the phylogeny (fig. 2) and ad hoc statistics in OptM (Fitak 2021), a model with one introgression event fit our data best, and accounted for 99.76% of the variance in the data (supplementary fig. S3 and table S4, Supplementary Material online). This model indicates introgression from the lineage leading to P. godmani into P. mareeba (fig. 4). A model with a second introgression event from P. assimilis into P. inornata fit the data better (fig. 4), but the improvement was not statistically significant (supplementary fig. S3 and table S4, Supplementary Material online).
Table 1.

Summary of Introgression Results from the Three Analyses: TreeMix, ABBA-BABA, and MIGRATE.

Pairwise Comparison (P1/P2)Expected No. of Chromosomes in Meiotic Chain TreeMix ABBA-BABA
Migrate
D-Statistic Z Score P ValueP1 > P2P2 > P1
P. mareeba/P. godmani 5 P. godmani>P. mareeba
P. mareeba/P. assimilis 50.01762.1340.01640.00060.1346
P. mareeba/P. sharmani 30.00010.0180.4930.01620.0024
P. assimilis/P. sharmani 40.02343.4460.00030.11880.0004
P. assimilis/P. inornata 3 P. assimilis>P. inornata0.114615.2920.00.18810.0004

Note.—Pairwise comparisons are listed as Parental 1 (P1) and Parental 2 (P2) taxa with MIGRATE results indicating directionality of migration between these taxa. Expected chains of arms in meiosis for chromosomal heterozygotes between the pairs of species are listed, with simple heterozygotes representing trivalent chain formations and complex heterozygotes representing chains of four or five. TreeMix results highlight significant introgression events and the direction of gene flow. ABBA-BABA results outline the D-statistic value, the Z score, and the P value. MIGRATE results summarize the absolute rate of introgression (mutation-scaled migration rate M×effective population size Θ/ploidy of data [4Ne]).

Fig. 4.

(a) TreeMix results which support a single introgression event from the ancestor of P. godmani into P. mareeba. The migration rate is intermediate for this introgression event. (b) TreeMix results from a model of two introgression events, building off of the m = 1 model (a) and including an introgression event with high migration from P. assimilis into P. inornata.

(a) TreeMix results which support a single introgression event from the ancestor of P. godmani into P. mareeba. The migration rate is intermediate for this introgression event. (b) TreeMix results from a model of two introgression events, building off of the m = 1 model (a) and including an introgression event with high migration from P. assimilis into P. inornata. Summary of Introgression Results from the Three Analyses: TreeMix, ABBA-BABA, and MIGRATE. Note.—Pairwise comparisons are listed as Parental 1 (P1) and Parental 2 (P2) taxa with MIGRATE results indicating directionality of migration between these taxa. Expected chains of arms in meiosis for chromosomal heterozygotes between the pairs of species are listed, with simple heterozygotes representing trivalent chain formations and complex heterozygotes representing chains of four or five. TreeMix results highlight significant introgression events and the direction of gene flow. ABBA-BABA results outline the D-statistic value, the Z score, and the P value. MIGRATE results summarize the absolute rate of introgression (mutation-scaled migration rate M×effective population size Θ/ploidy of data [4Ne]). In a second approach to introgression, we analyzed the DArT data with the D-statistics (ABBA-BABA tests) using Dsuite (Malinsky et al. 2021) which allows for population sampling by using allele frequency data. We tested for introgression between all possible pairs of the three species in the SAM clade using P. inornata as the outgroup. We found significant evidence of introgression between P. assimilis and P. sharmani (P < 0.05, Z > 3). We also tested for introgression between parapatric P. assimilis and P. inornata, with P. godmani as the outgroup. Significant introgression was detected here as well (P < 0.05, Z > 3). We note that the D-statistic is five times greater in the second species pair than in the first (0.02 vs. 0.11; supplementary table S5, Supplementary Material online). Other species pairs also showed P < 0.05, suggestive of introgression. But in those cases, Z < 3, which we interpreted as insufficient to determine that introgression had occurred (see Zheng and Janke [2018]). Our third approach to introgression used MIGRATE (Beerli and Palczewski 2010, Beerli et al. 2019). This is a Bayesian coalescent-based algorithm that estimates the mutation-scaled effective population size and introgression rates assuming a star phylogeny (see Materials and Methods). Here, we used the unphased exon data set as this program takes full account of ambiguity-coded sequence data. Estimated population sizes were very low for all species (Θ = 4 Neµ=3×10−5), while introgression rates between some pairs of taxa were substantial (M = m/µ up to 2.5×104). The product of Θ/4 and M estimates the absolute rate of introgression; here, the number of heterospecific genomes entering the population per generation. This rate was greatest for introgression from P. assimilis into P. inornata (0.19 genomes per generation), as well as from P. assimilis into P. sharmani and P. mareeba (0.12 and 0.13 genomes per generation, respectively) (supplementary table S6, Supplementary Material online). We estimated introgression separately for each of the three genomic categories: X-linked, rearranged autosomes, and nonrearranged autosomes. For the X chromosome, we used all 45 loci, and a subsampled similar number of loci for the nonrearranged and rearranged autosomes (n = 50) to allow for similar power across the three categories. The per-generation introgression rates between all pairs of species were low and similar across the three categories (M = 0.004–0.12), again with highest values from P. assimilis into other species (M = 0.05–0.12, supplementary table S6, Supplementary Material online). These results are not significantly different from zero and do not represent strong evidence of ongoing introgression (one migrant per generation). In sum, there is consistent evidence of introgression between P. assimilis and P. inornata based on all three analyses. However, only the MIGRATE analysis estimates introgression in terms of genomes per generation and these were below the rates needed to homogenize neutral variation. Both MIGRATE and TreeMix results suggest introgression is unidirectional, from P. assimilis into P. inornata. The concordance of introgression results for this species pair agrees with our expectation as these two species differ by only a single Robertsonian fusion. Evidence of introgression between other species pairs, however, is mixed. The ABBA-BABA and MIGRATE analyses suggest there has been some introgression between P. assimilis and P. sharmani, which have different fusions with monobrachial homology; however, this result was not supported by TreeMix results. Last, we detected introgression from an ancestor of P. godmani into P. mareeba using TreeMix, but data were not available to test this with the other two methods.

Discussion

Resolving the evolutionary histories of divergence and gene flow amongst very closely related species is important to determine effects of chromosome rearrangements on isolation, but is also expected to be challenging. Studies have shown different impacts of Robertsonian fusions on levels of introgression, largely associated with the complexity of chromosomal rearrangement (e.g., simple vs. complex heterozygotes; see Garagna et al. [2014]; Yannic et al. [2019]). Our previous work based on microsatellites inferred introgression even amongst species with complex rearrangements (Potter et al. 2015). Here, however, the much larger genomic data sets provide new and more robust insights. For the first time, we were able to demonstrate genetic differentiation between the three species in the SAM clade (P. assimilis, P. mareeba, and P. sharmani) and determined the divergence histories between all six species in the notoriously difficult penicillata group (fig. 2). These results validate some previous conclusions based on just two individuals per species (Potter et al. 2017). In contrast to Potter et al. (2017), this current work finds moderate support for P. assimilis and P. mareeba as sister species. We do, however, also find evidence of reticulate evolutionary processes (incomplete lineage sorting or introgression) between species (fig. 2table 1). The new phylogeny suggests that the history of structural changes is more complex than was previously appreciated. For example, five to ten fusions have fixed independently in the parapatric P. mareeba and P. sharmani or there has been a reversal in P. assimilis (fig. 1). Given the low rates of introgression that we found, it is likely that incomplete lineage sorting rather than introgression is driving most of the uncertainty in phylogenetic relationships.

Introgression between Species

Across the penicillata group, strong evidence for introgression is restricted to just some species pairs (table 1). Where it has occurred, it primarily involves the geographically widespread and centrally located P. assimilis. The clearest evidence for introgression is between P. assimilis and P. inornata, which differ by a single fusion between chromosomes 6 and 10, and centric shifts on chromosomes 3 and 4. This supports our prediction that introgression should be less impeded between species whose hybrids form simple heterozygotes (trivalents). The finding is also consistent with introgression seen in other taxa where hybrids show proper segregation of chromosomes during meiosis (e.g., Mus musculus domesticus, Sorex araneus; see Garagna et al. [2014]; Searle et al. [2019], respectively). Introgression between P. assimilis and P. inornata has previously been reported from mtDNA and microsatellite loci (Potter et al. 2015), as well as at their contact zone based on allozymes and mtDNA (Briscoe et al. 1982; Bee and Close 1993). Our results also show some evidence of introgression between P. godmani and P. mareeba, based on TreeMix and PCoA results (fig. 2; table 1). The PCoA demonstrates the mixed ancestry of F1 hybrids midway between these two species, consistent with previous reports of mtDNA leakage and known karyotypic hybrids (Briscoe et al. 1982; Sharman et al. 1989; Bee and Close 1993; Close and Bell 1997; Potter et al. 2015). Evidence of introgression between these two species was also found from microsatellite markers (Potter et al. 2015). These species have complex chromosomal differences: P. godmani has fusions between chromosomes 6 and 10, and an inversion on chromosome 5; P. mareeba has fusions between chromosomes 6 and 9, and between chromosomes 5 and 10. In hybrid meiosis, these form a chain of five chromosomes. The results from these two species contradict our prediction that complex rearrangements should inhibit introgression. Similarly, complexity of rearrangements has not seemed to effect gene flow between races of the common shrew (S. araneus) (Horn et al. 2012). Empirical systems where more fine-scale mapping of loci to rearranged and nonrearranged loci are suggesting introgression, when present, is localized to nonrearranged regions of the genome (e.g., Giménez et al. 2013). Unfortunately, we do not know which of the SNPs used in our analyses lie within rearranged parts of the genome, so we cannot test the hypothesis that there will be lower introgression in those regions. We find limited evidence of introgression between members of the SAM clade. The lack of substantial introgression is not surprising given that they differ in multiple structural rearrangements. Meiosis in an F1 hybrid between P. assimilis and P. mareeba would involve a chain of five chromosomes, whereas a hybrid between P. assimilis and P. sharmani hybrid would generate a chain of four chromosomes. For this last pair, there is indication of introgression from ABBA-BABA tests on SNPs and MIGRATE analyses of exons, but that conclusion is not supported by the TreeMix analysis of SNPs (table 1). Previous research has suggested limited introgression at the contact zone between P. assimilis and P. sharmani based on mtDNA, but incomplete lineage sorting could not be ruled out (Bee and Close 1993). Introgression between these species has been found based on microsatellite loci (Potter et al. 2015). Conversely, P. mareeba and P. sharmani show no evidence of introgression even though they differ by only a single fusion. However, this fusion of chromosomes 6 and 9 also involves a subsequent centric shift resulting in an acrocentric chromosome (Eldridge et al. 1989; Metcalfe et al. 1997), which could influence crossover of segments of chromosome 9 and be more complex than a simple Robertsonian fusion. Higher levels of introgression among these taxa were previously inferred using coalescent analyses of microsatellite loci (Potter et al. 2015) and that result could be due in part to homoplasy in the microsatellite data (Balloux et al. 2000; Queney et al. 2001). Identical allele sizes could be present due to different mutational histories (convergent evolution), not shared ancestry, and have the potential to increase estimates of introgression and reduce estimates of divergence. A study in shrews (S. araneus) found extreme underrepresentation of population structure across a chromosomal hybrid zone and cautioned that studies based on autosomal microsatellite loci could overestimate the extent of gene flow (Balloux et al. 2000). We therefore caution the interpretation of gene flow based solely on microsatellites. Introgression will be better estimated using SNP or sequence data, particularly for empirical systems where gene exchange could be limited as a result of chromosomal rearrangements. Our findings here for the SAM group suggest if introgression is present, it could be very localized in the genome so that whole genome sequencing will be required to obtain robust and detailed results.

Variation in Introgression and Divergence

We find that the X chromosome shows greater divergence (measured as dXY) than do the autosomes in comparisons between P. assimilis, P. inornata, P. mareeba, and P. sharmani. Similar results have been reported from many other taxa (faster X effect), for which several hypotheses have been proposed (Presgraves 2018). In many species, the X chromosome is also enriched for genes involved in hybrid incompatibilities (Coyne and Orr 2004), which can inhibit introgression of the X across species boundaries. This outcome has been seen, for example, in Anopheles mosquitoes (Fontaine et al. 2015; Thawornwattana et al. 2018). Surprisingly, in our system, loci mapped to chromosomes involved in rearrangements (5, 6, 9, 10) show significantly less divergence than do loci mapped to nonrearranged chromosomes (1, 2, 7, 8), opposite to expectations if restriction of gene flow is greater for fused chromosomes (e.g., via localized suppression of recombination). The number of loci we could accurately assign to chromosomes was limited though. In addition, cytogenetic banding results may have underestimated rearrangements in putatively nonrearranged chromosomes (e.g., Eldridge and Close 1993). De novo genome assemblies which can resolve spatial organization (e.g., Hi-C approaches—Belton et al. 2012; Kong and Zhang 2019), have been shown to identify novel rearrangements (Fan et al. 2019, Damas et al. 2021). Further whole genome sequencing with de novo approaches will improve our understanding of rearrangements and whether there is localized introgression (Mérot et al. 2020). This will enable more robust testing of hypotheses surrounding the faster X effect and divergence histories in relation to rearrangements. The introgression detected between P. assimilis and P. inornata, between which male hybrids are sterile, could suggest a role for genic divergence rather than chromosome rearrangements driving speciation, or a combination of the two in play. In both house mice (Mus musculus domesticus) and common shrews (S. araneus) simple Robertsonian fusion heterozygotes (trivalents) generally progress normally through meiosis (Garagna et al. 2014; Borodin et al. 2019). We find conflicting evidence for the prediction that introgression should be reduced between species that differ in larger numbers of rearrangements. We see some suggestion of introgression between two species pairs that differ by multiple rearrangements (P. godmani and P. mareeba, P. assimilis and P. sharmani, respectively). It is possible that introgression predated the chromosomal rearrangements, or occurred during the early stages of divergence when only a single or more simple rearrangements separated these species. However, given the evidently low level of introgression among species within the more recently evolved SAM clade, it is likely that complex combinations of chromosome rearrangements do in fact play an important (if not sole) role in divergence and speciation in this genus. Unlike some of the better studied systems (e.g., Mus and Sorex), where data can be collected across hybrid zone transects, the disjunct populations of Petrogale make continuous sampling approaches unfeasible. We note that the highly fragmented population structure and small effective population size of rock-wallabies are mixed fortunes for studies of chromosomal speciation. On the one hand, those features generate strong drift and promote the fixation of potentially underdominant rearrangements. On the other hand, they result in extremely low levels of molecular polymorphism, which makes tests of hypotheses challenging. The data here are consistent with the recombination suppression model, as introgression may be ongoing outside of rearranged regions. The data are also compatible with the hybrid dysfunction model, as introgression could have occurred before the chromosome rearrangements became fixed. Last, it is possible that the rearrangements are not important to genetic isolation, and instead mediated by genic divergence. Even in the better studied systems it is still difficult to distinguish between these hypotheses (Giménez et al. 2013; Garagna et al. 2014).

Conclusions

Individual Robertsonian fusions are thought to cause little disruption of meiosis and can fix via meiotic drive or genetic drift (Sites and Moritz 1987; Chmátal et al. 2014; Garagna et al. 2014). When species differ in multiple fusions that share chromosome arms, however, more severe meiotic disruption is expected (Baker and Bickham 1986). Due to their diverse karyotypes, rock-wallabies have long been identified as an interesting model group to explore chromosomal speciation (King 1993; Brown and O’Neill 2010). Our results suggest that fusions have played a significant role in reducing genome-wide introgression within Petrogale. Consistent evidence of introgression is present between two species which share a single Robertsonian fusion. Introgression was supported for one pair of species which share complex chromosomal rearrangements. For the remaining species, there is limited evidence of introgression, and it appears to be at low levels indicative of infrequent events. Like the house mouse, speciation in Petrogale could reflect both hybrid dysfunction and recombination suppression associated with chromosome change, and these in concert with genic divergence, especially on the X chromosome. Comparisons and integration with evidence from model systems of Mus and Sorex, where detailed insight come from hybrid zone research and laboratory crosses (e.g., Garagna et al. 2014; Searle et al. 2019), will be important in drawing insight about the role of chromosomal evolution in other diverse systems, including Petrogale. Further, a much more nuanced test of the roles played by structural and genic variation in the evolution of reproductive isolation will become possible with the arrival of whole genome sequences of this and other taxa with rapidly evolving chromosomes.

Materials and Methods

Data Sets

Three different data sets were analyzed for this study with individuals overlapping where possible. The first data set consisted of SNPs generated from a complexity reduction approach (n = 77), the second data set consisted of full-length phased exon sequences based on an exon capture approach (n = 67) and the third data set consisted of unphased exon sequences from this same exon capture approach (supplementary table S1, Supplementary Material online; see below for methods).

Sampling and DNA Extraction

DNA was extracted from ear and liver biopsies stored at the Australian Museum using a salting-out method (Sunnucks and Hales 1996). A total of 88 samples, identified to species by karyotyping (Sharman et al. 1989; Bee and Close 1993), were analyzed across five species including Petrogale assimilis (n = 23), P. godmani (n = 19), P. inornata (n = 14), P. mareeba (n = 17), P. sharmani (n = 11). Sampling was geographically widespread across the known distribution of each species (fig. 1). Based on previous mtDNA sequencing (Potter et al. 2015), we subsampled individuals from across the distribution to encompass the mitochondrial diversity as well as geographic spread within each species. In addition, individuals from P. coenensis (n = 2) and P. penicillata (n = 2) were included as outgroups and to complete the exon capture data set.

Exon Capture Approach

We applied a custom in-solution exon capture approach using target sequences from a yellow-footed rock-wallaby (Petrogale xanthopus) transcriptome (Bragg et al. 2016; see Potter et al. [2017]). Briefly, we identified orthologs of exon sequences in the transcriptome based on reciprocal best BLAST (blastall v 2.2.25, program BlastN; Altschul et al. 1990) hits to the Tasmanian devil (Sarcophilus harrisii) genome (derived from GTF file, Ensembl release 74, Flicek et al. 2014). A total of 3,960 target exons were filtered as being >200 bp in length and based on BLAST hits to Sarcophilus harrisii and the tammar wallaby genome (N. eugenii) (O’Neill RJ, unpublished data) expected to be single copy. Probes were then synthesized for these targets (1.83 Mb) in a SeqCap EZ Developer Library (Roche NimbleGen; scripts to identify targets are available in Dryad Repository https://doi.org/10.3389/fgene.2017.00010). Genomic libraries for each individual were prepared following the protocol of Meyer and Kircher (2010), including modifications made by Bi et al. (2013). In addition, we included a double-sided size selection bead clean up prior to the blunt-end repair step to remove fragments <200 and >500 bp. Each individual sample was given a unique barcode (supplementary table S1, Supplementary Material online) and quantified using a LabChip DS Droplet Spectrophotometer (PerkinElmer). For the in-solution hybridization, we pooled samples in equimolar ratios (1.2 μg total). Samples were prepared across multiple experiments, limited by the number of individuals pooled together for a single hybridization reaction (n = 56). In addition to the genomic libraries, 5 μg of mouse Cot-1 DNA (Life Technologies Corporation) and 56 barcode specific blocking oligos (1,000 pmol) designed to block the unique barcodes and adapters used in the Meyer and Kircher (2010) protocol were combined. Following the manufacturer’s protocol (SeqCap EZ Developer Library), samples were hybridized for ∼72 h and the hybridization reaction then cleaned up. Two independent enrichment PCRs (17 cycles) were run on the cleaned “postcapture” hybridization reaction and checked against the precapture pooled libraries as a quality control check, to assess global enrichment efficiency using the DyNAmo Flash SYBR green qPCR kit (Thermo Fisher Scientific Inc.; see Bi et al. [2012]). The qPCR amplified target loci from the capture using primers specifically designed to hit targets of the hybridization probes to test for enrichment, as well as primers to amplify nontargets that are known to amplify but not in the capture to test de-enrichment. The enriched hybridization samples were run on a BioAnalyzer (2100; Agilent Technologies, Inc.) to check the quality and quantity of the sample and then sequenced on a single lane of an Illumina HiSeq 2500 (100 bp paired-end run) at the ACRF Biomolecular Resource Facility.

SNP Screening

Genomic DNA samples (20 μl of 50–100 ng/μl concentration) were sent to Diversity Arrays Technologies Pty Ltd (DArT). SNP genotyping was performed using the DArT complexity reduction approach and proprietary DArT analytical pipelines (Kilian et al. 2012). This approach uses restriction enzymes, double digest complexity reduction, fragment size selection, and sequencing. Technical replicates were included by DArT to assess allele calls and accuracy. We requested raw sequencing reads and mapped these to a P. penicillata genome to call SNPs (see below), rather than using genotypes produced by DArT pipelines.

Bioinformatics—Exon Capture Data

Illumina sequencing reads were cleaned following a workflow developed by Singhal (2013). Briefly, this workflow removes duplicate, contaminant (human: GRCh37, ensemble version 67; Escherichia coli: str. K-12 substr. MG1655, GenBank accession number U00096.2), and low complexity reads. Then, TRIMMOMATIC (v 0.22; Bolger et al. 2014) is used to remove adaptors and low-quality bases, and FLASH (v 1.2.2; Magoc and Salzberg 2011) is used to merge overlapping read pairs. All scripts used in this analysis are archived in Dryad Repository (https://doi.org/10.5061/dryad.mm856). Each locus was assembled de novo from the cleaned sequencing reads, with heterozygous sites called and phased using a workflow described in Bragg et al. (2016) (see Dryad Repository https://doi.org/10.5061/dryad.mm856 for code that was used to assemble exon sequences). Using BLAST (blastall v 2.2.25, program BlastX; Altschul et al. 1990), sequencing reads with homology to each target exon were identified. Then, using VELVET (K values of 31, 41, 51, 61, and 71; v 1.2.08; Zerbino and Birney 2008), sequencing reads homologous to the target were assembled. CAP3 (Huang and Madan 1999) was used to assemble contigs from different K values. These were then aligned to the target Sarcophilus protein using EXONERATE (v. 2.2.0; Slater and Birney 2005), and the contigs were trimmed to the target exon boundaries. Where multiple contigs were present, we took the one with the maximum bit score for the BlastX hit and checked it did not hit any other Sarcophilus proteins. Using BOWTIE 2 (v 2.2.2; Langmead and Salzberg 2012) and GATK (v. 3.3-0-g37228af; McKenna et al. 2010), heterozygous sites were identified by mapping clean sequencing reads to the best assembled contigs and then phased. We inserted unknown (N) bases at sites that did not exceed a minimum genotyping quality (GQ) score of 20. This pipeline has alternative fully automated containerized version available from the Docker hub: https://hub.docker.com/repository/docker/trust1/ubuntu. Finally, haplotype sequences obtained from the steps above were aligned and filtered using the EAPhy (v1.2; Blom 2015) pipeline. This uses MUSCLE (v 3.8.31; Edgar 2004) to align haplotype sequences, perform checks of coding frame to ensure amino acid coding (starting in frame 1), removes missing data from the ends of alignments and creates data files ready for analysis (fasta and phylip format) allowing for different amounts of missing data (we set 0–10%). Samples were visually inspected using Geneious Prime 2020.1.1 (https://www.geneious.com). We used snp-sites (Page et al. 2016) to extract SNP data from FASTA alignments of exon capture loci. We located our target sequences on the scaffolds of the N. eugenii genome assembly using BLAST (blastall v 2.2.25, program BlastN; Altschul et al. 1990). Based on knowledge of scaffold physical maps, and relationship of tammar wallaby chromosome to homologous Petrogale chromosomes (Eldridge and Close 1997; O’Neill et al. 1999), we were able to group loci into chromosomes. Average pairwise divergences between species across the X, rearranged autosomal (chromosomes 5, 6, 9, 10) and nonrearranged autosomal loci (chromosomes 1, 2, 7, 8) were estimated using PopGenome (Pfeifer et al. 2014). Averages were then estimated across all groups and divided by the total length of the alignment of the X, autosomal loci, as well as rearranged autosomal and nonrearranged autosomal loci to get comparisons for average net divergence.

Mapped Loci

We estimated average net divergence (dXY) between species of loci mapped to chromosomes on N. eugenii. Based on our knowledge of chromosome synteny (Deakin and O’Neill 2020), we were able to estimate the location of these loci on the penicillata group species’ chromosomes. Loci were separated into three groups: X chromosome, rearranged chromosomes (including chromosomes 5, 6, 9, and 10), and nonrearranged chromosomes (including chromosomes 1, 2, 7, and 8).

Bioinformatics—DArT Data

The DArT sequencing reads were mapped to a new draft Petrogale penicillata genome sequence (https://doi.org/10.5061/dryad.6m905qg0d) for variant calling. To generate the draft genome sequence, we extracted high molecular weight DNA using a Qiagen genomic tips and DNA buffer kit (Qiagen, Hilden, Germany). Then a 10× Genomics linked read sequencing library (10× Genomics, Pleasanton, CA) was prepared and sequenced using an Illumina HiSeq X Ten instrument (The Ramaciotti Centre for Gene Analysis, UNSW). A draft genome was assembled using the Supernova assembler (“pseudohap” format; v 1.2.0, Weisenfeld et al. 2017). We assessed the completeness of the draft genome using BUSCO (v 2.0.1, Simão et al. 2015, tetrapoda_odb9 data set), which measures the fraction of expected single-copy ortholog genes that were present in the assembly. BUSCO was configured with HMMer (v 3.1b2, Eddy 2011), AUGUSTUS (v 3.2.2, Keller et al. 2011), EMBOSS (v 6.5.7, Rice et al. 2000), and BLAST+ (v 2.2.31, Altschul et al. 1990). The draft genome assembly for Petrogale penicillata had a scaffold N50 of 5.5 Mb, L50 of 179, and contained 89.7% of expected single-copy ortholog genes (n = 3,950) rated as Complete with a low level of duplication (50 of the 3,543 Complete BUSCOs). A further 5.4% of BUSCO genes were fragmented, with only 4.3% missing. The raw DArT Illumina single-end short reads were mapped to the draft genome sequence using “bwa mem” (v. 0.7.17, Li and Durbin 2009). Variant calling followed by GATK best practice protocol with GATK (v. 4.1.4.1, Van der Auwera et al. 2014). Briefly, this protocol first ran single sample variant calling which generates a panel of polymorphic sites detected across all samples. Then, a second stage processing step where sequence base qualities in the alignment files were recalibrated with information taken from the first stage variant calling. Recalibrated alignment files proceeded with second pass calling via making combined g.vcf file across all samples including technical replicates. This protocol allows us to accurately detect low frequency allelic variants. Default parameters and thresholds suggested by GATK best practice recommendations were applied. GATK pipeline script and all used tools were containerized with Docker image and available from the/trust1/gatk repository on the official Docker website alone with additional documentation: https://hub.docker.com/r/trust1/gatk. The DArT SNPs produced by the GATK pipeline were filtered prior to analysis. We used vcftools to apply an initial set of SNP filters including minimum coverage (8), maximum coverage (150), and genotyping quality (20). After applying these proposed filters, we compared the genotypes that were estimated for each pair of technical replicate samples. We found that, on average, technical replicates had the same estimated genotype at 99.73% of called loci. We were satisfied that this level of genotyping precision was likely to produce robust downstream analyses. We therefore removed technical replicates from the data set (keeping the replicate sample with the most complete data) and applied the same set of filters. We also applied several additional filters. This included the removal of SNPs that had missing data for more than 30 samples.

Genetic Structure

We first evaluated genetic clustering of individuals using a PCoA in the package dartR (Gruber et al. 2018) in R (v. 3.6.3; R Core Team). This clustering approach uses Euclidean genetic distances which we applied to our SNP data generated from DArT. We applied an hierarchical approach, whereby we removed the most divergent species and reran the analysis to determine clustering of more closely related species. We also ran a DAPC using the package adegenet (Jombart 2008; Jombart et al. 2010) and poppr (Kamvar et al. 2014) packages in R on the most closely related SAM group. We used the species identity to inform this analysis. This multivariate approach identifies clusters of genetically related individuals and finds a model which maximizes the between-group variation whilst minimizes the within-group variation, thus can make use of the a priori species identity. In addition, we explored genetic structure of the DArT SNP data using a Bayesian clustering algorithm implemented in Structure (Pritchard et al. 2000), which uses Hardy–Weinberg equilibrium to estimate individual ancestry coefficients and determine distinct genetic groupings. Again, we used a hierarchical approach, where we pulled out distinct populations/species and reran the program with fewer individuals to detect more fine-scale structure. This approach is suggested to counteract the K = 2 conundrum (Janes et al. 2017). These analyses were performed using the admixture model, independent allele frequencies and lambda set to 1.0. Genetic clusters (K) were estimated across 10 replicates, using a burnin of 100,000 and sampling every 1 million iterations. We then ran CLUMPAK (Kopelman et al. 2015) to infer the best K value based on evaluation of the maximum posterior probability (L[K]; Pritchard et al. 2000) and the maximum delta log likelihood (ΔK; Evanno et al. 2005).

Phylogenetic Analyses

Given our population-scale sampling with a large number of loci, use of full Bayesian species-tree or network methods is infeasible. We therefore estimated phylogenetic relationships using SVDquartets in PAUP* (Swofford 2003; Chifman and Kubatko 2014), which analyses a decomposition matrix of site pattern frequencies to compute quartet scores and infer a species phylogeny. We analyzed the concatenated phased exons (h0 haplotype only) from the exon capture data set to estimate the phylogeny using two approaches. One assigning tips to species and running the multispecies coalescent and the other not assigning tips to species. Both analyses evaluated 100,000 random quartets and performed 1,000 bootstrap replicates.

Tests for Introgression

We implemented a phylogenetic network analysis using the Neighbor-Net approach in SplitsTree4 (Bryant and Moulton 2002; Huson and Bryant 2006) to examine the patterns of reticulation. The Neighbor-Net approach creates a distance matrix based on uncorrected P distances and agglomerates clusters of individuals and is a useful exploratory statistical approach to examine complex evolutionary processes of reticulation. Again, we used the concatenated nuclear alignment (h0 haplotypes), effectively ignoring coalescent variance in gene trees to visualize potential reticulation. We then used a population genomic approach to evaluate the phylogenetic relationships of species and explore admixture between species whilst accounting for incomplete lineage sorting. Using the DArT SNP data set, we ran TreeMix (Pickrell and Pritchard 2012), which uses allele frequency data to evaluate historical relationships, estimating patterns of population splitting, drift, and mixing. We calculated allele frequencies for each species and estimated a maximum likelihood tree using P. godmani as the outgroup and different inferences of admixture events were compared (zero to four events). We analyzed the data in windows of size k = 1, k = 100, and k = 500 to account for any nonindependence in SNPs. The residual plots of observed covariance, the percent of variance explained by the models (get_f() script in R from documentation accompanying TreeMix) and the package OptM in R (Fitak 2021), were examined to determine the best model fit to the data and infer the number of admixture events. OptM calculates the second-order rate of change in the log likelihood between the different models of introgression and infers delta M to estimate the most likely model. Based on the residual plots of covariance, positive residuals indicate the model underestimates the observed covariance between pairs of populations, whilst negative residuals imply overestimation of the model. Therefore, residuals close to zero infer the best fit model to the data. Once a model infers 99.8% of the variance in the data, it has been customary to stop adding migration events. Given the varied approaches to determining the best fit model to the data, we evaluated across them to estimate the number of migration events using TreeMix. In addition, to formally test for admixture, we used the three-population test (f3 statistic; threepop analysis) within TreeMix, where a negative statistic infers admixture. In addition to TreeMix, we calculated Patterson’s D (Green et al. 2010; Durand et al. 2011) also known as the ABBA-BABA statistic as well as the f4-ratio (f_G; Patterson et al. 2012) for all possible trios of species in Dsuite (Malinsky et al. 2021). Based on biallelic sites and the phylogeny (((P1, P2),P3),O), where the outgroup (O) defines the ancestral allele, and the pattern of the derived allele determines if there is introgression between P3 and P1 or P2. This site pattern approach assumes under a null hypothesis that there are equal patterns of shared derived alleles between P3 and P1 and between P3 and P2, and that deviation from this (greater than sampling error) implies introgression. We used this approach because it is computed for allele frequency estimates and thus, we could include multiple individuals from each species. We tested for introgression among parapatric species using the DArT data set. To test for introgression among SAM, we used P. inornata as the outgroup, and for evaluation of introgression between P. assimilis and P. inornata, we used P. godmani as the outgroup. Statistics were calculated for all possible trios of species using the Dtrio command and jackknife blocks of 20 and 100 (k = 20 or k = 100) SNPs. We also ran the analysis using user-defined input trees (-t option) to ensure all pairwise comparisons were conducted. To assess if D is significantly different from zero, the standard block-jackknife procedure was used to obtain an approximately normally distributed standard error (Green et al. 2010; Durand et al. 2011; Malinsky et al. 2021). The Z score, the value of the D-statistic divided by its standard error represents strong significance if Z score >3 (Zheng and Janke 2018). A combined result with P values (<0.05) and Z scores >3 was taken as significant evidence of introgression. We evaluated the Dmin, the lowest D-statistic reported for each trio, as a conservative approach. P1 and P2 are ordered so that nABBA≥nBABA and thus is not negative, so the program does make some assumptions of relatedness of species based on the shared allele patterns. If P1 is more closely related to P3 (more shared alleles) then it swaps to P2 and P2 becomes P1. Therefore, the results range from 0 = no introgression between P2 and P3, to 1 = no incomplete lineage sorting but introgression. Lastly, to exploit having extensive haplotype sequence data from exon-capture, we used a Bayesian coalescent-based approach to estimate effective population sizes (Θ) and past migration rates (M) in MIGRATE 3.6 (Beerli and Palczewski 2010). Using the exon capture data (P. godmani excluded due to small sample size), we estimated Θ for all species and M for an asymmetric migration matrix. Past migration was estimated for all pairs of species except for the geographically nonadjacent. These estimates are mutation-scaled estimates, such that Θ = 4 Neµ and M = m/µ, where µ is the mutation rate per generation and x is a multiplier that depends on the ploidy and inheritance of the data (e.g., x = 4 for nuclear data). The analysis was run using 1,617 nuclear loci, the Jukes–Cantor sequence model with base frequency 0.25, a uniform prior distribution for Θ (0,0.1,0.01; minimum, maximum, Delta respectively) and M (0, 50,000, 10,000). Random starting parameters from the prior distribution were used for estimation of Θ and M, and analyses were run starting from a random number seed, a constant mutation rate estimated from the data and two independent analyses were run, each with one long chain and four heated chains. A static heating scheme was used with temperatures set to 1, 1.5, 3, and 106 ordered from cold to hot, with sampling every 100 steps and run for 1 million steps after a burnin of 50,000.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  105 in total

Review 1.  Chromosomes, conflict, and epigenetics: chromosomal speciation revisited.

Authors:  Judith D Brown; Rachel J O'Neill
Journal:  Annu Rev Genomics Hum Genet       Date:  2010       Impact factor: 8.929

2.  Gene flow despite complex Robertsonian fusions among rock-wallaby (Petrogale) species.

Authors:  Sally Potter; Craig Moritz; Mark D B Eldridge
Journal:  Biol Lett       Date:  2015-10       Impact factor: 3.703

3.  Dense Geographic and Genomic Sampling Reveals Paraphyly and a Cryptic Lineage in a Classic Sibling Species Complex.

Authors:  Ethan Linck; Kevin Epperly; Paul Van Els; Garth M Spellman; Robert W Bryson; John E McCormack; Ricardo Canales-Del-Castillo; John Klicka
Journal:  Syst Biol       Date:  2019-11-01       Impact factor: 15.683

4.  Speciation by monobrachial centric fusions.

Authors:  R J Baker; J W Bickham
Journal:  Proc Natl Acad Sci U S A       Date:  1986-11       Impact factor: 11.205

5.  Testing for ancient admixture between closely related populations.

Authors:  Eric Y Durand; Nick Patterson; David Reich; Montgomery Slatkin
Journal:  Mol Biol Evol       Date:  2011-02-15       Impact factor: 16.240

6.  Numerous transposed sequences of mitochondrial cytochrome oxidase I-II in aphids of the genus Sitobion (Hemiptera: Aphididae).

Authors:  P Sunnucks; D F Hales
Journal:  Mol Biol Evol       Date:  1996-03       Impact factor: 16.240

7.  Stationary distributions of microsatellite loci between divergent population groups of the European rabbit (Oryctolagus cuniculus).

Authors:  G Queney; N Ferrand; S Weiss; F Mougel; M Monnerot
Journal:  Mol Biol Evol       Date:  2001-12       Impact factor: 16.240

8.  Quartet inference from SNP data under the coalescent model.

Authors:  Julia Chifman; Laura Kubatko
Journal:  Bioinformatics       Date:  2014-08-07       Impact factor: 6.937

9.  The Composite Regulatory Basis of the Large X-Effect in Mouse Speciation.

Authors:  Erica L Larson; Sara Keeble; Dan Vanderpool; Matthew D Dean; Jeffrey M Good
Journal:  Mol Biol Evol       Date:  2017-02-01       Impact factor: 16.240

10.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

View more
  2 in total

1.  The Genetic Differentiation of Pyrrhulina (Teleostei, Characiformes) Species is Likely Influenced by Both Geographical Distribution and Chromosomal Rearrangements.

Authors:  Pedro H N Ferreira; Fernando H S Souza; Renata L de Moraes; Manolo F Perez; Francisco de M C Sassi; Patrik F Viana; Eliana Feldberg; Tariq Ezaz; Thomas Liehr; Luiz A C Bertollo; Marcelo de B Cioffi
Journal:  Front Genet       Date:  2022-05-04       Impact factor: 4.772

Review 2.  Chromosome Changes in Soma and Germ Line: Heritability and Evolutionary Outcome.

Authors:  Irina Bakloushinskaya
Journal:  Genes (Basel)       Date:  2022-03-28       Impact factor: 4.141

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.