SignificanceWhether sympatric speciation (SS) is rare or common is still debated. Two populations of the spiny mouse, Acomys cahirinus, from Evolution Canyon I (EC I) in Israel have been depicted earlier as speciating sympatrically by molecular markers and transcriptome. Here, we investigated SS both genomically and methylomically, demonstrating that the opposite populations of spiny mice are sister taxa and split from the common ancestor around 20,000 years ago without an allopatric history. Mate choice, olfactory receptors, and speciation genes contributed to prezygotic/postzygotic reproductive isolation. The two populations showed different methylation patterns, facilitating adaptation to their local environment. They cope with abiotic and biotic stresses, due to high solar interslope radiation differences. We conclude that our new genomic and methylomic data substantiated SS.
SignificanceWhether sympatric speciation (SS) is rare or common is still debated. Two populations of the spiny mouse, Acomys cahirinus, from Evolution Canyon I (EC I) in Israel have been depicted earlier as speciating sympatrically by molecular markers and transcriptome. Here, we investigated SS both genomically and methylomically, demonstrating that the opposite populations of spiny mice are sister taxa and split from the common ancestor around 20,000 years ago without an allopatric history. Mate choice, olfactory receptors, and speciation genes contributed to prezygotic/postzygotic reproductive isolation. The two populations showed different methylation patterns, facilitating adaptation to their local environment. They cope with abiotic and biotic stresses, due to high solar interslope radiation differences. We conclude that our new genomic and methylomic data substantiated SS.
Entities:
Keywords:
adaptation; genome divergence; methylation; population genetics; sympatric speciation
Sympatric speciation (SS) is still debated, although there are numerous theoretical and empirical studies supporting it (1–5). There are four criteria supporting SS traditionally, including 1) living entirely or largely sympatric; 2) substantial reproductive isolation between the sister species, preferentially based on genetic difference; 3) the species should be sister groups; and 4) they are diverging from the same ancestor without an allopatric history (6). As the fourth criterion is not easy to prove, especially in a large open area, very few cases of SS were confirmed. Researchers suggested SS study in a small space or isolated islands, thereby excluding allopatric history scenarios (7, 8). Population divergence is mainly driven by divergent selection associated with contrasting environmental stresses (9, 10). Individuals specialized in different niches or resources might lead to SS (11).Divergent selection, imposed by the local environment, could lead to divergent adaptation and reproductive isolation (10). Divergent adaptation to the contrasting environments will promote genetic differentiation, especially the directly targeted genes on which the stresses acted. Each evolutionary process should have an underlying molecular basis, and deciphering the interaction among ecological stresses, innovative genetic changes by natural genetic engineering (NGE) (12), and evolutionary changes will advance our understanding of adaptation and SS (13). The genome revolution facilitated proving SS (14). Evolutionary processes like homogenization by gene flow and recombination, which break down the adaptation caused by selective sweep during SS, could shape genome architecture and leave a genetic signature in the genome (15), and this would allow us to study the speciation pattern. Screening signatures of divergence and selection across the genome can help us understand how the genome landscape was shaped during divergent adaptation and ecological speciation (16). Identifying speciation genes and adaptive genetic complexes that facilitate the adaptation to the challenging environment and increase fitness could help us reconstruct the mechanism of adaptation and local SS (17). Natural selection will reduce genome-wide rates of gene flow (18); by contrast, the antagonistic gene flow could homogenize the genome and impede population divergence. SS occurs when natural selection overrides gene flow (16), and it is characterized by divergence in some areas under strong divergent selection (19). Genomic regions with a low recombination rate will be more differentiated, especially after long-term differentiation accumulation (20). Sex pheromones are mate-recognition signals and will help animals find the right mate partner (21). Olfactory receptors (ORs) and mate choice reinforce prezygotic reproductive isolation (21). Other speciation genes, identified in many cases of speciation (22, 23), contribute to postzygotic reproductive isolation (22, 23).Ernst Mayr argued that geographical barriers were a must for speciation (24), and this hypothesis dominated for many years, although he changed his mind because of the accumulated evidence (25). Recently, speciation with gene flow is also proved to be possible by both theoretical models and empirical studies (1, 4, 9, 16, 26, 27). In these studies, physical barriers are absent, and natural selection is the driving force for divergent adaptation and speciation, based on NGE innovations (12). SS was proposed by Darwin about 160 y ago (28), and it occurred in a freely mating population, driven by natural selection, like host switching (29), divergent phenology (30), and resource division (4). SS is still debated and regarded as rare by most biologists despite increasing evidence. Our studies in divergent microclimatic and geological-edaphic microsites demonstrated the commonality of SS as a realistic model in microsites divergent ecologically with reproductive isolation pre- or postzygotic, or both (13).DNA methylation was supposed to play essential roles in adaptation and speciation (31–33), and it could extend the Bateson–Dobzhansky–Muller model to a broader spectrum to interpret speciation (33). Some methylation events could increase fitness, and thus, some loci were fixed and spread to the entire population, resulting in novel morphology, physiology, behavior, and ultimately reproductive isolation (34). Methylation, as plastic variation and facilitating adaptation to short-term environmental changes, may diverge before any genetic differentiation and could occur during the early age of speciation (34). Thus, population methylation comparison could help us understand how it initiates population divergence, facilitates reproductive isolation, and maintains species boundaries (34).EC I was proposed as a natural evolutionary laboratory in 1990 in lower Nahal Oren, Mount Carmel, Israel (32°24′N, 34°58′E). Together with Evolution Plateau (32°52′N, 35°33′E), both have been identified as hot spots of SS, based on microclimatic and geological-edaphic divergence in microsites, respectively (35). EC I consists of two abutting slopes: the hot, dry, and savanoid African slope (AS) and the temperate, humid, and forested European slope (ES) that are separated by ∼250 m at midslopes (). Both slopes share the upper Cenomanian limestone geology and terra rossa soils but differ in microclimate (35, 36). The spiny mouse, Acomys cahirinus, is a major model organism at EC I. It originated from tropical Africa and colonized rocky habitats across Israel, including AS of EC I, then dispersed to ES of EC I. The two spiny mouse populations from the AS and ES have been studied by mitochondrial DNA, amplified fragment length polymorphism , and RNA sequencing (RNA-seq) (26) for population differentiation and SS. These studies were based on a limited number of markers or SNPs from part of the genome and showed population divergence in the face of gene flow. Adaptive genes (2) and preliminary data of mate choice (37) showed incipient prezygotic isolation. Here, we extend the analysis to whole-genome and epigenetic methylation and strongly support its SS by a coalescent model based on whole-genome data. We showed an interslope divergence of ORs, implying prezygotic reproductive isolation. We also investigated the regulatory divergence between the two populations and deciphered how the populations adapt to the new environment and speciate sympatrically with reproductive isolation.
Results
Genome Assembly and Annotation.
We generated a de novo chromosomal-level genome of A. cahirinus based on clean long-reads (), and the assembly was polished by ∼50× Illumina pair-end (PE) reads. The genome size was 2.27 Gb with an N50 size of 55.02 Mb, consistent with the estimated 2.18 Gb in genome survey with the k = 19 (). The assembled genome consists of 120 contigs, with the longest contig size of 79.41 Mb (), suggesting high continuity of the sequences. According to the Benchmarking Universal Single-Copy Orthologs (BUSCOs) and Core Eukaryotic Genes Mapping Approach (CGEMA) results, 92.08% single-copy genes and 233 (93.95%) core genes were present in our assembly, respectively (), indicating completeness of the assembly. We anchored 108 (90%) contigs with a total length of 2.26 Gb (99.78%) onto 19 pseudochromosomes with 286,701,013 valid Hi-C PE reads from the same individual (). Finally, we obtained a high-quality A. cahirinus reference genome (Table 1 and Fig. 1).
Table 1.
Summary of the A. cahirinus genome assembly and annotation
Categories
Type
Length (bp)
No.
Assembly
Contigs
2,259,415,117
120
Contig N50
55,022,522
18
Contig N90
18,485,260
42
Longest
79,412,063
1
Noncoding RNAs
miRNA
54,596
586
snRNA
161,213
1,327
rRNA
135,583
281
tRNA
1,843,108
25,904
Transposable elements
DNA
69,916,114
1,274,726
LINE
301,085,698
999,508
SINE
213,457,863
1,682,988
LTR
227,021,489
1,225,763
RC
2,221,781
57,369
Satellite
2,505,804
186,818
Simple repeat
8,935,597
60,566
Low complexity
517,913
3,268
Unknown
13,955,040
275,508
Genes
Gene loci
—
21,879
Average gene length (bp)
—
35,363
Average CDS length (bp)
—
1,566
Average exon length (bp)
—
176
Average exons per gene
—
9
Average intron length
—
4,286
Fig. 1.
A. cahirinus genome, phylogenetic and evolutionary analyses. (A) Overview of the spiny mouse genome. From a to e, represented genomic positions (in Mb) of the 19 pseudochromosomes of spiny mouse, guanine-cytosine content, TE content, gene density, and interconnections of paralogs, respectively. (B) Phylogenetic tree of 14 species and the evolution of gene families. The x axis shows the estimated divergence time of each node (million years ago, Mya). The number of expaned and contracted gene families are indicated by red and green numbers, respectively. (C) The ration of three topologies (t1–t3) around focal internal branches (node 2) of ASTRAL species trees. The numbers 1, 3, 20, and 24 represent different branches in the species tree.
A. cahirinus genome, phylogenetic and evolutionary analyses. (A) Overview of the spiny mouse genome. From a to e, represented genomic positions (in Mb) of the 19 pseudochromosomes of spiny mouse, guanine-cytosine content, TE content, gene density, and interconnections of paralogs, respectively. (B) Phylogenetic tree of 14 species and the evolution of gene families. The x axis shows the estimated divergence time of each node (million years ago, Mya). The number of expaned and contracted gene families are indicated by red and green numbers, respectively. (C) The ration of three topologies (t1–t3) around focal internal branches (node 2) of ASTRAL species trees. The numbers 1, 3, 20, and 24 represent different branches in the species tree.Summary of the A. cahirinus genome assembly and annotationGenome annotation was performed by homology-based search and ab initio methods. We identified 21,879 genes with an average gene length of 35,363 bp and coding sequence (CDS) length of 1,565 bp (), and the reference genome contained 94.7% BUSCOs (). In addition, 281 rRNAs, 1,327 snRNAs, 586 miRNAs, and 25,904 tRNAs were identified. We annotated ∼878 Mb (39%) repetitive sequences in the reference genome. Long interspersed nuclear elements (LINEs) were the most abundant, accounting for 13.3% of the genome. DNA transposons, long terminal repeat (LTR), and short interspersed nuclear elements (SINEs) accounted for 3.1%, 10%, and 9.4% of the assembled genome, respectively (Table 1).
Gene Families and Phylogenetic Analysis.
To investigate the phylogeny of A. cahirinus, we constructed a phylogenetic tree based on 6,285 single-copy orthologous genes of 13 mammal species (Fig. 1). The phylogenetic tree of the four Acomys species (A. russatus, A. percivali, A. cahirinus, and A. kempi) based on the mitochondrial genome is consistent with that based on the nuclear genome (Fig. 1, , and Dataset S1). However, we observed a phylogenetic discordance that A. cahirinus showed a closer relationship with A. kempi rather than A. russatus (Fig. 1) in the concatenated tree, while A. cahirinus is closer to A. russatus in the coalescent tree (). The proportions of the three topologies on this node were ∼39%, ∼35%, and ∼26%, respectively (Fig. 1 and ). Therefore, quantifying introgression via branch lengths (38) was used to test whether incomplete lineage sorting (ILS) or introgression contributed to the discordance. Results revealed that ∼73% (∼46% with Bayesian information criteria, BIC filtering) of discordant loci were caused by ILS (). We conducted a collinearity analysis among the four Acomys species, and the results showed that A. percivali and A. kempi had high collinearity with large syntenic blocks (). Nevertheless, extensive chromosome breaks were found in A. russatus (). A. cahirinus diverged from its common ancestor ∼0.2 million years ago (Mya) (Fig. 1). Out of the 16,914 gene families in the A. cahirinus genome, we identified 238 expanded and 79 contracted gene families and 197 out of 238 rapidly expanded gene families (P < 0.01) (Fig. 1 and ). The Gene Ontology (GO) terms of expanded genes were related to defense response (GO:0051607, 0050778, 0045087, 0006955, 0006952, 0042613), energy metabolism (GO:0046034, 0006633, 0006096, 0016042), and protein modification (GO:0016579, 0006479, 0006468). Comparing spiny mouse to mouse, we obtained 200 positively selected genes in the spiny mice under positive selection, which were mainly enriched in brain development (GO:0007420) and embryonic morphogenesis (GO:0048562) (Dataset S2).
Genetic Divergence and Demographic History.
We first constructed the mitochondria tree based on 13 coding genes extracted from each sample from EC I and one individual out of EC. Results showed that samples from EC I had a common ancestor (). Whole-genome resequencing was performed on all individuals from the AS and ES populations located in EC I (), and the clean reads were mapped to the reference genome assembled above. Sequencing depths for each individual ranged from ∼19× to ∼42×, with an average mapping rate of 99.68% (). We removed 13 individuals with relationships that were too close from subsequent analyses. We identified 6,889,463 SNPs from all the individuals, and 496,955 and 518,762 SNPs were unique to AS and ES populations, respectively, and 5,873,746 SNPs were shared by both populations (). Phylogenetic neighbor-joining tree, principal component analysis (PCA), and structure analysis were used to explore population genetic differentiation, all results of which indicated that AS and ES had an obvious genetic divergence (Fig. 2 and ). Additionally, SVs and copy number variations (CNVs) were identified, and we found that SVs rather than CNVs could clearly distinguish AS from ES (Fig. 2 and ). Among five structure variation types (deletion, duplication, insertion, translocation, and inversion), insertion and deletion had the highest percentage, while duplication had the lowest proportion for each individual (). In CNVs, there are 1,567 CNV regions, including 892 losses and 675 gains in ES, and 1,439 in AS containing 766 losses and 673 gains (). Genetic parameters, including recombination rates, linkage disequilibrium (LD), FST, dXY, and nucleotide diversity, were calculated (). The recombination rate was significantly higher in AS. The nucleotide diversity was positively correlated with the recombination rate in both populations (AS correlation: 0.1885, P = 0; ES correlation: 0.1157, P = 0, Spearman correlation coefficient). The LD decayed to half maximum within 50 kb, and AS showed a sharp decay compared to ES (). Population demography was estimated using pairwise sequentially Markovian coalescent (PSMC). Similar fluctuations of effective population size (Ne) were observed in AS and ES (Fig. 2). There were two population declines during two glaciations: Saalian interpluvial (130∼300 kya) and the last glacial maximum (LGM) around 20,000 y ago (Fig. 2). The Ne of the two populations started to increase at ∼70,000 y ago and reached a peak around 25,000 y ago. We then simulated six speciation models: speciation with no gene flow, early gene flow, recent gene flow, split model, equal gene flow, and decreasing gene flow, a scenario meeting SS (). The best-fit model by a difference in Akaike information criterion values of two orders of magnitude suggested that AS and ES diverged with decreasing gene flow from ancient, larger gene flow (∼4 × 10−3) before 2.4 kya to the current, smaller gene flow (from AS to ES: ∼3 × 10−7, and from ES to AS: ∼2 × 10−7). The gene flow from AS to ES is larger than the opposite, and AS showed a larger population size (Fig. 2 and ).
Fig. 2.
Population structure and divergence. (A and B) neighbor-joining tree based on SVs and SNPs, respectively. (C and D) PCA based on SVs and SNPs, respectively. Colors in red and blue represent AS and ES, respectively. (E) Demographic history estimated using PSMC. Saalian and LGM are shown in a gray plot. Red and blue lines represent Ne of AS and ES, respectively. (F) The best-fit model was simulated by fastsimcoal2.7. Effective population sizes, divergence time, and an estimate of gene flow are shown. (G) The divergence of seven OR genes in pseudochromosome17 (33-Mb to 35-Mb segments). FST (in blue) is significantly elevated, π is declined, and Tajima's D is significantly differentiated between AS (in red) and ES (in green) populations.
Population structure and divergence. (A and B) neighbor-joining tree based on SVs and SNPs, respectively. (C and D) PCA based on SVs and SNPs, respectively. Colors in red and blue represent AS and ES, respectively. (E) Demographic history estimated using PSMC. Saalian and LGM are shown in a gray plot. Red and blue lines represent Ne of AS and ES, respectively. (F) The best-fit model was simulated by fastsimcoal2.7. Effective population sizes, divergence time, and an estimate of gene flow are shown. (G) The divergence of seven OR genes in pseudochromosome17 (33-Mb to 35-Mb segments). FST (in blue) is significantly elevated, π is declined, and Tajima's D is significantly differentiated between AS (in red) and ES (in green) populations.
Genomic Islands Analysis.
Genome-wide average divergence of the two populations, measured by FST, was 0.03, and 41% of windows showed zero FST (). As the divergence across the genome is heterogeneous (), elevated FST regions with Z-FST ≥ 4 were identified as genomic islands, also known as highly diverged regions (HDRs). We got 1,142 HDRs with a mean size of 15,946 bp scattered across the genome (). The total length of these regions was ∼18 Mb (∼1% of the genome). Furthermore, we observed a high FST in HDRs compared to the simulated data with neutral coalescent under the best-fit model identified above (P < 0.01; ). We compared the differences between island and background regions and found most HDRs were characterized by reduced nucleotide diversity and recombination rate, elevated LD, and excess of low-frequency variants (reduced levels of Tajima’s D) (). We further tested whether gene flow from the population outside the canyon could produce HDRs and contribute to speciation. We compared fd in HDRs with that in background regions, which revealed that there was no significantly elevated fd in HDRs (P = 0.09; ), rejecting the hypothesis of speciation caused by gene flow from a population outside the canyon.The speciation-with-gene-flow model predicted that loci involved in speciation should be characterized by both high FST and high dXY (39, 40). The HDRs with high FST, dXY and normal π simultaneously were suggested to directly contribute to speciation (41) . By contrast, continuing adaptive loci would be expected to have elevated FST but not dXY (41). After combining the adjacent HDRs, we identified 440 windows, including 110 genes in total with elevated dXY (P < 0.01, Mann–Whitney U test) and normal π both in AS and ES compared to background regions (). Furthermore, we calculated π-diff for each window, and finally, we got 385 and 255 HDRs in AS and ES (). In adaptative HDRs, we observed a reduced nucleotide diversity (P < 0.01, Mann–Whitney U test) compared to the rest of the regions (). Genes located in HDRs were recognized as HDR genes (Dataset S3). GO enrichment on genes under adaptation in AS was related to response to defense and inflammation (GO: 0031349, 0050729, 0039595, 0042129, 1902107), temperature homeostasis (GO: 0001659), and nervous system development (GO: 0051962). In ES, an OR (OR6K3) was selected, and biological processes were related to cellular energy production and cell cycle control. Furthermore, we scanned the ORs across the whole genome and identified 147 OR genes. FST in OR regions was higher than background regions (P < 0.01, Mann–Whitney U test). We observed a cluster of ORs in LG17 with elevated FST and negative Tajima’s D in AS and low nucleotide diversity in AS and ES (Fig. 2), indicating that the two populations differed in OR, which may result in gametic isolation and speciation.
Whole-Genome Methylation Analysis.
Epigenetic differences, especially DNA methylation patterns, improved the ability of adaptation. After mapping the whole-genome bisulfite sequencing data to the spiny mouse reference genome, ∼5.7 billion Cs were analyzed (). Similar to other mammalian genomes, methylated Cs mainly occurred in the cytosine-guanine (CG) format (72.6–77.3%), and only 0.6–0.9% were in the cytosine-H-guanine (CHG, H represented A, T, or C) context and 0.6–1% in the cytosine-H-H (CHH) context. The total number of differentially methylated regions (DMRs) was 28,636, with a total length of ∼5.9 Mb, and the mean methylation level in DMRs in AS was higher than in ES (0.70 versus 0.64, P < 0.01; ). Considering all DMRs, 36.4% are located in the gene body (). Based on the DMR regions, we estimated the mean percentage of Cs in each chromosome of each sample. The two populations showed two different methylation clusters in PCA (Fig. 3). The methylation levels in transcription start sites (TSSs), gene bodies, 10 kb upstream, and 10 kb downstream were compared, and the average methylation level is significantly higher in AS than in ES (Fig. 3). We observed low methylation levels in the TSS regions in AS and ES populations based on all genes (P < 0.01). We estimated the methylation levels in these four regions of the selected genes compared to the rest of the genes in AS and ES (Fig. 3). We found a similar pattern in the two populations: an obvious fluctuation and higher methylation levels (P < 0.01) were observed in selective sweeps. In the TSS region, nonselected genes showed an increase in comparison to selected genes. According to previous studies, methylation kept a negative influence on transposable elements (TEs). In this study, we divided TE into five classes: DNA transposons, LTR, SINE, LINE, and other classes. We compared the methylation levels between TE and non-TE regions and got the same results in AS and ES (Fig. 3). Methylation is higher in TE regions, except in LINE (P < 0.01, Fig. 3). Similar results were observed in each sample from two populations (). Finally, DMRs located in the gene body and its 3-kb flanking regions were identified as DMR genes. We got 1,790 DMR genes based on the DMR regions with different levels of more than 0.2, and GO enrichment showed that genes were enriched in circadian wake/sleep rhythm (GO: 0042752, 0042745, 0007623), DNA damage (GO: 0006307, 0045739, 2001022, 0006282), lipid storage and usage (GO: 0019915, 0055088, 0046461, 0006638), renal system development (GO: 007200, 0001822, 0072703), and temperature homeostasis (GO: 1990845, 0106106, 0001659, 0120161) (Fig. 3 and Dataset S4).
Fig. 3.
DNA methylation divergence across the whole genome between two populations. (A) PCA on CpGs between the AS and ES populations. (B) Methylation at TSS, gene bodies, and adjacent upstream and downstream regions in AS and ES. Red line and blue line represent AS and ES, respectively. (C) Differences in average methylation levels between TE regions (DNA, LINE, SINE, LTR, other classes) and background regions. Asterisk represents significantly elevated methylation levels compared to background regions (P < 0.01). (D) Differences in methylation between selected genes and nonselected regions. (E) DMR genes are associated with four functional enrichments, including DNA damage repair, circadian rhythm, kidney system development and liquid balancing, and lipid storage and usage.
DNA methylation divergence across the whole genome between two populations. (A) PCA on CpGs between the AS and ES populations. (B) Methylation at TSS, gene bodies, and adjacent upstream and downstream regions in AS and ES. Red line and blue line represent AS and ES, respectively. (C) Differences in average methylation levels between TE regions (DNA, LINE, SINE, LTR, other classes) and background regions. Asterisk represents significantly elevated methylation levels compared to background regions (P < 0.01). (D) Differences in methylation between selected genes and nonselected regions. (E) DMR genes are associated with four functional enrichments, including DNA damage repair, circadian rhythm, kidney system development and liquid balancing, and lipid storage and usage.
Daily and Circadian Sleep and Wake Activity Patterns of Two Populations.
To test whether differences existed in daily and circadian activity patterns of the two populations, we tested the circadian cycle (τ) and the onset of activity time differences under 12L:12D laboratory conditions between the AS and ES populations. Two populations did show different activity patterns (Fig. 4 and ). In constant darkness, samples from the AS had significantly longer τ values (average 23.85 ± 0.03 h, n = 8) compared to samples from ES (average 23.69 ± 0.11 h, n = 4, P = 0.05; Fig. 4 and ). In addition, the average center of gravity of samples in AS was significantly earlier than in ES (mean onset = 11.5 ± 0.1 h after lights onset in AS, 12.9 ± 0.1 h after lights onset in ES, P < 0.01; Fig. 4 and ).
Fig. 4.
Circadian wake/sleep differences between the AS and ES populations. (A and B) Circadian activities of AS and ES populations, respectively. (C) Center of gravity (hours after lights on) in AS and ES (one-tail t test, P < 0.01). (D) The circadian periods of activity rhythms of AS and ES individuals under constant darkness (one tail t test, P = 0.05).
Circadian wake/sleep differences between the AS and ES populations. (A and B) Circadian activities of AS and ES populations, respectively. (C) Center of gravity (hours after lights on) in AS and ES (one-tail t test, P < 0.01). (D) The circadian periods of activity rhythms of AS and ES individuals under constant darkness (one tail t test, P = 0.05).
Discussion
Phylogenetic Inference in Acomys.
Gene flow, ILS, or both could lead to phylogenetic discordance (42). The discordance between A. russatus and A. percivali we observed was due to ILS (Fig. 1 ). ILS could happen easily if the divergence happened in a short time (42, 43). The divergence of A. russatus and A. percivali occurred in a comparatively short time (Fig. 1), which explains the incongruent phylogeny (Fig. 1 ).
SS Inference in Acomys.
The two spiny mouse populations from the EC I were described elsewhere, and they followed the criteria of incipient SS (26, 37). The robust selection of contrasting ecologies—including different temperatures, humidity, and solar radiation on the two abutting slopes—overrides gene flow, leading to divergent adaptation and speciation (16, 36). As the two slopes are so close, from 0 to 250 m, it supplies us with an optimal model for SS.
Population Structure and Genetic Diversity.
A de novo genome was assembled with long-read sequencing () and generated the best-qualified chromosomal-level reference to date, which facilitated our genome research of this species. The assembled genome size is 2.29 Gb, which is a little larger than that estimated in the genome survey (), but it is reasonable. The resequencing depth for each individual ranges from 19× to 42× of the genome, which is deep enough to identify reliable SNPs and SVs. In the neighbor-joining tree (Fig. 2 ) and PCA (Fig. 2 ) based on SNPs and SVs, respectively, the two populations from EC I were separated into two clear-cut clusters, and the individuals from the same microclimatic environment were grouped together (Fig. 2 ). These analyses demonstrate that the divergent microclimate shaped the genetic divergence in both SNPs and SVs. This is congruent with our previous study using RNA-seq (26) and the study on the blind mole rat (1, 2), where the contrasting stresses drove the adaptive genome divergence. The PCA shows larger intrapopulation genetic variation, possibly because of the higher stress on the cool and humid (35, 36) ES slope, where the habitat is unfamiliar to dry-tropically originated Acomys.In the ancestral estimation analysis, individuals from the AS population show the same genetic background. However, the ES population shows intrapopulation variations (), possibly because of the different time migration from AS and the higher positive selection in ES (26). In the RNA-seq study, we found the same situation: the individuals from AS displayed the same ancestry, while the individuals from ES harbored different ancestral backgrounds (26). There are several individuals from ES that showed the same ancestral background as in AS, suggesting ancestry and gene flow from AS ().The mean LD was lower in AS than in the ES population, possibly because of the larger effective AS population size (Fig. 2), and when the animals dispersed to the ES from the AS, gene flow or admixture could create LD. In addition, humid and cool stresses (26) in ES might select for the higher linkage disequilibrium in ES (). Many evolutionary processes could influence the recombination rate, like environmental, molecular, and demographic factors (44). The significantly higher recombination rate in AS population also contributes to lower LD and higher genetic diversity (45). The recombination rate of the AS population is significantly higher than that of the ES population, which could probably generate more genetic diversity to help the population cope with the drought and the hot and high solar radiation stresses (35) on the AS (46).
Genomic Divergence and Olfactory Differentiation.
Although continuous but decreasing gene flow between the AS and ES populations was inferred, the genes displayed substantial heterogeneous genomic divergence (). High-FST genomic islands with elevated dXY and nonreduced genetic diversity () were reported to be involved in speciation; by contrast, high-FST and dXY regions with reduced genomic diversity were supposed to be adaptive loci. The different food resources shaped the different metabolisms. The richer food supply on the ES meets the cellular development and the function of the energetics category, which could resist lower temperature in ES, congruent with that found in the RNA-seq study (26). Temperature homeostasis may contribute to resistance to the excessive temperature difference between day and night.Mate choice could lead to fitness benefits, like increasing the number of offspring or being better at surviving and reproducing (47). Olfaction plays an essential role in mate choice, which will lead to reproductive isolation (48). ORs are known to be involved in chemical communication with the external environment, and their differentiation could contribute to mate choice and reproductive isolation (1). Regions of 147 ORs significantly diverged compared to the non-OR region (Fig. 2), suggesting prezygotic isolation. Remarkably, two genes, ERR2 and SPAG8 (49, 50), related to speciation were identified at highly diverged regions.Allochrony on a day, season, or year level could contribute to reproductive isolation and SS (51). In the present study, due to the chronological difference in the sunshine on the two slopes (AS earlier than ES), animals from the AS slope start activities 1 h earlier than those on the ES slope (Fig. 4). Correspondingly, genes related to the circadian sleep/wake cycle showed differential methylation (Fig. 4). Although the 1-h chronological difference may not be enough for SS in mammals, it could be more common than we thought (51).
Demographic History.
As a dry-tropically originated species, A. cahirinus colonized the hot, xeric savannoid AS, ECI more than 20,000 y ago. They dispersed to the cool, humid ES subsequently during the Natufian age, where the zoologist professor Georg Haas identified paleontologically a fossil Acomys in the Natufian of Um Usba cave, at the top of ES. He named it Acomys carmeliensis (52). This is the living species we identified here genomically and methylomically and, earlier, transcriptomically (27) and mitochondrially (37). The Ne of the two populations fluctuated simultaneously (Fig. 2) presumably because they are sympatric and under the same macroclimate. Two population declines (Fig. 2) were observed during the Saalian and LGM periods. Either the interpluvial or the glacial periods could lead to a food shortage for Acomys, and low temperature is a strong stress for tropical-originated species.Speciation patterns could be inferred from gene flow. In allopatric speciation, incipient species were isolated by physical barriers. Thus, no gene flow could be detected (). In secondary contact, very limited or no gene flow could be detected at the beginning stage, and later, gene flow increased (). In SS, sister species should diverge with decreasing gene flow without allopatric history (), and if the gene flow was blocked at the second stage (), it was also allopatric speciation. If populations were not diverging, stable gene flow is expected between lineages () (14). The best-fit model strongly showed () us that AS and ES populations started to split from the same ancestor 20,000 y ago. Decreasing gene flow (Fig. 2) occurred between the two populations (), which satisfies the SS scenario. The Ne of AS is larger than that of ES because the hot, dry, and savannoid environment is very similar to that in its origin, which is consistent with our field observation. The AS and ES populations contracted to about 42 k and 1 k, respectively, recently because of the last interpluvial climate.
Methylation Differentiation between the AS and ES Populations.
The AS and ES populations showed significantly different methylation patterns (Fig. 3), and they did show more complete separation than that based on SNPs (Fig. 2 ), which is consistent with the previous report that methylation divergence precedes genetic differentiation and may initiate divergence between populations (53). Generally, the average methylation is higher in AS population in TSSs, 10 kb up- and downstream of genes, and in gene bodies (Fig. 3), suggesting generally high stresses (54). Theory predicted that incipient species would show greater methylome differentiation than their corresponding genome divergence (34). In the present study, we did find more complete differentiation in methylome than in the genome (Fig. 2). Methylated cytosines could transit to thymine by deaminating, leading to stable genetic differences in short evolutionary periods (55). TEs were supposed to have substantial deleterious effects on the host genome, and their expression should be under control (56). In the present study, hypermethylation was also observed in regions with more TEs (Fig. 3), which could maintain a stable genome and facilitate adaptation and speciation. Significantly higher CpG methylation was observed within selected genes (Fig. 3), including gene bodies and 10 kb up- and downstream of gene bodies than the counterparts in nonselected genes, which is probably due to the selected genes being of great importance to adaptation and should be more stable, consistent with the previous study in the great tit (32).Prenatal famine exposure has been proven to leave a DNA methylation signature in the genome and may affect biological growth and metabolism (57). We found DMR genes enriched in lipid storage (Fig. 3 and Dataset S4), possibly because of the food shortage during xeric summer (36). No water resources could be used in the arid AS environment during the no-rain summer, and only very limited water could be obtained from the snails (58). We found several GO items in DMR genes related to kidney development, and the capacity of the kidney to concentrate urea in the arid AS environment was expected to be much stronger in order to protect the animals from losing too much water (59). The solar radiation in AS is 200–800% stronger than that in ES (36), and UV can create substantial damage to DNA and proteins and even lead to tumors (60). Damaged protein degradation and removal will improve the quality of proteome and maintenance of proteostasis, helping the cell cope with oxidative stresses (61), and could facilitate their adaptation to the stronger solar radiation in AS environment (26). DNA methylation was a mechanism to drive circadian clock plasticity (62). Interestingly, five GO items related to circadian sleep/wake cycle were obtained in DMRs (Dataset S4). This explains why the AS individuals are active 1 h earlier than in ES (Fig. 4).In conclusion, the divergences of population genomes, ORs and speciation genes, and methylomes, together with the behavior allochrony, substantiated SS of Acomys in EC I, Israel. Remarkably, EC I is a hot spot of SS across life from bacteria to mammals (13).
Materials and Methods
All the experiments on the animals were approved by the Ethics Committee of Lanzhou University and University of Haifa. Genome sequencing was performed on HiSeq 2500 with PE 125 bp. Genome assembly was performed by Nextdenovo. Divergence time among species was calculated by PAML. SNP was called for each individual using GATK, and genetic diversity was calculated by VCFtools. Linkage disequilibrium was constructed by PopLDdecay. Recombination rate was estimated by FastEPRR. Gene flow between the two populations, effective population size, and divergence time were calculated by fastsimcoal. SV was called by Delly, Lumpy, and Manta. CNV was called by Freec. Methylation data were extracted by Bismark, and DMRs were calculated by R package of DSS. GO enrichment was performed by Metascape. Detailed information is available in
Authors: Alexander S T Papadopulos; Javier Igea; Thomas P Smith; Ian Hutton; William J Baker; Roger K Butlin; Vincent Savolainen Journal: Evolution Date: 2019-08-09 Impact factor: 3.694
Authors: Elmar W Tobi; Jelle J Goeman; Ramin Monajemi; Hongcang Gu; Hein Putter; Yanju Zhang; Roderick C Slieker; Arthur P Stok; Peter E Thijssen; Fabian Müller; Erik W van Zwet; Christoph Bock; Alexander Meissner; L H Lumey; P Eline Slagboom; Bastiaan T Heijmans Journal: Nat Commun Date: 2014-11-26 Impact factor: 14.919
Authors: Lotte C Houtepen; Christiaan H Vinkers; Tania Carrillo-Roa; Marieke Hiemstra; Pol A van Lier; Wim Meeus; Susan Branje; Christine M Heim; Charles B Nemeroff; Jonathan Mill; Leonard C Schalkwyk; Menno P Creyghton; René S Kahn; Marian Joëls; Elisabeth B Binder; Marco P M Boks Journal: Nat Commun Date: 2016-03-21 Impact factor: 14.919
Authors: Daryl M Okamura; Elizabeth D Nguyen; Sarah J Collins; Kevin Yoon; Joshua B Gere; Mary C M Weiser-Evans; David R Beier; Mark W Majesky Journal: J Muscle Res Cell Motil Date: 2022-09-21 Impact factor: 3.352