Literature DB >> 29760079

Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri.

Guo-Dong Wang^1,2, Bao-Lin Zhang^1,3, Wei-Wei Zhou^1,4, Yong-Xin Li^1,3, Jie-Qiong Jin¹, Yong Shao¹, He-Chuan Yang⁵, Yan-Hu Liu⁶, Fang Yan¹, Hong-Man Chen¹, Li Jin⁷, Feng Gao⁸, Yaoguang Zhang⁷, Haipeng Li^2,8, Bingyu Mao^1,2, Robert W Murphy^1,9, David B Wake¹⁰, Ya-Ping Zhang^11,2, Jing Che^11,2,4.

Abstract

Tibetan frogs, Nanorana parkeri, are differentiated genetically but not morphologically along geographical and elevational gradients in a challenging environment, presenting a unique opportunity to investigate processes leading to speciation. Analyses of whole genomes of 63 frogs reveal population structuring and historical demography, characterized by highly restricted gene flow in a narrow geographic zone lying between matrilines West (W) and East (E). A population found only along a single tributary of the Yalu Zangbu River has the mitogenome only of E, whereas nuclear genes of W comprise 89-95% of the nuclear genome. Selection accounts for 579 broadly scattered, highly divergent regions (HDRs) of the genome, which involve 365 genes. These genes fall into 51 gene ontology (GO) functional classes, 14 of which are likely to be important in driving reproductive isolation. GO enrichment analyses of E reveal many overrepresented functional categories associated with adaptation to high elevations, including blood circulation, response to hypoxia, and UV radiation. Four genes, including DNAJC8 in the brain, TNNC1 and ADORA1 in the heart, and LAMB3 in the lung, differ in levels of expression between low- and high-elevation populations. High-altitude adaptation plays an important role in maintaining and driving continuing divergence and reproductive isolation. Use of total genomes enabled recognition of selection and adaptation in and between populations, as well as documentation of evolution along a stepped cline toward speciation.

Entities: Chemical Disease Gene Species

Keywords: gene flow; hybridization; natural selection; population genomics; speciation

Mesh：

Year: 2018 PMID： 29760079 PMCID： PMC5984489 DOI： 10.1073/pnas.1716257115

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

Speciation, the fundamental phenomenon underlying biodiversity, continues to be a central focus of research in evolutionary biology. Disentangling pattern and process and gaining an understanding of the underlying genetic mechanisms as speciation proceeds are central issues in species biology (1, 2). Most vertebrate species arise by vicariant isolation. This form of speciation takes time and the right combination of stochastic genetic change and natural selection (3, 4). It can occur swiftly under certain circumstances, for example, when populations experience rapid demographic changes (e.g., bottlenecks, expansions) (5, 6) or when they are exposed to ecological shifts (7–9). Speciation becomes difficult to understand when it involves many nonexclusive mechanisms (10). Fortunately, approaches using whole genomes or transcriptomes are opening new research pathways (11–13). Speciation accompanied by gene flow (i.e., without complete geographical isolation) is thought to be common in animal evolution (14, 15). The role of gene flow in speciation remains controversial because it is often assumed to be an impediment to speciation (16). However, gene flow can also facilitate evolution and speciation by transferring adaptive genes or generating novel genes (17). A genomic approach has the potential to identify genes that are important for adaptation and speciation, which often evolve more rapidly than other genes (18). Growing numbers of studies have used genome-scale comparisons in phylogeography and speciation [e.g., birds (19–22), fishes (23, 24), mammals (25, 26), reptiles (27, 28)]. How organisms adapt and diversify in high-altitude environments has attracted much attention in recent years (29–31). Because of its heterogeneous topography, inhospitable environment, and complex paleoclimate history, the Qinghai-Tibetan Plateau (QTP) is a natural laboratory for studying adaptation and speciation (32–35). Differences in elevation between valleys and mountaintops often exceed 2,000 m, presenting strong ecological gradients in abiotic variables, such as oxygen partial pressure (36), UV radiation (37), precipitation (38), and ambient temperatures (39). The periodicity of climatic oscillations affects diverse phenomena, such as dispersal, fluctuations in population size, and speciation (32, 40). An endemic Tibetan frog, Nanorana parkeri, occurs in lentic environments of the southeastern QTP along the Yalu Zangbu River (YZR) drainage. It faces challenges that few other amphibians experience. Not unusual for frogs, it occurs at an elevation of 2,800 m, yet it is the only amphibian to also exist at 5,000 m (41), where oxygen is scarce and UV radiation is dangerously high. Western (W) and eastern (E) mitochondrial DNA matrilines of N. parkeri diverged in the middle Pleistocene and formed distinct entities with overlapping elevational ranges (42). Three nuclear DNA loci correspond to this pattern with limited admixture of E and W alleles in two localities near their geographic boundaries (43). However, neither the extent of gene flow nor the degree of isolation in the zone of admixture is known. No morphological differences between W and E have been noted (44). Clines in elevation, genetic-morphological discordance, and mtDNA introgression in the zone of admixture offer an unprecedented opportunity to investigate the role selection plays in driving rapid genetic change, environmental adaptation, and the evolution of species differences. The annotated genome of N. parkeri (45) facilitates investigations into the evolutionary drivers of population divergence, and possibly speciation, by enabling the identification of selected genes and their functions. Herein, we report on whole-genome sequences of 63 N. parkeri individuals from across the range of the species and decipher the genetic mechanisms that may be responsible for driving population divergence. We reconstruct the evolutionary history of N. parkeri from a genomic perspective, investigate the impacts of isolation and gene flow on the process of speciation, and explore genomic consequences. We also evaluate ecological factors leading to speciation and identify candidate genes underlying the observed differentiation.

Results

Sampling and Sequencing.

Collection localities ranged from 2,900 to 4,900 m in elevation and covered the entire documented distribution of N. parkeri (Fig. 1 and ). The 45 E and W individuals were sequenced for 33.77 Gb (16.89-fold coverage) on average, and 18 individuals with mixed E and W heritage were sequenced for 11.92 Gb (5.96-fold coverage). Most reads were aligned (90.88% average mappable rate) to the reference genome of N. parkeri belonging to E (45). After genotyping and stringent quality-filtering, we retained 8.59 million single-nucleotide polymorphisms (SNPs) of E and W individuals and 6.44 million SNPs of mixed heritage.

Fig. 1.

Sampling sites and population structure of the Tibetan frog (N. parkeri). (A) Sampling locations (ArcGIS 10.2; esri). Site numbers refer to . Colors denote the five main groups recovered from population structure and phylogenetic analyses. Gray indicates hybrid populations (25–27). (B) PCA of all high-coverage samples. (C) Principle component plot of E samples only. (D) ML tree based on concatenated sequences. W, green; E1, purple; E2, yellow; E3, light blue; E4, red.

Population Structure, Phylogeny, and Introgression.

Principal component analysis (PCA) and population structure analyses unambiguously identify five genetic clusters. The PCA plot separates populations W and E along the first eigenvector, which explains 57.76% of total genetic variance (Fig. 1). The second eigenvector identifies subpopulations E1–E4 and explains 8.80% of the variance. E1 and E3 comprise samples from low elevations (∼2,900–3,300 m; localities 1–5; Fig. 1 ) near the Yalu Zangbu Grand Canyon, and no obvious geographic barrier separates them. Subpopulation E2 includes samples from high elevations (∼3,900–4,900 m; localities 6–8; Fig. 1 ). The remaining localities, at elevations from ∼3,700–4,500 m, form subpopulation E4. Phylogenetic analyses based on 1,000 neutral loci (Fig. 1) and gene tree-based coalescent () methods display nearly identical topologies. Populations W and E split into distinct branches, with subpopulation E1 in the most basal position, subpopulation E2 following subpopulation E1, and then allopatric units E3 and E4. Population structure analysis corroborates the phylogenetic analyses and PCA, and identifies signs of nuclear gene flow from E to W at their boundary (Fig. 2). Two topology-based methods verify nuclear admixture among some of the matrilines. The TreeMix analysis () and the D-statistic test (Table 1) show gene flow within sympatric E1 and E3. A significant signature of admixture is found between E2 and E4 (|Z| > 3; Table 1).

Fig. 2.

Population structure plots with the number of ancestral clusters (K) = 2–5.

Table 1.

Admixture signatures from D-statistic tests

Test	D statistic	Z score
W, E1; E3, E4	−0.247	−50.868
W, E1; E3, E2	−0.264	−61.376
W, E1; E4, E2	−0.009	−2.354
W, E2; E3, E4	0.117	21.667
E1, E2; E3, E4	0.260	78.181

Populations with gene flow are denoted in boldface.

Population structure plots with the number of ancestral clusters (K) = 2–5. Admixture signatures from D-statistic tests Populations with gene flow are denoted in boldface.

Demography History.

We used the generalized phylogenetic coalescent sampler (G-PhoCS) (46) to infer ancestral population sizes, divergence times, and migration rates. We estimate the divergence time of W and E from ∼688.3–846.0 kya (Ka) (Fig. 3). The ancestral effective population size of E at this period was reduced by ∼56% (). Pairwise sequentially Markovian coalescence (PSMC) results suggest that W was reduced by ∼50% (). Population expansion occurred two- to threefold until the penultimate glaciation [Marine Isotope Stage 6, ∼100–200 Ka (47)]. E1 diverged from E2–E4 at ∼181.7 Ka [95% confidence interval (CI) = 158.0–205.4 Ka]. The time of divergence for E2 relative to E3 and E4, and between the latter two, is estimated at 45.9 Ka (95% CI: 36.9–55.6 Ka) and 39.3 Ka (95% CI: 32.1–46.6 Ka), respectively. We may underestimate the divergence time because the mutation rate [0.776e-09 per site per year (45)] is a conservative value. Demographic analysis shows evidence of genetic exchanges among all subpopulations of E as well as directional migration from E4 to W (Fig. 3). Gene flow from the hypothesized ancestral population of E2–E4 into W occurred between 45.9 and 181.7 Ka (G-PhoCS analyses). Coalescent simulations of gene flow between W and E under different migration scenarios suggest that the inferred demographic parameters were most concordant with the observed patterns of differentiation (Fig. 3).

Fig. 3.

Demographic inference. (A) Demographic history inferred by G-PhoCS. Widths of branches are proportional to Ne. Horizontal dashed lines denote posterior estimates for divergence times, associated mean values are shown in bold, and 95% credible intervals are shown in parentheses. Arrows indicate the direction of gene flow, and associated figures indicate the estimates of total migration rates. (B) Distribution of FST(W, E1) values in the observed data and in the simulated data under different migration scenarios between W and E. The full model shows simulation with the full set of demographic parameters inferred from G-PhoCS. The no_E234 refers to the simulation without the postdivergence migration from E234 to W. The no_E refers to the simulation without current and postdivergence migration from E to W.

Genomic Isolation Between West and East Matrilines.

We used several statistical approaches to assess the extent of genomic differentiation between W and E. High levels of differentiation between W and E and within E are found using pairwise mean relative divergence (FST) and absolute sequence divergence (dxy). The mean FST = 0.4787 ± 0.0097 and dxy = 0.0030 ± 0.0002 values are more than threefold higher than those among subpopulations of E (FST: 0.1405 ± 0.0322, dxy: 0.0009 ± 0.0001; Fig. 4 and ). Levels of divergence using the density of fixed differences per site between populations (df) are about 100-fold higher than those within subpopulations of E (). The genomic landscape of divergence using the df statistic finds large numbers of loci across the genome fixed between W and subpopulations of E (Fig. 4 and ).

Fig. 4.

Genomic divergence associated with species formation. (A) FST distribution between W and E1 and their landscape of genomic divergence measured by the df. Both statistics are measured from 50-kb nonoverlapping windows. Scaffolds are concatenated to reveal the whole-genomic divergence pattern. (B) FST distribution between E1 and E2 and their landscape of genomic divergence. (C) FST distribution between E3 and E4 and their landscape of genomic divergence. We used a modified population branch statistic (PBS) approach to measure the extent of genomic differentiation between W and all four subpopulations of E (). The mean PBS value of W (PBSw) is 0.5583. Neutral coalescent simulations based on the inferred demographic model find the top 2.5% of the observed PBSW values (ca. PBSW ≥ 0.8906; Fig. 5) to be significantly higher than simulated neutral values (P < 2.2e-16, two-tailed Mann–Whitney test; Fig. 6). Because all five populations of the species expanded recently (based on PSMC analysis), we performed the simulation taking this expansion into account; results were similar to those under the G-PhoCS model (). Accordingly, we can identify highly divergent regions (HDRs) of the genome with the observed PBSw ≥ 0.8906; there are 579 regions with a total length of 39.60 Mb. The longest fragment is 350 kb, but most (76.34%) are only a one-window-length (50 kb) fragment (). FST and dxy values are significantly higher in HDRs than those in the background genome (P < 2.2e-16, two-tailed Mann–Whitney test; Fig. 5 and ). The correlation between FST and dxy is significantly positive (Pearson’s R > 0.3, P < 2.2e-16; ), reflecting reduced gene flow in these regions. A contrasting pattern between W and E occurs with respect to nucleotide diversity (π) (). The value in HDRs of W is no less than for the rest of the genome (0.00078 vs. 0.00067) but is reduced significantly (P < 2.2e-16) in the corresponding regions of subpopulations of E (Fig. 5 and ). The reduced level of intrapopulation diversity in HDRs suggests that selection has affected E. Relative to the genomic background, these regions also contain a significantly high derived allele frequency (DAF) (P < 2.2e-16).

Fig. 5.

Identification of HDRs. (A) Distribution of FST-based statistic of PBSW. HDRs in top 2.5% PBS distribution, light blue; outside HDRs, gray. (B) Comparisons of HDRs of PBSW in terms of FST, π, dxy, and DAF versus the genomic background. (C) Distribution of PBSE1. (D) Comparisons of HDRs of PBSE1 with the genomic background. Asterisks designate levels of significance between HDRs and outside HDRs by a two-tailed Mann–Whitney test (*P < 0.01; **P < 1e-8; ***P < 2.2e-16).

Fig. 6.

Distribution of observed genome-wide top 2.5% PBSW values compared with the simulated PBSW values under the full model.

Divergence of W and E Matrilines in Relation to Reproductive Isolation.

We used the annotated genome of N. parkeri to infer the functions of candidate target genes within the HDRs that might relate to speciation. We annotated 365 protein-coding genes in the HDRs of PBSW. Gene ontology (GO) evaluations identified 51 functional classes that were significantly overrepresented (P < 0.05; ). In total, 21 genes distributed among 14 GO terms have functions related to reproduction, including, as examples, reproductive developmental process (GO:0003006), sexual reproduction (GO:0019953), and spermatogenesis (GO:0007283) (Table 2 and ).

Table 2.

GO analysis of genes located in regions that strongly differentiated W from all four subpopulations of E

Category	Term	No. of genes	P value
Cluster 1	GO:0048538∼thymus development	5	0.00
	GO:0048534∼hemopoietic or lymphoid organ development	12	0.01
	GO:0002520∼immune system development	12	0.01
Cluster 2	GO:0003006∼reproductive developmental process	15	0.00
	GO:0048610∼reproductive cellular process	11	0.00
	GO:0048609∼reproductive process in a multicellular organism	18	0.01
	GO:0032504∼multicellular organism reproduction	18	0.01
	GO:0019953∼sexual reproduction	17	0.01
	GO:0007281∼germ cell development	7	0.01
	GO:0007276∼gamete generation	15	0.01
	GO:0048232∼male gamete generation	12	0.02
	GO:0007283∼spermatogenesis	12	0.02
Cluster 3	GO:0035270∼endocrine system development	6	0.01
	GO:0030325∼adrenal gland development	3	0.01

Annotation clusters with an enrichment score of ≥2 are shown.

GO analysis of genes located in regions that strongly differentiated W from all four subpopulations of E Annotation clusters with an enrichment score of ≥2 are shown.

Little Representation of E Matrilines in the Nuclear Genome Within the Zone of Admixture.

Individuals in the zone of admixture are more closely related to W than E with respect to the nuclear genome (). However, all mixed individuals have the mitogenome of E (). We used PCAdmix to identify genomic regions from E in the genomes of the mixed individuals. The low level of E ancestry detected (∼5.9–11.2%; ) is not caused by systematic errors (<2%; ). Regions of E ancestry occur in no less than 67% (24 of 36 haploids) of mixed individuals. Such regions, about 1.6 Mb in size, are dispersed randomly across the genome, with a mean length of 62 Kb. These regions contain 31 protein-coding genes, one relevant to mitochondria [VDAC3 (48)], but none exhibit reduced levels of polymorphism (0.00059 > 0.00046) or significant variation in intergroup and intragroup nonsynonymous/synonymous ratios [P > 0.05, McDonald–Kreitman test (49)]. We detect no signal of selection on the mitogenome in the zone of admixture (Tajima D = −1.49, P > 0.10).

Incomplete Isolation and Gene Flow Within the Geographic Range of E Matrilines.

Levels of whole-genomic differentiation among E1–E4 are comparatively small. Interpopulation FST values range from 0.0974 to 0.1825, and dxy values range from 0.0007 to 0.0010 (). Mean df values within the subpopulations range from 1.27 × 10−07 to 6.02 × 10−07 (), which corresponds to ∼1.27–6.02 fixed differences for every 10 Mb of sequence data in compared populations. We applied the PBS statistic to identify HDRs and then examined the features of these regions. Similar patterns occur for each subpopulation as follows: (i) most outlier windows (>72%) are discontinuous and one window in length (50 kb) (); (ii) FST and dxy values are significantly higher than the background genome (P < 2.2 × 10−16, two-tailed Mann–Whitney test; Fig. 5 and ); (iii) π is significantly lower in HDRs relative to the rest of the genome, although the level varies among the subpopulations (Fig. 5 and ); and (iv) E1–E3 are skewed toward high frequencies of derived variants (Fig. 5 and ).

Environmental, Histological, and Physiological Divergence Affecting E Matrilines.

In a PCA of all bioclimatic variables associated with frog sampling sites (), PC1 explains 49.13% of the variation. This value differs significantly between low-elevation clades E1 and E3 and high-elevation clades E2 and E4 (P = 3.3e-4). Histological sections of middorsal skin show that the frogs from high elevations have a significantly greater number of granular glands than frogs from low elevations (P < 0.05, two-tailed t test; Fig. 7); such glands may function in response to environmental stimuli (50). Hemoglobin (Hb) levels in peripheral blood are correlated with elevation (P < 0.01; Fig. 7). In contrast to Hb levels, muscular oxygen content in relatively low-elevation populations is significantly higher than muscular oxygen content in populations from higher elevations (P < 0.01; Fig. 7); this corresponds to the decreased oxygen content in air.

Fig. 7.

Morphological and physiological changes associated with elevation. (A) Bar plots of the differences in numbers of granular glands in the middorsal skin between low-elevation (2,968 m, E1) and high-elevation (4,859 m, E4) populations (two-tailed test: P < 0.05). (B) Hb levels (grams per deciliter) at different elevations. Red lines show the best-fit regression line based on a third-order polynomial equation. The 95% confidence interval is shown in gray. (C) TOC (μmol/L) in low (E1) and high (E4) elevations. (D) Expression level of DNAJC8 in the brain, TNNC1 and ADORA1 in the heart, and LAMB3 in the lung from low-elevation (E1) and high-elevation (E4) populations of E, respectively. Eight replicates were performed for each group. Statistically significant differences in differential expression are indicated by asterisk(s) (two-tailed t test: *P < 0.05; **P < 0.01).

Genes, Ecological Factors, and the Evolution of E Matrilines.

Genes from the HDRs of each subpopulation were evaluated for functional categories that related to environmental factors. GO enrichment analyses revealed many overrepresented functional categories that appear to associate with adaptation to the environment of the QTP (). For instance, Kyoto Encyclopedia of Genes and Genomes pathways and GO categories related to metabolism (fatty acid metabolism, hsa00071), blood circulation (circulatory system process, GO:0003013), motility (muscle cell differentiation, GO:0042692), and immune response (inflammatory response, GO:0006954) characterize low-elevation E1 (). GO categories associated with apoptosis (induction of apoptosis, GO:0006917) were significantly enriched in high-elevation E2 (), and response to radiation (GO:0009314) and light stimulus (GO:0009416) were significantly enriched in high-elevation E4 (). GO categories associated with blood circulation (GO:0008015) and metabolism process (response to lipid, GO:0033993) are present in low-elevation E3 (). To verify the function of the genes in HDRs, we performed real-time quantitative PCR on seven candidate genes from three different tissues (heart, lung, and brain). These seven candidate genes were identified based upon what they potentially contribute to high-altitude adaptation (). Four of these differ in their levels of expression between low-elevation and high-elevation lineages (Fig. 7), including two related to cardiac function in the heart [Troponin C1 Slow (TNNC1) (51) and Adenosine A1 Receptor (ADORA1) (52)] and two hypoxia-related genes [DnaJ Homolog Subfamily C Member 8 (DNAJC8) in the brain (53) and Laminin Subunit Beta 3 (LAMB3) in the lung (54)].

Discussion

Genomic Isolation and Speciation Between E and W Matrilines.

Population genomic analyses find substantial genomic isolation between W and E, with limited directional gene flow. Gene flow has been found only in one geographically restricted area, where no obvious geographic barrier (e.g., mountains) exists that could restrict dispersal. Further, the genetic patterns indicate that frogs from both E and W cross the YZR within their ranges and reject the null hypothesis of panmixia within a single species. Neutral processes, including divergence, changes of current, and ancestral effective population size (Ne), and levels of gene flow have the potential to explain the broad pattern of differentiation between W and E. However, the HDRs between W and E exhibit significantly more genetic differentiation than our simulation using the inferred history. HDRs could result from positive selection and not demographic history (23). Furthermore, the HDRs in subpopulations of E exhibit significantly lower π and higher DAF compared with the rest of the genome (Fig. 5), suggesting the action of positive selection on E. Because the HDRs of W and E harbor many genes that relate to GO categories involved in fertilization, particularly to spermatogenesis, they likely lead to reproductive isolation by suppressing gene flow. Studies of other species have detected the rapid evolution of male reproductive genes, for instance, in primates (55), birds (19), fruit flies (56), and amphibians (57). Taken together with the fact that W, E2, and E4 occupy similar climatic environments () and display no differences in morphology, we suggest that endogenous selection is a dominant factor building reproductive isolation, resulting in speciation between W and E matrilines. Populations from the geographic region of admixture between E and W matrilines occur along a narrow valley containing a tributary of the YZR. They have the mitogenome of E, whereas nuclear genes of W strongly dominate. Such mitonuclear discordance is common in vertebrates (58); however, here, a cohort of N. parkeri genomes offers an unusual opportunity to decipher the genomic pattern and the genetic mechanism of the phenomenon. Population genomic analyses reveal the E introgressed regions in hybrids are small and randomly dispersed in the genome. Those short introgressed segments suggest to us that they formed by hybridization, and were subsequently broken into even smaller sizes by recombination operating over many generations (59, 60). We detect no signal of selection in either the regions of introgression or the mitochondrial genome, and no significant enrichment of the pathways relevant to mitochondria or of the GOs of genes in introgressed regions. Thus, the mitonuclear discordance is not caused by natural selection driven by environmental factors (29, 61, 62). A likely scenario is establishment of reproductive isolating barriers (RIBs) resulting from secondary contact with historical and limited introgression after mid-Pleistocene divergence (63, 64). We hypothesize that hybrids formed historically only between females of E and males of W upon secondary contact, with hybrid females continuing to backcross to the males of W, leading to postzygotic, prezygotic, or both categories of RIBs.

Ecological Adaptation Within E Matrilines.

E Tibetan frogs primarily segregate into high-elevation (>3,700 m, E2 and E4) and low-elevation (<3,700 m, E1 and E3) populations (Fig. 1). This pattern likely reflects an ecological stratification in the southeastern QTP, including oxygen partial pressure (36), UV radiation (37), precipitation (38), and ambient temperature (39). If ecological selection is one of the forces that drives differentiation of the populations of E, one expects to find an imprint on genomic regions involved in ecological adaptation (27, 65). The HDRs of each subpopulation of E exhibit signals of selection compared with the background, including low π, increased dxy between populations, and DAF. GO enrichment analyses point to environmental drivers of divergence. No clear geographic barriers separate the low-elevation sympatric pair E1 and E3. Although they were formed at different times (Fig. 3), they share several GO functional categories, such as blood circulation and metabolic processes. Given the phylogeny and population history, these functions likely evolved independently in each lineage. The histological, physiological, and expressional evidence (Fig. 7) corresponds to the scenario that ecological stratification in the southeastern QTP has greatly promoted adaptive population diversification in situ. In summary, N. parkeri is a useful model to investigate the processes and genetic bases of speciation along geographic and environmental gradients. We argue that natural selection plays important roles in driving continuing divergence within the species, and even in maintaining it. The extreme environments of the Tibetan Plateau can drive the rapid evolution of species [e.g., yak (33)]. Given the rapidity of changes and the challenging environment, the area is a natural laboratory for studying how selection drives adaptation; how environments influence evolutionary history; and, in some cases, how speciation can occur.

Materials and Methods

Sample Collection and Sequencing.

Tibetan frogs (45 in total) were selected for our genomic study based on matrilineal (mtDNA) genealogies (42). An additional 18 individuals come from the area of mixed clades in a tributary of the YZR (localities 25, 26, and 27; Fig. 1). One individual of Nanorana pleskei was used as the outgroup taxon (66). All collections were made according to animal use protocols approved by the Kunming Institute of Zoology Animal Care and Ethics Committee. Total genomic DNA was extracted from liver, muscle, toe clips, or tadpole samples using the phenol/chloroform method (67). For each individual, 1–3 μg of DNA was sheared into fragments of 300–800 bp using the Covaris system. DNA fragments were processed and sequenced using Illumina paired-end sequencing technology (Illumina, Inc.). The 45 nonhybrid individuals and the outgroup taxa were sequenced to a target depth of 15×, and the 18 hybrid individuals were sequenced to target depth of 5×. The raw sequence data from this study have been submitted to the Genome Sequence Archive (gsa.big.ac.cn/) under accession nos. CRA000919 (N. parkeri) and CRA000918 (N. pleskei).

SNP Calling and Filtering.

Raw sequence reads of each individual were mapped to the Tibetan frog reference genome (45) using BWA-ALN (v.0.7.4) (68) with default parameters. SAMtools (v.0.1.18) was used for sorting and removing PCR duplicates (69). To minimize false-positive SNP calls around indels, local realignment around indels was performed using the Genome Analysis Tool Kit (v.2.6-5) (70). Raw SNPs were extracted using SAMtools on the locally realigned BAM files with the command “samtools mpileup -q 20 -Q 20 -C 50 -uDEf.” To obtain high-quality genotype calls for downstream analyses, we kept SNPs that met the following criteria: (i) sites were at least 5 bp away from a predicted insertion/deletion, (ii) the consensus quality was ≥40, (iii) sites did not have triallelic alleles and indels, (iv) the depth ranged from 2.5 to 97.5% in depth quartile, (v) SNPs had minor allele frequencies ≥ 0.01, and (vi) SNPs occurred in more than 95% of high-coverage individuals (nonhybrids) and 75% of medium-coverage individuals (hybrids). The filtered data were phased using Beagle v.3.3.2 (71).

Population Structure, Phylogenetic Inference, and Admixture Analyses.

We used PCA and population structure analysis to evaluate the genetic structuring of frogs. SNPs in scaffolds longer than 500 kb were extracted; they occupied about 76% of the entire genome. PCA was performed using the package GCTA (v.1.24.2) (72). The genomic ancestry of each individual was inferred using Frappe (v.1.1) (73). To avoid the effect of linkage disequilibrium, we selected one SNP for each interval of 50 kb. The postulated number of ancestral clusters (K) was set from two to five, and the maximum number of expectation-maximization iterations was set to 10,000. Phylogenetic relationships were inferred via both coalescent and concatenation methods. To minimize the effects of potential alignment errors and regions with strong natural selection, we performed these analyses on putatively neutral genomic regions by filtering out the positions with repeat sequences, exons, and the 10 kb flanking them on each side. We randomly selected 1,000 neutral loci with a window size of 100 kb from the intergenic region. First, we reconstructed individual gene trees for each window based on the maximum likelihood (ML) approach using RAxML (v.8.1.15) (74). Support values for each node were inferred using 100 rapid bootstrap replicates based on the GTRGAMMA model. Second, for gene tree-based coalescent analysis, species trees were generated using MP-EST (75) and STAR (76, 77) and using population as the units of the tree tips. Third, for the concatenation analysis, ML trees were constructed based on the GTRGAMMA model using the concatenated sequences from the same set. To infer admixture events, we applied the D-statistic using the qpDstat module in the ADMIXtools package (78). We also adopted a tree-based approach, which was implemented in TreeMix (79), to verify the existence of gene flow through modeling the migration values set from 0 to 4 with a block of 5,000 SNPs.

Demographic Analysis Using G-PhoCS and PSMC.

G-PhoCS (46) was employed to infer the complete demographic history for N. parkeri, including population divergence times, ancestral population size, and migration rates based on 1,000 neutral loci (80). The parameters were inferred in a Bayesian manner using Markov Chain Monte Carlo to jointly sample model parameters and genealogies of the input loci (46). Migration scenarios were added by combining results of D-statistic tests, Frappe, and TreeMix because G-PhoCS often have limited ability to characterize complex migration scenarios (81). Additionally, two postdivergence migration bands were added to test if gene flow occurred between W and E since they separated. Each Markov chain was run for 2,000,000 generations while sampling parameter values every 20th iteration. Burn-in and convergence of each run were determined with TRACER 1.5 (82). More information about the control file of G-PhoCS is provided in . Divergence times in units of years, effective population sizes, and migration rates were calibrated by the estimates of generation time and neutral mutation rate from previous studies (45, 83). We repeated the G-PhoCS analysis with four separate runs to obtain reliable and stable estimates for the demographic parameters. To validate the G-PhoCS inferences and test if the differential gene flow to W was correctly identified, we used ms (84) to perform simulations under different migration scenarios between W and E (). We used the inferred demographic parameters to produce 50 Mbp of sequences and compared the observed patterns of differentiation for W–E1 with those under different migration scenarios. The trajectory of demographic histories for the five populations of Tibetan frogs was inferred by the PSMC model (85). Because PSMC has high false-negative rates at low sequence coverage, we restricted this analysis to the individual in each group with highest coverage (≥15×). In addition, a correction factor (-N) was invoked to correct for the false-negative rate caused by the shallow sequence depth (80). The PSMC analysis was set as the following parameters: -N25 -t15 -r5 -b -p “4+25 * 2+4+6”. A bootstrapping approach with 100 replicates was performed to assess the variation in the inferred Ne trajectories. A generation time of 5 y and a neutral mutation rate of 0.776e-09 per site per year were used to convert the population sizes and scaled time into real sizes and time (45, 83). The program fastsimcoal2 (86) was used to estimate the extent of population expansion within 30,000 y based on the model of G-PhoCS. Fifty time simulations were performed, and the result with the highest likelihood was kept. For each run, demographic estimates were obtained from 100,000 simulations (-n 100,000) and 40 expectation/conditional maximization cycles (-L 40) per parameter file.

Inference of Mitochondrial Phylogeny and Hybrid Ancestry Assignment.

We used BWA-ALN (v.0.7.4) to map all raw sequence reads from each individual to the previously assembled complete mitochondrial genomes of N. pleskei (87), the closest relative of our Tibetan frog. SNPs were identified and filtered based on the same nuclear genome. The matrilineal (mitochondrial) genealogy was inferred using the ML method based on the GTRGAMMA model implemented in RAxML (v.8.1.15) (74). PCAdmix (88) was used to identify the ancestries in each of the 18 hybrid individuals. We used W and E4 as the ancestral populations because they were the geographically closest populations and showed the highest signal of admixture in the f3 test. To prevent high-linkage blocks from having excessive influence on the inferred ancestry of a region, SNPs were thinned with r2 > 0.80 in all ancestral and admixed groups. PCA was performed using a window of 40 SNPs. A default calling threshold of 0.9 was used as the criterion to assign ancestry. However, systematic errors, such as incomplete lineage sorting and estimation errors, may have contributed to similar introgression signals in the PCAdmix analysis. To estimate these, we reanalyzed the PCAdmix analysis using non-E4 individuals as hybrids.

Inference HDRs.

Genomic differentiation was calculated based on the PBS under the topological transformation described previously (89). The PBS value estimates the amount of sequence change along a population branch since its divergence from other branches of a population tree. First, pairwise FST statistics were calculated using Weir and Cockerham’s method (90), implemented in VCFtools v0.1.11 (91), with nonoverlapping 50-kb genomic windows. Negative values of FST were treated as 0. Windows less 20 kb in length were excluded for further analysis. A log-transformation FST was used for the PBS calculation (92). The length of the branch leading to W since the divergence from all subpopulations of E was estimated as follows: The length of the branch leading to E1 since the divergence from the remaining subpopulations of E clades was then estimated as follows: Calculations for branch lengths leading to subpopulations E2, E3, and E4 were similar to the equation for E1. HDRs were defined as the upper 2.5% of each PBS distribution. To avoid the potential assembly or mapping errors at the ends of scaffolds, we further removed high divergence peaks less than two consecutive windows in both ends of scaffolds in both sides.

Characterization of HDRs in Terms of dxy, df, π, and DAF.

In addition to FST and PBS, we applied several population genomic parameters to quantify and compare the outlier windows with the background genome, including the following: dxy between populations (93), df between populations (19), within-population π level (19, 23, 91), DAF, and population-scaled recombination rates (ρ). We treated N. pleskei as the outgroup when comparing differences in DAF between W and E, and treated three nonadmixture W individuals (Fig. 2) as the outgroup when comparing DAF differences among subpopulations of E.

GO Enrichment Analysis.

GO enrichment analysis was performed using DAVID (Database for Annotation Visualization and Integrated Discovery) (94). GO terms with less than two genes were excluded from further analysis. GO terms with a P value <0.05 were considered to be significantly enriched.

Phenotype Data from Physiological and Morphological Detection.

Hb levels and tissue oxygen content (TOC) were collected in the field during June and July 2015. Measurements were taken in five communities in E along an elevation gradient: Nyingchi (29.61°N, 94.36°E, 2,968 m above sea level), Lhasa (29.67°N, 90.88°E, 3,671 m above sea level), Zhaxigang (29.75°N, 91.95°E, 4,011 m above sea level), Riduo (29.70°N, 92.23°E, 4,368 m above sea level), and Milashankou (29.80°N, 92.34°E, 4,859 m above sea level). At least 10 individuals were measured for each community. Hb level in peripheral blood was measured from 53 adult frogs, and muscular oxygen content was measured from 69 adult frogs. Hb was determined using Mission Plus Hb, immediately after drawing blood from the heart ventricle. TOC was determined via a fiber optic cable (PreSens Precision Sensing GmbH). We also made histological sections of the middorsal skin of seven frogs from the communities of Nyingchi and Milashankou, which represented populations from E at low and high elevations, respectively. The tissues were fixed in Heidenhain’s Susa fixative to make paraffin sections (5 μm thick).

RNA Isolation and Reverse Transcriptase PCR Assay.

Total RNAs from populations at low (Nyingchi, 2,968 m) and high (Milashankou, 4,859 m) elevations of E were extracted using the TRIzol total RNA extract kit (Tiangen). Reverse transcription was carried out using the Fermentas RevertAid First-Strand cDNA synthesis kit (Fermantas) to prepare templates for real-time quantitative PCR. The primers of candidate genes used for real-time PCR are displayed in . Actin, Beta (ACTB) was used as a loading control.

75 in total

1. Genomic divergence during speciation: causes and consequences.

Authors: Patrik Nosil; Jeffrey L Feder
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2012-02-05 Impact factor: 6.237

2. Estimating species phylogenies using coalescence times among sequences.

Authors: Liang Liu; Lili Yu; Dennis K Pearl; Scott V Edwards
Journal: Syst Biol Date: 2009-07-16 Impact factor: 15.683

3. Migration-selection balance and local adaptation of mitochondrial haplotypes in rufous-collared sparrows (Zonotrichia capensis) along an elevational gradient.

Authors: Zachary A Cheviron; Robb T Brumfield
Journal: Evolution Date: 2009-02-02 Impact factor: 3.694

4. Sequencing of 50 human exomes reveals adaptation to high altitude.

Authors: Xin Yi; Yu Liang; Emilia Huerta-Sanchez; Xin Jin; Zha Xi Ping Cuo; John E Pool; Xun Xu; Hui Jiang; Nicolas Vinckenbosch; Thorfinn Sand Korneliussen; Hancheng Zheng; Tao Liu; Weiming He; Kui Li; Ruibang Luo; Xifang Nie; Honglong Wu; Meiru Zhao; Hongzhi Cao; Jing Zou; Ying Shan; Shuzheng Li; Qi Yang; Peixiang Ni; Geng Tian; Junming Xu; Xiao Liu; Tao Jiang; Renhua Wu; Guangyu Zhou; Meifang Tang; Junjie Qin; Tong Wang; Shuijian Feng; Guohong Li; Jiangbai Luosang; Wei Wang; Fang Chen; Yading Wang; Xiaoguang Zheng; Zhuo Li; Zhuoma Bianba; Ge Yang; Xinping Wang; Shuhui Tang; Guoyi Gao; Yong Chen; Zhen Luo; Lamu Gusang; Zheng Cao; Qinghui Zhang; Weihan Ouyang; Xiaoli Ren; Huiqing Liang; Huisong Zheng; Yebo Huang; Jingxiang Li; Lars Bolund; Karsten Kristiansen; Yingrui Li; Yong Zhang; Xiuqing Zhang; Ruiqiang Li; Songgang Li; Huanming Yang; Rasmus Nielsen; Jun Wang; Jian Wang
Journal: Science Date: 2010-07-02 Impact factor: 47.728

Review 5. The role of the uplift of the Qinghai-Tibetan Plateau for the evolution of Tibetan biotas.

Authors: Adrien Favre; Martin Päckert; Steffen U Pauls; Sonja C Jähnig; Dieter Uhl; Ingo Michalak; Alexandra N Muellner-Riehl
Journal: Biol Rev Camb Philos Soc Date: 2014-05-01

6. Evolution of Darwin's finches and their beaks revealed by genome sequencing.

Authors: Sangeet Lamichhaney; Jonas Berglund; Markus Sällman Almén; Khurram Maqbool; Manfred Grabherr; Alvaro Martinez-Barrio; Marta Promerová; Carl-Johan Rubin; Chao Wang; Neda Zamani; B Rosemary Grant; Peter R Grant; Matthew T Webster; Leif Andersson
Journal: Nature Date: 2015-02-11 Impact factor: 49.962

7. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

Authors: Yan-Bo Sun; Zi-Jun Xiong; Xue-Yan Xiang; Shi-Ping Liu; Wei-Wei Zhou; Xiao-Long Tu; Li Zhong; Lu Wang; Dong-Dong Wu; Bao-Lin Zhang; Chun-Ling Zhu; Min-Min Yang; Hong-Man Chen; Fang Li; Long Zhou; Shao-Hong Feng; Chao Huang; Guo-Jie Zhang; David Irwin; David M Hillis; Robert W Murphy; Huan-Ming Yang; Jing Che; Jun Wang; Ya-Ping Zhang
Journal: Proc Natl Acad Sci U S A Date: 2015-03-02 Impact factor: 11.205

8. Adaptive protein evolution at the Adh locus in Drosophila.

Authors: J H McDonald; M Kreitman
Journal: Nature Date: 1991-06-20 Impact factor: 49.962

9. Inference of human population history from individual whole-genome sequences.

Authors: Heng Li; Richard Durbin
Journal: Nature Date: 2011-07-13 Impact factor: 49.962

10. Linked selection and recombination rate variation drive the evolution of the genomic landscape of differentiation across the speciation continuum of Ficedula flycatchers.

Authors: Reto Burri; Alexander Nater; Takeshi Kawakami; Carina F Mugal; Pall I Olason; Linnea Smeds; Alexander Suh; Ludovic Dutoit; Stanislav Bureš; Laszlo Z Garamszegi; Silje Hogner; Juan Moreno; Anna Qvarnström; Milan Ružić; Stein-Are Sæther; Glenn-Peter Sætre; Janos Török; Hans Ellegren
Journal: Genome Res Date: 2015-09-09 Impact factor: 9.043

19 in total

1. Metabolic characteristics of overwintering by the high-altitude dwelling Xizang plateau frog, Nanorana parkeri.

Authors: Yonggang Niu; Wangjie Cao; Kenneth B Storey; Jie He; Jinzhou Wang; Tao Zhang; Xiaolong Tang; Qiang Chen
Journal: J Comp Physiol B Date: 2020-04-09 Impact factor: 2.200

Review 2. A review on the conservation genetic studies of Indian amphibians and their implications on developing strategies for conservation.

Authors: Priti Hebbar; G Ravikanth; N A Aravind
Journal: J Genet Date: 2019-12 Impact factor: 1.166

3. Evolutionary responses of a reef-building coral to climate change at the end of the last glacial maximum.

Authors: Jia Zhang; Zoe T Richards; Arne A S Adam; Cheong Xin Chan; Chuya Shinzato; James Gilmour; Luke Thomas; Jan M Strugnell; David J Miller; Ira Cooke
Journal: Mol Biol Evol Date: 2022-10-11 Impact factor: 8.800

4. Freeze tolerance and the underlying metabolite responses in the Xizang plateau frog, Nanorana parkeri.

Authors: Yonggang Niu; Wangjie Cao; Jinzhou Wang; Jie He; Kenneth B Storey; Li Ding; Xiaolong Tang; Qiang Chen
Journal: J Comp Physiol B Date: 2020-10-06 Impact factor: 2.200

Review 5. From Summary Statistics to Gene Trees: Methods for Inferring Positive Selection.

Authors: Hussein A Hejase; Noah Dukler; Adam Siepel
Journal: Trends Genet Date: 2020-01-15 Impact factor: 11.639

6. Whole-Genome Diversification Analysis of the Hornbeam Species Reveals Speciation and Adaptation Among Closely Related Species.

Authors: Zeyu Zheng; Ying Li; Minjie Li; Guiting Li; Xin Du; Hu Hongyin; Mou Yin; Zhiqiang Lu; Xu Zhang; Nawal Shrestha; Jianquan Liu; Yongzhi Yang
Journal: Front Plant Sci Date: 2021-02-10 Impact factor: 5.753

Review 7. Frog Skin Innate Immune Defences: Sensing and Surviving Pathogens.

Authors: Joseph F A Varga; Maxwell P Bui-Marinos; Barbara A Katzenback
Journal: Front Immunol Date: 2019-01-14 Impact factor: 7.561

8. Multi-Tissue Transcriptomes Yield Information on High-Altitude Adaptation and Sex-Determination in Scutiger cf. sikimmensis.

Authors: Sylvia Hofmann; Heiner Kuhl; Chitra Bahadur Baniya; Matthias Stöck
Journal: Genes (Basel) Date: 2019-10-31 Impact factor: 4.096

9. Convergent Evolution of Cysteine-Rich Keratins in Hard Skin Appendages of Terrestrial Vertebrates.

Authors: Florian Ehrlich; Julia Lachner; Marcela Hermann; Erwin Tschachler; Leopold Eckhart
Journal: Mol Biol Evol Date: 2020-04-01 Impact factor: 16.240

Review 10. Perspectives on studying molecular adaptations of amphibians in the genomic era.

Authors: Yan-Bo Sun; Yi Zhang; Kai Wang
Journal: Zool Res Date: 2020-07-18