Literature DB >> 34573405

De novo Assembly, Annotation, and Analysis of Transcriptome Data of the Ladakh Ground Skink Provide Genetic Information on High-Altitude Adaptation.

Sylvia Hofmann1,2, Chitra Bahadur Baniya3, Matthias Stöck4,5, Lars Podsiadlowski6.   

Abstract

The Himalayan Arc is recognized as a global biodiversity hotspot. Among its numerous cryptic and undiscovered organisms, this composite high-mountain ecosystem harbors many taxa with adaptations to life in high elevations. However, evolutionary patterns and genomic features have been relatively rarely studied in Himalayan vertebrates. Here, we provide the first well-annotated transcriptome of a Greater Himalayan reptile species, the Ladakh Ground skink Asymblepharus ladacensis (Squamata: Scincidae). Based on tissues from the brain, an embryonic disc, and pooled organ material, using pair-end Illumina NextSeq 500 RNAseq, we assembled ~77,000 transcripts, which were annotated using seven functional databases. We tested ~1600 genes, known to be under positive selection in anurans and reptiles adapted to high elevations, and potentially detected positive selection for 114 of these genes in Asymblepharus. Even though the strength of these results is limited due to the single-animal approach, our transcriptome resource may be valuable data for further studies on squamate reptile evolution in the Himalayas as a hotspot of biodiversity.

Entities:  

Keywords:  Himalayas; Scincidae; adaptation; evolution; genomic; high elevation

Mesh:

Year:  2021        PMID: 34573405      PMCID: PMC8466045          DOI: 10.3390/genes12091423

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

The Himalayan arc represents one of the world’s most important faunal and floral hotspots with high species diversity and endemism [1], which result from the Tertiary orogeny of this mountain chain, its complex topography as well as its great climatic heterogeneity and isolation. The genesis of the Tibetan highlands and the Himalayas since the Paleogene, with the Greater Himalayas starting to rise presumably the earliest in the post-Eocene (for a review, see the supplementary in Hofmann et al. [2]), triggered the evolution of unique biodiversity under gradual high-altitude adaptation, as already shown for anurans [3,4,5,6,7]. Besides amphibians, there are also several reptiles that can cope with life at high altitude in those regions, e.g., Thermophis [8], Phrynocephalus [9], and some Laudakia species [10]. Potential constraints to upslope migration of reptiles (and amphibians) to high-elevation environments are the substantial UV-radiation, the thermal extremes, and especially the oxidative stress, referred to as high-altitude hypoxia, which interacts with temperature in a context-dependent manner to influence thermal performance and limits in terrestrial ectotherms [11,12]. Recent advances in high-throughput sequencing technologies have led to a growing number of genomic studies that address the molecular basis of high-altitude adaptation, some of them focused also on reptiles [13,14,15]. However, such data have been scarce in non-model species of the Greater Himalayas (but see [16]). This results from the general understudied biodiversity of this high-mountain range, presuming a relatively large number of cryptic and undiscovered species [17], even among vertebrates. Molecular data from Himalayan organisms can contribute to understanding of the taxonomic and functional diversity spectra across this species-rich, fragile ecosystem. These data resources are even more important because Himalayan biodiversity is threatened at the very core; rapid warming due to climate change, especially at higher elevations, as well as higher rates of forest degradation and deforestation, infrastructural development, trade routes, and hydropower dams are driving species loss at a very alarming speed [18,19]. To allow future studies in evolutionary biology at a genomic level and to generally provide sufficient and relevant data for Himalayan reptiles, in the present study, we have generated a new genomics data set based on RNAseq for a scincid species from the Greater Himalayas. Using these data, we specifically aimed to identify genes known to play roles in adaptation of terrestrial ectothermic vertebrates to high elevations. Since exposure to oxidative stress can particularly affect the physiology during early development [20] and in oxygen-sensitive organs [21,22], such as the nerve system, we focused on embryonic and brain tissue samples. Our target species is a scincid lizard in the genus Asymblepharus, the Ladakh Ground Skink, A. ladacensis (GÜNTHER, 1864), which is endemic to the western part of the Himalayas. The genus further contains the following species (Figure 1): A. alaicus (ELPATJEVSKY, 1901), A. eremchenkoi PANFILOV, 1999, A. himalayanus (GÜNTHER, 1864), A. mahabharatus EREMCHENKO, SHAH & PANFILOV, 1998, A. nepalensis EREMCHENKO & HELFENBERGER, 1998, and A. tragbulensis (ALCOCK, 1898). Another two species, Asymblepharus medogensis JIANG, WU, GUO, LI & CHE, 2020 and A. nyingchiensis JIANG, WU, WANG, DING & CHE, 2020, have been described very recently from Mêdog, Nyingchi in SE Tibet, China. According to a large-scale phylogenetic study of squamates [23], the sampled specimen of A. sikimmensis (BLYTH, 1854) is nested within Scincella and was therefore suggested to be transferred to this genus. However, it remains unclear whether this single specimen had been misidentified as A. sikimmensis since originally it was labeled in the museum collection as Scincella potanini (voucher catalogue number CAS:HERP:194923, see http://portal.vertnet.org/o/cas/herp?id=urn-catalog-cas-herp-194923 (accessed on 29 July 2021) [24].
Figure 1

Map of Asymblepharus species based on GBIF (www.gbif.org; accessed on 20 July 2021) records of preserved specimens and georeferenced localities in the taxonomic reptile database (https://reptile-database.reptarium.cz/; accessed on 20 July 2021). The location of our RNA sample of the female A. ladacensis (photo) is indicated by a green circle with a dot in the middle and an arrow. * Note, according to a large-scale phylogeny of squamates, A. sikimmensis is nested within Scincella; however, it remains unclear whether this single “A. sikimmensis” specimen, on which the sequence data are based, had been taxonomically correctly identified. Therefore, we also show the GBIF records of specimens collected as A. sikimmensis. Records of A. eremchenkoi in the databases could not be georeferenced due to insufficient information on the collection site. Photo credit: Sylvia Hofmann.

In general, Asymblepharus is a genus with a still poorly known endemic distribution, origin, and evolutionary history. No studies of its population genetic structure and genetic diversity exist to date for any Asymblepharus taxon, and the current taxonomic relationships of its lineages are in flux [23,24,25]. The three Himalayan species A. himalayanus, A. ladacensis, and A. tragbulensis have frequently been assigned to the genus Himalblepharus Eremchenko, 1987. According to literature data ([26] and references therein) and personal observation, they show a remarkably wide vertical distribution from the foothills (~150 m a.s.l.) to the high alpine zone and even up to the snow line (~5500 m a.s.l.), making them an excellent model to study the genetic basis of adaptations to high altitudes in ectotherms and the evolutionary processes accounting for them. With this paper, we characterize the first transcriptome data set of a high-altitude reptile species from the Greater Himalayas and report genes known to play roles in adaptation of ectotherms to high elevations of this non-model reptile. Such data provide the necessary sources for future molecular studies in Himalayan reptiles and high-altitude vertebrates.

2. Materials and Methods

2.1. Sample Collection and Ethics Statement

A single gravid female Asymblepharus ladacensis was collected in Central Nepal, in the Dhaulagiri range, west of the Kali Gandaki valley (28.68° N, 83.59° E; 2714 m a.s.l.; Figure 1). Samples were collected in accordance with regulations for the protection of terrestrial wild animals under the permits of the Nepal expeditions of the Natural History Museum of Erfurt, Germany [27,28]. All treatments were carried out in accordance with approved guidelines and according to the permit as well as the local animal welfare committee’s instructions (VNME 17, 15–30). Tissues were transferred into RNAlater (Thermo Fisher), kept at ambient temperature during the time of the fieldwork, and later stored at −30 °C.

2.2. RNA Isolation, Library Preparation, and Sequencing

We followed the same procedure as previously described [16]. In brief, total RNA was isolated from the brain, an embryonic disc, and from pooled tissues (including lung, muscle, and heart) using TRIzol Reagent (Thermo Fisher Scientific, Waltham, MA, USA) according to the supplier’s recommendation, in combination with the RNeasy Mini Kit (Qiagen, Hilden, Germany) and adjusted to equal concentrations. RNA quality was assessed by RNA concentration, RIN (RNA Integrity Number) value, 28S/18S and fragment length distribution using an Agilent Bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA). Complementary DNA (cDNA) library preparation and paired-end sequencing were carried out by BGI (BGI-Hongkong Co., Ltd., Tai Po District, Hong Kong), using Illumina NextSeq500 sequencing system (Illumina, San Diego, CA, USA). The raw reads quality was examined using FastQC v0.11.9 [29].

2.3. Assembly and Assessment of Transcriptome Quality and Completeness

First, we controlled our data for rRNA quantity using SortMeRNA 4 [30]. The three de novo assemblies were then created following the Oyster River Protocol (ORP; Docker image 2.2.8) best practices [31]. This protocol implements both pre-assembly procedures and a number of different kmer lengths and assemblers, finally merging these assemblies into a single, comprehensive transcriptome. The rationale behind it is that assembling RNAseq reads with different assembly tools increases assembly quality and mapping rate and, in turn, the ability to draw conclusions from that fraction of the sample [31]. Thus, merging the contigs resulting from several assemblers and parameter configurations to combine the advantages of certain assembly mechanisms and to overcome their different disadvantages seems to be the best way to obtain a comprehensive de novo transcriptome assembly [32,33]. Illumina sequencing adapters and nucleotides with quality Phred ≤ 2 were removed using Trimmomatic v0.36 [34], then the reads were error corrected by Rcorrector version 1.0.2 [35]. These reads were then assembled using Trinity release 2.8.4 [36] with default settings (k = 25), two independent runs of SPAdes assembler version 3.11 with kmer lengths of 55 and 75 [37], and the assembler Shannon version 0.0.2 with a kmer length of 75 [38]. The resulting four distinct transcriptome assemblies were then merged to a single, comprehensive transcriptome using Ortho-Fuser [31]. This final transcriptome was evaluated with TransRate version 1.0.3 [39], which is modified for and packaged with the ORP, and with BUSCO (Benchmarking Universal Single-Copy Orthologs) version 4.1.4 [40,41]. Searching the assembly for conserved single-copy orthologs found in orthologous sets of genes constructed from genomes representing eukaryotes (70 species: 255 BUSCOs), vertebrates (67 species: 3354 BUSCOs), and tetrapods (38 species: 5310 BUSCOs). A detailed quality assessment of the assembly with respect to known genes was further obtained with rnaQUAST version 2.2.0 [42] using the reference genomes of Anolis carolinensis, Gekko japonicus, and Python bivittatus.

2.4. Functional Annotation

Transcripts were functionally annotated as previously described [16]. Briefly, sequence homology searches were conducted against seven databases (Gene Onthology, GO; Kyoto Encyclopedia of Genes and Genomes, KEGG; EuKaryotic Orthologous Groups, KOG; InterPro; the non-redundant nucleotide database, NT; the non-redundant protein database, NR; SwissProt). To align our data to KEGG, KOG, NR, and NT and SwissProt, we used Diamond v0.8.31 [43] or the BLASTx [44] algorithm; matched transcripts were filtered by using a cut-off e-value of 1 × 10−25. Transcripts that aligned to the NR database were transferred to the GO database with Blast2GO v2.5.0 [45] and assigned into the following three groups: biological process, cellular components, and molecular functions. InterProScan5 v5.11-51.0 tool [46] was used to annotate against the InterPro databases. Blast v2.2.23 was used to search in SwissProt and hmmscan v3.0 [47] for search against Pfam database (for each sample individually). Candidate coding areas within the transcript sequences were predicted by TransDecoder v.3.0.1 [36]. For a coding sequence with multiple open reading frames (ORF), the longest one was selected. We also used the getorf program of the EMBOSS v6.5.70 package [48] to find the ORF of each transcript and mapped them to the Animal Transcription Factor DataBase (AnimalTFDB2.0). The threshold of transcript lengths used for annotation and downstream analyses was ≥200 bp.

2.5. Positively Selected Genes Related to Mechanisms of High-Altitude Adaptation

We selected transcripts of genes reported to exhibit molecular adaptation to high elevations in lizards and anurans as provided in the literature [7,15]. This comprised a total of 143 genes identified to be under positive selection (PSG) in high-elevational lineages of lizards (Phrynocephalus vlangalii) [15]. We included further 1481 PSGs of these lizards (P. erythrurus, P. putjatia, P. vlangalii) and of dicroglossid frogs (Nanorana, Quasipaa) [7], genes that were identified across an elevational gradient (~1000 m to 4500 m). These additional, individual genes were grouped according to the phylogenetic tree branches across the elevational gradient as presented in Sun et al. (2018) [7]: PSGs attributed to branches that represent (i) lowland species (~1000 m), (ii) species distributed in colline zones up to about 2000 m, (iii) submontane and montane species (2000 and 3500 m), and (iv) subalpine and alpine species (>3500 and 4500 m). Given the vertical distribution of Asymblepharus ladacensis in the Himalayas between ~2500 m and 4500 m [49], we expected to find primarily genes under positive selection reported as PSGs for the submontane and montane, as well as the subalpine and alpine species. To identify scincid orthologs to the genes under positive selection (high-altitude adaptations) in reptiles and amphibians, we used the corresponding coding sequences from Anolis carolinensis (AnoCar2.0, gene build from ensemble 104.2; we used Anolis gene numbers that were mentioned in the publications cited in the last paragraph) as a reference for blast searches. We performed blast searches with the newly sequenced and assembled transcriptome of A. ladacensis, as well as with publicly available transcriptome data from the following scincid lizards: Scincella lateralis (NCBI-SRA SRR629642), Lampropholis guichenoti (NCBI-SRA SRR4293354), Lepidothyris fernandi (NCBI-SRA SRR10360868), and Pseudemoia entrecasteauxii (NCBI-SRA SRR3099521). Only the best reciprocal hit between Anolis and each scincid species (and only if the e-value was below 10−25) were used for subsequent analyses. Alignments of orthologs to the Anolis reference were done with mafft v.7.455 [50] making use of the “adjustdirection” and “keeplength” options to get alignments that are in the same direction and keep reading frames intact. Alignments were inspected for good representation of all species under study. When one or more species had substantially shorter, incomplete contigs as best reciprocal hits, we omitted those from the alignment. A phylogenetic tree for each alignment was produced using FastTree v. 2.1.10 [51] with default settings, except for the nucleotide option. Alignments and trees were analyzed with the HyPhy (Hypothesis Testing using Phylogenies) package v. 2.5 [52,53] using the following methods (for details see [54]): (i) The BUSTED (Branch-Site Unrestricted Statistical Test for Episodic Diversification) [55] model, to test whether a given gene has been subject to positive, diversifying selection at any site, at any time (we tested all lineages for positive selection); (ii) FUBAR (Fast, Unconstrained Bayesian AppRoximation) [56], a Bayesian approach to infer which site(s) in a gene are subject to pervasive, i.e., consistently across the entire phylogeny, diversifying selection (we considered a posterior probability of at least 0.90 as significant); and (iii) the branch-site model aBSREL (adaptive Branch-Site Random Effects Likelihood) [57,58], to test whether codon sites and individual branches are subject to positive selection across the phylogeny. The threshold for significance in BUSTED and aBSREL was set at a p-value lower than 0.05. GO term and metabolic pathway enrichment analysis was done using pantherdb (pantherdb.org; accessed on 2 August 2021, [59]), using the genome of Anolis carolinensis as reference.

2.6. Data Availability

Data generated in this study are publicly available from the NCBI GenBank database under the Bioproject ID PRJNA750278, BioSamples SAMN20458631, SAMN20458632, and SAMN20458631. All sequence data were deposited in the NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra/; accessed on 2 August 2021) under the accession numbers SRR15283177, SRR15283178, and SRR15283179’; assembled sequences were transmitted to NCBI Transcriptome Shotgun Assembly Sequence Database (TSA, http://www.ncbi.nlm.nih.gov/genbank/tsa; accessed on 2 August 2021).

3. Results

3.1. Sequencing and Transcriptome Assembly

All RNA was reasonably high quality; A260/280 ratios ranged between 1.81 (brain), 1.76 (embryonic disc), and 2.09 (pooled tissues); RIN values were between 7.90, 8.20, and 9.10, respectively. RNA-seq libraries of the two tissues yielded a total of 74.78, 73.56, and 64.20 million raw sequence reads (Table 1). Pre-processing of reads via read trimming and read error correction removed approximately 2–4% of the raw data, resulting in 73.14, 71.89, and 61.40 million clean reads for the brain, embryonic disc, and pooled tissues (Table 1). GC content of these clean reads was 48%.
Table 1

Summary of sequencing data used to obtain the de novo transcriptome assemblies of Asymblepharus ladacensis based on paired-end Illumina sequencing. Final assemblies based on four unique assemblies per sample generated by ORP using different assemblers and k-mers.

Brain TissueEmbryonic Disc TissuePooled Tissue
Number of paired-end raw reads74,780,62873,557,12064,195,486
Number of cleaned reads73,139,29471,887,89261,397,610
Number of base pairs in final assembly102,605,07998,917,80747,613,446
Number of transcripts in final assembly151,718105,13366,696
Average transcript length (bp)676940712
Minimum transcript length (bp)131131131
Maximum transcript length (bp)17,54318,16815,866
N50121520521194
N90257311278
GC% content of the final ORP assembly0.480.480.48
The final de novo assemblies generated from ORP resulted in 151,718 (brain), 105,133 (embryonic disc), and 66,696 (pooled tissues) transcripts with a total length of ~102.61, ~98.92, and ~47.61 million bp, respectively. Transcripts had an average length of 676 bp (brain), 940 bp (embryonic disc), and 712 bp (pooled tissues), and an N50 of 1215 bp, 2052 bp, and 1194 bp (Table 1, Supplementary Table S1 and Figure S1). A total of 48,884 (32.22%; brain), 44,358 (42.19%; embryonic disc), and 25,772 (38.64%; pooled tissues) transcripts were longer than 500 bp (Supplementary Table S2).

3.2. Assembly Completeness

TransRate’s optimal assembly score (min 0.0, max 1.0) is considered to be a good parameter of the quality of an assembly [32]; it captures the confidence and completeness of the assembly. The TransRate scores of our final assemblies were high, ranging between 0.44 (optimized score 0.49) for the brain sample, 0.45 (optimized score 0.57) for the embryonic disc sample, and 0.44 (optimized score 0.52) for the pooled tissues. A transRate score >0.22 is generally thought to be acceptable [31,39]. More than 90% of the reads were used to assemble the transcriptomes, and 87% (brain, pooled tissues), as well as 90% (embryonic disc) of the fragments, were considered as good mappings, while only 1.6% (brain), 3.7% (embryonic disc), and 2.7% (pooled tissues) of assembled contigs had no coverage (Supplementary Table S1). The assessment of completeness of our assemblies by the BUSCO pipeline resulted in a moderate to high percentage of complete eukaryotic orthologues (from 69.8 to 98.0% of 255 BUSCOs) but also a significantly higher percentage of putative paralogues; in the vertebrate (3354 BUSCOs) and tetrapod (5310 BUSCOs) databases, more than half of the markers were recovered completely in the brain and embryonic tissue (Table 2). The fraction of missing BUSCOs ranged between 0.8% (Eukaryota; embryonic disc) and 54.9% (Tetrapoda; pooled tissues). These BUSCO values are comparable to recent de novo transcriptome studies in many vertebrates [60,61,62]. BUSCO recovery rate tends to be highest when full organism and/or multiple developmental stages were used to generate the transcriptomes, compared to those assembled from specific organs or tissues [63].
Table 2

Benchmarking Universal Single-Copy Orthologs (BUSCO) results based on the eukaryotic (EU, eukaryota_odb10; 255 BUSCOs), vertebrates (VB, vertebrata_odb10; 3354 BUSCOs), and tetrapod databases (TP, tetrapoda_odb10, 5310 BUSCOs) searched. BUSCO searches for completed, single-copy, duplicated, fragmented, and missing orthologs within given genomes.

BUSCO StatisticsBrainEmbryonic DiscPooled Tissues
EUVBTPEUVBTPEUVBTP
Complete220/255(86.3%)2052/3354(61.1%)2696/5310(50.8%)250/255(98.0%)2743/3354(81.8%)3778/5310(71.1%)178/255(69.9%)1405/3354(41.9%)1750/5310(33.0%)
Single-copy189/255(74.1%)1759/3354(52.4%)2305/5310(43.4%)185/255(72.5%)1871/3354(55.8%)2555/5310(48.1%)148/255(58.0%)1150/3354(34.3%)1421/5310(26.8%)
Duplicated31/255(12.2%)293/3354(8.7%)391/5310(7.4%)65/255(25.5%)872/3354(26.0%)1223/5310(23.0%)30/255(11.8%)255/3354(7.6%)329/5310(6.2%)
Fragmented23/255(9.0%)589/3354(17.6%)709/5310(13.4%)3/255(1.2%)221/3354(6.6%)323/5310(6.1%)47/255(18.4%)654/3354(19.5%)645/5310(12.1%)
Missing12/255(4.7%)713/3354(21.3%)1905/5310(35.8%)2/255(0.8%)390/3354(11.6%)1209/5310(22.8%)30/255(11.8%)1295/3354(38.6%)2915/5310(54.9%)
Coverage of specific gene databases (Anolis carolinensis, Gekko japonicus, Python bivittatus) ranged between 7% and 29%, being highest for the Gekko reference genome (Supplementary Table S2). Up to 40,000 transcripts (22–39%) could be aligned to one of the three databases. The duplication ratio varied between 1.3 and 1.5 and was in the range reported for vertebrate transcriptomes [32,64]. Although the proportion of misassembled contigs was low (<2%), the assemblies showed only a small number of 95%-assembled genes and isoforms (Supplementary Table S2). We assume that this might reflect biological novelty in the study species rather than fragmentation of the assemblies [41]. For example, the low proportion of completeness against the three reptile gene databases contrasts with a >40% completeness score for the vertebrate gene set and could be the result of an overrepresentation of gene sets from more intensively studied lineages [65]. Alternatively, suboptimal sample quality and a resulting higher proportion of fragmented genes due to potential RNA degradation could be a reason for the relatively low number of assembled genes and lower Eukaryota BUSCO scores, especially in the pooled tissues sample.

3.3. Functional Annotation

Annotation of the complete set of transcriptomes from all three samples resulted in 39,975 (51.94%) transcripts annotated in at least one of the seven databases used for functional annotation; 7292 sequences generated hits in all of these databases (Table 3). Approximately 70% of the top hits matched to genes from Gekko japonicus (7716; 23.07%), Pogona vitticeps (7239; 21.65%), Anolis carolinensis (5399; 16.14%), and Python bivittatus (3176; 9.5%), Figure 2.
Table 3

Number (N) of transcripts identified in Asymblepharus ladacensis that are shared (=Intersection) and unique among seven annotation database resources. GO—Gene Onthology; InterPro—integrative protein signature database; KEGG—Kyoto Encyclopedia of Genes and Genomes; KOG— EuKaryotic Orthologous Groups; NR—non-redundant protein database; NT—non-redundant nucleotide database; SwissProt—Swiss Protein Sequence Database.

TotalNRNTSwissProtKEGGKOGInterProGOIntersectionOverall
N76,96833,44434,11430,99428,96126,60827,01311,010729239,975
%10043.4544.3240.2737.6334.5735.1014.309.4751.94
Figure 2

Annotation of the Asymblepharus ladacensis transcriptome. (a) Species distribution of the top BLASTx hit performed against NR database; (b) GO (Gene Onthology) assignments as predicted by Blast2GO; (c) functional distribution of KOG (EuKaryotic Orthologous Groups) annotation and (d) KEGG (Kyoto Encyclopedia of Genes and Genomes) classifications of assembled transcripts.

In terms of the biological process ontology (GO database categories), the most common categories were cellular processes (6935), metabolic processes (4884), and biological regulation (4549). The most frequent classifications for the cellular component ontology were cellular (7485), cell compartments (7446), and organelles (5922). Regarding the molecular function, major categories involved binding (6397), catalytic activity (3899), and molecular function regulator (787) (Figure 2). The KOG functional classification revealed genes of signal transduction mechanisms (6714), general function (6592), unknown function (2599), posttranslational modification (2580), and transcription (2339) as top five categories (Figure 2). We found 44,956 transcripts in the KEGG database that aligned to entries associated with pathways of cellular processes (5650), environmental information processing (5752), genetic information processing (3409), human diseases (12,499), metabolism (7697), and organismal systems (9949). Predominantly, the genes were enriched in “signal transduction” (4256), followed by “global and overview maps” (2901), “cancers: overview” (2665), “immune system” (2324), and “infectious diseases: viral” (2299) categories (Figure 2).

3.4. Positive Selection

Among the 143 PSGs previously reported for the high-elevation lineage of the toad-headed agama Phrynocephalus vlangalii [15], we found ten genes to be likewise under positive selection in Asymblepharus based on all three tests that we performed. Two of these genes (IL1RAP, GRK6) were specific to the Asymblepharus branch of the gene tree (Table 4 and Supplementary Table S3).
Table 4

Summary of the positive selection analysis for high-altitude candidate genes of a toad-headed agama (Phrynocephalus vlangalii) [15] tested likewise positive in Asymblepharus (A) using BUSTED (B); p-value > 0.05) [55], FUBAR (FB; number of sites under positive selection) [56] and aBSREL (aB) [57,58] methods. PSGs are represented by the last six digits of the anole lizard’s (Anolis carolinensis) ENSEMBL gene and transcript identifiers (starting with ENSACAG00000, or ENSACAT00000, respectively).

GeneIDGenep-Value [15]Gene DescriptionTranscriptBFBaB
000773IL1RAP3.58 × 10−2Interleukin 1 receptor accessory protein000813<0.00 × 10−52A.
000907MICU11.79 × 10−2Mitochondrial calcium uptake 10009091.30 × 10−31yes
001142TARBP11.39 × 10−6TAR RNA binding protein 1001104<0.00 × 10−51yes
002254MIA31.83 × 10−2Melanoma inhibitory activity family member 30022761.93 × 10−21yes
002549RPS21.99 × 10−2Ribosomal protein S20025415.00 × 10−42yes
002995RNF104.38 × 10−2Ring finger protein 100030462.63 × 10−25yes
003987NUP1071.54 × 10−4Nucleoporin 107kDa004158<0.00 × 10−51yes
006133GRK61.07 × 10−2G protein-coupled receptor kinase 60062521.16 × 10−22A.
007074SMC41.56 × 10−3Structural maintenance of chromosomes 40071911.00 × 10−41yes
015860SH3RF14.55 × 10−3SH3 domain containing ring finger 10159688.90 × 10−31yes
Out of the 410 transcripts reported to be under positive selection in lowland frog and lizard species [7], 321 could be identified in Asymblepharus. Of these, 23 (7.2%) transcripts were found to be under positive selection in the Ladakh Ground Skink in all three tests (Table 5). Similarly, a total of 32 (7.1% of 449 transcripts) PSGs of colline, 24 (7.6% of 314 transcripts) of submontane and montane, and 24 (6.9% of 350 transcripts) of subalpine and alpine frog and lizard species were tested to be under positive selection in Asymblepharus. Moreover, out of the 32 parallel PSGs that were identified in both high-elevation frog and lizard species [7], one gene (PGS1) showed positive selection in the Ladakh Ground Skink (Table 5). Analyzing these genes under positive selection for GO terms and pathways reveals an overrepresentation of genes involved in the following processes: low-density lipoprotein receptors and catabolic processes; mitochondrial citrate transmembrane transport; glycolysis and fructose/galactose metabolism; nucleoside phosphate binding; p53 pathway feedback loop (involved in DNA repair), platelet-derived growth (PDGF) factor binding (involved in blood-vessel formation).
Table 5

Summary of the positive selection analysis for candidate genes of lineages of dicroglossid frogs and toad-headed agamas [7] identified across an elevational gradient, tested likewise positive in Asymblepharus (A) using BUSTED (B); p-value > 0.05) [55], FUBAR (FB; number of sites under positive selection) [56] and aBSREL (aB, number of branches with positive selection) [57,58] methods. Ensembl gene and transcript identifier (ENSACAG00000, ENSACAT00000) refers to Anolis carolinensis.

LowlandGene DescriptionTranscriptBFBaB
GeneIDGene
000146PCSK9Proprotein convertase subtilisin/kexin type 90001634.40 × 10−312
000201NPC1NPC intracellular cholesterol transporter 10002615.10 × 10−321
000264LAMP1Lysosomal associated membrane protein 10002472.90 × 10−382
000531SEPT12Septin 120006021.29 × 10−211
000768SYKSpleen associated tyrosine kinase000802<0.00 × 10−513
000798WBP4WW domain binding protein 40008042.00 × 10−231
000955QSOX1Quiescin sulfhydryl oxidase 10009629.20 × 10−331
001070CADM1Cell adhesion molecule 10011881.00 × 10−411
002270KANK1KN motif and ankyrin repeat domains 10022948.00 × 10−422
003015VLDLRVery low-density lipoprotein receptor0030961.66 × 10−231
003460ANKRD12Ankyrin repeat domain 120034841.80 × 10−311
003908MDM1Mdm1 nuclear protein0039232.30 × 10−322
005884CCDC66Coiled-coil domain containing 66025744<0.00 × 10−511
006279GLYR1Glyoxylate reductase 1 homolog006325<0.00 × 10−513
007694ACOX1Acyl-CoA oxidase 1007823<0.00 × 10−581
008005PHACTR2Phosphatase and actin regulator 20080474.30 × 10−351
008420VTA1Vesicle trafficking 10084633.36 × 10−211
009938 010074<0.00 × 10−563
013301COL1A2Collagen type I α 2 chain0136141.10 × 10−3173
013917MSH2MutS homolog 20140761.11 × 10−211
013984LRRCC1Leucine-rich repeat and coiled-coil centrosomal protein 10141002.50 × 10−212
016683TENT2Terminal nucleotidyltransferase 2016777<0.00 × 10−522
017936ATP6V0A1ATPase H+ transporting V0 subunit a10180083.50 × 10−311
Up to 2000 m
Gene ID GeneGene descriptionTranscriptBFBaB
000608FBXL3F-box and leucine-rich repeat protein 30005483.47 × 10−221
000768SYKSpleen associated tyrosine kinas 000802<0.00 × 10013
000837KATNB1Katanin p80 (WD repeat containing) subunit B 10008964.96 × 10−211
000955QSOX1Quiescin sulfhydryl oxidase 10009628.90 × 10−331
002090KIAA0232KIAA0232002069<0.00 × 10052
002091RANBP2RAN binding protein 20021007.00 × 10−321
002556SLC25A1Solute carrier family 25 member 10025435.00 × 10−321
002948RSPH1Radial spoke head component 10029631.59 × 10−221
003975TOGARAM1TOG array regulator of axonemal microtubules 1004004<0.00 × 10−513
003987NUP107Nucleoporin 107004158<0.00 × 10−512
004938RARS1Arginyl-tRNA synthetase 1005003<0.00 × 10−532
005569CCT4Chaperonin containing TCP1 subunit 40056832.00 × 10−421
006189PARP1Poly(ADP-ribose) polymerase 10063564.11 × 10−241
006926CTNND1Catenin delta 10070063.00 × 10−411
007100ZNF277Zinc finger protein 2770071731.17 × 10−233
007489PRDX4Peroxiredoxin 40074982.20 × 10−311
007985POLR3ARNA polymerase III subunit A0080771.37 × 10−211
008005PHACTR2Phosphatase and actin regulator 20080473.90 × 10−351
008475SNTA1Syntrophin α 10085445.20 × 10−351
008991KEAP1Kelch-like ECH-associated protein 10090102.00 × 10−431
009213SWAP70Switching B cell complex subunit SWAP700092428.70 × 10−311
013059JPH1Junctophilin 10130931.09 × 10−241
013301COL1A2Collagen type I α 2 chain0136141.10 × 10−3173
013326ADGRF5Adhesion G protein-coupled receptor F50134054.00 × 10−411
015062COL3A1Collagen type III α 1 chain0155393.00 × 10−2114
015374RANBP17RAN binding protein 170156307.90 × 10−332
015422BAIAP2BAR/IMD-domain-containing adaptor protein 20155404.96 × 10−222
016662PGS1Phosphatidylglycerophosphate synthase 1016740<0.00 × 10−511
017208FLOT1Flotillin 1017291<0.00 × 10−542
017228SLC4A1Solute carrier family 4 member 1 (Diego blood group) 0173454.36 × 10−221
017316PTBP3Polypyrimidine tract binding protein 30174076.00 × 10−423
018003FGFR1Fibroblast growth factor receptor 10180806.00 × 10−421
2000–3500 m
Gene IDGeneGene descriptionTranscriptBFBaB
000306ZNF622Zinc finger protein 6220002918.50 × 10−351
000907MICU1Mitochondrial calcium uptake 10009091.30 × 10−311
001396ABCC3ATP-binding cassette subfamily C member 3001480<0.00 × 10−511
002254MIA3Melanoma inhibitory activity family member 30022762.22 × 10−212
002779CDH1Cadherin 10030311.50 × 10−311
004137STARD13StAR-related lipid transfer domain containing 130042352.00 × 10−431
005084COL1A1Collagen type I α 1 chain0052985.00 × 10−4182
006739RALGAPBRal GTPase-activating protein, β subunit0068039.40 × 10−311
006920NEO1Neogenin 1007074<0.00 × 10−511
006926CTNND1Catenin delta 10070063.00 × 10−411
007694ACOX1Acyl-CoA oxidase 10078230.00 × 10081
007907FLOT2Flotillin 20080150.00 × 10011
008206ADGRG6Adhesion G protein-coupled receptor G60084051.21 × 10−241
009366PLEKHG3Pleckstrin homology and RhoGEF domain containing G30266491.43 × 10−212
010640MYLKMyosin light chain kinase0107353.00 × 10−421
011707FYNFYN proto-oncogene, Src family tyrosine kinase0117605.00 × 10−311
013938SPEGSPEG complex locus0233423.99 × 10−222
014232RBBP5RB binding protein 5014319<0.00 × 10−552
014373CD82Tetraspanin0144582.92 × 10−251
015062COL3A1Collagen type III α 1 chain0155392.92 × 10−2114
015121SLC26A4Solute carrier family 26 member 40152043.02 × 10−221
015894PNNPinin, desmosome associated protein0159731.56 × 10−211
016077 0161331.00 × 10−343
017036NIF3L1NGG1 interacting factor 3 like 10171108.50 × 10−341
3500–4500 m
GeneID GeneGene descriptionTranscriptBFBaB
000773IL1RAPInterleukin 1 receptor accessory protein000813<0.00 × 10−525
001545ZCCHC8Zinc finger CCHC-type containing 80015732.78 × 10−212
002034CCDC138Coiled-coil domain containing 1380020224.00 × 10−411
003612NR3C2Nuclear receptor subfamily 3 group C member 20036964.97 × 10−271
004231COL6A3Collagen type VI α 3 chain004512<0.00 × 10−562
005058RPL11Ribosomal protein L110050761.67 × 10−211
005562PFKMPhosphofructokinase, muscle0059574.83 × 10−232
006776FUSFUS RNA-binding protein0068956.70 × 10−311
006926CTNND1Catenin delta 10070063.00 × 10−411
007250ATP11BATPase phospholipid transporting 11B (putative)0073921.40 × 10−3121
007887ABHD3Abhydrolase domain containing 3, phospholipase0078964.00 × 10−322
009164NADK2NAD kinase 2, mitochondrial0092111.18 × 10−211
009700 0096865.50 × 10−311
009800FLNAFilamin A0102000.00 × 10022
011843SENP7SUMO specific peptidase 70118501.43 × 10−211
013301COL1A2Collagen type I α 2 chain0136141.20 × 10−3173
013313AGAP2ArfGAP with GTPase domain, ankyrin repeat, PH domain 20134553.40 × 10−311
014695RFWD2COP1 E3 ubiquitin ligase0147891.00 × 10−322
014919ALDOAAldolase, fructose-bisphosphate A0149842.99 × 10−221
015785CDK14Cyclin-dependent kinase 140158726.10 × 10−311
016662PGS1Phosphatidylglycerophosphate synthase 1016740<0.00 × 10−511
017054NFIXNuclear factor I X0171369.50 × 10−311
017166FARSAPhenylalanyl-tRNA synthetase subunit α0172392.58 × 10−212
017936ATP6V0A1ATPase H + transporting V0 subunit a10180083.50 × 10−311
Frogs and lizards, common genes at similar elevation
Gene IDGeneGene descriptionTranscriptBFBaB
016662PGS1Phosphatidylglycerophosphate synthase 1016740<0.00 × 10−511

4. Discussion

This study presents the first transcriptome sequences from different tissues of the Ladakh Ground Skink Asymblepharus ladacensis, a high-altitude reptile species endemic to the Greater Himalayas. We provide high-quality de novo transcript assemblies and well-annotated results, enabling comparisons with transcriptomes of related scincid or higher lizards available at public databases. Although squamate reptiles (lizards and snakes) represent one of the most diverse vertebrate groups with over 10,000 species spanning more than 200 million years of evolution [66], genomic data of squamates are limited and still poorly studied [67]. To our knowledge, no annotated transcriptome has been published for the genus Asymblepharus so far. Although our study is mostly descriptive, it has yielded discoveries with respect to genes known to play roles in the adaptation of vertebrates to high elevations and adds data resources for genomic studies in Himalayan herpetofauna. Yang et al. [15] used comparative transcriptomic analyses of two closely related lizards, Phrynocephalus przewalskii from low elevations (500–1500 m a.s.l.) and P. vlangalii from high elevations (2000–4600 m a.s.l.), to identify candidate genes that are potentially linked to adaptation to high elevation environments. In addition, Sun et al. [7] tested amphibian and reptile populations at various altitudes in Tibet, which show parallel evolution. These studies demonstrated convergent and continuous adaptation to high elevations in Anura (Ranidae) and Sauropsida (Agamidae). Genes with related functions, especially DNA repair and energy metabolism, exhibit featured rapid changes and are positively correlated to elevation. These data let us assume that a similar genomic high-elevation selection syndrome might be detectable in Asymblepharus, sampled in 2714 m a.s.l. (Methods), and with a vertical distribution between ~2500 and 5500 m ([26] and references therein). Indeed, we identified a total of 10 out of 143 and 104 out of ~1500 key genes [7,15] under positive selection in the Asymblepharus transcriptome. Interestingly, several of the 10 genes (Table 4) have been reported to be under positive selection or significantly enriched or differentially methylated for pathways consistent with physiological compensation for limited oxygen in high elevation dwellers, e.g., IL1RAP [68] (human), MIA3 [69] (pika), and MICU1 [70] (Ladakhi cow). Several genes we identified to be under positive selection have GO terms that suggest their involvement in, e.g., energy metabolism and DNA repair. It is well-known that high UV radiation and hypoxia are major challenges for organisms in high-altitude habitats. The extreme environments in the Greater Himalayas necessitate high energy metabolism, strong resistance to UV by an efficient DNA repair, and adaptation to hypoxia in species endemic to these mountains. However, the proportion of genes we found in Asymblepharus per ‘altitude-specific’ gene group does not appear to reflect a convergently evolved gene set as previously reported [7]. In other words, we found a comparable number of genes under positive selection from the group of genes identified for lowland species and those identified for colline, montane, or alpine species. Given the idea of convergent evolutionary changes and, thus, a gradual accumulation of high-elevation genetic adaptations, a higher number of candidate genes reported for montane and alpine species would have been expected in Asymblepharus compared to those genes reported in species distributed in lower elevations. The potential reasons for this supposed lack of confirmation of the suggested pattern are complex. A major limitation is that our analyses were restricted so far to a single Asymblepharus transcriptome. Due to limitations of sampling, data were unobtainable from radiation of species or an altitudinal gradient as available for frogs and other lizards [7,15], preventing us from intraspecific comparisons. Moreover, data of a single specimen cannot reflect the breadth of allelic diversity in the selected genes, putatively associated with adaptations to high altitude, especially in the view of wide vertical distribution. Another deficiency results from the fact that only one female but no male could be sequenced. Indeed, many genes that might be sex-specifically expressed might not have been sequenced or characterized with our approach. Therefore, future research with multiple high-elevation species and populations across a larger scale of altitudinal variation should validate genes known to contribute to high elevation adaptation in scincid reptiles and, thus, yield additional evidence for potential convergent evolution. A promising future genomic approach might be to include populations of Asymblepharus between the species‘ lower and upper distributional periphery, sampling three populations at each of four elevation levels (e.g., <2000 m, 3000 m, 4000 m, and 5000 m a.s.l.), to investigate the expression of genes presumably related to adaptions to high altitude. It would also be desirable to reveal potential tissue-specific expression patterns across altitude-associated genes, samples of organs sensitive to UV radiation, and for oxygen, e.g., heart, lung, skin, and embryonic structures, which are particularly of interest. Moreover, to address adaptive convergence, additional comparative transcriptomic analyses on Japalura or Laudakia species might be promising since these taxa have a similarly broad vertical distribution as Asymblepharus and often co-occur with sympatric ground skinks. Asymblepharus might even become an excellent model to study local adaptation by reciprocal transplantation experiments between high and low altitude populations. Such studies could enhance our understanding of how organisms might cope with rapid environmental changes in fluctuating demographic contexts. However, such intensive field studies require adequate access to suitable habitats in the Himalayas and, thus, a much higher logistic and financial effort than available in our pilot study.

5. Conclusions

In summary, our present study provides the first transcriptomic data for a Himalayan reptile of the genus Asymblepharus and evidence for genes under positive selection for high-altitude adaptation of the Ladakh Ground Skink. Further research is encouraged to validate the key genes confirmed in this study by population genetic and functional genomic approaches. Comparative sequencing analyses for other Asymblepharus species may enable further insights into the adaptive basis of reptiles to different altitude environments in the Himalayas. Our data are available for future investigations on the evolution and environmental adaptation in Himalayan high-altitude vertebrates.
  56 in total

1.  Fast and sensitive protein alignment using DIAMOND.

Authors:  Benjamin Buchfink; Chao Xie; Daniel H Huson
Journal:  Nat Methods       Date:  2014-11-17       Impact factor: 28.547

2.  Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection.

Authors:  Martin D Smith; Joel O Wertheim; Steven Weaver; Ben Murrell; Konrad Scheffler; Sergei L Kosakovsky Pond
Journal:  Mol Biol Evol       Date:  2015-02-19       Impact factor: 16.240

3.  BUSCO: Assessing Genome Assembly and Annotation Completeness.

Authors:  Mathieu Seppey; Mosè Manni; Evgeny M Zdobnov
Journal:  Methods Mol Biol       Date:  2019

4.  Species groups distributed across elevational gradients reveal convergent and continuous genetic adaptation to high elevations.

Authors:  Yan-Bo Sun; Ting-Ting Fu; Jie-Qiong Jin; Robert W Murphy; David M Hillis; Ya-Ping Zhang; Jing Che
Journal:  Proc Natl Acad Sci U S A       Date:  2018-10-22       Impact factor: 11.205

5.  Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq.

Authors:  Bingxin Lu; Zhenbing Zeng; Tieliu Shi
Journal:  Sci China Life Sci       Date:  2013-02-08       Impact factor: 6.038

6.  Comparative genomic investigation of high-elevation adaptation in ectothermic snakes.

Authors:  Jia-Tang Li; Yue-Dong Gao; Liang Xie; Cao Deng; Peng Shi; Meng-Long Guan; Song Huang; Jin-Long Ren; Dong-Dong Wu; Li Ding; Zi-Yan Huang; Hu Nie; Devon P Humphreys; David M Hillis; Wen-Zhi Wang; Ya-Ping Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2018-07-31       Impact factor: 11.205

7.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Authors:  Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev
Journal:  Nat Protoc       Date:  2013-07-11       Impact factor: 13.491

8.  Blast2GO: A comprehensive suite for functional analysis in plant genomics.

Authors:  Ana Conesa; Stefan Götz
Journal:  Int J Plant Genomics       Date:  2008

9.  Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri.

Authors:  Guo-Dong Wang; Bao-Lin Zhang; Wei-Wei Zhou; Yong-Xin Li; Jie-Qiong Jin; Yong Shao; He-Chuan Yang; Yan-Hu Liu; Fang Yan; Hong-Man Chen; Li Jin; Feng Gao; Yaoguang Zhang; Haipeng Li; Bingyu Mao; Robert W Murphy; David B Wake; Ya-Ping Zhang; Jing Che
Journal:  Proc Natl Acad Sci U S A       Date:  2018-05-14       Impact factor: 11.205

10.  Molecular systematics of the Philippine forest skinks (Squamata: Scincidae: Sphenomorphus): testing morphological hypotheses of interspecific relationships.

Authors:  Charles W Linkem; Arvin C Diesmos; Rafe M Brown
Journal:  Zool J Linn Soc       Date:  2011-11-25       Impact factor: 3.286

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.