Literature DB >> 31964888

Comprehensive transcriptome data for endemic Schizothoracinae fish in the Tibetan Plateau.

Chaowei Zhou1,2, Shijun Xiao1,3, Yanchao Liu1, Zhenbo Mou1, Jianshe Zhou1, Yingzi Pan1, Chi Zhang1, Jiu Wang1, Xingxing Deng2, Ming Zou3, Haiping Liu4.   

Abstract

The Schizothoracinae fishes, endemic species in the Tibetan Plateau, are considered as ideal models for highland adaptation and speciation investigation. Despite several transcriptome studies for highland fishes have been reported before, the transcriptome information of Schizothoracinae is still lacking. To obtain comprehensive transcriptome data for Schizothoracinae, the transcriptome of a total of 183 samples from 14 representative Schizothoracinae species, were sequenced and de novo assembled. As a result, about 1,363 Gb transcriptome clean data was obtained. After the assembly, we obtain 76,602-154,860 unigenes for each species with sequence N50 length of 1,564-2,143 bp. More than half of the unigenes were functionally annotated by public databases. The Schizothoracinae fishes in this work exhibited diversified ecological distributions, phenotype characters and feeding habits; therefore, the comprehensive transcriptome data of those species provided valuable information for the environmental adaptation and speciation of Schizothoracinae in the Tibetan Plateau.

Entities:  

Mesh:

Year:  2020        PMID: 31964888      PMCID: PMC6972879          DOI: 10.1038/s41597-020-0361-6

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

The Tibetan Plateau, the world’s largest and highest plateau, has unique geographical and climatic characteristics, such as the high altitude, dramatic difference in day and night temperature, strong solar radiation[1]. Due to the special geographical environment, many highland species that are distributed in and around the Tibetan Plateau have gradually formed unique characteristics to tolerate harsh living conditions during the long-term evolution[2]. The Schizothoracinae fishes, members of family Cyprinidae, are endemic to Asian highlands including 15 genera and ca. 100 species[3]. In China, more than 70 species, account for over 80% of the world’s Schizothoracine fishes, are mainly distributed in lakes and rivers of the Tibetan Plateau and adjacent areas[4]. According to the morphological characteristics, the Schizothoracine fishes can be divided into three groups: the primitive group, the specialized group and the highly specialized group[5]. Several researches on the morphology, archaeology and molecular biology of Schizothoracine fishes on the Tibetan Plateau have shown that there is close correlation between the species diversity and the uplift of the Tibetan Plateau[6,7] and the morphological traits of Schizothoracine fishes is related with specific periods of geological evolution of the Tibetan Plateau such as pharyngeal teeth, scales and whiskers[5]. Therefore, the Schizothoracine fishes are considered as good model species for the investigations on highland adaptation and speciation. More genomic and transcriptome data are required to decipher the relationship of the speciation and the uplift of the Tibetan Plateau for the Schizothoracine fishes. Recent advances in sequencing technologies have offered the opportunity to obtain the genomes of numerous highland animals, enabling us to better understand the adaptive evolution mechanism of highland fish species. So far, the vast majority of the genome researches on the environmental adaptation were performed on highland terrestrial animal (e.g., yak[8] and Tibetan antelope[9]). Few study was reported on highland fish, especially for Schizothoracinae fishes. One of the major reasons was the complexity of the genome, such as high content of repeats and polyploidy[10]. Transcriptome sequencing is a good choice to construct the sequence dataset for transcribed genes in many polyploidy cases[11]. Despite several transcriptome analyses on highland adaptation have reported in Schizothoracine fishes before[12-16], the species and tissues used for transcriptome sequencing were still limited. There is a great demand for more transcriptome sequencing data for the adaptation and evolution of Schizothoracine fishes in the Tibetan Plateau. In this work, we obtained and released a total of ∼1.36 Tb of high-quality transcriptome data for 183 samples of 14 representative Schizothoracine fish covering 5 genera from 6 drainage systems and 3 lakes in the Tibetan Plateau (Tables 1, 2 and Fig. 1). The distribution, ecological position and phenotype difference making the transcriptome of those Schizothoracine species invaluable genetic resources for the adaptation and speciation of endemic fish in the Tibetan Plateau.
Table 1

Sample information for the species in the study.

GenusSpeciesAbbreviationsGeographic regionDrainagePartial morphological feature
Pairs of whiskersBody scales
SchizothoraxS. oconnoriSocoGongga, Tibet, ChinaYarlungZangbo River2small scale
S. lissolabiatusSlisChangdu, Tibet, ChinaLancang River2small scale
S. nukiangensisSnukBomi, Tibet, ChinaNujiang River2small scale
S. plagiostomusSplaAli, Tibet, ChinaShiquan River2small scale
S. labiatusSlabAli, Tibet, ChinaShiquan River2small scale
S. davidiSdavGanzi, Sichuan, ChinaJinsha River2small scale
PtychobarbusP. kaznakoviPkazChangdu, Tibet, ChinaLancang River1moderate degeneration
GymnocyprisG. namensisGnamBange, Tibet, ChinaLake Namtso0absence
G. przewalskiiGprzHaibei, Qinghai, ChinaLake Qinghai0absence
G. eckloniGeckXunhua, Qinghai, ChinaYellow River0absence
G. selincuoensisGselBange, Tibet, ChinaLake Siling Co0absence
SchizopygopsisS. younghusbandiSyouLazi, Tibet, ChinaYarlungZangbo River0absence
S. pylzoviSpylXunhua, Qinghai, ChinaYellow River0absence
PlatypharodonP. extremusPextGonghe, Qinghai, ChinaYellow River0absence
Table 2

Sample collected for the transcriptome sequencing.

SpeciesThe number of samples
MuscleLiverSpleenSkinSwim bladderGutEyeGillKidneyHeartBrainGonadsVibrissaFatBloodTotal
S. oconnori11111111111111115
S. lissolabiatus1111111111111114
S. nukiangensis111111111111113
S. plagiostomus11111111111112
S. labiatus111111111111113
S. davidi1111111111111114
P. kaznakovi111111111111113
G. namensis111111111111113
G. przewalskii111111111111113
G. eckloni1111111111111114
G.selincuoensis111111111111113
S. younghusbandi11111111111112
S. pylzovi11111111111112
P. extremus11111111111112
Total1414141313141414141313137310183

The abbreviations of species were identical with those in Table 1. The short line represented the absence of the sample in the transcriptome sequencing.

Fig. 1

Sample sites of 14 Schizothoracine species in our study. The abbreviations of species were identical with those in Table 1. The altitude was represented by the color bar from white (high alititude) to green (low altitude).

Sample information for the species in the study. Sample collected for the transcriptome sequencing. The abbreviations of species were identical with those in Table 1. The short line represented the absence of the sample in the transcriptome sequencing. Sample sites of 14 Schizothoracine species in our study. The abbreviations of species were identical with those in Table 1. The altitude was represented by the color bar from white (high alititude) to green (low altitude).

Methods

Sample collection

To select representative Schizothoracine species in our study, we chose 14 species of 5 genera in Schizothoracine fishes representing the three specialized group based on the previous morphology study[5]. The primitive group in our study contains 6 species in Schizothorax genus, such as Schizothorax oconnori (S. oconnori), Schizothorax lissolabiatus (S. lissolabiatus), Schizothorax nukiangensis (S. nukiangensis), Schizothorax plagiostomus (S. plagiostomus), Schizothorax labiatus (S. labiatus) and Schizothorax davidi (S. davidi). The specialized group contains Ptychobarbus kaznakovi in Ptychobarbus genus. The highly specialized group contains 7 species in 3 genera, such as Gymnocypris namensis (G. namensis), Gymnocypris przewalskii (G. przewalskii), Gymnocypris eckloni (G. eckloni) and Gymnocypris selincuoensis (G. selincuoensis) of the Gymnocypris genus, Schizopygopsis younghusbandi (S. younghusbandi), and Schizopygopsis pylzovi (S. pylzovi) of the Schizopygopsis genus, Platypharodon extremus (P. extremus) in the Platypharodon genus. The samples were collected from the six major rivers and three lakes of the Tibetan Plateau including Yarlung Zangbo River, Shiquan River, Lancang River, Nujiang River, Jinsha River, Yellow River, Lake Namtso, Lake Qinghai, Lake Siling Co (Fig. 1 and Table 1). We noted that the Schizothoracine species in this work exhibited obvious morphology diversification, especially on the whiskers and scales. For example, Gymnocypris, Schizopygopsis and Platypharodon species were naked, while small scales were observed in the Schizothorax and Ptychobarbus genus (Table 1). All individuals were narcotized with MS-222 (Solarbio, Beijing, China) for a few minutes before the sample collection. A total of 183 tissues were collected from 14 representative Schizothoracine fish in our study, including muscle, liver, spleen, gonads, skin, swim bladder, gut, eye, gill, kidney, heart, brain, blood, fat, vibrissa (Table 2). All tissues were immediately frozen in liquid nitrogen after the dissection and then stored at −80 °C until total RNA isolation.

RNA extraction and sequencing

Total RNA was isolated from each sample using RNAiso Plus (TaKaRa, Dalian, China) according to the manufacturer’s instructions and was determined with a photometer for RNA sample integrity (Thermo Scientific, USA). RNA samples passing the quality criteria (see technical validation for detail) were used for the library preparation and RNA sequencing. All samples were sequenced on an Illumina HiSeq X Ten platform with 150 bp paired-end mode. In preset research, a total of more than 10 billion raw PE reads were obtained from all libraries. After filtering by removal of adaptor sequences, contaminated reads and poor–quality reads, we obtained approximately 1.4 Tb of clean data with Q20 bases larger than 96.94%. The average of 7.6 Gb sequencing data were obtained for samples (Supplementary Table S1). The transcriptome data for Oxygymnocypris stewarti in the Oxygymnocypris genus that reported in our previous studies[17] were also used for comparision in the work.

De novo assembly of fish transcriptome

We firstly utilized publicly available program Trinity software version 2.5.1[18] with default parameters for de novo assembly of fish transcripts. The length of <200 bp contigs from each assembly libraries were discarded for subsequent analysis. Next, the redundancies of the transcripts for each species in the dataset were eliminated using the CD-HIT-EST program included in the cd-hit-v4.6.6 package[19], with parameters -c 0.98 -n 11 -d 0 -M 0 -T 8 in the final assembly and the longest transcript in each cluster was considered as unigenes. After assembly, the unigene numbers for 15 Schizothoracine species ranged from 76,602 to 154,860 (Table 3). Of these, the highest number of unigenes was observed in P. kaznakovi, and the lowest in S. labiatus. The GC contents of transcripts for all species were rather stable around 40–42%. The N50 length of unigenes ranged from 1,564 to 2,143 bp, with an average of 1,250 bp for all fish transcriptome. As shown in Fig. 2, the unigene length distribution is comparable for all Schizothoracine species, and the average length ranged from 1,120 to 1,392 bp.
Table 3

The statistics of the de novo transcriptome assembly.

SpeciesTotal size (Mb)GC (%)UnigeneTranscript
Sequence numberN50 length (bp)Longest (bp)Sequence numberN50 length (bp)Longest (bp)
S. oconnori117.000.41588,6761,94836,581831,3531,52736,694
S. lissolabiatus104.060.42279,0731,94633,187667,8021,57333,187
S. nukiangensis107.460.41984,6381,83530,806743,5181,42030,806
S. plagiostomus98.950.41983,1691,72517,902736,4051,25517,910
S. labiatus99.980.41676,6021,90543,720670,7921,43243,720
S. davidi109.440.4283,7572,04324,328689,2221,58924,340
P. kaznakovi173.480.409154,8601,56477,4341,363,4611,19877,434
G. namensis107.090.41584,4641,82523,933813,4741,29423,933
G. przewalskii105.490.41378,7621,97428,230751,1371,40928,231
G. eckloni113.000.41287,2481,89123,925849,8361,41123,925
G. selincuoensis122.360.406106,8511,58825,7301,187,25191425,730
S. younghusbandi101.230.41481,0291,82023,570723,6241,32923,570
S. pylzovi97.960.41880,5421,72426,467751,2151,20226,467
P. extremus101.780.41785,9191,67424,119843,4231,12224,119
O. stewartii#106.520.42277,0692,14325,942639,4441,92025,942

Note that the total size means the total base amount of all transcripts for species.

#The transcriptome data for Oxygymnocypris stewarti was reported in our previous studies[17].

Fig. 2

Length distribution of unigenes for all species.

The statistics of the de novo transcriptome assembly. Note that the total size means the total base amount of all transcripts for species. #The transcriptome data for Oxygymnocypris stewarti was reported in our previous studies[17]. Length distribution of unigenes for all species. The assembled transcriptome sequences were analyzed by the BUSCO pipeline. BUSCO were generally used in the evaluation of the completeness of a genome assembly, we applied BUSCO version3.0.2 to assess the quality of transcriptome assembly in our work. As a result, we found that more than 98% of the 2,586 BUSCO genes of vertebrates were detected in our transcriptome and 85–92% were completely identified depends on species (Fig. 3), suggesting the transcriptome represented a rather high level of completeness of the conserved genes. Meanwhile, we found that a high fraction of duplicated BUSCO for all species (Fig. 3), which was consistent with the fact that the majority of the Schizothoracine fish were polyploidy.
Fig. 3

BUSCO statistics of assembled transcripts for species. The rate of single, duplicated, fragmented and missing BUSCO genes were colored by purple, blue, green and pink.

BUSCO statistics of assembled transcripts for species. The rate of single, duplicated, fragmented and missing BUSCO genes were colored by purple, blue, green and pink.

Functional annotation of transcriptome

To annotate the assembled unigenes, we searched the homologous sequences for all unigenes against four public available function databases (Blast-X search: E-value cutoff of 1 × 10−10), including NCBI nonredundant protein database (NR), Swiss-Prot, KEGG pathway database and KOG database. Only the best hits with the highest sequence homology was used for annotation. Then, the gene ontology (GO) terms analysis of the predicted protein based on the NR in NCBI was performed with the Blast2GO software version3.1 with default parameters. We found that at least 40.2% of unigenes per species were annotated based on proteins in four public databases (Table 4 and Supplementary Fig. S1). Meanwhile, we found that high match efficiency was observed the longer assembled unigenes (≥2,000 bp) compared to shorter unigenes (≤500 bp) during the annotation process, the same result was reported in other animal[20].
Table 4

Functional annotation summary for species.

SpeciesNRSwiss-portKOGGOKEGGTotalRatio
S. oconnori45,29629,70140,79328,84228,81646,97252.97%
S. lissolabiatus45,09130,79341,06430,20329,92246,51658.83%
S. nukiangensis46,55731,07742,38030,45030,18548,12256.86%
S. plagiostomus49,11133,19444,03434,89632,26751,26461.64%
S. labiatus43,74929,70239,84628,66828,83744,95658.69%
S. davidi47,89832,46742,54435,62831,61050,96260.85%
P. kaznakovi58,39234,17449,96033,66933,25362,21640.18%
G. namensis44,31029,97040,14728,72129,10245,73254.14%
G. przewalskii43,10429,50239,14128,52428,62844,38756.36%
G. eckloni45,84731,69941,64830,75430,81347,35354.27%
G. selincuoensis49,76832,16544,38131,04931,23951,82848.50%
S. younghusbandi46,36933,00842,48731,61232,07047,53358.66%
S. pylzovi44,77731,29641,10130,08830,40846,09457.23%
P. extremus46,69432,13642,75630,76631,23148,07455.95%
O. stewartii#43,21229,42638,49532,09928,59746,00959.70%

The hit number for NR, Swiss-port, KOG, GO, KEGG were summarized. The ratio means the percentage of annotated unigenes to the total assembly sequences.

#The transcriptome data for Oxygymnocypris stewarti was reported in our previous studies[17].

Functional annotation summary for species. The hit number for NR, Swiss-port, KOG, GO, KEGG were summarized. The ratio means the percentage of annotated unigenes to the total assembly sequences. #The transcriptome data for Oxygymnocypris stewarti was reported in our previous studies[17].

Data Records

The sequencing and assembly data of transcriptome for all samples were deposited into public repositories: The transcriptome sequencing data generated in this work were deposited as SRP186751 in NCBI Sequence Read Archive[21]; The assembly of sequencing data were deposited in TSA as GHYM00000000[22], GHYL00000000[23], GHYK00000000[24], GHYJ00000000[25], GHYI00000000[26], GHYH00000000[27], GHYG00000000[28], GHYF00000000[29], GHYE00000000[30], GHYD00000000[31], GHYC00000000[32], GHYB00000000[33], GHYA00000000[34], GIBO00000000[35], and GHXZ00000000[36]; The transcriptome annotation information and predicted coding and protein sequences for unigenes were uploaded to figshare[37].

Technical Validation

RNA integrity

The transcriptome for twelve tissues from three fish individuals were sequenced. In before constructing RNA-Seq libraries, the concentration and quality of total RNA were evaluated using NanoVue Plus spectrophotometer (GE Healthcare, NJ, USA). The total amount of RNA, RNA integrity and rRNA ratio were used to estimate the quality, content and degradation level of RNA samples. In the present study, RNAs samples with a total RNA amount ≥ 10 μg, RNA integrity number ≥ 8, and rRNA ratio ≥ 1.5 were finally subjected to construct the sequencing library.

Quality filtering of Illumina sequencing raw reads

The raw sequencing reads generated from the Illumina platform were rigorously cleaned by the following procedures as in the previous study[38]. Firstly, adaptors in the reads were filtered out; secondly, reads with more than 10% of N bases were filtered out; thirdly, reads with more than 50% of the low-quality bases (phred quality score < =5) were filtered out. If any end of the pair was classified as low quality, both pairs were discarded. The initially generated raw sequencing reads were also evaluated regarding quality distribution, GC content distribution, base composition, average quality score at each position and other metrics. Supplementary Table S1 Supplementary Figure S1
Measurement(s)RNA • transcriptome • sequence_assembly • sequence feature annotation
Technology Type(s)RNA sequencing • sequence assembly process • sequence annotation
Sample Characteristic - OrganismSchizothorax oconnori • Schizothorax lissolabiata • Schizopyge nukiangensis • Schizothorax plagiostomus • Schizothorax labiatus • Schizothorax davidi • Ptychobarbus kaznakovi • Gymnocypris namensis • Gymnocypris przewalskii • Gymnocypris eckloni • Gymnocypris selincuoensis • Schizopygopsis younghusbandi • Schizopygopsis pylzovi • Platypharodon extremus • Oxygymnocypris stewartii
Sample Characteristic - Environmentlake • drainage basin
Sample Characteristic - LocationTibetan Plateau
  2 in total

1.  Transcriptional response to heat shock in liver of snow trout (Schizothorax richardsonii)--a vulnerable Himalayan Cyprinid fish.

Authors:  Ashoktaru Barat; Prabhati Kumari Sahoo; Rohit Kumar; Chirag Goel; Atul Kumar Singh
Journal:  Funct Integr Genomics       Date:  2016-01-26       Impact factor: 3.410

2.  The sequence and de novo assembly of Oxygymnocypris stewartii genome.

Authors:  Hai-Ping Liu; Shi-Jun Xiao; Nan Wu; Di Wang; Yan-Chao Liu; Chao-Wei Zhou; Qi-Yong Liu; Rui-Bin Yang; Wen-Kai Jiang; Qi-Qi Liang; Chi Zhang; Jun-Hua Gong; Xiao-Hui Yuan; Zhen-Bo Mou
Journal:  Sci Data       Date:  2019-02-05       Impact factor: 6.444

  2 in total
  3 in total

1.  Genomic Signature of Shifts in Selection and Alkaline Adaptation in Highland Fish.

Authors:  Chao Tong; Miao Li; Yongtao Tang; Kai Zhao
Journal:  Genome Biol Evol       Date:  2021-05-07       Impact factor: 3.416

2.  Insights Into miRNA-mRNA Regulatory Mechanisms of Cold Adaptation in Gymnocypris eckloni: Ubiquitin-Mediated Proteolysis Is Pivotal for Adaptive Energy Metabolism.

Authors:  Miaomiao Nie; Weilin Ni; Lihan Wang; Qiang Gao; Dan Liu; Fei Tian; Zhenji Wang; Cunfang Zhang; Delin Qi
Journal:  Front Genet       Date:  2022-07-22       Impact factor: 4.772

3.  Transcriptome analysis reveals molecular mechanisms responsive to acute cold stress in the tropical stenothermal fish tiger barb (Puntius tetrazona).

Authors:  Lili Liu; Rong Zhang; Xiaowen Wang; Hua Zhu; Zhaohui Tian
Journal:  BMC Genomics       Date:  2020-10-23       Impact factor: 3.969

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.