Literature DB >> 33842678

Dataset of the transcriptomes of Urechis unicinctus to identify differentially expressed genes (DEGs) under different temperature and exposure to open air.

Xudong Jiao1,2, Jiaxin Shi3, Song Qin1,2, Dong Huang1,4, Yinchu Wang1,2.   

Abstract

Urechis unicinctus has a wide range of bioactive polypeptides with high edible, economic and medicinal values. As the key technical breakthrough, the artificial breeding is imperative. However, the seedling transport becomes a primary matter, which indicates the indispensability of realizing how Urechis unicinctus responses to various situations. We compared transcriptome of Urechis unicinctus under the dry and ultraviolet irradiation treatment and different temperature. The dataset of the organism in response to water-temperature variety was provided by using the Illumina Hiseq X Ten system, which will be helpful to understand the adaptation of Urechis unicinctus to changing temperature (low, high and room temperature) and open air (ultraviolet and desiccation). The assembly of the transcriptomes was carried out using the isoform sequencing (Iso-seq) method. The functions of expressed genes were annotated and categorized, while the DEGs were presented.
© 2021 The Authors.

Entities:  

Keywords:  RNA-seq; Transcriptome assembly; Urechis unicinctus

Year:  2021        PMID: 33842678      PMCID: PMC8020418          DOI: 10.1016/j.dib.2021.106941

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

These data show RNA-seq results of Urechis unicinctus under ultraviolet and desiccation treatments and changing temperature, providing new insights into the biological pathways of autolytic phenomena. These data are useful resources for scientific communities working on transcriptome of Urechis unicinctus even invertebrates but also on animal stress biology to understand specific and common stress response pathways. Functional analysis data can be used in future studies to anticipate the biological pathways of Urechis unicinctus when the temperature changes or being exposure to open air.

Data Description

Total RNA was extracted from five groups separately under conditions of ultraviolet, desiccation and high, low and room temperatures. SMRT-bell libraries were constructed after the amplification of optimized polymerase chain reaction (PCR) and sequenced via the PacBio, Iso-seq Sequel and the Illumina Hiseq X Ten platform. However, the single-base error rate was irregular so that multiple corrections were necessary. LoRDEC [1], a software with high precision, corrected the data of three generation sequecing from PacBio with the technique of hybrid error correction. The comparison consequence are displayed in Table 1. We used the Illumina HiSeqTM to produce the raw image data files. These files were transferred into sequenced reads by CASAVA base calling analysis and stored into FASTQ format [2]. The clean data (829,237,442 bp) was obtained by filtering against the NGS QC Toolkit v2.3.3. The primary procedures were removing the adapters and eliminating low-quality bases. 311,403 unigenes without redundancy, with an average length of 1625.39 bp, were annotated and classified by function. We utilized Trinity to splice these clean reads in order to form the reference transcriptome (ref), after which all clean reads were mapped to the ref through RSEM [3]. The result was approximately 101 million RNA-seq reads (Table 2). After finishing the redundancy removing, this assembly was annotated by NCBI-Nr protein database according to different functions. We used readcounts to proceed the analyze of DEGs. DESeq2 were adopted since simples had biological duplication, where the standard of filting was padj<0.05 and |log2FoldChange|>1. Additionally, binomial distribution method was used to perform independent statistical hypothesis testing, which tended to lead to high overall false positives. Thereby, we needed to correct the p-value obtained from the original hypothesis test.
Table 1

Statistics of length distribution before and after transcript corrections.

SampleTypeTotal nucleotidesTotal_numberMean lengthMin lengthMax lengthN50N90
UU2019Before correct505,187,047216,918232917714,12125181493
UU2019After correct504,952,370216,918232817614,19425161492

Sample: the name of the sample.

Type: the state of correction.

Total_nucleotides: the number of bases of the consensus.

Total_number: the number of the consensus.

Mean length: the average length of the consensus.

Min length: the minimum length of the consensus.

Max_length: the maximum of the consensus.

N50/N90: the total length of the consensus after being ranked in order of length and added up the length until it is no less than 50% or 90% of the consensus.

Table 2

Read alignment summary of transcriptomes of Urechis unicinctus under desiccation, ultraviolet and high, low and room temperature.

SampleRaw ReadsClean readsClean basesError(%)Q20(%)Q30(%)GC(%)Total mapped
DRY_149,895,24048,303,2247.25 G0.0298.3695.0947.7343,022,064(89.07%)
DRY_259,168,57657,459,0788.62 G0.0298.3395.0047.1350,889,032(88.57%)
DRY_353,897,62651,886,0527.78 G0.0298.3094.9847.0445,759,512(88.19%)
UV_156,072,49854,080,5628.11 G0.0298.2094.7346.5847,364,024(87.58%)
UV_259,620,59858,202,0148.73 G0.0298.3695.0947.2651,773,516(88.95%)
UV_362,171,26660,422,2029.06 G0.0298.3995.1547.1853,806,486(89.05%)
RT_146,423,86445,102,1986.77 G0.0298.1694.6447.4139,850,676(88.36%)
RT_261,925,71660,173,9609.03 G0.0298.1394.5447.0853,398,352(88.74%)
RT_352,836,08251,009,1347.65 G0.0397.9194.0746.8645,008,268(88.24%)
HT_160,880,84859,486,4788.92 G0.0298.3795.1146.9252,707,958(88.60%)
HT_263,003,75461,551,0509.23 G0.0298.3595.1047.1854,330,180(88.27%)
HT_357,418,29255,689,9048.35 G0.0298.3395.0147.0549,213,370(88.37%)
LT_151,508,47450,009,8667.5 G0.0397.4793.0746.8744,093,334(88.17%)
LT_258,596,06857,487,6848.62 G0.0397.4392.9646.9250,676,846(88.15%)
LT_359,399,55858,374,0368.76 G0.0298.4395.3047.3951,741,846(88.64%)

Q20, Q30: Proportion of bases with Qphred >20, 30 (Qphred=−10log10(e)).

Raw reads: Original data from sequencing.

Clean Bases: Clean read numbers multiply read length (saved in G unit).

Clean Bases: Clean read numbers multiply read length (saved in G unit).

Error: Average sequencing error rate, calculated through Qphred= −10log10(e).

GC: Propotion of G and C in total bases.

Statistics of length distribution before and after transcript corrections. Sample: the name of the sample. Type: the state of correction. Total_nucleotides: the number of bases of the consensus. Total_number: the number of the consensus. Mean length: the average length of the consensus. Min length: the minimum length of the consensus. Max_length: the maximum of the consensus. N50/N90: the total length of the consensus after being ranked in order of length and added up the length until it is no less than 50% or 90% of the consensus. Read alignment summary of transcriptomes of Urechis unicinctus under desiccation, ultraviolet and high, low and room temperature. Q20, Q30: Proportion of bases with Qphred >20, 30 (Qphred=−10log10(e)). Raw reads: Original data from sequencing. Clean Bases: Clean read numbers multiply read length (saved in G unit). Clean Bases: Clean read numbers multiply read length (saved in G unit). Error: Average sequencing error rate, calculated through Qphred= −10log10(e). GC: Propotion of G and C in total bases.

Experimental Design, Materials and Methods

Animal materials and experimental design

We collected Urechis unicinctus in LaiShan Bay, BoHai, China (37°27′N, 31°30′E). Fifty individuals of 6-month old Urechis uniconctus (average weight: 2.12 g; average length: 3.05 cm) were acclimated at 20 °C in flowing fresh seawater for 2 weeks. They were randomly divided into five groups: Group 1, named RT, was cultured in seawater at a temperature of 20 °C for 2 h; Group 2, named DRY, were cultured without water for 2 h; Group 3, named UV, were cultured in ultraviolet irradiation for 2 h. Group 4 was named LT, where the temperature was instantly decreased to −20 °C and lasted for 120 min; and the temperature of Group 3 (HT) was raised to 28 °C rapidly and lasted for 120 min. Three individuals of each group were randomly sampled and instantly frozen in the liquid nitrogen.

Total rna extraction, library preparation and sequence

We use the method of TRIzol (Invitrogen, Carlsbad, USA) [4] to extract the total RNA from each mixed sample and the Nanodrop (OD260/280 ratio) was used to detect the purity of RNA [5,6]. The Qubit and Aglient 2100 were used to check its concentration and integrity according to the manufacture`s protocol. A total of 3 μg mRNA per sample, enriched by Oligo(dT) magnetic beads, was reverse-transcribed through SMARTer® PCR cDNA synthesis kit (Clontech, Mountain View, USA). Large fragments (>4 kb) double-strand cDNA were selectively used to construct the SMRTbell library and sequenced on the PacBio Sequel platform after repairing DNA damage, end blunting and adapter ligation. In addition, the ployadenylated RNA was broken into short fragments (~200 bp). The double-strand cDNA, which was synthesized with random hexamers after the first-strand preparation and was purified and repaired at the end. The libraries with effective concentration (> 2 nM) were sequenced on the Illumina HiSeq X Ten platform.

Ethics Statement

Each of the procedures that were used to handle and treat the Urechis unicinctus during this study was in the accordance with the Animal Management Regulations of China, revised on March 1, 2017, No. 676.

CRediT Author Statement

Xudong Jiao: Conceptualization, Project administration; Jiaxin Shi: Data curation, Writing - Original draft preparation; Song Qin: Validation, Investigation and Supervision; Dong Huang: Writing - Reviewing; Yinchu Wang: Data submission and Editing.

Declaration of Competing Interest

The authors declare that they have no competing financial interests, which could influence the work reported in this article.
SubjectBiochemistry, Genetics and Molecular Biology
Specific subject areaTranscriptomics, Genomics
Type of dataFastq read files
How data were acquiredIllumina HiSeq X TenPacific Biosciences (PacBio), Iso-seq method
Data formatRaw sequencing reads (fastq)
Parameters for data collectionTotal RNA was collected from 6-month old Urechis uniconctus under room temeprature (RT), high temperature (HT), low temeprature (LT), desiccation treatment (DRY) and ultraviolet radiation (UV).
Description of data collectionTotal RNA was obtained from 5 groups separately under conditions of UV, DRY and HT, LT and RT, where the RT group was considered as the control one and all groups had 3 parallel experiments.Sequencing was performed according to Illumina HiSeq X Ten. Clean reads were obtained by removing reads containing adapter and low-quality bases and subsequently mapped to the reference spliced by Trinity that is a transcriptome-splicing software combined with 3 separate software modules. The DEGs were analysed by DESeq2.
Data source locationYantai institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai, Shandong, ChinaHarbin Institute of Technology, Weihai, Shangdong, China
Data accessibilityThe complete RNA-seq data of Urechis unicinctus is available in the NCBI BioProject under accession number (PRJNA603659).Direct URL to data: https://www.ncbi.nlm.nih.gov/bioproject/603659The sequencing reads of three control groups (RT_1, RT_2, RT_3) used in assembly analysis are available in the NCBI SRA database under accession number: SRX9623339, SRX9623338, SRX9623337(https://www.ncbi.nlm.nih.gov/sra/SRX9623339https://www.ncbi.nlm.nih.gov/sra/SRX9623338https://www.ncbi.nlm.nih.gov/sra/SRX9623337)The sequencing reads of three groups under low temperature(LT_1, LT_2, LT_3) used in assembly analysis are available in the NCBI SRA database under accession number: SRX9623329, SRX9623328 and SRX9623327(https://www.ncbi.nlm.nih.gov/sra/SRX9623329https://www.ncbi.nlm.nih.gov/sra/SRX9623328https://www.ncbi.nlm.nih.gov/sra/SRX9623327)The sequencing reads of three groups under high temperature (HT_1, HT_2, HT_3) used in assembly analysis are available in the NCBI SRA database under accession number: SRX9623340, SRX9623326 and SRX9623325(https://www.ncbi.nlm.nih.gov/sra/SRX9623340https://www.ncbi.nlm.nih.gov/sra/SRX9623326https://www.ncbi.nlm.nih.gov/sra/SRX9623325)The sequencing reads of three ultraviolet groups (UV_1, UV_2, UV_3) used in assembly analysis are available in the NCBI SRA database under accession number: SRX9623336, SRX9623335 and SRX9623334(https://www.ncbi.nlm.nih.gov/sra/SRX9623333https://www.ncbi.nlm.nih.gov/sra/SRX9623324https://www.ncbi.nlm.nih.gov/sra/SRX9623323)The sequencing reads of three desiccation groups (DRY_1, DRY_2, DRY_3) used in assembly analysis are available in the NCBI SRA database under accession number: SRX9623332, SRX9623331 and SRX9623330(https://www.ncbi.nlm.nih.gov/sra/SRX9623332https://www.ncbi.nlm.nih.gov/sra/SRX9623331https://www.ncbi.nlm.nih.gov/sra/SRX9623330)
  5 in total

1.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

2.  Comparison of RNA isolation methods from insect larvae.

Authors:  J A Ridgeway; A E Timm
Journal:  J Insect Sci       Date:  2014-01-01       Impact factor: 1.857

3.  Comparison of the efficiency of different cell lysis methods and different commercial methods for RNA extraction from Candida albicans stored in RNAlater.

Authors:  Antonio Rodríguez; Mario Vaneechoutte
Journal:  BMC Microbiol       Date:  2019-05-14       Impact factor: 3.605

4.  LoRDEC: accurate and efficient long read error correction.

Authors:  Leena Salmela; Eric Rivals
Journal:  Bioinformatics       Date:  2014-08-26       Impact factor: 6.937

5.  Comparison of procedures for RNA-extraction from peripheral blood mononuclear cells.

Authors:  Antonio Rodríguez; Hans Duyvejonck; Jonas D Van Belleghem; Tessa Gryp; Leen Van Simaey; Stefan Vermeulen; Els Van Mechelen; Mario Vaneechoutte
Journal:  PLoS One       Date:  2020-02-21       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.