Literature DB >> 32226802

De novo transcriptome datasets of Shorea balangeran leaves and basal stem in waterlogged and dry soil.

Fitri Indriani1, Ulfah J Siregar2, Deden D Matra3, Iskandar Z Siregar2.   

Abstract

Shorea balangeran Burk locally known as balangeran has been widely used as recommended species for tropical peat swamp forest restoration, due to the capability of these species to grow in waterlogged and dry areas. However, the information concerning genetic basis of adaptation to ecological condition variation is limited and no transcriptome study has been reported in this context. Here we reported two sets of transcriptome data from a sample of leaf and basal stem that were taken from seedlings growing in potted media containing peat and mineral soil. The raw reads are stored in the DDBJ platform with accession number DRA008633.
© 2019 The Author(s).

Entities:  

Keywords:  Adaptation; RNA-seq; Shorea balangeran; Transriptome

Year:  2019        PMID: 32226802      PMCID: PMC7093797          DOI: 10.1016/j.dib.2019.104998

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table This is the first transcriptome data of Shorea balangeran from leaves and basal stem This data is beneficial to elucidate the molecular mechanism and gene pathway of Shorea balangeran response to different ecological condition This data allows further analysis to identify genes of interest that play roles in Shorea balangeran adaptation process

Data

Shorea balangeran (balangeran) belongs to Dipterocarpaceae family that is distributed in peat and heath forest in Indonesia [1]. In this study the de novo transcriptome assembly of balangeran is reported for the first time. The transcriptome data were obtained from leaves and basal stem of seedling that were growing in potted media containing each of peat and mineral soil. The high quality of mRNA extracted were sequenced using Illumina Hiseq 4000. The statistics of the reads and assembled sequences are presented in Table 1. The overview of transcriptome data were showed in Table 2. Analysis showed that 113,998 contigs (63.62%) had significant matches in nr NCBI database and 78,407 (43.49%) in Swiss-Prot database and 90,875 (50.40%) in TrEMBL database. Out of 180,291 merged contigs, a total 130,314 open reading frames (ORFs) were identified (Table 3) with 5prime partial ORFs type 31,209 (23.95%), 3prime partial 17,633 (13.53%) and complete ORFs type 64,374 (49.40%) were identified. In this study, microsatellite motifs from merged contigs were identified (Table 4), mononucleotides were the most abundant type (44,626, 70.30%), followed by trinucleotides (11,160, 17.58%) and dinucleotides (6,270, 9.88%).
Table 1

The properties of reads and assembled sequences of balangeran.

FeaturesNumbers
LeafBasal StemMergedb (Leaf and Basal Stem)
Reads
Number of reads64,101,94256,537,051120,638,993
Number of bases9,615,291,3008,480,557,65018,095,848,950
Number of post-trimming reads62,400,243(97.35)54,917,915(97.14)117,318,158(97.25)
Number of post-trimming bases9,360,036,450(97.35)8,237,687,250(97.14)17,597,723,700(97.25)
Transcriptsa
Number of transcript279,598574,875
Number of bases175,610,736342,696,076
Length range (bp)201-16,510201-16,960
Average (bp)628.08596.12
N50 (bp)940839
GC contents (%)42.2845.56
Contigsb
Number of contig187,297440,665180,291
Number of bases118,677,247252,486,917197,305,352
Length range (bp)201-16,510201-16,960201-17,014
Average (bp)633.63572.971094.37
N50 (bp)9187621489
GC contents (%)42.646.244.3

Constructed by Trinity Program.

Constructed by CAP3, cd-hit-est, and corset (only for merged contig) programs.

Table 2

Functional annotation of balangeran contigs using several database.

Database SourceNumber (percentage)
Contig Number180,291
Non-redundant protein (nr) NCBI113,998 (63.62)
Non-redundant Nucleotide (nt) NCBI53,407 (29.62)
Swiss-Prot UniProt78,407 (43.49)
TrEMBL UniProt90,875 (50.40)
Table 3

Open Reading Frames (ORFs) prediction characteristics of balangeran contigs using TransDecoder.

FeaturesContigs Number (percentage)
ORF contig130,314
ORFs Type :
 a. 5prime_partial31,209 (23.95)
 b. 3prime_partial17,633 (13.53)
 c. Internal17,104 (13.13)
 d. Complete64,374 (49.40)
Table 4

Number and motif of microsatellite of balangeran contigs.

MotifsNumber of Contigs (percentage)
LeafBasal StemMerged
Mononucleotide26,259 (72.93)48,786 (68.83)44,626 (70.30)
Dinucleotide3939 (10.94)6943 (9.80)6270 (9.88)
Trinucleotide5192 (14.42)13,443 (18.97)11,160 (17.58)
Tetranucleotide421 (1.17)1221 (1.72)995 (1.57)
Pentanucleotide142 (0.39)292 (0.41)267 (0.42)
Hexanucleotide54 (0.15)193 (0.27)164 (0.26)
The properties of reads and assembled sequences of balangeran. Constructed by Trinity Program. Constructed by CAP3, cd-hit-est, and corset (only for merged contig) programs. Functional annotation of balangeran contigs using several database. Open Reading Frames (ORFs) prediction characteristics of balangeran contigs using TransDecoder. Number and motif of microsatellite of balangeran contigs.

Experimental design, materials, and methods

Balangeran seedlings were treated and raised in the nursery of Department of Silviculture, Faculty of Forestry, IPB University Bogor for 6 months. Two seedlings were grown in peat soil in which each seedling planted in waterlogged peat and dry peat. Two seedlings were grown in mineral soil in which each seedling planted in waterlogged soil and dry soil. Total RNA was isolated from leaves and basal stem using Plant Total RNA mini kit (Geneaid) following the protocol. The quantity and integrity were evaluated using P360 Nanophotometer (Implen, München, Germany) and Bioanalyzer 2100 (Agilent Technologies). RNA samples had RNA integrity number (RIN) values between 7.4 and 8.6. The total RNA extracted were used for Illumina Hiseq 4000 sequencing following the protocol (NovogeneAIT, Singapore). The quality of raw data were examined using FastQC [2] then performed using Trimmomatic 0.39 to remove adaptor sequences, contamination and low-quality reads with default parameters (TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25) [3]. Pre-processed read from each leaves and basal stem were de novo assembled using Trinity v.2.3.2 with default parameter (minimum length = 200) [4], generated high quality contigs [5]. The each contigs from leaves and basal stem were reconstructed using CAP3 [6] and CD-HIT-EST v.4.6.8 [7]. The contigs form leaves and basal stem were merged and reconstructed using CAP3, CD-HIT-EST, and Corset program [8,9]. The database such as NCBI non-redundant (nr) (downloaded by October 1, 2018) and NCBI nucleotide sequence (nt) (downloaded by October 1, 2018), SwissProt and TrEMBL of UniProt (downloaded by September 14, 2018) were used to annotate the contigs using BLAST + program [10]. Open reading frames (ORFs) of contigs were predicted by the TransDecoder package (https://github.com/TransDecoder/TransDecoder), with the minimum ORF length of 100 bp [11]. Microsatellite discovery was analyzed using MISA software (http://pgrc.ipk-gatersleben.de/misa) with parameter (unit size-minimum repeats) as follows: 1–10, 2–6, 3–5, 4–5, 5–5, 6–5 and the interruptions (maximum difference between microsatellites) was 100 bases.

Specifications Table

SubjectAgricultural and Biological Sciences: Forestry
Specific subject areaMolecular study in Forestry
Type of dataRNA Sequencing Data
How data were acquiredIllumina Hiseq 4000
Data formatRaw sequencing reads and assembled contigs
Parameters for data collectionLeaf and basal steam of balangeran seedlings planted in waterlogged peat, dry peat, waterlogged mineral soil and dry mineral soil
Description of data collectionTotal RNA was sequenced using Illumina Hiseq 4000 platform in NovogenAIT, Singapore
Data source locationBogor, West Java Indonesia
Data accessibilityRepository name: DDBJ (DNA Data Bank of Japan)Data identification number: DRA008633Direct URL to data:https://ddbj.nig.ac.jp/DRASearch/submission?acc=%20DRA008633
Related research articleF. Indriani, D.D. Matra, U.J. Siregar, I.Z. SiregarEcological aspects and genetic diversity of Shorea balangeran in two forest types of Muara Kendawangan Nature Reserve, West Kalimantan, Indonesia,Biodiversitas. 20 (2019) 482–488 https://doi.org/10.13057/biodiv/d200226
Value of the Data

This is the first transcriptome data of Shorea balangeran from leaves and basal stem

This data is beneficial to elucidate the molecular mechanism and gene pathway of Shorea balangeran response to different ecological condition

This data allows further analysis to identify genes of interest that play roles in Shorea balangeran adaptation process

  9 in total

1.  CAP3: A DNA sequence assembly program.

Authors:  X Huang; A Madan
Journal:  Genome Res       Date:  1999-09       Impact factor: 9.043

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

3.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

4.  Corset: enabling differential gene expression analysis for de novo assembled transcriptomes.

Authors:  Nadia M Davidson; Alicia Oshlack
Journal:  Genome Biol       Date:  2014-07-26       Impact factor: 13.583

5.  Dataset from de novo transcriptome assembly of Nephelium lappaceum aril.

Authors:  Deden Derajat Matra; Arya Widura Ritonga; Azis Natawijaya; Roedhy Poerwanto; Winarso Drajad Widodo; Eiichi Inoue
Journal:  Data Brief       Date:  2018-12-14

6.  Comparative transcriptome analysis of translucent flesh disorder in mangosteen (Garcinia mangostana L.) fruits in response to different water regimes.

Authors:  Deden Derajat Matra; Toshinori Kozaki; Kazuo Ishii; Roedhy Poerwanto; Eiichi Inoue
Journal:  PLoS One       Date:  2019-07-19       Impact factor: 3.240

7.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

8.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

9.  De novo transcriptome assembly of mangosteen (Garcinia mangostana L.) fruit.

Authors:  Deden Derajat Matra; Toshinori Kozaki; Kazuo Ishii; Roedhy Poerwanto; Eiichi Inoue
Journal:  Genom Data       Date:  2016-09-09
  9 in total
  2 in total

1.  De novo assembly of transcriptome dataset from leaves of Dryobalanops aromatica (Syn. Dryobalanops sumatrensis) seedlings grown in two contrasting potting media.

Authors:  Iskandar Zulkarnaen Siregar; Fifi Gus Dwiyanti; Ulfah Juniarti Siregar; Deden Derajat Matra
Journal:  BMC Res Notes       Date:  2020-08-28

2.  De novo transcriptome assembly data for sengon (Falcataria moluccana) trees displaying resistance and susceptibility to boktor stem borers (Xystrocera festiva Pascoe).

Authors:  Ulfah J Siregar; Aditya Nugroho; Hasyyati Shabrina; Fitri Indriani; Apriliya Damayanti; Deden D Matra
Journal:  BMC Res Notes       Date:  2021-07-07
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.