Literature DB >> 32226802

De novo transcriptome datasets of Shorea balangeran leaves and basal stem in waterlogged and dry soil.

Fitri Indriani¹, Ulfah J Siregar², Deden D Matra³, Iskandar Z Siregar².

Abstract

Shorea balangeran Burk locally known as balangeran has been widely used as recommended species for tropical peat swamp forest restoration, due to the capability of these species to grow in waterlogged and dry areas. However, the information concerning genetic basis of adaptation to ecological condition variation is limited and no transcriptome study has been reported in this context. Here we reported two sets of transcriptome data from a sample of leaf and basal stem that were taken from seedlings growing in potted media containing peat and mineral soil. The raw reads are stored in the DDBJ platform with accession number DRA008633.

Entities: Chemical Disease Species

Keywords: Adaptation; RNA-seq; Shorea balangeran; Transriptome

Year: 2019 PMID： 32226802 PMCID： PMC7093797 DOI： 10.1016/j.dib.2019.104998

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table This is the first transcriptome data of Shorea balangeran from leaves and basal stem This data is beneficial to elucidate the molecular mechanism and gene pathway of Shorea balangeran response to different ecological condition This data allows further analysis to identify genes of interest that play roles in Shorea balangeran adaptation process

Data

Shorea balangeran (balangeran) belongs to Dipterocarpaceae family that is distributed in peat and heath forest in Indonesia [1]. In this study the de novo transcriptome assembly of balangeran is reported for the first time. The transcriptome data were obtained from leaves and basal stem of seedling that were growing in potted media containing each of peat and mineral soil. The high quality of mRNA extracted were sequenced using Illumina Hiseq 4000. The statistics of the reads and assembled sequences are presented in Table 1. The overview of transcriptome data were showed in Table 2. Analysis showed that 113,998 contigs (63.62%) had significant matches in nr NCBI database and 78,407 (43.49%) in Swiss-Prot database and 90,875 (50.40%) in TrEMBL database. Out of 180,291 merged contigs, a total 130,314 open reading frames (ORFs) were identified (Table 3) with 5prime partial ORFs type 31,209 (23.95%), 3prime partial 17,633 (13.53%) and complete ORFs type 64,374 (49.40%) were identified. In this study, microsatellite motifs from merged contigs were identified (Table 4), mononucleotides were the most abundant type (44,626, 70.30%), followed by trinucleotides (11,160, 17.58%) and dinucleotides (6,270, 9.88%).

Table 1

The properties of reads and assembled sequences of balangeran.

Features	Numbers
Features	Leaf	Basal Stem	Mergedb (Leaf and Basal Stem)
Reads
Number of reads	64,101,942	56,537,051	120,638,993
Number of bases	9,615,291,300	8,480,557,650	18,095,848,950
Number of post-trimming reads	62,400,243(97.35)	54,917,915(97.14)	117,318,158(97.25)
Number of post-trimming bases	9,360,036,450(97.35)	8,237,687,250(97.14)	17,597,723,700(97.25)
Transcriptsa
Number of transcript	279,598	574,875	–
Number of bases	175,610,736	342,696,076	–
Length range (bp)	201-16,510	201-16,960	–
Average (bp)	628.08	596.12	–
N50 (bp)	940	839	–
GC contents (%)	42.28	45.56	–
Contigsb
Number of contig	187,297	440,665	180,291
Number of bases	118,677,247	252,486,917	197,305,352
Length range (bp)	201-16,510	201-16,960	201-17,014
Average (bp)	633.63	572.97	1094.37
N50 (bp)	918	762	1489
GC contents (%)	42.6	46.2	44.3

Constructed by Trinity Program.

Constructed by CAP3, cd-hit-est, and corset (only for merged contig) programs.

Table 2

Functional annotation of balangeran contigs using several database.

Database Source	Number (percentage)
Contig Number	180,291
Non-redundant protein (nr) NCBI	113,998 (63.62)
Non-redundant Nucleotide (nt) NCBI	53,407 (29.62)
Swiss-Prot UniProt	78,407 (43.49)
TrEMBL UniProt	90,875 (50.40)

Table 3

Open Reading Frames (ORFs) prediction characteristics of balangeran contigs using TransDecoder.

Features	Contigs Number (percentage)
ORF contig	130,314
ORFs Type :
a. 5prime_partial	31,209 (23.95)
b. 3prime_partial	17,633 (13.53)
c. Internal	17,104 (13.13)
d. Complete	64,374 (49.40)

Table 4

Number and motif of microsatellite of balangeran contigs.

Motifs	Number of Contigs (percentage)
Motifs	Leaf	Basal Stem	Merged
Mononucleotide	26,259 (72.93)	48,786 (68.83)	44,626 (70.30)
Dinucleotide	3939 (10.94)	6943 (9.80)	6270 (9.88)
Trinucleotide	5192 (14.42)	13,443 (18.97)	11,160 (17.58)
Tetranucleotide	421 (1.17)	1221 (1.72)	995 (1.57)
Pentanucleotide	142 (0.39)	292 (0.41)	267 (0.42)
Hexanucleotide	54 (0.15)	193 (0.27)	164 (0.26)

The properties of reads and assembled sequences of balangeran. Constructed by Trinity Program. Constructed by CAP3, cd-hit-est, and corset (only for merged contig) programs. Functional annotation of balangeran contigs using several database. Open Reading Frames (ORFs) prediction characteristics of balangeran contigs using TransDecoder. Number and motif of microsatellite of balangeran contigs.

Experimental design, materials, and methods

Balangeran seedlings were treated and raised in the nursery of Department of Silviculture, Faculty of Forestry, IPB University Bogor for 6 months. Two seedlings were grown in peat soil in which each seedling planted in waterlogged peat and dry peat. Two seedlings were grown in mineral soil in which each seedling planted in waterlogged soil and dry soil. Total RNA was isolated from leaves and basal stem using Plant Total RNA mini kit (Geneaid) following the protocol. The quantity and integrity were evaluated using P360 Nanophotometer (Implen, München, Germany) and Bioanalyzer 2100 (Agilent Technologies). RNA samples had RNA integrity number (RIN) values between 7.4 and 8.6. The total RNA extracted were used for Illumina Hiseq 4000 sequencing following the protocol (NovogeneAIT, Singapore). The quality of raw data were examined using FastQC [2] then performed using Trimmomatic 0.39 to remove adaptor sequences, contamination and low-quality reads with default parameters (TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25) [3]. Pre-processed read from each leaves and basal stem were de novo assembled using Trinity v.2.3.2 with default parameter (minimum length = 200) [4], generated high quality contigs [5]. The each contigs from leaves and basal stem were reconstructed using CAP3 [6] and CD-HIT-EST v.4.6.8 [7]. The contigs form leaves and basal stem were merged and reconstructed using CAP3, CD-HIT-EST, and Corset program [8,9]. The database such as NCBI non-redundant (nr) (downloaded by October 1, 2018) and NCBI nucleotide sequence (nt) (downloaded by October 1, 2018), SwissProt and TrEMBL of UniProt (downloaded by September 14, 2018) were used to annotate the contigs using BLAST + program [10]. Open reading frames (ORFs) of contigs were predicted by the TransDecoder package (https://github.com/TransDecoder/TransDecoder), with the minimum ORF length of 100 bp [11]. Microsatellite discovery was analyzed using MISA software (http://pgrc.ipk-gatersleben.de/misa) with parameter (unit size-minimum repeats) as follows: 1–10, 2–6, 3–5, 4–5, 5–5, 6–5 and the interruptions (maximum difference between microsatellites) was 100 bases.

Specifications Table

Subject	Agricultural and Biological Sciences: Forestry
Specific subject area	Molecular study in Forestry
Type of data	RNA Sequencing Data
How data were acquired	Illumina Hiseq 4000
Data format	Raw sequencing reads and assembled contigs
Parameters for data collection	Leaf and basal steam of balangeran seedlings planted in waterlogged peat, dry peat, waterlogged mineral soil and dry mineral soil
Description of data collection	Total RNA was sequenced using Illumina Hiseq 4000 platform in NovogenAIT, Singapore
Data source location	Bogor, West Java Indonesia
Data accessibility	Repository name: DDBJ (DNA Data Bank of Japan)Data identification number: DRA008633Direct URL to data:https://ddbj.nig.ac.jp/DRASearch/submission?acc=%20DRA008633
Related research article	F. Indriani, D.D. Matra, U.J. Siregar, I.Z. SiregarEcological aspects and genetic diversity of Shorea balangeran in two forest types of Muara Kendawangan Nature Reserve, West Kalimantan, Indonesia,Biodiversitas. 20 (2019) 482–488 https://doi.org/10.13057/biodiv/d200226

Value of the Data

•

This is the first transcriptome data of Shorea balangeran from leaves and basal stem

•

This data is beneficial to elucidate the molecular mechanism and gene pathway of Shorea balangeran response to different ecological condition

•

This data allows further analysis to identify genes of interest that play roles in Shorea balangeran adaptation process

9 in total

1. CAP3: A DNA sequence assembly program.

Authors: X Huang; A Madan
Journal: Genome Res Date: 1999-09 Impact factor: 9.043

2. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

3. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

4. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes.

Authors: Nadia M Davidson; Alicia Oshlack
Journal: Genome Biol Date: 2014-07-26 Impact factor: 13.583

5. Dataset from de novo transcriptome assembly of Nephelium lappaceum aril.

Authors: Deden Derajat Matra; Arya Widura Ritonga; Azis Natawijaya; Roedhy Poerwanto; Winarso Drajad Widodo; Eiichi Inoue
Journal: Data Brief Date: 2018-12-14

6. Comparative transcriptome analysis of translucent flesh disorder in mangosteen (Garcinia mangostana L.) fruits in response to different water regimes.

Authors: Deden Derajat Matra; Toshinori Kozaki; Kazuo Ishii; Roedhy Poerwanto; Eiichi Inoue
Journal: PLoS One Date: 2019-07-19 Impact factor: 3.240

7. Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors: Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal: Nat Biotechnol Date: 2011-05-15 Impact factor: 54.908

8. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

9. De novo transcriptome assembly of mangosteen (Garcinia mangostana L.) fruit.

Authors: Deden Derajat Matra; Toshinori Kozaki; Kazuo Ishii; Roedhy Poerwanto; Eiichi Inoue
Journal: Genom Data Date: 2016-09-09

9 in total

2 in total

1. De novo assembly of transcriptome dataset from leaves of Dryobalanops aromatica (Syn. Dryobalanops sumatrensis) seedlings grown in two contrasting potting media.

Authors: Iskandar Zulkarnaen Siregar; Fifi Gus Dwiyanti; Ulfah Juniarti Siregar; Deden Derajat Matra
Journal: BMC Res Notes Date: 2020-08-28

2. De novo transcriptome assembly data for sengon (Falcataria moluccana) trees displaying resistance and susceptibility to boktor stem borers (Xystrocera festiva Pascoe).

Authors: Ulfah J Siregar; Aditya Nugroho; Hasyyati Shabrina; Fitri Indriani; Apriliya Damayanti; Deden D Matra
Journal: BMC Res Notes Date: 2021-07-07

2 in total