| Literature DB >> 31489347 |
Abstract
The data presented in this article are associated to the research articles, "DOI: 10.1007/s11295-019-1348-3", [1]; and "DOI: 10.1007/s13205-018-1162-x" [2]. Clausena excavata Burm. f. and Sterculia lanceolata Cav. are medicinal tree plants [3,4] native to Southeast Asia and China, and most members of both the genus Clausena and the genus Sterculia contain various valuable secondary metabolites with a great potential for drug development. Though many phytochemical studies have been conducted using plant extracts from various parts of these plants [4,5], there are very limited genetic resources available. RNA sequencing of C. excavata and S. lanceolata was conducted using pair-end Illumina HiSeq2500 sequencing system, from which the first de novo transcriptome data were produced for both genus Clausena and Sterculia. Transcriptome shotgun assembly using three different assembly tools [2] generated a total of 16,638 non-redundant contigs (N50, 900 bp) from C. excavata and 7,857 (N50, 423 bp) from S. lanceolata. The data are accessible at NCBI BioProject: PRJNA428402 for C. excavata [2] or PRJNA435648 for S. lanceolata[1].Entities:
Keywords: Clausena excavata; Medicinal plant; Sterculia lanceolata; Transcriptome analysis
Year: 2019 PMID: 31489347 PMCID: PMC6717166 DOI: 10.1016/j.dib.2019.104297
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Summary of coding sequences of the contigs from transcriptome data of C. excavata and S. lanceolata. Contigs with the “5′ partial” only contain start codon in open reading frame, whereas contigs with “3′ partial” contain stop codon in open reading frame. Contigs designated to “internal” do not have both start and stop codon. Contigs with “complete” contain both start and stop codon in open reading frame.
Summary of raw and assembled sequence data.
| Description of sequence data | Sequence data | |
|---|---|---|
| Number of raw reads | 10,348,544 | 4,357,001 |
| Total length of raw reads (bp) | 2,607,833,088 | 1,097,964,252 |
| Number of filtered clear reads | 8,790,228 | 4,240,923 |
| Total length of filtered reads (bp) | 2,143,847,087 | 1,054,277,267 |
| Percentage of filtered read length (%) | 82.2 | 96.0 |
| Number of assembled contigs | 16,638 | 7,857 |
| GC contents of contigs (%) | 43.7 | 45.7 |
| Shortest and longest contigs (bp) | 297 ∼ 4,065 | 297 ∼ 5,754 |
| Total length of assembled contigs (bp) | 12,557,892 | 3,559,905 |
| Average length (bp) | 754.8 | 453.1 |
| N25 (bp) | 1,302 | 609 |
| N50 (bp) | 900 | 423 |
| N75 (bp) | 582 | 348 |
Specifications Table
| Subject area | Plant Science |
| More specific subject area | Transcriptomics |
| Type of data | Table, figure, text file |
| How data was acquired | RNA sequence data obtained from RNA sequencing using Illumina HiSeq 2500 sequencing platform |
| Data format | Raw, analyzed |
| Experimental factors | Total RNAs were isolated from the leaves. |
| Experimental features | |
| Data source location | |
| Data accessibility |
These data are the first The data would be very useful for genetic and comparative studies of Assembled sequences will serve as a reference for future studies and would be valuable resources to examine molecular characteristics involved in pharmaceutical properties of |