| Literature DB >> 32577459 |
Juan C Castro1,2, J Dylan Maddox3,4,5, Hicler N Rodríguez1,3, Carlos G Castro1,3, Sixto A Imán-Correa6, Marianela Cobos3, Jae D Paredes3, Jorge L Marapara1,2, Janeth Braga1,2, Pedro M Adrianzén1,2.
Abstract
Myrciaria dubia "camu-camu" is a native shrub of the Amazon that is commonly found in areas that are flooded for three to four months during the annual hydrological cycle. This plant species is exceptional for its capacity to biosynthesize and accumulate important quantities of a variety of health-promoting phytochemicals, especially vitamin C [1], yet few genomic resources are available [2]. Here we provide the dataset of a de novo assembly and functional annotation of the transcriptome from a pool of samples obtained from seeds during the germination process and seedlings during the initial growth (until one month after germination). Total RNA/mRNA was purified from different types of plant materials (i.e., imbibited seeds, germinated seeds, and seedlings of one, two, three, and four weeks old), pooled in equimolar ratio to generate the cDNA library and RNA paired-end sequencing was conducted on an Illumina HiSeq™2500 platform. The transcriptome was de novo assembled using Trinity v2.9.1 and SuperTranscripts v2.9.1. A total of 21,161 transcripts were assembled ranging in size from 500 to 10,001 bp with a N50 value of 1,485 bp. Completeness of the assembly dataset was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) software v2/v3. Finally, the assembled transcripts were functionally annotated using TransDecoder v3.0.1 and the web-based platforms Kyoto Encyclopedia of Genes and Genomes (KEGG) Automatic Annotation Server (KAAS), and FunctionAnnotator. The raw reads were deposited into NCBI and are accessible via BioProject accession number PRJNA615000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA615000) and Sequence Read Archive (SRA) with accession number SRX7990430 (https://www.ncbi.nlm.nih.gov/sra/SRX7990430). Additionally, transcriptome shotgun assembly sequences and functional annotations are available via Discover Mendeley Data (https://data.mendeley.com/datasets/2csj3h29fr/1).Entities:
Keywords: Gene expression; Germination; Metabolic pathways; Molecular sequence annotation; Plant development; RNA-seq; Seedlings
Year: 2020 PMID: 32577459 PMCID: PMC7305401 DOI: 10.1016/j.dib.2020.105834
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Distribution of the transcript lengths of the de novo assembled transcripts of the transcriptome obtained during germination and initial growth of seedlings of M. dubia.
Fig. 2Completeness scores of the de novo assembled transcripts of the transcriptome obtained during germination and initial growth of seedlings of M. dubia.
Fig. 3Summary of ORFs predicted in the de novo assembled transcripts of the transcriptome obtained during germination and initial growth of seedlings of M. dubia.
Fig. 4Gene Ontology classifications of the de novo assembled transcripts of the transcriptome obtained during germination and initial growth of seedlings of M. dubia.
| Subject | Genetics, Genomics and Molecular Biology |
| Specific subject area | Transcriptomics |
| Type of data | Figures, raw paired-end sequencing data, transcriptome shotgun assembly sequence database, and functional annotation results. |
| How data were acquired | Total RNA was isolated from seeds during the germination process and from seedlings during the initial growth (until one month after germination). High quality RNA samples were pooled and mRNA was purified. The library was constructed using standardized protocols and paired-end sequenced on an Illumina HiSeq™2500 platform. |
| Data format | Raw data in fastq format was deposited into NCBI database and available at BioProject accession number PRJNA615000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA615000) and SRA accession number SRX7990430 (https://www.ncbi.nlm.nih.gov/sra/SRX7990430). Also, transcriptome shotgun assembly sequences database (fasta.gz format) and functional annotation results were deposited at Discover Mendeley Data (https://data.mendeley.com/datasets/2csj3h29fr/1). |
| Parameters for data collection | Total RNA was isolated from seeds during the germination process and from seedlings during the initial growth (until one month after germination). High quality RNA samples were pooled and mRNA was purified. The library was constructed using standardized protocols and paired-end sequenced on an Illumina HiSeq™2500 platform. |
| Description of data collection | Cleaned, high quality reads were de novo assembled with Trinity v2.9.1 and multiple gene transcripts combined into a single sequence with SuperTranscripts v2.9.1. Completeness of the assembly dataset was evaluated using the Benchmarking Universal Single-Copy Orthologs (BUSCO) software v2/v3 as implemented in the web-based server gVolante (https://gvolante.riken.jp/). |
| Data source location | Institution: Universidad Nacional de la Amazonia Peruana |
| Data accessibility | Raw data in fastq format is available from NCBI under BioProject accession number PRJNA615000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA615000) and SRA accession number SRX7990430 (https://www.ncbi.nlm.nih.gov/sra/SRX7990430). |