Literature DB >> 33365368

Data on RNA-seq analysis of the cocoa pod borer pest Conopomorpha cramerella (Snellen) (Lepidoptera: Gracillariidae).

Nor Azlan Nor Muhammad1, Intan Azlinda Ramlee1, Diana Mohd Nor1, Mahasakthy Vijeyasri Satyavenathan1, Nur Lina Rahmat1, Alias Awang2, Maizom Hassan1.   

Abstract

Cocoa bean (Theobroma cacao L.) is part of the global cocoa and chocolate industry valued at 44 billion US dollars in 2019. Cocoa pod borer (CPB), Conopomorpha cramerella is a major pest of cocoa in Malaysia and Indonesia that is responsible for the decline for cocoa production. They have been detected since 1980s. Unfortunately, current control strategies are inefficient for CPB management. Although biotechnological alternatives, including RNA interference (RNAi), have been proposed in recent years to control insect pests, characterizing the genetics of the target pest is essential for successful application of these emerging technologies. We generated a comprehensive RNA-seq dataset (135,915,430 clean reads) for larva and adult stages of CPB by using the Illumina HiseqTM 4000 system to increase the understanding of CPB in relation to molecular features. The CPB transcriptome was assembled de novo and annotated. The final assembled produced 249,280 unigenes, of which 75,929 unigenes annotated against NCBI NR database and were distributed among 156 KEGG pathways. The raw data were uploaded to SRA database and the BioProject ID is PRJNA553611. The transcriptomic dataset we present are the first reports of transcriptome information in CPB that is valuable for further exploration and understanding of CPB molecular pathways.
© 2020 The Authors.

Entities:  

Keywords:  Cocoa pod borer; Conopomorpha cramerella; Insect development stages; RNA-seq; Transcriptomics

Year:  2020        PMID: 33365368      PMCID: PMC7749363          DOI: 10.1016/j.dib.2020.106638

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The RNA-seq data obtained is from cocoa pod borer (CPB) developmental stages transcriptome. This will lead to identification of differently expressed genes between the developmental stages that will reveal putative developmental pathways and mechanisms for further exploration. This data will benefit molecular biology researchers of CPB in gene discovery, characterization and cloning works. Transcriptome data of CPB is a solid foundation for studies related to CPB development. The data is valuable for further studies on putative genes and proteins discovery that controls the development of CPB. Understanding the molecular mechanisms of CPB may lead to novel control methods of the pest.

Data Description

RNA-seq transcriptome data from CPB larvae and adult stages with three biological replicates have been obtained using the Illumina HiSeq™ 4000 sequencing platform. Raw reads from the sequencing have been uploaded to the NCBI Sequence Read Archive (SRA) database. Links and accession number to each sample fastq file is listed in Table 1. Over 135,915,430 clean reads were obtained from the Illumina sequencing. The number of reads for each sample is in Table 2. The data were also de novo assembled into full-length transcriptome (Table 3) and this can be replicated using the protocol in the methods section below.
Table 1

SRA accession numbers and links for raw data of CPB development transcriptome. CPB development stages: Larvae vs Adults.

StageReplicateAccession numberAccession links
LarvaeLar_L1SRX6450311https://www.ncbi.nlm.nih.gov/sra/SRX6450311
Lar_L3SRX6450312https://www.ncbi.nlm.nih.gov/sra/SRX6450312
Lar_L4SRX6450313https://www.ncbi.nlm.nih.gov/sra/SRX6450313
AdultsAdl_A2SRX6450314https://www.ncbi.nlm.nih.gov/sra/SRX6450314
Adl_A3SRX6450309https://www.ncbi.nlm.nih.gov/sra/SRX6450309
Adl_A4SRX6450310https://www.ncbi.nlm.nih.gov/sra/SRX6450310
Table 2

Statistics of raw and clean reads of CPB development transcriptome.

Raw
Clean
StageReplicateReadsBases (G)ReadsBases (G)
LarvaeLar_L1242467787.3234378557.0
Lar_L3203157756.1196508045.9
Lar_L4268970228.1250900457.5
AdultsAdl_A2260542017.8251269877.5
Adl_A3215603616.5202649926.1
Adl_A4229536736.9223447476.7
Table 3

Statistics of CPB development transcriptome assembly.

AttributesValue
Number of unigenes249,280
Percent GC42.12 %
Average contig length579.31 bp
Total assembled bases144,410,072 bp
SRA accession numbers and links for raw data of CPB development transcriptome. CPB development stages: Larvae vs Adults. Statistics of raw and clean reads of CPB development transcriptome. Statistics of CPB development transcriptome assembly.

Experimental Design, Materials and Methods

Insect collection and rearing

Cocoa pod borer (CPB) adult and larvae were collected from cocoa pod in the Malaysian Cocoa Board Bagan Datuk Plantation, Perak, Malaysia (3.894131 N 100.8642093 E). Cocoa pods showing premature yellowing of the husk, a characteristic symptom of CPB infestation were collected in the field.

cDNA library construction and high-throughput sequencing

Each sample was homogenized with liquid nitrogen in a mortar and dissolved in 1 ml of TRI Reagent (Thermo Fisher Scientific) per 100 mg tissue. Total RNA was purified using RNeasy mini kit (Qiagen, Inc., Valencia, CA) following the manufacture's protocol RNA extraction with modification step. Residual genomic DNA was removed using DNA-free™ DNA Removal Kit (Invitrogen), according to the manufacturer's instructions. RNA quality was assessed using Nanodrop 1000 spectrophotometer (Thermo Scientific, USA). The OD260/280 values of each RNA sample were between 1.8 and 2.0, indicating sufficient quality. Finally, the integrity of the total RNA sample was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, USA), with an expected RNA integrity number (RIN) threshold of 7.0. Poly(A) RNA was isolated using the NEBNext Poly(A) mRNA Magnetic Isolation Module and libraries were prepared using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina, both following the protocol of the manufacturer. Library construction and sequencing was performed by a commercial service provider Novogene Med NGS Clinical Laboratory (Tianjin, China) using Illumina HiSeq4000 at 150-bp paired-end (PE) reads (Table 1).

Data filtering

Weak signals and low-quality sequences were removed; read ends were also screened and trimmed for Illumina adaptor sequences by sequencing company. Approximately 142 million (142,027,810) reads were obtained, resulting in over 42.7 Gb of paired-end data. Raw reads containing ambiguous ‘N’ nucleotides (with ration of ‘N’ greater than 10%) and low quality sequences (with quality score less than 5) were removed using an in-house software (service by Novogene) in order to obtain clean read sequence (Table 2). Next, adaptor sequences were removed from the raw reads using Trimmomatic software (version 0.36) [1]. A thorough quality control on the trimmed reads was performed using FastQC software [2] written in Java to provide summary statistics for FASTQ files.

De novo assembly

Trinity RNA-Seq assembly software package version 2.9.1 [3] with command “Trinity –seqType fq–max_memory 80G –left [LEFT_READS_FILES] –right [RIGHT_READ_FILES] –CPU 48 –bflyCPU 12–min_contig_length 200 –full_cleanup” were used to de novo assemble the CPB transcriptome without reference genome. Transcriptome completeness was assessed by using BUSCO 4.0.6 [4] with 90.7% completeness against Insecta Odb10 and 85.7% completeness against Lepidoptera Odb10 BUSCO datasets. Trinity SuperTranscripts script were used to output the unigene sequences from the assembly (Table 3).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
SubjectBiochemistry, genetics and molecular biology
Specific subject areaTranscriptomics
Type of dataTableText file
How data were acquiredIllumina HiSeq™ 4000
Data formatRaw (FASTQ)
Parameters for data collectionCocoa pod borer larvae and adults were collected from Cocoa Board Bagan Datuk Plantation, Perak, Malaysia
Description of data collectionTranscriptome of Cocoa pod borer larvae and adults
Data source locationInstitution: Malaysian Cocoa BoardCity/Town/Region: Bagan Datuk, PerakCountry: MalaysiaLatitude and longitude for collected samples/data: 3.894131 N 100.8642093 E
Data accessibilityRepository name: NCBI Sequence Read Archive (SRA)Data identification number: BioProjectID: PRJNA553611Direct URL to data: https://www.ncbi.nlm.nih.gov/bioproject/553611Instructions for accessing these data:The raw sequence reads can be accessed from NCBI SRA with BioProjectID PRJNA553611
  3 in total

1.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

2.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

3.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

  3 in total
  1 in total

1.  Selection and Validation of Reference Genes for RT-qPCR Normalization in Bradysia odoriphaga (Diptera: Sciaridae) Under Insecticides Stress.

Authors:  Haiyan Fu; Tubiao Huang; Cheng Yin; Zhenhua Xu; Chao Li; Chunguang Liu; Tong Wu; Fuqiang Song; Fujuan Feng; Fengshan Yang
Journal:  Front Physiol       Date:  2022-01-11       Impact factor: 4.566

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.