Literature DB >> 25830018

Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae.

Sheina B Sim1, Bernarda Calla2, Brian Hall1, Theodore DeRego2, Scott M Geib2.   

Abstract

BACKGROUND: Bactrocera cucurbitae is a serious global agricultural pest. Basic genomic information is lacking for this species, and this would be useful to inform methods of control, damage mitigation, and eradication efforts. Here, we have sequenced, assembled, and annotated a comprehensive transcriptome for a mass-rearing sexing strain of this species. This forms a foundational genomic and transcriptomic resource that can be used to better understand the physiology and biochemistry of this insect as well as being a useful tool for population genetics.
FINDINGS: A transcriptome assembly was constructed containing 17,654 transcript isoforms derived from 10,425 unigenes. This transcriptome size is similar to reports from other Tephritid species and probably includes about 70-80% of the protein-coding genes in the genome. The dataset is publicly available in NCBI and GigaDB as a resource for researchers.
CONCLUSIONS: Foundational knowledge on the protein-coding genes in B. cucurbitae will lead to improved resources for this species. Through comparison with a model system such as Drosophila as well as a growing number of related Tephritid transcriptomes, improved strategies can be developed to control this pest.

Entities:  

Keywords:  Bactrocera cucurbitae; Melon fly; RNA-Seq; SIT; Sterile insect technique; Tephritidae; Translocation; White-pupae

Mesh:

Year:  2015        PMID: 25830018      PMCID: PMC4379760          DOI: 10.1186/s13742-015-0053-x

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data description

Background

Bactrocera cucurbitae (Diptera: Tephritidae) is an important agricultural pest attacking many fruits and vegetables in tropical and subtropical regions. To maximize efforts to control, mitigate the damage, and maintain eradication of this invasive species in the mainland United States, we must accumulate foundational information that describes the many aspects of the biology of this species. The data we present is a comprehensive transcriptome of the T1 translocated sexing strain of B. cucurbitae and represents genes expressed across all major life stages [1]. The white pupae trait of this line is sex-linked and makes this strain conducive to large-scale rearing and male-only mass release of sterilized flies for the sterile insect technique (SIT).

Samples

Samples were derived from the T1 white pupae translocated line of B. cucurbitae maintained at the USDA-ARS Daniel K. Inouye Pacific Basin Agricultural Research Center Insectary in Hilo, Hawaii, USA [1]. To generate samples that are representative of a broad range of life stages and ages, daily samples were collected from eggs (0–2 days old), larvae (~0-10 day post-hatch), pupae (0–10 days post-pupation) and adults (both unmated and mated males and females) as previously described [2]. Total RNA was extracted from samples across each stage, and then representative, stage-specific samples were generated through pooling to generate four RNA samples for sequencing which are described as NCBI BioSamples SAMN03010448- SAMN03010451 associated with BioProject PRJNA259566. For each sample, RNA was extracted using the Zymo Quick-RNA miniprep extraction kit (Zymo Research, Irvine, CA) following recommended procedures for whole-tissue extraction. The resulting RNA was quantified with the Qubit Broad Range RNA assay on a Qubit 2.0 fluorimeter (Life Technologies, Carlsbad, CA), and size and quality determined with an RNA 6000 nano chip on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA).

Sequencing

Total RNA was sent to the Beijing Genomics Institute (BGI Americas, at University of California, Davis, USA) and eukaryotic mRNA libraries were prepared using TruSeq technology (TruSeq RNA Kit v2). The resulting four libraries (egg, larvae, pupae, and adult) were barcoded and sequenced together on a single lane of Illumina HiSeq 2000, generating approximately 39.31 Gb of raw data from approximately 196 million 2 × 100 bp paired reads. These raw reads were filtered by quality and for adaptor contamination using an in-house pipeline at BGI, targeting reads containing adaptor, reads with more than 5% ambiguous bases, or reads with greater than 50% of bases with Phred quality score below 10. This reduced the reads to approximately 37.78 Gb after filtering, removing approximately 4% of the data. This filtered data was used as the input to the de novo assembly and also deposited at GenBank under the SRA accessions SRS691531- SRS691534.

Transcriptome assembly

A single representative de novo assembly was generated from a concatenation of the four libraries using the Trinity pipeline (r2014_07-17) [3,4]. In brief, read abundance was normalized in silico to 50X coverage, and then assembled using default Trinity parameters, with the exceptions of the addition of the ‘--jaccard_clip’ flag to reduce the generation of transcript fusions from non-strand specific data. After assembly, transcript and unigene level expression values were calculated using RSEM [5] and open reading frames (ORFs) were predicted with Transdecoder [4]. In addition to Transdecoder-predicted ORFs, ORFs were included that had a Pfam-A domain match utilizing Hmmer3 to perform searches [6]. Next, the raw transcriptome was filtered to discard poorly supported transcripts and maintain transcripts with strong evidence for protein-coding regions and reasonable support for being expressed. Transvestigator was implemented [7], and parameters were set to retain only those transcripts that have a transcript per million (TPM) value greater than 0.5, that at an isoform level represent at least 5% of the read abundance based expression for the parent unigene, and that have a predicted ORF. A similar filtering strategy was used for two previously published Tephritid transcriptomes [2,8], and should allow similar quality transcriptomes and comparison between these species. In addition to filtering the transcriptome, Transvestigator also prepared the dataset for NCBI Transcriptome Shotgun Assembly (TSA) submission by ensuring the predicted ORF is in the positive strand, confirming presence of only a single ORF per transcript, and generating a properly formatted NCBI .tbl file for submission. Details on transcriptome assembly and annotation statistics are listed in Table 1.
Table 1

Transcriptome assembly and annotation statistics compared with other Tephritid transcriptomes and the genome

Species B. cucurbitae B. dorsalis a C. capitata b D. melanogaster c
Number of read pairs used in assembly (SRA accession number)
Egg (SRA: SRS691534)4374131412462204--
Larvae (SRA: SRS691533)5156883511753084--
Pupae (SRA: SRS691532)470931781329114793256673-
Adult (SRA: SRS691531)465152434725012396929532-
Total18891857084756558190186205-
Normalized reads (in silico normalization)12792085779649117217414-
Unfiltered assembly
Number of unigenes (or Drosophila genes)5022047216118793-
N50 longest transcript/unigene219118821187-
Sum longest transcript/unigene (Mb)49.6340.2081.56-
Number of transcripts7668880345190958-
N50 transcript length (bp)262628022686-
Sum transcript length (Mb)100.20109.48236.18-
Transcripts per unigene1.531.701.61-
GC %38.1039.1136.21-
Filtered de novo assembly or current Drosophila release
Number of unigenes10425107841074115504
N50 unigene length (longest transcript/unigene)3464304333832979
Sum longest transcript/unigene (Mb)28.1224.4628.3430.53
Number of transcripts17654235392176125205
N50 transcript length (bp)3477346039133633
Sum transcript length (Mb)48.2862.0666.6568.47
Isoforms per unigene1.692.182.031.63
GC %40.1740.3239.4149.70
N50 protein length (amino acids)323301310370
Number of proteins with complete ORF (%)12936 (73.2)13017 (55.3)15740 (72.3)-
Annotation statistics
Number of proteins with PFAM domains identified130291661213646-
Number of proteins with Gene Ontology Terms10640-13648-
Number of proteins with gene names159561709315841-
Number of proteins with significant hit to Drosophila proteinsd160702071319245-

aData from Geib et al., 2014 [2]; bData from Calla et al., 2014 [8]; cData from Flybase r6.03 [11]; dBLASTP hit with e-value cutoff 1e-5.

Transcriptome assembly and annotation statistics compared with other Tephritid transcriptomes and the genome aData from Geib et al., 2014 [2]; bData from Calla et al., 2014 [8]; cData from Flybase r6.03 [11]; dBLASTP hit with e-value cutoff 1e-5.

Annotation

Annotation was performed at the peptide level, and annotations were used to generate a transcript name and product in addition to functional annotations. All predicted proteins were subjected to analysis through InterProScan5, searching all available databases including Gene Ontology and InterPro term lookup [9]. In addition, proteins were subjected to a BLASTP search against the UniProt SwissProt database (downloaded 10 November 2013). Transcripts were annotated with UniProt and InterProScan results using Annie [10], a program that extracts qualified gene names and products by cross-referencing SwissProt BLAST hits and performs database cross-referencing from InterProScan5 results. The resulting annotation file was provided to Transvestigator, described above, to include functional annotations in the resulting.gff3 and .tbl files. The filtered and annotated transcriptome was deposited at GenBank as a TSA under the accession GBXI00000000 associated with BioProject PRJNA259566. Annotation statistics are listed in Table 1.

Comparison of B. cucurbitae transcriptome with other published datasets

Two previously published de novo Tephritid transcriptomes (Bactrocera dorsalis [2] and Ceratitis capitata [8], as well as the current Drosophila melanogaster genome transcript and peptide datasets (Flybase.org r6.03) [11] were used to compare the relative quality and completeness of the B. cucurbitae transcriptome. Figure 1 displays a histogram distribution of transcript length and predicted peptide length for all four species, in addition to the raw unfiltered B. cucurbitae transcriptome dataset. This demonstrates that the relative distribution of transcript length is consistent with what is seen in other Tephritid species and also, the Tephritid distribution is consistent with what is seen in Drosophila. In addition, the majority of the filtered transcripts fall outside of the expected distribution, supporting their removal from the assembly. Looking at summary assembly statistics (Table 1), unigene and transcript abundance is similar to other Tephritid transcriptomes, and the proportion of transcripts that could be functionally annotated is similar across species. Based on these comparisons, the B. cucurbitae transcriptome presented here is of high quality, and should serve as a foundational resource to promote molecular and biochemical research on this important pest species.
Figure 1

Comparison oftranscriptome to related fly species. Distribution of (A) transcript length and (B) predicted polypeptide length of the B. cucurbitae transcriptome compared with published Bactrocera dorsalis and Ceratitis capitata de novo transcriptome assemblies and the current Drosophila melanogaster transcript/protein set (Flybase r6.03).

Comparison oftranscriptome to related fly species. Distribution of (A) transcript length and (B) predicted polypeptide length of the B. cucurbitae transcriptome compared with published Bactrocera dorsalis and Ceratitis capitata de novo transcriptome assemblies and the current Drosophila melanogaster transcript/protein set (Flybase r6.03).

Availability of supporting data

The filtered and annotated transcriptome has been deposited at GenBank as a transcriptome shotgun assembly (TSA) under the accession GBXI00000000 associated with BioProject PRJNA259566. Supporting data and analysis results also available from the GigaScience GigaDB database [12].
  10 in total

1.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Authors:  Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev
Journal:  Nat Protoc       Date:  2013-07-11       Impact factor: 13.491

2.  FlyBase--the Drosophila genetic database.

Authors:  M Ashburner; R Drysdale
Journal:  Development       Date:  1994-07       Impact factor: 6.868

3.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

4.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

5.  Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae.

Authors:  Sheina B Sim; Bernarda Calla; Brian Hall; Theodore DeRego; Scott M Geib
Journal:  Gigascience       Date:  2015-03-31       Impact factor: 6.524

6.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

7.  Characterizing the developmental transcriptome of the oriental fruit fly, Bactrocera dorsalis (Diptera: Tephritidae) through comparative genomic analysis with Drosophila melanogaster utilizing modENCODE datasets.

Authors:  Scott M Geib; Bernarda Calla; Brian Hall; Shaobin Hou; Nicholas C Manoukis
Journal:  BMC Genomics       Date:  2014-10-28       Impact factor: 3.969

8.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

9.  Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera.

Authors:  Brant C Faircloth; Michael G Branstetter; Noor D White; Seán G Brady
Journal:  Mol Ecol Resour       Date:  2014-09-29       Impact factor: 7.090

10.  A genomic perspective to assessing quality of mass-reared SIT flies used in Mediterranean fruit fly (Ceratitis capitata) eradication in California.

Authors:  Bernarda Calla; Brian Hall; Shaobin Hou; Scott M Geib
Journal:  BMC Genomics       Date:  2014-02-05       Impact factor: 3.969

  10 in total
  9 in total

1.  Recapitulation of the embryonic transcriptional program in holometabolous insect pupae.

Authors:  Alexandra M Ozerova; Mikhail S Gelfand
Journal:  Sci Rep       Date:  2022-10-20       Impact factor: 4.996

2.  Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae.

Authors:  Sheina B Sim; Bernarda Calla; Brian Hall; Theodore DeRego; Scott M Geib
Journal:  Gigascience       Date:  2015-03-31       Impact factor: 6.524

3.  Identification and preliminary characterization of chemosensory perception-associated proteins in the melon fly Bactrocera cucurbitae using RNA-seq.

Authors:  Samia Elfekih; Chien-Yu Chen; Ju-Chun Hsu; Mahdi Belcaid; David Haymer
Journal:  Sci Rep       Date:  2016-01-11       Impact factor: 4.379

4.  Isolation and Molecular Characterization of the Transformer Gene From Bactrocera cucurbitae (Diptera: Tephritidae).

Authors:  Ya Luo; Santao Zhao; Jiahui Li; Peizheng Li; Rihui Yan
Journal:  J Insect Sci       Date:  2017-01-01       Impact factor: 1.857

5.  De novo transcriptome assemblies of four xylem sap-feeding insects.

Authors:  Erica E Tassone; Charles C Cowden; S J Castle
Journal:  Gigascience       Date:  2017-03-01       Impact factor: 6.524

6.  A Chromosome-Scale Assembly of the Bactrocera cucurbitae Genome Provides Insight to the Genetic Basis of white pupae.

Authors:  Sheina B Sim; Scott M Geib
Journal:  G3 (Bethesda)       Date:  2017-06-07       Impact factor: 3.154

7.  De novo construction of an expanded transcriptome assembly for the western tarnished plant bug, Lygus hesperus.

Authors:  Erica E Tassone; Scott M Geib; Brian Hall; Jeffrey A Fabrick; Colin S Brent; J Joe Hull
Journal:  Gigascience       Date:  2016-01-28       Impact factor: 6.524

8.  Sequencing, de novo assembly and annotation of a pink bollworm larval midgut transcriptome.

Authors:  Erica E Tassone; Gina Zastrow-Hayes; John Mathis; Mark E Nelson; Gusui Wu; J Lindsey Flexner; Yves Carrière; Bruce E Tabashnik; Jeffrey A Fabrick
Journal:  Gigascience       Date:  2016-06-22       Impact factor: 6.524

9.  Twenty-Five New Viruses Associated with the Drosophilidae (Diptera).

Authors:  Claire L Webster; Ben Longdon; Samuel H Lewis; Darren J Obbard
Journal:  Evol Bioinform Online       Date:  2016-06-20       Impact factor: 1.625

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.