Literature DB >> 34977297

Exploration of RNA-Seq data to identify a potential pathogen of the leaf-mining moth, Stomphastis thraustica (Meyrick, 1908) (Lepidoptera: Gracillariidae).

Kayvan Etebari1, Dianne B J Taylor2, Md Mahbubur Rahman2, Kunjithapatham Dhileepan2, Michael J Furlong1, Sassan Asgari1.   

Abstract

The leaf-mining moth, Stomphastis thraustica (Meyrick, 1908) was imported to Australia as a potential biological control agent of an exotic weed, bellyache bush (Jatropha gossypiifolia), from Peru. The insect colony has been maintained in the quarantine facility for over eight years but recently, significant mortality was observed in the culture. The larvae demonstrated swollen intersegments with a fragile integument. The infected larvae are cloudy muted green or yellowish whereas a healthy late instar larva is a vivid green. They slowly dehydrate and eventually die, at which point the larval body becomes rubbery and turns to black. We used next generation sequencing to identify the cause of mortality in the insects. Total RNA was extracted from 20 larvae in two cohorts, one with and one without apparent symptoms of disease, for deep sequencing on NovaSeq platform after eukaryote ribosomal RNA depletion. We identified several non-insect sequences belonging to viruses, bacteria, and fungi, but none of those showed significant abundance or enrichment in the infected dataset. The sequences related to a unicellular yeast, Saccharomyces cerevisiae, and they were among the highly expressed non-insect contigs; more than 5% of reads in both libraries mapped to the genome of this opportunistic microorganism. Crown
Copyright © 2021 Published by Elsevier Inc.

Entities:  

Keywords:  Bellyache bush; Insect pathology; Insect transcriptome; Insect viruses; Saccharomyces cerevisiae; Weed biological control; Yeast

Year:  2021        PMID: 34977297      PMCID: PMC8685975          DOI: 10.1016/j.dib.2021.107708

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The RNA-Seq data is the first transcriptome of Stomphastis thraustica larvae which facilitate future genomic study of this species. Our data is useful for biologists and insect pathologists who investigate the pathogenic role of Saccharomyces cerevisiae in insect populations. The transcriptomic analysis of this data provides a list of the microbial community of S. thraustica larvae. This data provides essential information and knowledge for future work on this under described insect species.

Data Description

Data described in this article originated from cDNA sequencing of two cohorts of Stomphastis thraustica late instar larvae, one with and one without apparent symptoms of disease. The first obvious sign of affected larvae is a decline in the feeding activity of late instar larvae in the leaf mine; they eventually die in the leaf. Dead larvae are more likely to be found in leaves located lower on the plant than leaves located near the top of the plant. If they exit the leaf, affected larvae are lethargic and some fall from the plant to the cage floor. Only a few of these affected individuals can successfully pupate. Affected larvae/prepupae are more delicate and they become injured easily. While healthy late instar larvae are a vivid green colour, affected larvae are cloudy muted green or yellowish and have swollen intersegments with a fragile integument. They slowly dehydrate and eventually die, when the larval body becomes rubbery and turns to black (Fig. 1).
Fig. 1

The progression of disease in Stomphastis thraustica larvae. A) healthy prepupa with vivid green colouration, B) cloudy, muted green colour with a fragile integument, C) dehydrated larva that is less active, and D) rubbery larva, body turns black after death.

The progression of disease in Stomphastis thraustica larvae. A) healthy prepupa with vivid green colouration, B) cloudy, muted green colour with a fragile integument, C) dehydrated larva that is less active, and D) rubbery larva, body turns black after death. In total, 207,772,228 paired-end reads were generated from two RNA-Seq libraries. We de novo assembled 26,816 contigs using CLC genomic workbench v21.0.5 from 191,110,344 clean and trimmed reads (Table 1). We also used unmapped reads to the proxy genomes as input for further de novo assembly of metagenome in CLC Microbial Genomics module from which 17,315 contigs were produced. More than 97% of trimmed reads mapped to the proxy genome references (Table 2). The assembled contigs are available in FASTA format (these files are accessible through Gene Expression Omnibus (GEO) series accession numbers GSE185938 at NCBI website). The outcome of BLASTx search for 8926 contigs from transcriptomic de novo assembly is available in Supplementary Table S1.
Table 1

Summary statistics of the de novo assembly of S. thraustica larvae.

Measurementde novo assembly in transcriptome mode (Length or count)de novo assembly in metagenome mode (Length or count)
Number of contigs26,81617,315
Number of contigs > 1kb43611783
Total length of contigs19,257,96410,335,551
Total length of contigs > 1kb7,329,7402,719,167
Minimum contig length300300
Maximum contig length25,02910,377
Mean contig length718597
Median contig length523462
N2514271036
N50773629
N75495425
N90371342
Table 2

Summary statistics of the mapping to the proxy genome reference.

Amyelois transitella
Conopomorpha cramerella
Read CountPercentageRead CountPercentage
Mapped reads185,533,10097.08187,504,64398.11
Not mapped reads5,577,2442.923,605,7011.88
Reads in pairs183,285,59895.91160,371,75283.91
Broken paired reads2,247,5021.1827,132,89114.19
Summary statistics of the de novo assembly of S. thraustica larvae. Summary statistics of the mapping to the proxy genome reference. We identified several partial sequences of insect-specific viruses from family Rhabdoviridae, Metaviridae, and Chuviridae but this data is not conclusive to consider those viruses as the cause of disease in the larvae (Tables 3 and S1). Some of those sequences might represent Endogenous Viral Elements (EVEs) which could be part of the S. thraustica genome [1]. These S. thraustica larval RNA-Seq data provided no evidence for the presence of detectable Microsporidia-like Nosema sp., which is a well-known cause of mortality in other lepidopteran larvae [2,3]. The sequences related to a unicellular yeast, Saccharomyces cerevisiae, are among the highly expressed non-insect contigs and more than 5% of reads in both libraries (symptomatic and asymptomatic) mapped to the genome of this opportunistic microorganism. Although previous studies have shown that S. cerevisiae can be pathogenic to insects [4], further investigation is required to confirm this potential cause of disease in S. thraustica larvae. We assume that S. cerevisiae causes mortality when the rearing or food conditions is not optimum for their insect host. The symptom of this infection is similar to a previously described case in Galleria mellonella, when the larvae injected with a lethal dose of S. cerevisiae turned black (consistent with melanization) within 30 min post injection [4].
Table 3

The list of identified non-insects’ sequences from S. thraustica transcriptome data.

Microorganism name*# SequenceGroup
Saccharomyces cerevisiae62Fungi
Ogataea polymorpha1Fungi
Talaromyces stipitatus1Fungi
Mixia osmundae1Fungi
Hanseniaspora opuntiae1Fungi
Kluyveromyces marxianus1Fungi
Histoplasma capsulatum1Fungi
Hubei lepidoptera virus 42Viruses
Lambdina fiscellaria nucleopolyhedrovirus2Viruses
Trichoplusia ni TED virus2Viruses
Xenotropic MuLV-related virus1Viruses
Scaldis River bee virus1Viruses
Xenotropic murine leukemia virus1Viruses
Orgi virus3Viruses
Gata virus1Viruses
Hubei odonate virus 111Viruses
Spodoptera frugiperda rhabdovirus1Viruses
Candidatus Woesearchaeota1Archaea
Escherichia coli3Bacteria
Acinetobacter baumannii3Bacteria
Salmonella enterica subsp. enterica serovar Typhi1Bacteria
Streptococcus pneumoniae1Bacteria
Actinomyces odontolyticus1Bacteria
Propionibacterium acnes1Bacteria
Vibrio anguillarum1Bacteria
Streptococcus mutans1Bacteria
Salmonella enterica1Bacteria
Streptococcus mutans1Bacteria
Anaerotignum lactatifermentans1Bacteria
Pseudomonas sp.1Bacteria
Bacteroidetes bacterium1Bacteria
Piscirickettsia salmonis1Bacteria
Pseudomonas syringae1Bacteria
Haemophilus influenzae1Bacteria
Pseudomonas amygdali pv. mori1Bacteria
Stylonychia lemnae1Eukaryota
Trichomonas vaginalis G31Eukaryota
Trypanosoma brucei brucei1Eukaryota
Brugia malayi2Nematoda

More information about these sequences can be found in supplementary Table S1.

The list of identified non-insects’ sequences from S. thraustica transcriptome data. More information about these sequences can be found in supplementary Table S1.

Experimental Design, Materials and Methods

Insect collection and sample preparation

The leaf-mining moth (S. thraustica) was imported from Peru to Australia in 2014 as a potential biological control agent of the weed, bellyache bush (Jatropha gossypiifolia). The insect colony has been maintained in the Department of Agriculture and Fisheries quarantine facility in Brisbane since that time. Bellyache bush is a serious weed of rangelands and riparian zones of northern Australia, and it has the potential to expand its range significantly in the region [5]. S. thraustica larvae enter the leaf and remain within the leaf until pupation. Prepupae are highly mobile, though most pupate on the leaf that they emerge from [6]. Recently, significant mortality was observed in the laboratory colony. We presume the microorganism associated with larval disease has been well established in the S. thraustica laboratory colony and most of the individuals are already infected with this potential pathogen and selecting a healthy or non-infected larvae was not easily possible. We used a next generation sequencing approach to identify the cause of mortality in the insects. Twenty larvae with and 20 larvae without obvious signs of disease (symptomatic and asymptomatic individuals) were preserved in an RNA stabilization reagent (RNAprotect®, QIAGEN Cat No.:76104) for further RNA extraction and sequencing. The whole larvae were transferred to Qiazol lysis reagent for RNA extraction according to the manufacturer's instructions (QIAGEN; Cat No.: 79306). The RNA samples were treated with DNase I for 1 h at 37°C and then their concentrations were measured using a spectrophotometer and integrity was ensured through analysis of RNA on a 1% (w/v) agarose gel. After checking the RNA quality, total RNA from two samples (symptomatic and asymptomatic individuals) were submitted to the Genewiz sequencing facility (Jiangsu, China) for library preparation (after eukaryote ribosomal RNA depletion) and strand specific total RNA sequencing on NovaSeq platform.

RNA-Seq data analysis

The CLC Genomics Workbench version 21.0.5 was used for bioinformatics analyses. Both libraries were trimmed from any vector or adapter sequences remaining. Low quality reads (quality score below 0.05) and reads with more than two ambiguous nucleotides were discarded. In the absence of a reference genome, we used a de novo assembly approach (word size 25, bubble size 50 and minimum contig length 300 bp) to process these data. The contigs were corrected by mapping all reads against the assembled sequences (min. length fraction, maximum mismatch, insertion, and deletion cost of 0.8, 2, 3 and 3 respectively). The Read Per Kilobase of transcript per Million mapped reads (RPKMs) value was calculated for each of the assembled contigs. To search for a potential pathogen, we retained all contigs above 600 bp and all contigs with RPKM above 10 (1446 contigs), regardless of their size, for downstream analysis. Due to lack of biological replicates, the nature of preparation of this RNA-Seq library, and the possibility of asymptomatic infection of the control group, these datasets are not suitable for assessment of differentially expressed genes. We also mapped the trimmed reads to the genomes of Conopomorpha cramerella (GCA_012932125.1) and Amyelois transitella (GCA_001186105.1) as proxy genome references to discard insect-specific reads. The unmapped reads were retained for metatranscriptome de novo assembly using CLC Microbial Genomics module. BLASTx was used to identify sequence similarity of all assembled contigs with protein database (nr). We consider a sequence as a potential candidate for further analysis, if it meets more than one of these criteria: (1) The sequence belongs to one of the well-known insect pathogens, (2) The sequence is among highly expressed contigs in the dataset, (3) The complete genome sequence has been identified, or (4) More than 5% of reads in the library mapped to that microorganism genome.

CRediT Author Statement

Kayvan Etebari: Conceptualization, Investigation, Methodology, Visualization, Data curation, Writing – original draft preparation; Dianne B. J. Taylor: Investigation, Resources, Writing – reviewing & editing; Md Mahbubur Rahman: Investigation, Resources, Writing – reviewing & editing; Kunjithapatham Dhileepan: Investigation, Resources, Writing – reviewing & editing; Michael Furlong: Resources, Writing – reviewing & editing; Sassan Asgari: Conceptualization, Methodology, Supervision, Writing – reviewing & editing.

Ethics Statement

This article is an original work of the authors. All procedures performed in studies involving animals were in accordance with the ethical standards of the institution or practice at which the studies were conducted and no human participants were involved in this article. Compliance with Ethical Standards.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.
SubjectAgricultural and Biological Sciences
Specific Subject AreaInsect transcriptomics and Pathogen discovery
Type of dataRNA-Seq Data, Tables and Figure
How data were acquiredHigh-throughput strand specific RNA sequencing after rRNA depletion on NovaSeq PE150 platform by Genewiz sequencing facility in China
Data formatRaw: FASTQ filesAnalysed: Assembled contigs in FASTA format
Description of data collectionTotal RNA was extracted from 20 Stomphastis thraustica larvae in two cohorts, with and without apparent symptoms of unknown disease in the laboratory population. CLC Genomic Workbench v21.5 was used for RNA-Seq analysis. We also used de novo assemble metagenome tool in CLC Microbial Genomics module for metatranscriptomic analysis of this data.
Data source locationA small leaf-mining moth, S. thraustica (Lepidoptera: Gracillariidae) was imported into Australia from Peru in 2014. The insect colony has been maintained at the Department of Agriculture and Fisheries quarantine facility in Brisbane, Queensland, Australia.
Data accessibilityRepository name: Deep sequencing data have been deposited in the National Centre for Biotechnology Information's (NCBI's) Gene Expression Omnibus (GEO) and are accessible through GEO series accession numbers GSE185938.Direct link to the dataset: https://www.ebi.ac.uk/ena/browser/view/PRJNA771399
  3 in total

1.  Genome-Wide Screen for Saccharomyces cerevisiae Genes Contributing to Opportunistic Pathogenicity in an Invertebrate Model Host.

Authors:  Sujal S Phadke; Calum J Maclean; Serena Y Zhao; Emmi A Mueller; Lucas A Michelotti; Kaitlyn L Norman; Anuj Kumar; Timothy Y James
Journal:  G3 (Bethesda)       Date:  2018-01-04       Impact factor: 3.154

2.  Pathogenicity of Nosema sp. (Microsporidia) in the diamondback moth, Plutella xylostella (Lepidoptera: Plutellidae).

Authors:  Nadia Kermani; Zainal-Abidin Abu-Hassan; Hamady Dieng; Noor Farehan Ismail; Mansour Attia; Idris Abd Ghani
Journal:  PLoS One       Date:  2013-05-13       Impact factor: 3.240

3.  In and Outs of Chuviridae Endogenous Viral Elements: Origin of a Potentially New Retrovirus and Signature of Ancient and Ongoing Arms Race in Mosquito Genomes.

Authors:  Filipe Zimmer Dezordi; Crhisllane Rafaele Dos Santos Vasconcelos; Antonio Mauro Rezende; Gabriel Luz Wallau
Journal:  Front Genet       Date:  2020-10-22       Impact factor: 4.599

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.