Literature DB >> 26823975

De novo construction of an expanded transcriptome assembly for the western tarnished plant bug, Lygus hesperus.

Erica E Tassone1, Scott M Geib2, Brian Hall3, Jeffrey A Fabrick4, Colin S Brent4, J Joe Hull4.   

Abstract

BACKGROUND: The plant bug Lygus hesperus Knight is a polyphagous pest of many economically important crops. Despite its pest status, little is known about the molecular mechanisms responsible for much of the biology of this species. Earlier Lygus transcriptome assemblies were limited by low read depth, or because they focused on specific conditions. To generate a more comprehensive transcriptome, we supplemented previous datasets with new reads corresponding to specific tissues (heads, antennae, and male reproductive tissues). This transcriptome augments current Lygus molecular resources and provides the foundational knowledge critical for future comparative studies.
FINDINGS: An expanded, Trinity-based de novo transcriptome assembly for L. hesperus was generated using previously published whole body Illumina data, supplemented with 293 million bp of new raw sequencing data corresponding to five tissue-specific cDNA libraries and 11 Illumina sequencing runs. The updated transcriptome consists of 22,022 transcripts (average length of 2075 nt), 62 % of which contain complete open reading frames. Significant coverage of the BUSCO (benchmarking universal single-copy orthologs) dataset and robust metrics indicate that the transcriptome is a quality assembly with a high degree of completeness. Initial assessment of the new assembly's utility revealed that the length and abundance of transcripts predicted to regulate insect physiology and chemosensation have improved, compared with previous L. hesperus assemblies.
CONCLUSIONS: This transcriptome represents a significant expansion of Lygus transcriptome data, and improves foundational knowledge about the molecular mechanisms underlying L. hesperus biology. The dataset is publically available in NCBI and GigaDB as a resource for researchers.

Entities:  

Keywords:  Lygus hesperus; Miridae; Plant bug; RNA-Seq; Transcriptome; Trinity

Mesh:

Year:  2016        PMID: 26823975      PMCID: PMC4730634          DOI: 10.1186/s13742-016-0109-6

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data description

Background

The western tarnished plant bug Lygus hesperus Knight is a polyphagous pest with an extensive host plant range including many economically important food, fiber, and seed crops [1]. While control measures have traditionally relied on broad-spectrum insecticides, negative ecological ramifications and evolving insecticide resistance have reduced the continued viability of this approach. As a consequence, there is growing interest in biorational-based strategies; however, the development of such approaches requires a comprehensive understanding of a species’ underlying biology. Towards this end, we previously reported on the sequencing and assembly of two L. hesperus transcriptomes: a general Roche 454-based assembly [2], and a second Illumina-based assembly incorporating sequence information from adults under thermal stress [3]. Those databases were developed using sequence data derived from whole bodies. Although this approach yields substantial data, whole body analysis tends to mask underrepresented genes that are expressed primarily in specific tissues or under specific conditions. To generate a more comprehensive transcriptome, here we supplement our previous thermal dataset with reads from specific tissues: heads, antennae, and male reproductive tissues. Incorporation of these new datasets expands the current L. hesperus database, provides greater depth of coverage, and enables new research for the better understanding of Lygus biology.

Samples

All samples and tissues were derived from an L. hesperus laboratory colony maintained at the United States Department of Agriculture-Agricultural Research Service (USDA-ARS) Arid Land Agricultural Research Center (ALARC) in Maricopa, Arizona, USA. The colony was reared at 27–29 °C under 20 % humidity with an L14:D10 photoperiod, and fed an artificial diet [4]. Nymphs and adults used for RNA preparation were from eggs deposited in agar oviposition packets and maintained as described previously [5]. Our initial Illumina-based transcriptome [3] was generated using 10-day old adults exposed for 4 h to one of three temperatures (4 °C, 25 °C, or 39 °C). To provide deeper coverage of transcripts encoding proteins functioning in olfaction, central nervous system-mediated behaviors, and male reproduction, sex-specific antennae, heads, and male accessory glands were dissected and stored at −20 °C in RNALater (Ambion/Life Technologies, Carlsbad, CA). The antennae samples represent ~500 unmated 7–9-day old adult males, and ~600 unmated 7–9-day old adult females. Heads (8–12 per stage/age per replicate) without antennae were collected across three biological replicates from 3rd instar nymphs, 4th instar nymphs, late 5th instar nymphs, and unmated adults of both genders at 1, 3, 7, 10, and 15 days post-eclosion. Accessory glands (30 per replicate) were dissected in phosphate-buffered saline from 7 to 8-day-old adult males 24 h post-mating and from similarly aged unmated cohorts. Total RNA extraction and library generation (TruSeq RNA Sample Preparation Kit v2; Illumina Inc., San Diego, USA) were performed as described previously [3] at the University of Arizona Genomics Center. All samples were sequenced using an Illumina HiSeq2000 or HiSeq2500 in Rapid Run mode (paired-end 100-bp reads).

Data filtering

Approximately 438 million reads were obtained, resulting in over 257 GB of 2 x 100 bp paired-end data. Raw read quality was assessed and filtered with a custom pipeline using FastQC (V 0.10.1) and Trimmomatic (V 0.32), using the parameters ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:10 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:36 to remove adapter sequences and filter by quality score. Short read archive (SRA) accessions for all data are found in Table 1.
Table 1

Accession numbers for L. hesperus sequence reads and assembled transcripts

SampleShort Read ArchiveBioSampleBioProject
10-day-old adultsa
 4 °CSRX483635, SRX483674, SRX483877SAMN02679940-42PRJNA238835
 25 °CSRX483950, SRX484037, SRX484042SAMN02679943-45"
 39 °CSRX484076, SRX484077, SRX484079SAMN02679946-48"
Antennae
 MaleSRX317887, SRX317888SAMN02222162-63PRJNA210219
 FemaleSRX317885, SRX317886SAMN02222160-61"
Accessory Gland
 MatedSRX318362, SRX318363SAMN02222164-65PRJNA210220
 UnmatedSRX318364, SRX318365SAMN02222166-67"
HeadSRX1072689, SRX1155625, SRX1155629SAMN03792993-95PRJNA284294

aData from Hull et al. 2014 [3]

Accession numbers for L. hesperus sequence reads and assembled transcripts aData from Hull et al. 2014 [3]

Transcriptome assembly

Data used for assembly corresponded to the ~145 million bp of sequence reads generated previously [3], and 293 million bp of new data from 11 Illumina runs covering five tissue-specific libraries. Prior to assembly, the four datasets (thermal-based, head, antennae, and accessory gland) were concatenated, and read abundance was normalized to 50X coverage using the in silico normalization tool in Trinity to improve assembly time and minimize memory requirements. Filtering and normalization reduced the dataset to 15 Gb, comprising approximately 32 million normalized read pairs, which were then assembled using default parameters in Trinity (r2014_07-17). Transcript expression levels were estimated with RSEM [6] and open reading frames (ORFs) were predicted using Transdecoder [7]. Hmmer3 was used to identify additional ORFs matching Pfam-A domains. Following transcriptome assembly, reads were filtered, sorted, and prepared for NCBI transcriptome shotgun assembly (TSA) submission as previously described [8].

Annotation

Functional annotation was performed at the peptide level using a custom pipeline [8] that defines protein products and assigns transcript names. Predicted proteins/peptides were analyzed using InterProScan5, which searched all available databases including Gene Ontology (GO) [9]. BLASTp analysis of the resulting proteins was performed with the UniProt Swiss Prot database (downloaded 11 February 2015). Annie [10], a program that cross-references SwissProt BLAST and InterProScan5 results to extract qualified gene names and products, was used to generate the transcript annotation file. The resulting .gff3 and .tbl files were further annotated with functional descriptors in Transvestigator [8].

Quality, completeness and depth of the comprehensive L. hesperus transcriptome

To assess the relative quality and completeness of our assembly, we compared core statistics for published Lygus transcriptomes [2, 3, 11] with those of the L. hesperus transcriptome described in this study (Table 2). The total number of sequence reads used in the current assembly represent 1660 and 300-fold increases over those used in the L. lineolaris transcriptome [11] and the initial Roche 454-based L. hesperus transcriptome [2] respectively. The expansion of read inputs resulted in average transcript lengths increasing from 725 to 2075 bp, and a larger percentage of transcripts with BLAST hits and assigned GO terms. Compared with the previously published Illumina transcriptome, inclusion of nearly three times the number of reads had little effect on average transcript length, and only marginally increased the N50 for the longest transcript per unigene (Table 2). However, low abundance isoforms were specifically removed during data normalization in the expanded assembly, a process that was modified from that used in the construction of the previous Illumina assembly. Consequently, while the expanded assembly represents less overall “gene space” than the previous assembly, it likely provides a more accurate reflection of the transcript landscape. More importantly, the expanded dataset increases overall coverage of transcripts critical to tissue-specific functions.
Table 2

Transcriptome assembly and annotation statistics compared with previous Lygus transcriptomes

Transcriptome
L. lineolaris a L. hesperus (454)b L. hesperus (thermal)c L. hesperus (current)d
Assembly
 Total no. read pairs262,5551,429,818144,898,116437,850,562
 Normalized reads (in silico normalization)--16,191,38332,342,216
 Total no. transcripts697036,13145,70622,022
 Average transcript length392 (100–3466)725 (2–13,480)2237 (300–23,322)2073 (297–23,350)
 Total assembled bases (all transcripts)-32,252,977102,246,19945,687,929
 Total assembled bases (longest transcript per unigene)-28.8 Mb39.8 Mb31.6 Mb
 N50 (all transcripts)-243029892610
 N50 (longest transcript per unigene)-184926382726
 %GC-0.410.440.45
 Proteins with complete ORF (%)---13,689 (62.1 %)
Annotation
 No. transcripts with a BLAST hit3126 (44.9 %)19,393 (54 %)-16,942 (76.9 %)
 No. transcripts with GO term2196 (31.5 %)7898 (21 %)-12,114 (54.9 %)
 PFAM-3705 (22.2 %)-14,575 (66.1 %)

Data from: aMagalhaes et al. 2013 [11]; bHull et al. 2013 [2]; cHull et al. 2014 [3]

Transcriptome assembly and annotation statistics compared with previous Lygus transcriptomes Data from: aMagalhaes et al. 2013 [11]; bHull et al. 2013 [2]; cHull et al. 2014 [3] The respective L. hesperus assemblies were also evaluated using the BUSCO (benchmarking universal single-copy orthologs) arthropod gene set [12], which uses 2675 near-universal single-copy orthologs to assess the relative completeness of genome and transcriptome assemblies. The percentage of conserved genes identified in the new L. hesperus assembly compares favorably with metrics reported for a number of insect transcriptomes and model insect genome assemblies (Table 3). Compared with the previous Illumina assembly, BUSCO genes in the new L. hesperus assembly were less fragmented, indicating the presence of more full-length sequences. The relatively high number of duplicates identified in the L. hesperus assemblies likely reflect isoforms of single unigenes, rather than true gene duplications.
Table 3

BUSCOa analysis of assembly completeness

SpeciesComplete (%)Duplicated (%)Fragment (%)Missing (%)
L. hesperus Transcriptomes
 454-basedb 56181329
 Illumina-thermalc 77431110
 Illumina-current74337.317
Select Insect Transcriptomesd
Nilaparvata lugens (GI:604923024)64-1915
Musca domestica (GI:510208131)6.4-6.387
Spodoptera exigua (GI:556694752)73-1114
Drosophila serrata (GI:570485056)8.5-2170
Select Insect Genomesd
Pediculus humanus (PhumU2)923.96.11.6
Acyrthosiphon pisum (GCA_000142985.2)726.11512
Drosophila melanogaster (Dmel_r5.55)986.40.60.3

aSimão et al. 2015 [13]

bHull et al. 2013 [2]

cHull et al. 2014 [3]

dsee Supplementary Data [12] for arthropod BUSCO assessments

BUSCOa analysis of assembly completeness aSimão et al. 2015 [13] bHull et al. 2013 [2] cHull et al. 2014 [3] dsee Supplementary Data [12] for arthropod BUSCO assessments Next, we used sequences encoding neuropeptides, G protein-coupled receptors, and chemosensory receptors to more fully evaluate the effect of expanding the current assembly with tissue-specific sequencing data. These gene sets mediate much of insect physiology and behavior, and are frequently characterized by spatially restricted expression. The query sequences used in the tBLASTx analyses are from two insect species (Nilaparvata lugens and Rhodnius prolixus) within the same phylogenetic order (Hemiptera) as L. hesperus. The first analysis, which used the 48 neuropeptide sequences reported in N. lugens [13] as queries, revealed nearly twice as many homologous sequences in the Illumina-based assemblies as in the initial Roche 454 assembly (Fig. 1). Subsequent searches using N. lugens G protein-coupled receptors [13] or Rhodnius prolixus chemosensory receptors [14, 15] as queries identified more transcripts ≥300 nt in length in the new, expanded assembly than in the previous transcriptomes. Based on these comparisons, we conclude that the expanded transcriptome represents a marked improvement over the first 454-based assembly, and provides greater coverage of tissue-specific transcripts, such as chemosensory genes and neuropeptide precursors, relative to the previous Illumina assembly. This expanded assembly extends previous work and provides a more comprehensive resource to facilitate the development of new research avenues into the molecular basis of L. hesperus biology.
Fig. 1

Relative transcript depth of the respective L. hesperus transcriptomes. tBLASTx analyses were performed using queries corresponding to genes of interest identified in genome assemblies of Nilaparvata lugens or Rhodnius prolixus. The L. hesperus transcriptomes analyzed include the initial Roche 454-based assembly [2], an Illumina-based thermal assembly [3], and the current assembly. tBLASTx search criteria for the neuropeptide analysis used an e-value of 10−1, whereas the G protein-coupled receptor (GPCR) and chemosensory receptor analyses used an e-value of 10−5 and transcripts ≥300 nt in length

Relative transcript depth of the respective L. hesperus transcriptomes. tBLASTx analyses were performed using queries corresponding to genes of interest identified in genome assemblies of Nilaparvata lugens or Rhodnius prolixus. The L. hesperus transcriptomes analyzed include the initial Roche 454-based assembly [2], an Illumina-based thermal assembly [3], and the current assembly. tBLASTx search criteria for the neuropeptide analysis used an e-value of 10−1, whereas the G protein-coupled receptor (GPCR) and chemosensory receptor analyses used an e-value of 10−5 and transcripts ≥300 nt in length

Availability of supporting data

The filtered and annotated transcriptome was deposited at GenBank as a TSA under the accession GDHC01000000, associated with BioProject PRJNA284294. NCBI accession identifiers for all of the associated SRA, Biosample, and Bioproject data repositories are listed in Table 1. Datasets further supporting the results of this article are available in the GigaScience repository, GigaDB [16].
  12 in total

1.  Characterization of male-derived factors inhibiting female sexual receptivity in Lygus hesperus.

Authors:  Colin S Brent; J Joe Hull
Journal:  J Insect Physiol       Date:  2013-12-11       Impact factor: 2.354

2.  Patterns of expression of odorant receptor genes in a Chagas disease vector.

Authors:  Jose Manuel Latorre-Estivalis; Emerson Soares de Oliveira; Barbara Beiral Esteves; Letícia Santos Guimarães; Marina Neves Ramos; Marcelo Gustavo Lorenzo
Journal:  Insect Biochem Mol Biol       Date:  2015-05-21       Impact factor: 4.714

3.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

4.  Transcriptome analysis of neuropeptides and G-protein coupled receptors (GPCRs) for neuropeptides in the brown planthopper Nilaparvata lugens.

Authors:  Yoshiaki Tanaka; Yoshitaka Suetsugu; Kimiko Yamamoto; Hiroaki Noda; Tetsuro Shinoda
Journal:  Peptides       Date:  2013-08-08       Impact factor: 3.750

5.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

6.  VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics.

Authors:  Karine Megy; Scott J Emrich; Daniel Lawson; David Campbell; Emmanuel Dialynas; Daniel S T Hughes; Gautier Koscielny; Christos Louis; Robert M Maccallum; Seth N Redmond; Andrew Sheehan; Pantelis Topalis; Derek Wilson
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

7.  Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae.

Authors:  Sheina B Sim; Bernarda Calla; Brian Hall; Theodore DeRego; Scott M Geib
Journal:  Gigascience       Date:  2015-03-31       Impact factor: 6.524

8.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

9.  Transcriptome-based identification of ABC transporters in the western tarnished plant bug Lygus hesperus.

Authors:  J Joe Hull; Kendrick Chaney; Scott M Geib; Jeffrey A Fabrick; Colin S Brent; Douglas Walsh; Laura Corley Lavine
Journal:  PLoS One       Date:  2014-11-17       Impact factor: 3.240

10.  Sequencing and de novo assembly of the western tarnished plant bug (Lygus hesperus) transcriptome.

Authors:  J Joe Hull; Scott M Geib; Jeffrey A Fabrick; Colin S Brent
Journal:  PLoS One       Date:  2013-01-24       Impact factor: 3.240

View more
  8 in total

1.  De Novo Transcriptome Characterization of a Sterilizing Trematode Parasite (Microphallus sp.) from Two Species of New Zealand Snails.

Authors:  Laura Bankers; Maurine Neiman
Journal:  G3 (Bethesda)       Date:  2017-03-10       Impact factor: 3.154

2.  A deep transcriptomic resource for the copepod crustacean Labidocera madurae: A potential indicator species for assessing near shore ecosystem health.

Authors:  Vittoria Roncalli; Andrew E Christie; Stephanie A Sommer; Matthew C Cieslak; Daniel K Hartline; Petra H Lenz
Journal:  PLoS One       Date:  2017-10-24       Impact factor: 3.240

3.  Effects of cold-acclimation on gene expression in Fall field cricket (Gryllus pennsylvanicus) ionoregulatory tissues.

Authors:  Lauren E Des Marteaux; Alexander H McKinnon; Hiroko Udaka; Jantina Toxopeus; Brent J Sinclair
Journal:  BMC Genomics       Date:  2017-05-08       Impact factor: 3.969

4.  Transcriptome profiling of ontogeny in the acridid grasshopper Chorthippus biguttulus.

Authors:  Emma L Berdan; Jonas Finck; Paul R Johnston; Isabelle Waurick; Camila J Mazzoni; Frieder Mayer
Journal:  PLoS One       Date:  2017-05-17       Impact factor: 3.240

5.  Nuclear genetic codes with a different meaning of the UAG and the UAA codon.

Authors:  Tomáš Pánek; David Žihala; Martin Sokol; Romain Derelle; Vladimír Klimeš; Miluše Hradilová; Eliška Zadrobílková; Edward Susko; Andrew J Roger; Ivan Čepička; Marek Eliáš
Journal:  BMC Biol       Date:  2017-02-13       Impact factor: 7.431

6.  Transcriptome Analysis Reveals Functional Diversity in Salivary Glands of Plant Virus Vector, Graminella nigrifrons.

Authors:  Swapna Priya Rajarapu; Raman Bansal; Priyanka Mittapelly; Andrew Michel
Journal:  Genes (Basel)       Date:  2020-10-29       Impact factor: 4.096

7.  De novo construction of an expanded transcriptome assembly for the western tarnished plant bug, Lygus hesperus.

Authors:  Erica E Tassone; Scott M Geib; Brian Hall; Jeffrey A Fabrick; Colin S Brent; J Joe Hull
Journal:  Gigascience       Date:  2016-01-28       Impact factor: 6.524

8.  Expression differences in Aphidius ervi (Hymenoptera: Braconidae) females reared on different aphid host species.

Authors:  Gabriel I Ballesteros; Jürgen Gadau; Fabrice Legeai; Angelica Gonzalez-Gonzalez; Blas Lavandero; Jean-Christophe Simon; Christian C Figueroa
Journal:  PeerJ       Date:  2017-08-21       Impact factor: 2.984

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.