Literature DB >> 28369349

Genome sequence of a rice pest, the white-backed planthopper (Sogatella furcifera).

Lin Wang1, Nan Tang1, Xinlei Gao1, Zhaoxia Chang1, Liqin Zhang1, Guohui Zhou2, Dongyang Guo1, Zhen Zeng1, Wenjie Li1, Ibukun A Akinyemi1, Huanming Yang3, Qingfa Wu1,4,5.   

Abstract

Background: Sogatella furcifera is an important phloem sap-sucking and plant virus-transmitting migratory insect of rice. Because of its high reproductive potential, dispersal capability and transmission of plant viral diseases, S. furcifera causes considerable damage to rice grain production and has great economical and agricultural impacts. Comprehensive studies into ecological aspects and virus-host interactions of S. furcifera have been limited because of the lack of a well-assembled genome sequence. Findings: A total of 241.3 Gb of raw reads from the whole genome of S. furcifera were generated by Illumina sequencing using different combinations of mate-pair and paired-end libraries from 17 insert libraries ranging between 180 bp and 40 kbp. The final genome assembly (0.72 Gb), with average N50 contig size of 70.7 kb and scaffold N50 of 1.18 Mb, covers 98.6 % of the estimated genome size of S. furcifera . Genome annotation, assisted by eight different developmental stages (embryos, 1 st -5 th instar nymphs, 5-day-old adults and 10-day-old adults), generated 21 254 protein-coding genes, which captured 99.59 % (247/248) of core CEGMA genes and 91.7 % (2453/2675) of BUSCO genes. Conclusions: We report the first assembled and annotated whole genome sequence and transcriptome of S. furcifera . The assembled draft genome of S. furcifera will be a valuable resource for ecological and virus-host interaction studies of this pest.
© The Author 2017. Published by Oxford University Press.

Entities:  

Keywords:  Annotation; Assembly; Genomics; Sogatella furcifera genome

Mesh:

Year:  2017        PMID: 28369349      PMCID: PMC5437944          DOI: 10.1093/gigascience/giw004

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data description

The white-backed planthopper, Sogatella furcifera (Horvath), an r-strategy Hemiptera insect species, primarily feeds on rice plants and can migrate over long distances in the temperate and tropical regions of Asia and Australia [1]. The sucking of plant sap by S. furcifera reduces plant vigor, delays tillering and causes stunting, chlorosis, shriveling grains, and hopper burn, ultimately leading to rice plant death; it is responsible for the destruction of approximately 10 million hectares of rice crops annually [2]. More importantly, the planthopper transmits devastating rice viruses, including the Southern rice black-streaked dwarf virus (SRBSDV), which poses an additional threat to rice plants [3]. Insecticide misuse, cultivation of hybrid high-nutritional rice varieties, cultural and climatic factors, long-distance migration capability and robust fecundity of S. furcifera, together result in outbreaks of S. furcifera [2, 4].

Samples and sequencing

Inbred laboratory strains of Sogatella furcifera (Fig. 1) originated from the University of Science and Technology of China. A laboratory colony was maintained at 26°C with 70 % humidity under a 16:8 h light/dark photoperiod for two years, spanning at least 20 generations. An inbreeding line was obtained by single pair sib-mating for 18 generations. For genome sequencing, S. furcifera specimens from the F6 generation were used. Seventeen DNA libraries with insert sizes ranging between 180 bp and 40 kb were constructed to perform whole genome shotgun (WGS) sequencing following the standard protocol (Illumina, San Diego, CA, USA). This generated 241.3 Gb raw sequence reads with coverage of approximately 330× (Table 1).
Figure 1.

Photograph of Sogatella furcifera (white-backed planthopper) on a rice plant leaf. The scale bar of 1 mm is shown in the photograph.

Table 1.

Whole genome shotgun (WGS) reads used in sequencing of the Sogatella furcifera genome

Read lengthInsert sizeSequencing data
Libraries(bp)(bp)Total (G)Coverage(x)
MiSeq2 × 30047010.514.38
2 × 300
Paired-end2 × 12518015.020.54
2 × 12532013.017.80
2 × 12542011.015.06
2 × 1003502.43.28
2 × 9050011.615.89
2 × 1256008.912.19
2 × 12568012.817.53
Mate-pair2 × 125200012.617.26
2 × 9032003.44.65
2 × 125500010.013.69
2 × 12580007.810.68
2 × 12510 00015.921.78
2 × 12515 00048.866.84
2 × 12520 00024.233.15
2 × 12540 00018.625.47
Total241.3330.54

*The estimated genome size was 0.73 Gb

Photograph of Sogatella furcifera (white-backed planthopper) on a rice plant leaf. The scale bar of 1 mm is shown in the photograph. Whole genome shotgun (WGS) reads used in sequencing of the Sogatella furcifera genome *The estimated genome size was 0.73 Gb For whole transcriptome sequencing, total RNA was prepared from S. furcifella specimens at different developmental stages. Briefly, eight different developmental stages of S. furcifera were washed three times with 95 % ethanol to reduce microbial contamination from the body surface. After ethanol volatilization, each sample was quickly ground into a fine powder in liquid nitrogen. Total RNA was prepared using TRIzol reagent (Invitrogen). RNA quantitation was performed by UV absorbance and its quality was further confirmed by gel electrophoresis. RNA sequencing libraries were generated using Illumina mRNA-Seq Prep Kit. In total, 75.49 Gb data comprising 603.9 million reads were generated (Table S1). For RNA-seq libraries with an insert size of 300 bp, low quality bases (Phred score< 30) were trimmed and duplicated reads were removed.

Evaluation of genome size

Two different methods were used to estimate the genome size of S. furcifera. Firstly, flow cytometry was used according to a previous method [5] to estimate a genome size of about 730 Mb. Briefly, the heads of five male or female S. furcifera and five male Drosophila melanogaster (W118) specimens were ground in a tube with pestle in 200 μL labeling solution, because the genome size of Drosophila melanogaster has been clearly determined [6]. The mixture was filtered with a 70 μm filter(BD Falcon),then treated with RNaseA for 10 min at 37 °C and stained with 5 μg/mL propidium iodide (PI) for 2 h on ice. About 10 000 cell particles were analyzed on a flow cytometer (BD Falcon). The fluorescence value of PI was analyzed using Flowjo [7]. The mean C value (0.184 pg) and genome size (180 Mb) was calculated based on the internal D. melanogaster control. As a result, the genome size of a male S. furcifera is about 733 Mb or 0.75 pg (Fig. S1A). Experiments were conducted in triplicate. In addition, genome size was estimated based on the k-mer approach (k-mers, with k = 17) using Jellyfish [8]. In this study, approximately 56 Gb of the short-insert library sequencing data was used to generate a 17-mer depth-frequency curve (Fig. S1B). The S. furcifera genome size was estimated to be 735 Mb (Table S2) with 0.38 % heterozygosity as calculated by mlRho [3].

Genome assembly and evaluation

Adapter sequences, low-quality and duplicated reads were filtered out prior to read assembly; error correction was also performed to eliminate sequencing errors. For whole-genome assembly, short reads were first assembled using SOAPdenovo2 [9], longer reads were further used for scaffolding using SSPACE [10], and finally GapCloser [11] was used to fill the gaps with short reads. Briefly, sequences derived from the short-insert libraries were decomposed into k-mers to construct the de Bruijn graph, which was simplified to allow remaining k-mers to be joined as contigs. All short-insert and large-insert libraries were mapped onto contigs for scaffold building by utilizing paired-end and mate-pair information. Paired-end and mate-pair information was subsequently applied to link contigs into scaffolds using a step-wise approach (from small to large-insert libraries). Finally, intra-scaffold gaps were filled using short-insert libraries in which one read uniquely mapped to a contig, and the other member of the pair located to a gap region. The resulting genome assembly size was 720 Mb, representing 98.6 % of the estimated genome,with a final scaffold N50 length of 1 185 287 bp and a contig N50 length of 70 730 bp. The longest contig and scaffold were 799 kb and 12.7 Mb, respectively (Table 2). In addition, reads which were mapped onto the mitochondria and Wolbachia symbiont genome sequence were extracted and assembled, and the assembled genome sizes were 16 Kb and 1.6 Mb, respectively.
Table 2.

Sogatella furcifera genome assembly statistical analysis

ContigScaffold
Size (bp)NumberSize (bp)Number
N90923212 25385 450890
N8021 0477536319 035489
N7035 4055076529 262317
N6051 8833500845 521207
N5070 73023901 185 287133
Longest799 91212 788 806
Total size673 904 942720 705 630
Total number (>10 000 bp)602 082 27311 792697 471 0282567
Total number (>100 000 bp)258 085 0681448649 732 076840
Sogatella furcifera genome assembly statistical analysis Four independent measures were used to assess the accuracy of the assembly. First, reads from paired-end libraries were mapped onto the assembly, and a greater than 10-fold effective depth was obtained across 95.55 % of the draft genome (Table S3). Second, when assembled transcripts derived from mRNA, expressed sequence tags (ESTs), and RNA-seq data from multiple developmental stages (Table S4) were mapped to the genome assembly, this revealed a transcript coverage rate of ∼98 %, suggesting that the genome was sufficiently complete for gene prediction and analysis (Tables S5–S7). Third, a core eukaryotic genes (CEG) mapping approach (CEGMA) dataset comprising 248 CEGs [12] was used to evaluate the completeness of the draft: 94.8 % (235/248) of genes were completely covered by the assembly, and 99.6 % (247/248)of genes were at least partially covered by the assembly genome (Table S8). The 5 % (13/248) incomplete genes, including three transcription factors, one ubiquitin-conjugating enzyme E2, and one mannose-6-phosphate isomerase, exhibited no obvious functional bias. Fourth, a benchmarking universal single-copy orthologs (BUSCO) dataset was used to evaluate the completeness of the draft: 91.7 % (2453/2675) of genes were covered by the assembly genome (Table S9), which was higher than 81.34% of Nilaparvata lugens and 81.71% of Acyrthosiphon pisum, respectively, because S. furcifera, N. lugens and A. pisum all belong to the Hemiptera order and were used for comparison.

Genome characterization and repeat annotation

The S. furcifera genome has a G+C content of 31.6 %; compared to other hemipteran species, this is slightly higher than that of the pea aphid A. pisum (29.6 %) [13] but lower than that of the brown planthopper N. lugens (34.6 %) [14]. Homology-based and de novo prediction analyses were both used to identify the transposable elements (TEs) and other repetitive content in the S. furcifera genome. For homology-based analysis, Repbase (version 20120418) [15] was used to perform a TE search with RepeatMasker (3.3.0) [16] and the WuBlast [17] search engine. For de novo prediction analysis, RepeatModeler [18] was used to construct a TE library. Elements within the library were then classified using a homology search with Repbase and a Support Vector Machine (SVM) method (TEClass) [19]. A total of 44.3 % of the S. furcifera genome consists of tandem repeats and TEs (Table 3); lower than the brown planthopper (48.6 %) but much higher than the pea aphid (33.3 %). Class I TEs (retroelements) represent 15.3 % of the total genome assembly (9.52 % long interspersed nuclear elements (LINEs), 4.30 % long terminal repeats (LTRs) and 1.48 % short interspersed nuclear elements (SINEs)), whereas class II TEs (DNA transposons) account for 17.33 %. In addition, the S. furcifera genome was annotated with seven centromeric sequences [20], eight telomeric sequences[20] (Table S10) and 169 932 microsatellite regions respectively.
Table 3.

Transposable element (TE) content of the Sogatella furcifera genome, derived from RepeatMasker analysis

RepBase TEsTE Proteins De novo Combined TEs
Length (bp)% of genomeLength (bp)% of genomeLength (bp)% of genomeLength (bp)% of genome
DNA3 946 7300.544 606 6590.63120 249 93616.54126 002 32317.33
LINE5 042 8060.6928 043 9193.8544 814 3936.1669 257 9829.52
SINE810 4480.110010 821 2651.4810 730 7221.48
LTR3 346 2750.467 721 6081.0628 298 6603.8931 286 5524.30
Other975 6770.13317 1390.042388456032823 167 3383.18
Unknown00.0000.0028 395 6393.9028 395 6393.90
Total14 121 9361.9440 689 3255.59256 464 45335.27288 840 55639.73

Note: LINE: long interspersed nuclear element; LTR: long terminal repeat; SINE: short interspersed nuclear element.

Transposable element (TE) content of the Sogatella furcifera genome, derived from RepeatMasker analysis Note: LINE: long interspersed nuclear element; LTR: long terminal repeat; SINE: short interspersed nuclear element.

Annotation of coding and non-coding genes

The S. furcifera genome was annotated by combining homology-based methods referring to protein sequences from seven representative insects, ab initio gene prediction (GENSCAN [21] and AUGUSTUS [22]), and RNA-seq data from eight different developmental stages (embryos, 1st–5th instar nymphs, 5-day-old adults and 10-day-old adults) (Table S1). For homology-based gene prediction, we aligned Bombyx mori, D. melanogaster, Apis mellifera, A. pisum, Rhodnius prolixus, Tribolium castaneum and Pediculus humanus proteins (from the Ensembl database [23]) to the S. furcifera genome using TblastN [24] with an E-value ≤ 1E−5, and then used GeneWise2.2.0 [25] for spliced alignment and prediction of gene structures. Secondly, for ab initio prediction, GENSCAN [21] and AUGUSTUS [22] were used to predict genes based on repeat-masked genome sequences. Short genes of <150 bp coding DNA sequences (CDS) were filtered from the resultant data sets. Thirdly, gene structure was identified using a transcriptome-based approach by mapping all RNA reads of the eight different developmental stages onto the S. furcifera genome using TopHat [26]. Mapping results were subsequently sorted and merged, and Cufflinks [27] was used to identify gene structures for gene annotation (Table 4). Finally, all predicted gene structures were integrated with EVidenceModeler (EVM) [28] to yield a consensus gene set containing 21 254 protein-encoding genes (Table 4), and an estimated 88 184 splice junctions [26]. Based on OrthoMCL analysis [29], the 21 254 protein-encoding genes can be assigned into 9096 single family genes, and 2963 families with 12 158 genes, indicating that S. furcifera has 57% duplicated genes.
Table 4.

Characteristics of the predicted protein-coding genes in the Sogatella furcifera assembly

GeneCoding DNAsequenceExonExonIntron
Gene setNumberlength (bp)sequence length (bp)per genelength (bp)length (bp)
De novo AUGUSTUS44 60010 406.5115076.042491659.97
GENSCAN44 1609280.9611064.252601505.23
GeneWise:
Homolog Bombyx mori 11 6874890.787712.872682197.30
D. melanogaster 19 6716574.528843.442572328.74
Apis mellifera 18 1608437.5110004.082442411.35
Acyrthosiphon pisum 55 2501414.848651.276812032.34
Rhodnius prolixus 29 8424534.727752.822742054.77
Tribolium castaneum 33 0964704.189252.633512315.42
Pediculus humanus 17 7857868.229864.012452282.51
RNA-Seq28 1834049.8318003.335402504.86
EVidenceModeler21 25412 584.2415776.472432011.27
Characteristics of the predicted protein-coding genes in the Sogatella furcifera assembly To designate gene names to each predicted protein-encoding locus, gene function information, protein motifs and domains were assigned through comparison with public databases, including Swiss-Prot [30], the Kyoto Encyclopedia of Genes and Genomes (KEGG) [31], Gene Ontology (GO) [32], TrEMBL [33] and InterPro [34]. A BLASTP [35] search of those proteomes was performed against the SwissProt [30] and TrEMBL database, with an E-value ≤ 1E−5. KEGG annotation was based on the KEGG Automatic Annotation Server (KAAS) [36] and GO annotation analysis was based on InterPro [34]. The 21 254 protein-encoding genes had 12 699 hits in InterPro, of which 8633 also had GO associations. In total, 59.7 %, 40.61 %, 31.26 %, 52.23 % and 68.47 % of the protein-coding gene can be assigned to known homologs in InterPro [34], GO [32], KEGG [31], SwissProt [30] and TrEMBL databases [33], respectively (Table 5). In combination, 14 990 (70.52 %) were similar to proteins from known databases. OrthoMCL analysis [29] revealed that 59.55 % and 63 % of S. furcifera proteins have homologs in the brown planthopper and pea aphid, respectively.
Table 5.

Summary of functional annotation

Sogatella furcifera
Gene numberPercent of total genes (%)
Total21 254
InterPro12 69959.74
GO*863340.61
AnnotatedKEGG664631.26
Swiss-Prot11 10252.23
TrEMBL14 55368.47
Annotated14 99070.52
Unannotated626429.47

Note: Five proteins databases were chosen to assist the function prediction of genes: InterPro, GO, KEGG, Swiss-Port, and TrEMBL. The table shows numbers of genes matched in each database. *GO assignments were based on InterPro. KEGG: Kyoto Encyclopedia of Genes and Genomes, GO: Gene Ontology.

Summary of functional annotation Note: Five proteins databases were chosen to assist the function prediction of genes: InterPro, GO, KEGG, Swiss-Port, and TrEMBL. The table shows numbers of genes matched in each database. *GO assignments were based on InterPro. KEGG: Kyoto Encyclopedia of Genes and Genomes, GO: Gene Ontology. By performing a homology search across the whole genome sequence, four types of non-coding RNAs (ncRNAs) were annotated in our analysis: miRNA, tRNA, rRNA, and snRNA. tRNAscan-SE [37] was used to predict tRNA. snRNAs were predicted by alignment using BlastN and using INFERNAL (v.0.81) [38] to search against the Rfam database [39]. rRNAs were found by BlastN alignment against other insects’ rRNA as reference sequences. In total, 172 rRNAs were identified and assigned into four rRNA families (Table S11): 2256 tRNAs (including all 20 tRNA genes used for decoding standard amino acids) (Tables S11 and S12), 176 snRNAs (Table S11) comprising 37 H/ACA box RNAs and 139 snRNAs involved in splicing, as well as 382 identified miRNAs, were assigned into 110 families [40].

Transcriptome analyses of ontogenetic development of S. furcifera

Low-quality bases were trimmed from RNA-seq data using a PERL script and adapters were removed using Cutadapt (version 1.3) [41]. The remaining reads were mapped against the reference sequences of rRNA and Southern rice black-streaked dwarf virus (SRBSDV) by Bowtie2 [42] using default parameters, and reads matching rRNA or SRBSDV were discarded. Filtered reads were mapped to the assembled S. furcifera genome with STAR [43] with the parameters: –runThreadN 8, –outFilterMultimapNmax 20, –outFilterMismatchNmax 4, –outFilterIntronMotifs: RemoveNoncanonical. Cuffdiff2 [44] was subsequently employed to calculate normalized fragments per kilobase of exon per million fragments mapped (FPKMs) based on the aligned BAM files of all RNA-seq libraries, in which the original FPKMs of each library were scaled by a library size factor computed as the median of ratios to the geometric means of original FPKMs across all libraries. Differentially expressed genes (DEGs, P-value<0.05 and |log2(fold change)|>1) in at least one comparison between any two conditions were identified with Cuffdiff2 [44] using the blind mode suited for a single replicate in each condition. k-means clustering was then performed on the gene expression profiles of DEGs with k = 8 using Gene Cluster 3.0 [45], and differential expression patterns were visualized using Java TreeView [46]. To obtain a global overview of S. furcifera development, the transcriptomes of eight different developmental stages, including embryo (emb), 1st–5th instar nymphs (1in, 2in, 3in, 4in, and 5in), 5-day-old adult (5d) and 10-day-old adult (10d), were investigated using RNA-seq, with pairwise comparisons between each pair of developmental stages. A total of 4166 genes were identified as significant DEGs. Analysis of the number of genes reaching their maximum expression levels at each developmental stage showed that gene expression levels in the embryo and 10-day-old adult differed significantly from those in the other stages. GO enrichment analysis using the R package phyper [47] for all DEGs across all development stages demonstrated that these DEGs are implicated in diverse biological processes, including chitin metabolic processes, proteolysis, oxidation–reduction processes, homophilic cell adhesion and microtubule-based processes (Fig. 2).
Figure 2.

Gene ontology (GO) enrichment analysis for differentially expressed genes in eight different developmental stages of Sogatella furcifera. All differentially expressed genes were subjected to GO analysis – the top 20 enriched terms are shown here.

Gene ontology (GO) enrichment analysis for differentially expressed genes in eight different developmental stages of Sogatella furcifera. All differentially expressed genes were subjected to GO analysis – the top 20 enriched terms are shown here.

Expression profile clustering and expression pattern identification

k-means clustering on the 4166 DEGs revealed 8 different expression patterns (Fig. 3, Figs. S1–S8 and Table S13).
Figure 3.

k-means clustering for differentially expressed genes and expression patterns. (A) Eight expression patterns are shown on the left. Heat map shows the relative expression levels of each transcript (rows) in each sample (column). Normalized fragments per kilobase of exon per million fragments (FPKMs) calculated by Cuffdiff2 were log2-transformed and then median-centered by transcript. Heatmap was drawn based on clustering results. Red color represents higher expression; green represents lower expression. Note: red asterisks (*) on the left side of the figure indicate that expression at the corresponding stage is higher than the average expression level. Abbreviations: emb: embryo; 1in: 1st instar nymph; 2in: 2nd instar nymph; 3in: 3rd instar nymph; 4in: 4th instar nymph; 5in: 5th instar nymph; 5d: 5-day-old adult; 10d: 10-day-old adult. (B) The average of log2-transformed FPKM corresponding genes in each pattern.

A total of 864 genes were highly expressed in the embryo and at the 1st instar nymph stage, with slightly higher expression levels in the embryo than in the 1st instar nymph and down-regulation in later stages. GO enrichment analysis [47] associated genes in expression pattern 1 with the G protein-coupled receptor signaling pathway, ion transport, multicellular organismal development, and the Wnt receptor signaling pathway, among others. Wnt genes are important in embryogenesis and cell differentiation in both vertebrates and insects [48]. Wnt-4 (Sfur-175.24), Wnt-10a (Sfur-90.42) and Wnt-16 (Sfur-884.2) were clustered in pattern 1 and exhibited highest expressions during the embryo stage, suggesting that the three Wnt genes are involved in the embryo development of the pest. A total of 134 genes were classified into expression pattern 2. These genes were specifically expressed in nymph instar stages 1–5. GO enrichment analysis associated these genes with catecholamine biosynthetic processes and aromatic amino acid family metabolic processes. Catecholamine is required for insect cuticle sclerotization; catecholamine conjugates are sequestered in the hemolymph during nymphal feeding periods for later use as tanning-agent precursors in cockroaches [49]. Tyrosine 3-monooxygenase catalyzes the initial and rate-limiting step in catecholamine biosynthesis [49]. Two tyrosine 3-monooxygenase (Sfur-187.36 and Sfur-7.15), belonging to expression pattern 2, were specifically expressed in the nymph stages, suggesting that these might be responsible for catecholamine biosynthesis and have roles in the later cuticle sclerotization process. A total of 525 genes exhibited specifically low expression in embryos but high expression in the post-embryonic stages. These genes have been implicated in processes such as metabolism, oxidation–reduction, and transmembrane transport. Several genes involved in fatty acid synthesis, including fatty acid synthase (Sfur-509.5, Sfur-698.4 and Sfur-72.518) and long-chain-fatty-acid-CoA ligase (Sfur-188.12 and Sfur-215.3), belonged to pattern 3 and were highly expressed in the nymph and adult stages. A total of 591 genes exhibited relatively higher expression levels from the 2nd to the 5th instar nymph, but were slightly down-regulated in the 4th instar nymph. These genes are enriched in nucleosome assembly, reciprocal meiotic recombination, DNA catabolism, and chitin metabolic processes. A total of 523 genes were specifically expressed in the 3rd and 4th instar nymphs, including genes participating in microtubule-based movement, glycolysis, glycerol metabolic processes, and positive regulation of apoptotic processes. Glycolysis can be suppressed during molting to direct a feeding, growing larva to convert to the immobile non-feeding stage [50]. Several genes participating in glycolysis, including fructose-bisphosphate aldolase (Sfur-330.7) and pyruvate kinase (Sfur-409.10 and Sfur-78.49), belonged to pattern 5; these were highly expressed in the 3rd and 4th instar nymphs and down-regulated in the 5th instar nymph. A total of 218 genes were highly expressed from the 3rd instar nymph to the 5-day-old adult, but the expression levels were higher in the 3rd and 4th instar nymphs than in the 5th instar nymphs and 5-day-old adults. These genes are related to processes including protein metabolism, microtubule-based processes, and negative regulation of biosynthesis. A total of 609 genes were highly expressed from the embryo to the 5th instar nymph, then down-regulated in adults. These genes were enriched in biological processes including chitin metabolism and catabolism, alcohol metabolism, steroid hormone mediated signaling pathways, and ecdysis, chitin-based cuticles. Many genes belonging to pattern 7 were involved in chitin metabolic processes and were highly expressed from embryo to the 5th instar nymphs; for example,endochitinase (Sfur-47.4), chitinase (Sfur-18.277, Sfur-203.28 and Sfur-453.6), chitotriosidase (Sfur-97.70), and peritrophin (Sfur-105.51, Sfur-84.51 and Sfur-203.26). Chitin is the main component of insect exoskeleton and peritrophic matrix. Chitin metabolism is coupled with insect growth and development. Because the exoskeleton and peritrophic matrix are regularly replaced and renewed during insect growth, chitin metabolic process should be active throughout the embryo and nymph stages until growing and melting stages are completed. The crucial role of chitin in insect development and survival has led to chitin-related genes being targeted for the development of pest control strategies [51]. A total of 702 genes were specifically expressed in the 5-day-old and 10-day-old adults. These genes are involved in processes such as lipid transport, DNA repair, cell–cell signaling, and microtubule-based movement. Vitellogenin, a member of the lipid transport protein family, is a precursor of egg yolk, which is specifically expressed in females and provides the essential nutrients required for egg development [52]. Vitellogenin (Sfur-20.301, Sfur-20.304, Sfur-3160.1, Sfur-496.9, Sfur-15.299 and Sfur-20.302) belonged to pattern 8 and was highly expressed in 5-day-old and 10-day-old adults, suggesting that they are essential for the S. furcifera reproduction process. k-means clustering for differentially expressed genes and expression patterns. (A) Eight expression patterns are shown on the left. Heat map shows the relative expression levels of each transcript (rows) in each sample (column). Normalized fragments per kilobase of exon per million fragments (FPKMs) calculated by Cuffdiff2 were log2-transformed and then median-centered by transcript. Heatmap was drawn based on clustering results. Red color represents higher expression; green represents lower expression. Note: red asterisks (*) on the left side of the figure indicate that expression at the corresponding stage is higher than the average expression level. Abbreviations: emb: embryo; 1in: 1st instar nymph; 2in: 2nd instar nymph; 3in: 3rd instar nymph; 4in: 4th instar nymph; 5in: 5th instar nymph; 5d: 5-day-old adult; 10d: 10-day-old adult. (B) The average of log2-transformed FPKM corresponding genes in each pattern.

Conclusions

The lack of a genome sequence for S. furcifera has hindered comprehensive studies of the life pattern and adaptive features that have made it a successful insect pest. We present the first annotated genome sequence of S. furcifera and data on the corresponding protein-coding genes that will aid future detailed studies into the insect's biology and virus–host interactions.

Availability of supporting data

Raw and transcriptomic data is available via NCBI bioproject PRJNA331022. Further data and scripts supporting the results of this article are available in the GigaScience GigaDB repository [53].

Additional file

Supplementary data are available at online. Additional file 1: Table S1. Transcriptome sequencing data statistics. Additional file 2: Table S2. Estimation of Sogatella furcifera genome size using k-mer analysis. Additional file 3: Table S3. Alignment information of short read mapping to the genome. Additional file 4: Table S4. RNA-seq datasets used in this study. Additional file 5: Table S5. Assessment of genome coverage by Sogatella furcifera transcripts assembled from mutiple developmental stages reads. Additional file 6: Table S6. Assessment of genome coverage by Sogatella furcifera expressed sequence tags. Additional file 7: Table S7. Assessment of genome coverage by Sogatella furcifera assembled transcripts. Additional file 8: Table S8. Genome assembly completeness evaluated on 248 core eukaryotic genes. Additional file 9: Table S9. Genome assembly completeness evaluated using benchmarking universal single-copy orthologs (BUSCO). Additional file 10: Table S10. Centromeric and telomeric DNA in Sogatella furcifera. Additional file 11: Table S11. Non-protein-coding genes in Sogatella furcifera. Additional file 12: Table S12. Isotype and anticodon count distribution of tRNA. Additional file 13: Table S13. Gene numbers in each expression pattern. Additional file 14: Figure S1. Estimation of Sogatella furcifera genome size based on flow cytometry and 17-mer statistics. (A) Estimation of Sogatella furcifera genome size based on flow cytometry. Comparing with the Drosophila melanogaster (D. mel) genome, the C value of S furcifera (S. fur) genome was estimated to be 0.75 pg. The genome size of S. furcifera was estimated to be 733 Mb. (B) Estimation of S. furcifera genome size based on 17-mer statistics. In total 56.76 Gb of short (<1 kb) paired-end genome sequencing reads were used to generate the 17-mer sequences. The genome size of S. furcifera was estimated to be 730 Mb based on the formula: 17-mer number/17-mer depth. Additional file 15: Figure S2. Gene Ontology enrichment analysis for genes in expression pattern 1. Additional file 16: Figure S3. Gene Ontology enrichment analysis for genes in expression pattern 2. Additional file 17: Figure S4. Gene Ontology enrichment analysis for genes in expression pattern 3. The top 20 enriched terms are shown. Additional file 18: Figure S5. Gene Ontology enrichment analysis for genes in expression pattern 4. Additional file 19: Figure S6. Gene Ontology enrichment analysis for genes in expression pattern 5. Additional file 20: Figure S7. Gene Ontology enrichment analysis for genes in expression pattern 6. Additional file 21: Figure S8. Gene Ontology enrichment analysis for genes in expression pattern 7. Additional file 22: Figure S9. Gene Ontology enrichment analysis for genes in expression pattern 8. The top 20 enriched terms are shown. Additional file 1: Table S1. Transcriptome sequencing data statistics. Additional file 2: Table S2. Estimation of Sogatella furcifera genome size using k-mer analysis. Additional file 3: Table S3. Alignment information of short read mapping to the genome. Additional file 4: Table S4. RNA-seq datasets used in this study. Additional file 5: Table S5. Assessment of genome coverage by Sogatella furcifera transcripts assembled from mutiple developmental stages reads. Additional file 6: Table S6. Assessment of genome coverage by Sogatella furcifera expressed sequence tags. Additional file 7: Table S7. Assessment of genome coverage by Sogatella furcifera assembled transcripts. Additional file 8: Table S8. Genome assembly completeness evaluated on 248 core eukaryotic genes. Additional file 9: Table S9. Genome assembly completeness evaluated using benchmarking universal single-copy orthologs (BUSCO). Additional file 10: Table S10. Centromeric and telomeric DNA in Sogatella furcifera. Additional file 11: Table S11. Non-protein-coding genes in Sogatella furcifera. Additional file 12: Table S12. Isotype and anticodon count distribution of tRNA. Additional file 13: Table S13. Gene numbers in each expression pattern. Additional file 14: Figure S1. Estimation of Sogatella furcifera genome size based on flow cytometry and 17-mer statistics. (A) Estimation of Sogatella furcifera genome size based on flow cytometry. Comparing with the Drosophila melanogaster (D. mel) genome, the C value of S furcifera (S. fur) genome was estimated to be 0.75 pg. The genome size of S. furcifera was estimated to be 733 Mb. (B) Estimation of S. furcifera genome size based on 17-mer statistics. In total 56.76 Gb of short (<1 kb) paired-end genome sequencing reads were used to generate the 17-mer sequences. The genome size of S. furcifera was estimated to be 730 Mb based on the formula: 17-mer number/17-mer depth. Additional file 15: Figure S2. Gene Ontology enrichment analysis for genes in expression pattern 1. Additional file 16: Figure S3. Gene Ontology enrichment analysis for genes in expression pattern 2. Additional file 17: Figure S4. Gene Ontology enrichment analysis for genes in expression pattern 3. The top 20 enriched terms are shown. Additional file 18: Figure S5. Gene Ontology enrichment analysis for genes in expression pattern 4. Additional file 19: Figure S6. Gene Ontology enrichment analysis for genes in expression pattern 5. Additional file 20: Figure S7. Gene Ontology enrichment analysis for genes in expression pattern 6. Additional file 21: Figure S8. Gene Ontology enrichment analysis for genes in expression pattern 7. Additional file 22: Figure S9. Gene Ontology enrichment analysis for genes in expression pattern 8. The top 20 enriched terms are shown.

Abbreviations

BUSCO: benchmarking universal single-copy orthologs; CDS: coding DNA sequence; CEG: core eukaryotic gene; CEGMA: core eukaryotic gene mapping approach; DEG: differentially expressed gene; EST: expressed sequence tag; EVM: EVidenceModeler; FPKM: fragments per kilobase of exon per million fragments; KEGG: Kyoto Encyclopedia of Genes and Genomes; GO: Gene Ontology; LINE: long interspersed nuclear element; LTR: long terminal repeat; miRNA: microRNA; ncRNA: non-coding RNA; PI: propidium iodide; snRNA: small nuclear RNA; TE: transposable element; WGS: whole genome shotgun; SINE: short interspersed nuclear element; SRBSDV: Southern rice black-streaked dwarf virus; SVM: support vector machine.

Competing interests

The authors declare they have no competing interests.

Funding

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant XDB11040400), the Ministry of Science and Technology of China (Grant 2014CB138405), and the National Natural Science Foundation of China (Grants 31571305, 91231110, and 31272011). The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Authors’ contributions

QW conceived the study and designed the experiments. NT, LZ, GZ, DG, LW and WL conducted the sample preparation, DNA/RNA isolation for sequencing and library construction. LW and QW performed the genome assembly, annotation, evaluation, and comparative genomics analysis and evolution studies. LW, XG, ZC, ZZ and IA conducted the transcriptome assembly and gene differential expression analysis. LW, HY and QW drafted and revised the manuscript and supplementary information. All authors read and approved the final manuscript.
  42 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Variation of the genome size estimate with environmental conditions in Drosophila melanogaster.

Authors:  Christiane Nardon; Michèle Weiss; Cristina Vieira; Christian Biémont
Journal:  Cytometry A       Date:  2003-09       Impact factor: 4.355

4.  Open source clustering software.

Authors:  M J L de Hoon; S Imoto; J Nolan; S Miyano
Journal:  Bioinformatics       Date:  2004-02-10       Impact factor: 6.937

Review 5.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

6.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

7.  Proteomic and transcriptomic analyses of fecundity in the brown planthopper Nilaparvata lugens (Stål).

Authors:  Yifan Zhai; Jianqing Zhang; Zhongxiang Sun; Xiaolin Dong; Yuan He; Kui Kang; Zhichao Liu; Wenqing Zhang
Journal:  J Proteome Res       Date:  2013-10-10       Impact factor: 4.466

8.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

9.  Intrapopulation genome size variation in D. melanogaster reflects life history variation and plasticity.

Authors:  Lisa L Ellis; Wen Huang; Andrew M Quinn; Astha Ahuja; Ben Alfrejd; Francisco E Gomez; Carl E Hjelmen; Kristi L Moore; Trudy F C Mackay; J Spencer Johnston; Aaron M Tarone
Journal:  PLoS Genet       Date:  2014-07-24       Impact factor: 5.917

10.  Ensembl Genomes 2016: more genomes, more complexity.

Authors:  Paul Julian Kersey; James E Allen; Irina Armean; Sanjay Boddu; Bruce J Bolt; Denise Carvalho-Silva; Mikkel Christensen; Paul Davis; Lee J Falin; Christoph Grabmueller; Jay Humphrey; Arnaud Kerhornou; Julia Khobova; Naveen K Aranganathan; Nicholas Langridge; Ernesto Lowy; Mark D McDowall; Uma Maheswari; Michael Nuhn; Chuang Kee Ong; Bert Overduin; Michael Paulini; Helder Pedro; Emily Perry; Giulietta Spudich; Electra Tapanari; Brandon Walts; Gareth Williams; Marcela Tello-Ruiz; Joshua Stein; Sharon Wei; Doreen Ware; Daniel M Bolser; Kevin L Howe; Eugene Kulesha; Daniel Lawson; Gareth Maslen; Daniel M Staines
Journal:  Nucleic Acids Res       Date:  2015-11-17       Impact factor: 16.971

View more
  22 in total

1.  Genome sequence of the Chinese white wax scale insect Ericerus pela: the first draft genome for the Coccidae family of scale insects.

Authors:  Pu Yang; Shuhui Yu; Junjun Hao; Wei Liu; Zunling Zhao; Zengrong Zhu; Tao Sun; Xueqing Wang; Qisheng Song
Journal:  Gigascience       Date:  2019-09-01       Impact factor: 6.524

2.  Flotillin 2 Facilitates the Infection of a Plant Virus in the Gut of Insect Vector.

Authors:  Wei Wang; Luqin Qiao; Hong Lu; Xiaofang Chen; Xue Wang; Jinting Yu; Jiaming Zhu; Yan Xiao; Yonghuan Ma; Yao Wu; Wan Zhao; Feng Cui
Journal:  J Virol       Date:  2022-03-07       Impact factor: 5.103

3.  Genome sequence of the small brown planthopper, Laodelphax striatellus.

Authors:  Junjie Zhu; Feng Jiang; Xianhui Wang; Pengcheng Yang; Yanyuan Bao; Wan Zhao; Wei Wang; Hong Lu; Qianshuo Wang; Na Cui; Jing Li; Xiaofang Chen; Lan Luo; Jinting Yu; Le Kang; Feng Cui
Journal:  Gigascience       Date:  2017-12-01       Impact factor: 6.524

4.  Horizontal Transfer of a Retrotransposon from the Rice Planthopper to the Genome of an Insect DNA Virus.

Authors:  Qiankun Yang; Yan Zhang; Ida Bagus Andika; Zhenfeng Liao; Hideki Kondo; Yanhua Lu; Ye Cheng; Linying Li; Yuqing He; Yujuan He; Yuhua Qi; Zongtao Sun; Yuanhua Wu; Fei Yan; Jianping Chen; Junmin Li
Journal:  J Virol       Date:  2019-03-05       Impact factor: 5.103

5.  Identification of genes underlying phenotypic plasticity of wing size via insulin signaling pathway by network-based analysis in Sogatella furcifera.

Authors:  Xinlei Gao; Yating Fu; Olugbenga Emmanuel Ajayi; Dongyang Guo; Liqin Zhang; Qingfa Wu
Journal:  BMC Genomics       Date:  2019-05-21       Impact factor: 3.969

6.  A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system.

Authors:  Sarah B Kingan; Julie Urban; Christine C Lambert; Primo Baybayan; Anna K Childers; Brad Coates; Brian Scheffler; Kevin Hackett; Jonas Korlach; Scott M Geib
Journal:  Gigascience       Date:  2019-10-01       Impact factor: 6.524

7.  Proteomic analysis of watery saliva secreted by white-backed planthopper, Sogatella furcifera.

Authors:  Yu-Tong Miao; Yao Deng; Hao-Kang Jia; Yu-Di Liu; Mao-Lin Hou
Journal:  PLoS One       Date:  2018-05-04       Impact factor: 3.240

8.  Characterization of the Akirin Gene and Its Role in the NF-κB Signaling Pathway of Sogatella furcifera.

Authors:  Jing Chen; Dao-Wei Zhang; Xing Jin; Xian-Lin Xu; Bo-Ping Zeng
Journal:  Front Physiol       Date:  2018-10-08       Impact factor: 4.566

9.  Characterization and comparative analysis of microRNAs in the rice pest Sogatella furcifera.

Authors:  Zhao-Xia Chang; Ibukun A Akinyemi; Dong-Yang Guo; Qingfa Wu
Journal:  PLoS One       Date:  2018-09-24       Impact factor: 3.240

10.  De Novo Assembly and Analysis of the White-Backed Planthopper (Sogatella furcifera) Transcriptome.

Authors:  An-Wen Liang; Han Zhang; Jia Lin; Fang-Hai Wang
Journal:  J Insect Sci       Date:  2018-07-01       Impact factor: 1.857

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.