Literature DB >> 34623414

Chromosomal-Level Genome Assembly of a True Bug, Aspongopus chinensis Dallas, 1851 (Hemiptera: Dinidoridae).

Tao Jiang1,2, Zhiyong Yin1,2, Renlian Cai1, Hengmei Yu2, Qin Lu2, Shuai Zhao1, Ying Tian1, Yufang Yan2, Jianjun Guo1,2, Xiangsheng Chen1,2.   

Abstract

The true bug, Aspongopus chinensis Dallas, 1851 (Hemiptera: Dinidoridae), is a fascinating insect with prolonged diapause and medicinal properties but also a notorious pest. However, because of the lack of genomic resources, an in-depth understanding of its biological characteristics is lacking. Here, we report the first genome assembly of A. chinensis anchored to 10 pseudochromosomes, which was achieved by combining PacBio long reads and Hi-C sequencing data. This chromosome-level genome assembly was 1.55 Gb in size with a scaffold N50 of 156 Mb. The benchmarking universal single-copy ortholog (BUSCO) analysis of the assembly captured 96.6% of the BUSCO genes. A total of 686,888,052 bp of repeat sequences, 18,511 protein-coding genes, and 1,749 noncoding RNAs were annotated. By comparing the A. chinensis genome with that of 8 homologous insects and 2 model organisms, 213 rapidly evolving gene families were identified, including 83 expanded and 130 contracted gene families. The functional enrichment of Gene Ontology and KEGG pathways showed that the significantly expanded gene families were primarily involved in metabolism, immunity, detoxification, and DNA/RNA replication associated with stress responses. The data reported here shed light on the ecological adaptation of A. chinensis and further expanded our understanding of true bug evolution in general.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Hi-C; PacBio sequence; gene family evolution; genome annotation; whole-genome sequence

Mesh:

Year:  2021        PMID: 34623414      PMCID: PMC8557641          DOI: 10.1093/gbe/evab232

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Significance

True bugs (Hemiptera: Heteroptera) are vital insect pests that affect public health and agronomy, yet, the whole-genome assemblies of only five species are available. Here, we assembled a complete draft genome of the true bug Aspongopus chinensis using the PacBio sequencing technology and used Hi-C sequences to assist chromosomal assembly. The whole-genome assembly produced in this study could be a useful genomic resource for understanding the evolutionary biology and biological characteristics of true bugs in general and of A. chinensis in particular. This assembly can be used to develop effective pest control strategies.

Introduction

Hemiptera is an insect order that is distributed across the world and contains >8,000 nonholometabolous species (Capinera 2008). Hemiptera is classified into five suborders: Sternorrhyncha (aphids, scale insects, psyllids, and whiteflies), Fulgoromorpha (planthoppers), Cicadomoirpha (leafhoppers, cicadas, and spittlebugs), Coleorrhyncha (moss bugs), and Heteroptera (true bugs) (Wang et al. 2019). True bugs are sap-sucking insects that consume food from sources ranging from animals to plants (Sparks et al. 2020). Aspongopus chinensis Dallas, 1851 (supplementary fig. S1), a true bug species mainly distributed in Eastern South Asia, is a polyphagous insect that has a prolonged diapause (Luo et al. 2012; Li et al. 2020). These two characteristics contribute to its adaptation efficiency to harsh environments. Although it is a severe threat to pumpkin and watermelon as a pest, A. chinensis has medicative and economic value; it is used in treating nephritis and gastric as well as menstrual pain (Wu et al. 2021). In the present study, we report the first chromosome-scale genome of a true bug from the Dinidoridae family, A. chinensis, which was sequenced using the Pacific Biosciences (PacBio) SMRT sequencing technology. In the genome, we annotated the repetitive sequences, protein-coding genes, and noncoding RNAs. In addition, we analyzed the gene family evolution of hemipteran clades based on the genome and annotated the function of the rapidly evolving gene families of A. chinensis. The high-quality A. chinensis genome reported in this study represents an essential genetic resource for improving researchers’ understanding of the biological characteristics of the species and for conducting Hemipteran evolutionary studies.

Results and Discussion

Genome Assembly and Completeness

In total, 140 Gb of filtered Illumina clean data were used to perform k-mer basis analysis. When k = 21, the estimated genome size of predicted genome was 1.36 Gb (heterozygous ratio = 1.71) (supplementary fig. S2). First, we obtained 134 Gb of PacBio clean data with 99.2-fold coverage and assembled these into 15,818 contigs (contig N50 = 602 kb). Then, using the Hi-C technology, the final assembly (size = 1.55 Gb; contig N50 = 1.498 Mb; scaffold N50 = 156 Mb) was produced, which was 13.97% larger than the estimated genome size (i.e., 1.36 Gb). A benchmarking universal single-copy ortholog (BUSCO) evaluation, which demonstrates genome completeness using the genome mode “-m genome,” found 96.6% orthologs based on insecta_odb10 (94.9% complete and single-copy, 1.7% duplicated, 0.6% fragmented, and 2.8% missing BUSCO genes) (supplementary tables S1–S4). Subsequently, we reconstructed Hi-C libraries to produce 107.2 Gb of high-quality clean data (Q30 ≥ 91%), 73% of which were unique paired-end reads (supplementary table S5). After de novo assembly, 15,818 contigs were anchored to 1,901 scaffolds. Finally, we obtained 10 pseudochromosomes sized 42–215 Mb (fig. 1 and supplementary fig. S3) that accounted for 91.48% of the former genome assembly (1.55 Gb).
Fig. 1.

Genomic and phylogenetic analysis of A. chinensis. (a) The genome landscape of A. chinensis. Tracks a, b, and c represent chromosome ideograms, gene density, and guanine–cytosine content (%; sliding window size: 500 kb), respectively. (b) Venn diagram of orthologous clusters showing the intersection between A. chinensis and three homologous insects. (c) Phylogeny of 10 selected insect species. The blue, yellow, and red node values represent expanded, contracted, and rapidly evolving families, respectively. Black numbers on the node show divergence time (Ma).

Genomic and phylogenetic analysis of A. chinensis. (a) The genome landscape of A. chinensis. Tracks a, b, and c represent chromosome ideograms, gene density, and guanine–cytosine content (%; sliding window size: 500 kb), respectively. (b) Venn diagram of orthologous clusters showing the intersection between A. chinensis and three homologous insects. (c) Phylogeny of 10 selected insect species. The blue, yellow, and red node values represent expanded, contracted, and rapidly evolving families, respectively. Black numbers on the node show divergence time (Ma).

Gene Annotation

We annotated 44.3% of the whole genome as repeat sequences, including DNA elements (6.73%), long interspersed nuclear elements (22.15%), long terminal repeats (0.03%), short interspersed nuclear elements (SINEs) (27.82%), and unclassified repeat sequences (0.77%). With a total length of 431,415,958 bp, SINEs were the richest type of repeat sequence. SINEs play crucial roles in genome evolution and modulation of gene expression (Kanhayuwa and Coutts 2016). We predicted 1,749 noncoding RNAs, including 418 ribosomal RNAs (rRNAs), 656 transfer RNAs (tRNAs), 17 small nuclear RNAs (snRNAs), and 40 microRNAs (miRNAs) (supplementary table S6). After aligning the predicted genes with data obtained from various protein databases (i.e., the nonredundant database, SwissProt, KEGG, TrEMBL, InterPro, and Gene Ontology), we found 18,511 predicted genes. Each gene contained an average of 2.69 exons; a total of 50,654 exons and 31,863 introns were found. The mean lengths of mRNA, exons, and introns were 5,727.22, 413.24, and 2,585.13 bp, respectively.

Gene Families and Phylogenetic Analyses

We identified 165,029 genes from 10 insect species in 18,640 orthogroups. Overall, 6,500 species-specific orthogroups (16.9% in total) were found. The gene set cluster analysis of four heteropteran species (Cimex lectularius, A. chinensis, Apolygus lucorum, and Halyomorpha halys) showed that they shared 4,122 clusters. Furthermore, A. chinensis contained 582 unique clusters (3,008 proteins), accounting for 4.94% of all clusters of the four species; these could be attributable to novel proteins and unique horizontal gene transfer (fig. 1). Based on 1,269 single-copy orthogroups, a phylogenetic tree was constructed that included divergence time and orthologous gene families (fig. 1). The phylogenetic tree and divergence time were consistent with hemipteran taxonomy and results from a previous study (Liu et al. 2021). As shown in the tree, two suborders of Hemiptera, namely Fulgoromorpha and Heteroptera, diverged from a recent common ancestor at 225–242 Ma and from Aphidomorpha approximately at 265–280 Ma. Moreover, the true bugs H. halys and A. chinensis appear to have diverged at <50 Ma, indicating that they have extraordinarily close consanguinity. A total of 1,755 expanded and 1,113 contracted genes were identified in 83 significantly expanded and 130 contracted gene families, respectively. The functional annotation of the 83 significantly expanded gene families indicated their association with metabolism, environmental adaptation, digestion, transport, and immune and detoxification systems (fig. 2). Metabolic pathways were significantly enriched in A. chinensis; these are likely associated with overwintering behavior and metamorphosis (Klowden 2013). In addition, dynamic changes in nutrients are generally used by insects to regulate the diapause period (Hahn and Denlinger 2011).
Fig. 2.

Functional annotation of significantly expanded gene families. (a) Eighty-three significantly expanded gene families were annotated in the pathways with the number of genes and categorized into five different pathway classes. (b) Top 20 function enrichment gene ontology terms. (c) KEGG pathways for significantly expanded gene families.

Functional annotation of significantly expanded gene families. (a) Eighty-three significantly expanded gene families were annotated in the pathways with the number of genes and categorized into five different pathway classes. (b) Top 20 function enrichment gene ontology terms. (c) KEGG pathways for significantly expanded gene families. Further analyses showed that the expanded gene families enriched in GO (fig. 2) and KEGG (fig. 2) primarily functioned in metamorphosis, immunity, DNA/RNA replication, detoxification, and metabolism, which may explain the mechanism of the broad phytophagy of A. chinensis as well as its adaptation to adverse environmental conditions. One specifically identified KEGG pathway, retinol metabolism, is associated with color vision and therefore essential for habitat exploration and foraging (Lebhardt and Desplan 2017). Furthermore, metabolic pathways (including carbohydrate, lipid, cofactor and vitamin, ascorbate and aldarate, and energy metabolisms), are significantly enriched in diapausic insects compared with non-diapausic A. chinensis groups (Qi et al. 2015; Chen et al. 2017; Wu et al. 2021). In the KEGG clustered pathways, genes involved in the HIF-1 signaling pathway are significantly expressed in diapause-destined pupal brains (Lin et al. 2016). Overall, these analyses provide further insights into the genomics of A. chinensis, particularly in terms of its feeding traits and environmental adaptation.

Materials and Methods

Sample Collection and Sequencing

Aspongopus chinensis samples were collected from Kaili City, Guizhou Province, China (26°39′16.65″N, 107°47′18.73″E) and subsequently reared for at least three generations under controlled conditions (25 °C, 80% relative humidity, and a 14:10-h light:dark photoperiod) at the Institute of Entomology, Guizhou University. To survey the genome size and heterozygous ratio, we constructed a 350-bp insert-size DNA library for Illumina sequencing on the HiSeq NovaSeq 6000 platform, a 20-kb insert-size library sequencing using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, CA, USA), and a 150 paired-end RNA library using TRIzol Reagent. We filtered the raw reads with short lengths (<50 bp) and low-quality values (<0.8) and then removed the adapter sequence consistencies.

Chromosomal Genome Assembly

The Illumina short-read sequencing data were used to count k-mers using Jellyfish v2.1.3 and GenomeScope v1.0 (Andrew 2010; Marçais and Kingsford 2011). We used CANU v1.8 (Koren et al. 2017) to assemble the genome with PacBio clean data and corrected errors using Arrow v1.5.5 in one round (Sparks et al. 2020). We polished the resultant assembly with Pilon v1.2 (Walker et al. 2014) and cleaned redundancy based on Hi-C sequences using trimDup (Rabbit Genome Assembler; https://github.com/gigascience/rabbit-genome-assembler, last accessed October 13, 2021). The genome assembly was aligned to remove bacterial contamination using BLAT v3.2.20. After filtering low-quality sequences (Q ≤ 15), adapter sequences, and short sequences (length ≤30 bp) using HiC-Pro v2.8.1 (Servant et al. 2015), the Hi-C data were mapped to the draft assembly genome using bowtie v2-2.2.5 (Langmead and Salzberg 2012) to obtain unique mapped (Kent 2002) paired-end reads. We then generated chromosomal scaffolds using Juicer v1.5 and 3D de novo assembly pipelines to assemble Hi-C clean data (Durand et al. 2016). The completeness of the genome assembly was evaluated using BUSCO v5.1.2 (Simão et al. 2015).

Genome Annotation

RepeatMasker v4.0.7 (Tarailo-Graovac and Chen 2009) was used to detect transposable elements against a de novo repeat library built using RepeatModeler v2.0.1. Tandem Repeats Finder (Benson 1999) was used to mark tandem repeat sequences in the A. chinensis genome. rRNAs were predicted using BLASTN (Chen et al. 2015), whereas tRNAs were annotated using tRNAscan-SE v1.3.1(Lowe and Eddy 1997). Furthermore, we used INFERNAL v1.1 (Griffiths-Jones et al. 2005) based on Rfam v14.0 to annotate miRNAs and snRNAs. The annotation of A. chinensis RNA-seq data and the protein sequences of the homologous species Oncopeltus fasciatus, Rhodnius prolixus, H. halys, Acyrthosiphon pisum, C. lectularius, and Nilaparvata lugens (supplementary table S7) were used to train a novel gene model via AUGUSTUS v3.2.3 (Stanke et al. 2006) and SNAP v2013-02-16 (Johnson et al. 2008). A nonredundant annotated gene set was obtained by integrating and filtering the results using Maker v2.31.10 (Holt and Yandell 2011). A circos plot was drawn on 10 chromosomes using CIRCOS v0.69-9 (Krzywinski et al. 2009). Gene functional annotation was performed by mapping the predicted gene set to specific protein databases (i.e., SwissProt, TrEMBL, KEGG, and InterPro) using Blast v2.2.31 (Kent 2002).

Gene Family and Phylogenetic Analyses

One dipteran species, Drosophila melanogaster, one lepidopteran species, Bombyx mori, and seven hemipteran species (Fulgoromorpha: N. lugens, A. pisum, and Bemisia tabaci; Heteroptera: R. prolixus, C. lectularius, A. lucorum, and H. halys) were selected for gene family analyses. We clustered the orthologous gene families using OrthoFinder v2.3.8 with the parameters “-S diamond -M msa -T fastree” (Emms and Kelly 2019). We then aligned the protein sequences of the 10 species with MUSCLE v3.8.1551 using its default options (Katoh and Standley 2013). Next, we concatenated the orthologous genes to a single alignment using RAxML v1.5.5 (Jarvis et al. 2014) (-m PROTGAMMAJTT -# 1000). The MCMCTree v4.9 of PAML (Yang 2007) was used to perform the divergence time analysis and calibration time was based on three nodes: D. melanogaster with A. lucorum (320–390 Ma) (Misof et al. 2014), A. pisum with A. lucorum (260–280 Ma) (International Aphid Genomics Consortium 2010; Liu et al. 2021), and H. halys with R. prolixus (144–338 Ma) (www.timetree.org/). The expansion/contraction of gene families was estimated using CAFÉ v4.2.1 (Han et al. 2013), with the single birth–death parameter lambda and a significance level of 0.01. In addition, we used OthoVenn2 (Xu et al. 2019) to cluster the gene sets of A. chinensis, C. lectularius (Rosenfeld et al. 2016), A. lucorum, and H. halys with E-values of 1e−5 and an inflation value of 1.5. The function of rapidly expanded gene family was annotated using eggNOG-mapper v2 (Cantalapiedra et al. 2021). We performed GO and KEGG enrichment analyses using the R package clusterProfiler v3.18.1 (Yu et al. 2012) and Enrichplot v1.10.2 (P-value = 0.05; q-value = 0.05).

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  39 in total

1.  PAML 4: phylogenetic analysis by maximum likelihood.

Authors:  Ziheng Yang
Journal:  Mol Biol Evol       Date:  2007-05-04       Impact factor: 16.240

2.  Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

Authors:  Mira V Han; Gregg W C Thomas; Jose Lugo-Martinez; Matthew W Hahn
Journal:  Mol Biol Evol       Date:  2013-05-24       Impact factor: 16.240

3.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

4.  HIF-1 regulates insect lifespan extension by inhibiting c-Myc-TFAM signaling and mitochondrial biogenesis.

Authors:  Xian-Wu Lin; Lin Tang; JinHua Yang; Wei-Hua Xu
Journal:  Biochim Biophys Acta       Date:  2016-07-26

5.  The biosynthetic products of Chinese insect medicine, Aspongopus chinensis.

Authors:  Xiao-Hong Luo; Xiao-Zheng Wang; Hai-Long Jiang; Jun-Li Yang; Phillip Crews; Frederick A Valeriote; Quan-Xiang Wu
Journal:  Fitoterapia       Date:  2012-03-11       Impact factor: 2.882

6.  Transcriptome sequencing reveals potential mechanisms of diapause preparation in bivoltine silkworm Bombyx mori (Lepidoptera: Bombycidae).

Authors:  Yan-Rong Chen; Tao Jiang; Juan Zhu; Yu-Chen Xie; Zhi-Cheng Tan; Yan-Hua Chen; Shun-Ming Tang; Bi-Fang Hao; Sheng-Peng Wang; Jin-Shan Huang; Xing-Jia Shen
Journal:  Comp Biochem Physiol Part D Genomics Proteomics       Date:  2017-08-02       Impact factor: 2.674

7.  Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.

Authors:  Bruce J Walker; Thomas Abeel; Terrance Shea; Margaret Priest; Amr Abouelliel; Sharadha Sakthikumar; Christina A Cuomo; Qiandong Zeng; Jennifer Wortman; Sarah K Young; Ashlee M Earl
Journal:  PLoS One       Date:  2014-11-19       Impact factor: 3.240

8.  Short Interspersed Nuclear Element (SINE) Sequences in the Genome of the Human Pathogenic Fungus Aspergillus fumigatus Af293.

Authors:  Lakkhana Kanhayuwa; Robert H A Coutts
Journal:  PLoS One       Date:  2016-10-13       Impact factor: 3.240

9.  OrthoFinder: phylogenetic orthology inference for comparative genomics.

Authors:  David M Emms; Steven Kelly
Journal:  Genome Biol       Date:  2019-11-14       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.