Many lacewing species (Insecta: Neuroptera) are important predators of pests with great potential in biological control. So far, there is no chromosome-level published genome available for Neuroptera. Here we report a high-quality chromosome-level reference genome for a green lacewing species Chrysopa pallens (Neuroptera: Chrysopidae), which is one of the most important insect natural enemies used in pest biocontrol. The genome was sequenced using a combination of PacBio and Hi-C technologies and assembled into seven chromosomes with a total size of 517.21 Mb, occupying 96.07% of the genome sequence. A total of 12,840 protein-coding genes were identified and approximately 206.21 Mb of repeated sequences were annotated. Phylogenetic analyses indicated that C. pallens diverged from its common ancestor with Tribolium castaneum (Coleoptera) approximately 300 million years ago. The gene families involved in digestion, detoxification, chemoreception, carbohydrate metabolism, immunity, nerves and development were significantly expanded, revealing the potential genomic basis for the polyphagia of C. pallens and its role as an excellent biocontrol agent. This high-quality genome of C. pallens will provide an important genomic resource for future population genetics, evolutionary and phylogenetic investigations of Chrysopidae as well as comparative genomic studies of Neuropterida and other insects.
Many lacewing species (Insecta: Neuroptera) are important predators of pests with great potential in biological control. So far, there is no chromosome-level published genome available for Neuroptera. Here we report a high-quality chromosome-level reference genome for a green lacewing species Chrysopa pallens (Neuroptera: Chrysopidae), which is one of the most important insect natural enemies used in pest biocontrol. The genome was sequenced using a combination of PacBio and Hi-C technologies and assembled into seven chromosomes with a total size of 517.21 Mb, occupying 96.07% of the genome sequence. A total of 12,840 protein-coding genes were identified and approximately 206.21 Mb of repeated sequences were annotated. Phylogenetic analyses indicated that C. pallens diverged from its common ancestor with Tribolium castaneum (Coleoptera) approximately 300 million years ago. The gene families involved in digestion, detoxification, chemoreception, carbohydrate metabolism, immunity, nerves and development were significantly expanded, revealing the potential genomic basis for the polyphagia of C. pallens and its role as an excellent biocontrol agent. This high-quality genome of C. pallens will provide an important genomic resource for future population genetics, evolutionary and phylogenetic investigations of Chrysopidae as well as comparative genomic studies of Neuropterida and other insects.
With the development of new sequencing technologies, the genomic information of more and more species has been obtained and analysed. Deep sequencing data can provide extensive information about genomes and gene expression profiles. Since the completion of the Human Genome Project, obtaining high‐quality reference genome sequence maps has become the basis for functional gene studies of various species (Venter et al., 2001). Especially, high‐quality genomes of many insects have helped us to understand the molecular mechanisms of their ecological and evolutionary characteristics. For example, the analyses of the digestion and detoxification genes based on the genome of Hypothenemus hampei revealed the mechanism of food seeking and pathogenic bacteria resistance, which can help develop insecticides, biological control agents and traps (Vega et al., 2015). The relationships between the expansion of the gene families of Spodoptera litura (gustatory receptor, cytochrome P450, carboxy‐lesterase as well as glutathione‐S‐transferase) and the resistance to various insecticides were analysed based on the high‐quality genome data, providing new insights into the evolutionary mechanism, host plant selection and ecological adaptability of Lepidoptera (Cheng et al., 2017). The high‐quality draft genome of Propylea japonica provides invaluable resource for understanding the molecular mechanisms of stress resistance in Coleoptera (Zhang et al., 2020). Two gene families associated with environmental adaptation (odorant receptor and cytochrome P450) were analysed, and a putative biosynthesis pathway of the defence alkaloid harmonine was successfully constructed based on the chromosome‐level genome assembly of Harmonia axyridis (Chen et al., 2021).The fauna of extant Neuropterida comprises nearly 6600 species of 20 families from three orders (Neuroptera, Megaloptera and Raphidioptera) (Yang et al., 2018), which is the key group connecting Holometabola and Hemimetabola. Many species of this group are important predators of pests with great potential in biological pest management such as green lacewings, brown lacewings, dusty lacewings and so on. Chrysopa pallens (Rambur, 1838) (Neuropterida: Neuroptera: Chrysopidae) is one of the most important predatory species widely used in agricultural and forestry ecosystems (Figure 1). It is the dominant species widely distributed in China except for the Tibet Plateau (Yang et al., 2018) and also distributed in Japan, Korea and Europe (Yang et al., 2005). Both larvae and adults of this species are predacious, feeding on aphids, coccids, thrips, planthoppers, whiteflies and some lepidopteran insects, as well as mites in agroforestry ecosystems (Bai et al., 2005; Winterton & de Freitas, 2006). In addition, laboratory experiments showed that C. pallens demonstrated an effective control capacity on the eggs and larvae of S. frugiperda (Xu et al., 2019). The high pest controlling potentials of C. pallens were underlain by its wide diet, high predation, strong adaptability and high egg production (about 2000 eggs per female) (Miller et al., 2004; Tauber et al., 2000). However, the molecular mechanisms of why C. pallens serve as such an excellent biocontrol agent remain unclear.
Ecological photos of Chrysopa pallens. (a) Egg, (b) larva, (c) pupa, (d) adultScholars from all over the world had carried out researches on the olfactory tropism, artificial feeding (Ali et al., 2018), mechanisms of resistance to chemical pesticides (Liu & Zeng, 2014) and pest‐control abilities of C. pallens (Xu et al., 2019), laying foundations for further exploring their potential application in biological control. C. pallens had evolved a sophisticated olfactory system to accurately sense a large amount of chemical information in order to seek prey, locate host plants, find mates and spot dangers (Li et al., 2019). Previous studies have shown that using chemicals to guide C. pallens predation on pests could effectively improve their abilities to search for prey (Cui et al., 2015; Koczor et al., 2010). It is necessary to study the potential mechanism of their olfactory system for effective biological control (Hesler, 2016). A normalized transcriptome of C. pallens was sequenced by Li et al. (2013), and a large number of candidate chemosensory genes were identified. A full‐length cDNA library from the antenna of C. pallens was constructed by Wang et al. (2014), and the functions of OBPs (CpalOBPs), the clone of olfaction‐related genes of antennae as well as the mechanism of information recognition of C. pallens were studied. All the results mentioned above provided a basis for exploring the mechanisms of chemoreception in C. pallens. However, the olfactory mechanism of C. pallens at the genetic level is still not clarified. The absence of genomic information about C. pallens has limited further study into the molecular mechanisms of its ecological and evolutionary characteristics.To date, there are more than 600 insect genomes sequenced and available at GenBank (accession date, 15 April 2021). However, no chromosome‐level genome has been published for any species of Neuroptera so far except one of Chrysoperla carnea (GCA_905475395.1) uploaded to NCBI during the preparation of this manuscript, which remain unpublished. Therefore, we generated a high‐quality genome assembly of a green lacewing species C. pallens, which would provide invaluable information for further study of C. pallens and promote large‐scale breeding and commercial utilization of C. pallens. This study will also facilitate the development of modern pest control strategies. Besides, it will also serve as a reference for further comparative genomic studies of Neuropterida and even Insecta.
MATERIALS AND METHODS
Sampling and genome sequencing
The specimens used in the present study were reared at Langfang station of the Chinese Academy of Agricultural Sciences and inbred under laboratory conditions. All samples were selected from newly‐emerged unfed adults to avoid contamination. There were 50 female and 10 male adults used for genome sequencing. The whole bodies of samples were used for the construction of each library. High‐quality genomic DNA was extracted with the Qiagen DNeasy Blood & Tissue kit. DNA was quantified by 0.75% agarose gel electrophoresis, Nanodrop spectrophotometry (Thermo Fisher) and Qubit 3.0 fluorometry (Invitrogen). RNA was extracted with the TRIzol Reagent kit.A Single Molecule Real‐Time DNA library was prepared for sequencing using SMRTbell Template Prep Kit 2.0 with an insert size of 40 kb. The genome of C. pallens was sequenced with PacBio Sequel II System using 25 male adults. Illumina paired‐end library (with 350 bp insert size) was prepared with Truseq DNA PCR‐free kit using five male adults. Illumina paired‐end reads were used for error checking and as references to Hi‐C. The RNA library was constructed with Illumina TruSeq RNA v2 Kit according to the manufacturer's instructions using 10 female and 10 male adults. Hi‐C library was constructed using 10 male adults. The construction of a Hi‐C library included cross‐linking of formaldehyde, restriction enzyme digestion, ends repair of fragments, DNA cyclization, DNA purification and other steps with MboI as the restriction enzyme.
Genome size estimation and assembly
Quality control of the second‐generation Illumina data was carried out by bbtools v38.67 (qtrim=rl trimq =20 minlen =15 ecco =t maxns =5 trimpolya =10 trimpolyg =10 trimpolyc =10) (Khan et al., 2012). Genome survey was conducted based on K‐mer distribution analyses using genomescope v2.0 (Ranallo‐Benavidez et al., 2020) to predict the genome size, heterozygosity and proportion of repeat content to select appropriate assembly tools and adjust corresponding parameter (‐k 21 ‐p 2 ‐m 10,000) (Kirkness et al., 2003). The k‐mer frequency was evaluated using bbtools v38.67 (Khan et al., 2012) and the length was set to 21 k‐mer.For genomic contig assembly, the long PacBio reads were corrected by nextdenovo v2.3.0 (seed_cutoff =20,000) (https://github.com/Nextomics/NextDenovo). The corrected PacBio sequences were assembled using raven v1.2.2 (‐‐weaken ‐p 0) (Vaser & Šikić, 2021). After the preliminary assembly of the genome, nextpolish v1.3.0 (Hu et al., 2019) was used to perform one round of long‐read polishing and two rounds of short‐read polishing to get the corrected genome sequence, which further improves the assembly accuracy. In this process, minimap2 v2.17 (Li, 2018) with default parameters were used to align the second‐generation/third‐generation data to the third‐generation assembled genome. samtools v1.9 (samtools view) (Li et al., 2009) was used for format conversion (SAM to BAM).3D‐DNA v180922 (Dudchenko et al., 2017; Nene et al., 2007) process was used to conduct chromosome anchoring based on the Hi‐C sequences. The adapter sequences of raw reads were trimmed and low‐quality PE reads were removed using juicer v1.6.2 (Durand et al., 2016) to obtain clean data. Then, the clean reads were mapped to the draft genome using 3D‐DNA v180922 (Dudchenko et al., 2017; Nene et al., 2007). Contigs in the same chromosome were separated by 100 Ns. A heat map was constructed based on the interaction signals revealed by the effective mapping read pairs. The sequencing depth of each locus and the average depth of chromosomes were calculated using samtools (Li et al., 2009). Illumina paired‐end reads of male adults were sequenced separately and then were aligned to the C. pallens assembled genome to identify the sex chromosomes (X and Y) in C. pallens. The average sequencing depth and length were based to identify chromosome X and Y.Potential contamination sequences were searched and removed from the nucleotide sequence database (NT) (Stoesser et al., 2001) and UniVec (https://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/) using blast+ (i.e., Blastn) v2.9.1 (Camacho et al., 2009) and uploaded to GenBank for reconfirmation. The completeness of the final draft genome was assessed using busco v3.0.1 (Simão et al., 2015) with insecta_odb10 reference, predicting the integrity of the genome with the percentage of single‐copy orthologues founded in the insect orthologues library.
Genome annotation
The repeat sequences were identified by both homology‐based and de novo prediction methods. Firstly, the LTR search (‐LTRStruct) was processed under repeatmodeler v2.0.1 (https://github.com/Dfam‐consortium/RepeatModeler) and the de novo repeat library was built based on the specific structure of repeat sequence and the principle of de novo repeat library. A custom library was then formed by combining Dfam_3.3 (http://www.dfam.org/) (Storer et al., 2021) and RepBase‐20181026 databases (https://www.girinst.org/repbase/) under the default parameters (Jurka et al., 2005). Finally, the repeat sequences were searched by repeatmasker v4.0.9 with the default commands (http://repeatmasker.org/cgi‐bin/WEBRepeatMasker) according to the custom library.The protein coding genes were annotated by integrating the evidences of ab initio, transcriptome‐based prediction and homology‐based annotations. Firstly, the protein coding gene structures were predicted using maker v3.01.03 with the default commands (Cantarel et al., 2008). Different pieces of evidences were weighted using EVidenceModeler (EVM) (Haas et al., 2008). augustus v3.3.4 (http://augustus.gobics.de/) (Stanke et al., 2006) and GeneMark‐ES/ET/EP 4.59_lic (Lomsadze et al., 2014) were automatically trained by BRAKER (Tomas et al., 2021) to improve the prediction accuracy combined with transcriptome and protein homology. Then, the transcriptome information was used to align to the genome by hisat2 v2.2.0 (Kim et al., 2015) to generate BAM files. The arthropod protein sequences were extracted from the orthodb10 v1 data (Kriventseva et al., 2018). Transcriptome assembly with reference genome was performed using stringtie v2.1.4 (Kovaka et al., 2019). For the homology‐based approach, proteins of related species were downloaded from GenBank, such as Drosophila melanogaster (Diptera), Tribolium castaneum (Coleoptera), Apis mellifera (Hymenoptera), Bombyx mori (Lepidoptera) and Rhopalosiphum maidis (Hemiptera) (Table 1). Finally, the preparation files obtained from the above three approaches were imported into maker v3.01.03 (Cantarel et al., 2008) for integrated annotations. EVM weights for the three types of evidences were set as: transcriptome 8, protein homologue 2, and ab initio 1.
TABLE 1
Information of species used in this study
Order
Species
NCBI accession
References
Ephemeropterp
Cloeon dipterum
GCA_902829235.1
Almudi et al. (2020)
Ispptera
Coptotermes formosanus
GCA_013340265.1
Itakura et al. (2020)
Hemiptera
Rhopalosiphum maidis
GCA_003676215.3
Chen et al. (2019)
Thysanoptera
Thrips palmi
GCA_012932325.1
Guo et al. (2020)
Coleptera
Tribolium castaneum
GCA_000002335.3
Richards et al. (2008)
Diptera
Drosophila melanogaster
GCA_000001215.4
Adams et al. (2000)
Hymenoptera
Apis mellifera
GCA_003254395.2
Solignac et al. (2007)
Trichoptera
Stenopsyche tienmushanensis
GCA_008973525.1
Luo et al. (2018)
Lepidoptera
Bombyx mori
GCA_014905235.2
International Silkworm Genome Consortium (2008)
Siphonaptera
Ctenocephalides felis
GCA_003426905.1
Driscoll et al. (2020)
Neuroptera
Chrysopa pallens
this study
this study
Information of species used in this studyTwo strategies were used for the annotation of gene functions. In the first strategy, the gene functions were predicted by aligning with existing databases, that is, uniprotkb (Swissprot +Trembl) (https://www.uniprot.org/) (Morgat et al., 2019) and the nonredundant protein sequence database (NR) using the sensitive mode of diamond v0.9.24 (‐‐more ‐sensitive ‐e 1e‐5) (Buchfink et al., 2015). In the second strategy, the gene functions were predicted by aligning with a comprehensive database, that is, gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome. Five databases including protein families (Pfam) (El‐Gebali et al., 2019), Smart (http://smart.embl‐heidelberg.de/) (Letunic et al., 2021), gene3d v21.0 (http://gene3d.biochem.ucl.ac.uk/) (Lewis et al., 2018), Superfamily (Wilson et al., 2009) and CDD (Marchler‐Bauer et al., 2017) were searched by interproscan 5.41–78.0 (Quevillon et al., 2005). At the same time, eggnog v5.0 (http://eggnog.embl.de) database was searched with eggnog‐mapper v2.0.1 (Huerta‐Cepas et al., 2017). The results of these two strategies were combined to get the final prediction of gene functions.Non‐coding RNAs including transfer RNAs (tRNAs), microRNAs (miRNAs), ribosome RNAs (rRNAs) and small nuclear RNAs (snRNAs) were also identified. The rRNAs, snRNAs and miRNAs were detected from the rfam database (release 13.0) (Kalvari et al., 2018) using infernal v1.1.3 (Nawrocki & Eddy, 2013). The tRNAs were predicted using trnascan‐se v2.0 with “EukHighConfidenceFilter” (Lowe & Eddy, 1997). The rRNAs and subunits were predicted using rnammer v1.2 (Lagesen et al., 2007).
Orthology, synteny and phylogenetic reconstruction
Orthologues from 11 insect species were clustered using the orthofinder v2.5.2 (Emms & Kelly, 2015) pipelines with default parameters (Table 1). According to the results of gene family clustering obtained from orthofinder v2.5.2 (Emms & Kelly, 2015), single‐copy genes were used for phylogenetic reconstruction. Firstly, mafft v7.394 (Katoh & Standley, 2013) was used to align the homologous region of the sequences with the L‐INS‐I strategy. The unreliable homologous alignments were filtered using bmge v1.12 (‐m BLOSUM90 ‐h 0.4) (Criscuolo & Gribaldo, 2010). Alignment sequences were then combined into a supermatrix using fasconcat‐g v1.04 (Kück & Longo, 2014). Phylogenetic reconstruction was performed with IQ‐TREE v2.0‐rc1 (‐m MFP ‐‐mset LG ‐‐msub nuclear ‐‐rclusterf 10) (Nguyen et al., 2015) based on the supermatrix and support values for nodes were assessed by Ultrafast Boostrap (Minh et al., 2013) and sh‐alrt algorithm (‐B 1000 ‐‐alrt 1000) (Guindon et al., 2010). Calibration points applied to estimate the divergence time of C. pallens were set based on the fossil records in the pbdb database (https://paleobiodb.org/navigator/) as well as from Misof et al. (2014) and Nel et al. (2013) (Table 2).
TABLE 2
Information of calibration times used in this study
Placement
Minimum age (Ma)
Maximum age (Ma)
Diptera
242
252
Pterygota
–
443
Paraneoptera
315
323
Holometabola
315
383
Coleoptera +Neuroptera
307
323
Trichoptera +Lepidoptera+ Diptera +Siphonaptera
311
323
Information of calibration times used in this studyTribolium castaneum was used as the reference to obtain relevant information to study the variation of interspecific chromosomes due to the lack of published high quality genomes of Neuropterida. There are only one chromosome‐level genome of Chrysoperla carnea (GCA_905475395.1) uploaded to ncbi during the preparation of this manuscript, which remain unpublished and another scaffold‐level genome of Neoneuromus ignobilis (GCA_014529405.1). The protein sequences of C. pallens and T. castaneum were compared with “blastp‐like” using mmseq2 v11‐E1A1C with default parameters (‐s 7.5 ‐‐alignment‐mode 3 ‐‐num‐iterations 4 ‐e 1e‐5 ‐‐max‐accept 5) (Steinegger & Söding, 2017). In addition, the all.blast result file and the integrated annotation file (all.gff file) were used as input files to obtain the results of collinearity analyses by MCScanX (Wang et al., 2012). Finally, the diagram was drawn by tbtools v1.0692 (Chen et al., 2020).
Gene family expansion and contraction analyses
Expansions and contractions of orthologous gene families were determined using computational analysis of family evolution (cafÉ) v4.2.1 (Han et al., 2013) with default parameters (p = .01) according to the gene families and phylogenetic tree. The random genetic birth and death rate was simulated to predict the evolution of gene families in different species at various branches of evolution to assess the expansion and contraction of gene families at each node on the phylogenetic tree. The significantly expanded gene families were classified through r package clusterprofiler v3.14.3 (Yu, Wang, et al., 2012) according to the gene annotation database. Then, the redundancy was removed to get final results of the gene enrichment.The selection of rapidly expanded families were analysed using paml 4 (Yang, 2007) based on the CDS with site models to further understand the evolution of the expansion of gene families. The phylogenetic tree used was reconstructed with iq‐tree v2.0‐rc1 (Nguyen et al., 2015) with ambiguity data filtered before the formal calculation. The specific model used were M0 (one rate), M1a (neutral) – M2a (selection), M7 (beta) – M8 (beta & w) (i.e., “NSsites=0 1 2 7 8”). The results of the M1a–M2a and M7–M8 pairs of models were compared using the likelihood ratio test (LRT) (p = .05). The gene family was identified as being positively selected only if M2a and M8 confirmed that the family had positive selection sites at the same time.
RESULTS AND DISCUSSION
Genome sequencing and assembly
About 78.59 Gb clean reads were generated by Illumina paired‐ended short‐read sequencing and 113.72 Gb filtered subreads were obtained by PacBio long‐read sequencing (Table 3). The average length and N50 of PacBio long reads were 32.86 kb and 18.31 kb, respectively. Hi‐C fragment libraries sequencing resulted in about 75.48 Gb (Table 3).
TABLE 3
Sequencing data used for the Chrysopa pallens genome assembly
The type of libraries
Number of reads
Raw data (bp)
Read length (bp)
N50 contig length (bp)
Survey
523,921,116
78,588,167,400
150
–
RNA
112,484,672
16,872,700,800
150
–
PacBio
6,210,590
113,715,446,179
18,309.93
32,855
Hi‐C
251,600,926
75,480,277,800
150
–
Sequencing data used for the Chrysopa pallens genome assemblyThe genome of C. pallens was predicted to be about 526.01–529.11 Mb indicated by the distribution frequency of 21‐mers (Table 4). The kmer analysis shows that the genome of C. pallens has high heterozygosity ranged from (1.48%–1.54%) (Table 4). The genome size was reduced from 555.40 to 539.70 Mb after the removal of heterozygous sequences (Table S1). These characteristics implied that the C. pallens genome harboured a high degree of complexity. The mitochondrial genome of C. pallens was assembled and removed by aligning with the genome. An assembled genome of 538.35 Mb slightly larger than predicted was finally obtained including 1639 contigs with contig N50 of 1.32 Mb long and the longest contig of 12.21 Mb. The average GC content was 27.08% (Table 5). The assembly was further improved with Hi‐C data, resulting in 552 scaffolds with a scaffold N50 of 89.78 Mb long and the longest scaffold of 141.00 Mb (Table 5). The results of each assembly step are shown in Table S1. Our genome assembly is highly complete with 98.10% BUSCO completeness, whereas only 0.30% and 1.60% BUSCO genes are fragmented and missing, respectively. This assembly includes seven pseudochromosomes which is consistent with the chromosome number identified by karyotype analyses (Figure 2a). In total, the length of pseudochromosomes was 517.21 Mb and contained 96.07% of the genome sequence. The pseudochromosomes ranged from 10.14 Mb to 141.00 Mb in length. The average sequencing depth of X chromosome (73.16x) was significantly lower than other chromosomes (deeper than 100x) and the length of Y chromosome was extremely short (10.14 Mb) (Table 6). The GC content of Y chromosome was the lowest (Figure 2b). The final genome sequences were submitted to GenBank under accession number JAEMTY000000000.
TABLE 4
Statistical results from the genome survey
Property
Minimum
Maximum
Homozygous (aa)
98.46%
98.52%
Heterozygous (ab)
1.48%
1.54%
Genome haploid length (bp)
526,010,433
529,108,099
Genome repeat length (bp)
87,197,893
87,711,400
Genome unique length (bp)
438,812,540
441,396,699
Model fit
79.74%
92.05%
Read error rate
0.59%
0.59%
TABLE 5
Genome assembly and annotation statistics of Chrysopa pallens
Statistic
Value
Genome assembly
Assembly size (Mb)
538.35
Number of scaffolds/contigs
552/1639
Longest scaffold/contig (Mb)
141.00/12.21
N50 scaffold/contig length (Mb)
89.78/1.32
GC (%)
27.08
Gaps (%)
0.02
BUSCO completeness (%)
98.10
Gene annotation
Protein‐coding genes
12,840
Mean protein length (aa)
540.79
Mean gene length (bp)
7555.93
Exons/introns per gene
6.03/4.75
Exon (%)
5.89
Mean exon length
409.18
Intron (%)
12.13
Mean intron length
1070.23
BUSCO completeness (%)
97.50
FIGURE 2
Heatmap of genome‐wide Hi‐C data and overview of the genomic landscape of Chrysopa pallens. (a) The heatmap shows all‐by‐all interactions among seven chromosomes of C. pallens. Resolution: 500 kb. There were strong intrachromosomal interactions (blocks on the diagonal line), while interchromosomal interactions were weaker. The frequencies of Hi‐C interaction links are represented by the colour, which ranges from white (low) to red (high). (b) Blocks on the outmost circle represent all seven chromosomes of C. pallens. Peak plots from outer to inner circles represent the length of each chromosome, the GC content of each chromosome, protein coding genes, the density of repeat sequences (SINE, short interspersed elements; LINE, long interspersed elements; LTR, long terminal repeat elements; simple repeats), respectively
TABLE 6
Chromosome length and average sequencing depth
Chromosome
Length (bp)
Average sequencing depth (x)
Chr1
140,998,244
105.33
Chr2
115,703,577
106.76
Chr3
89,781,665
107.80
Chr4
73,683,371
108.56
Chr5
61,463,935
107.43
ChrX
25,444,600
73.16
ChrY
10,136,392
113.63
Statistical results from the genome surveyGenome assembly and annotation statistics of Chrysopa pallensHeatmap of genome‐wide Hi‐C data and overview of the genomic landscape of Chrysopa pallens. (a) The heatmap shows all‐by‐all interactions among seven chromosomes of C. pallens. Resolution: 500 kb. There were strong intrachromosomal interactions (blocks on the diagonal line), while interchromosomal interactions were weaker. The frequencies of Hi‐C interaction links are represented by the colour, which ranges from white (low) to red (high). (b) Blocks on the outmost circle represent all seven chromosomes of C. pallens. Peak plots from outer to inner circles represent the length of each chromosome, the GC content of each chromosome, protein coding genes, the density of repeat sequences (SINE, short interspersed elements; LINE, long interspersed elements; LTR, long terminal repeat elements; simple repeats), respectivelyChromosome length and average sequencing depth
Gene annotation
A total of 206.21 Mb of repeat sequences were identified constituting 38.31% of the genome of C. pallens. The three most abundant classes of repeat sequences included unclassified (24.65%), DNA elements (3.82%) and simple repeats (2.13%), respectively (Table S2). The high proportion of unclassified elements may be due to the lack of studies on the repeat sequences of Neuropterida.In total, 12,840 protein‐coding genes were predicted in the genome of C. pallens (Table 5). The average lengths of gene, CDS, exon, and intron regions were 7555.93 bp, 1625.29 bp, 409.18 bp, and 1070.23 bp, respectively. The average exon number per gene was 6.03. The completeness of protein‐coding gene predictions was further evaluated by BUSCO (Simão et al., 2015) (insecta_odb10) to be 97.50%, including 93.90% single‐copy and 3.60% duplicated BUSCO genes (Table 5). 11,583 annotated genes (90.21%) matched one or more records in the UniProtKB database, while 10,879 genes (84.73%) encode proteins with at least one known domain in the InterPro database (Quevillon et al., 2005). Additional functional annotations identified 9503 GO terms, 7569 KO terms, 2662 enzyme codes, 8503 KEGG and 9580 reactome pathways, and 10,759 COG categories.There were 2361 noncoding RNA sequences annotated totally, including 1271 tRNAs, 607 ribosomal RNAs (rRNAs), 182 small nucleolar RNAs (snRNAs), 52 microRNAs (miRNAs), 15 small RNA (sRNAs) and others (Table S3). SnRNAs included 142 spliceosomal RNAs, four minor spliceosomal RNAs, one (small Cajal body‐specific RNA) scaRNA (SCARNA8) and others (Table S3). The numbers of rRNAs and tRNAs are significantly higher than that of other insects, which might indicate vigorous protein synthesis of C. pallens.
Orthology, synteny and phylogenetic relationships
Comparative genomic analyses were performed between C. pallens and 10 other insects representing major insect orders (i.e., Ephemeroptera, Isoptera, Hemiptera, Thysanoptera, Hymenoptera, Siphonaptera, Diptera, Trichoptera, Lepidoptera, Coleoptera and Neuroptera). In total, 138,491 genes were clustered into 13,857 gene families using OrthoFinder (Table S4). Among them, 967 were single‐copy and 3053 were multicopy in all 11 insect genomes (Figure 3a, Table S4). In the genome of C. pallens, 11,444 of the 12,840 genes were assigned to 8357 families that are also present in other insects, whereas the remaining 936 genes (223 gene families) were specific for this species (Table S5).
FIGURE 3
Phylogenetic tree, gene orthology, and synteny blocks. The phylogenetic tree was constructed based on 879 single‐copy gene families with 11 insects using maximum likelihood methods by IQ‐TREE (shown in Table 1). The length of branch indicated the divergence time. Bootstrap values (UFB/SH‐aLRT) were 100 at all nodes. Bars are subdivided to represent different types of orthologues with different colours. 1:1:1 (single copy orthologous genes in common gene families); N:N:N (mutiple copy orthologous genes in common gene families); Specific (genes from unique gene family from each species); Other (genes that do not belong to any above‐mentioned orthologue categories); Unassigned (orthologues which cannot be assigned into any orthogroups) (b) Synteny blocks between Chrysopa pallens and Tribolium cataneum
Phylogenetic tree, gene orthology, and synteny blocks. The phylogenetic tree was constructed based on 879 single‐copy gene families with 11 insects using maximum likelihood methods by IQ‐TREE (shown in Table 1). The length of branch indicated the divergence time. Bootstrap values (UFB/SH‐aLRT) were 100 at all nodes. Bars are subdivided to represent different types of orthologues with different colours. 1:1:1 (single copy orthologous genes in common gene families); N:N:N (mutiple copy orthologous genes in common gene families); Specific (genes from unique gene family from each species); Other (genes that do not belong to any above‐mentioned orthologue categories); Unassigned (orthologues which cannot be assigned into any orthogroups) (b) Synteny blocks between Chrysopa pallens and Tribolium cataneumAmong the 967 single‐copy genes, 879 genes (321,918 amino acid sites) passed the symmetry test in IQ‐Tree (Nguyen et al., 2015) and were used for phylogenetic reconstruction. All nodes in the estimated ML tree were fully supported (100/100 for UFB/SH‐aLRT). The phylogenetic tree supported Neuropterida being the sister group to Coleoptera which was consistent with former studies based on different molecular markers, such as 18S rRNA (Kjer, 2004), Vitellogenin sequences (Liu et al., 2015), transcriptome (Misof et al., 2014; Wang et al., 2019) and mitochondrial genomes (Song et al., 2016) (Figure 3a). Neuropterida and Coleoptera separated from each other about 300 million years ago (Figure 3a).Comparison of the chromosome‐level genome assemblies of C. pallens and T. cataneum revealed poor collinearity in general, which may be due to the relatively long evolutionary distance between the two species. Only X chromosome had an obvious large syntenic block (Figure 3b). There is one chromosome‐level genome of Ch. carnea (GCA_905475395.1) uploaded to NCBI during the preparation of this manuscript, which remain unpublished. Further analyses will be performed in the future.
Gene family expansion and contraction
The analysis of gene family evolution showed that 912 gene families were expanded and 3704 gene families were contracted in the genome of C. pallens compared to that of T. cataneum (Coleoptera) (Figure 4). Among them, 49 orthologous groups were expanded significantly (p < .05), including genes encoding C2H2‐type zinc finger proteins, trypsin, zinc finger Y‐chromosomal protein, acyltransferase, facilitated trehalose transporter, histone‐lysine N‐methyltransferase PRDM9‐like protein, longitudinals lacking protein, ionotropic receptor, cytochrome P450, leucine rich repeat and so on (Figure S1). On the other hand, only one orthologous group was contracted significantly (p < .05).
FIGURE 4
Gene family evolutions among genomes of Chrysopa pallens and other species. Red indicates gene family expansions, green indicates gene family contractions, and blue indicates gene family rapid evolutions. The length of branch indicated the divergence time
Gene family evolutions among genomes of Chrysopa pallens and other species. Red indicates gene family expansions, green indicates gene family contractions, and blue indicates gene family rapid evolutions. The length of branch indicated the divergence timeGO analyses showed that the expanded gene families were enriched for functions related to digestion, immunity and nerves (Figure 5a), which might explain why C. pallens could serve as an excellent biocontrol agent. For instance, the successful capture of prey is attributed to the ability of C. pallens to effectively perceive information in the environment, in which the olfactory system on antennae plays a key role (Li et al., 2019). Li et al. (2015) cloned five OBP genes and analysed the tissue expression profiles of CpalOBPs in five strains of C. pallens using quantitative RT‐PCR and demonstrated that CpalOBP10 and the OBP7 of aphids have close evolutionary relationships and can both mediate the perception of green leaf volatiles and (E)‐β‐farnesene later (Li et al., 2017). However, the underlying molecular mechanisms were not clarified. Further researches on olfactory related genes in C. pallens based on the high‐quality genome will help to clarify the olfactory recognition mechanism of insects at the genetic level and to control the behavior of natural enemies.
FIGURE 5
Enrichment analyses of expanded gene families of Chrysopa pallens. (a) GO enrichment map of gene families expanded rapidly. The vertical axis represents the path name and the horizontal axis represents gene ratio. The size of the dot indicates the number of differentially expressed genes in the pathway and the colour of the point corresponds to different q‐value ranges. (b) KEGG enrichment map of gene families expanded rapidly. The vertical axis represents the path name and the horizontal axis represents gene ratio. The size of the dot indicates the number of differentially expressed genes in the pathway and the colour of the point corresponds to different q‐value ranges
Enrichment analyses of expanded gene families of Chrysopa pallens. (a) GO enrichment map of gene families expanded rapidly. The vertical axis represents the path name and the horizontal axis represents gene ratio. The size of the dot indicates the number of differentially expressed genes in the pathway and the colour of the point corresponds to different q‐value ranges. (b) KEGG enrichment map of gene families expanded rapidly. The vertical axis represents the path name and the horizontal axis represents gene ratio. The size of the dot indicates the number of differentially expressed genes in the pathway and the colour of the point corresponds to different q‐value rangesThe reproduction of C. pallens was proven to be closely related to the weight of females and the activities of trypsin‐like as well as chymotrypsine‐like enzymes (Liu, Mao, & Zeng, et al., 2015). Han et al. (2017) analysed the transcriptome of female adults of C. pallens under starvation as well as feeding conditions and found that different expressed genes (DEGs) were mainly involved in ribosome biogenesis, protein processing in endoplasmic reticulum, biosynthesis of amino acids, and carbon metabolism. They also annotated four vitellogenins, three insulin‐like peptides and two insulin receptors genes. However, these earlier studies of C. pallens genes mainly relied on the transcriptome data which carry incomplete information. The chromosome‐level genome of C. pallens will provide more information, which will contribute to the further study of the molecular mechanisms of its vitellogenin protein, nutrient metabolism and reproduction.Interestingly, the gene families involved in cold acclimation and wax synthesis were also expanded in the genome of C. pallens, which may be related to its overwintering, spinning and cocooning as well as the formation of egg stalks. KEGG pathways enriched in expanded gene families mainly included insect pheromone synthesis, detoxification metabolism, digestion and other related pathways as well as wax synthesis (cutin, suberine and wax biosynthesis) (Figure 5b). In the long evolutionary process of adaptation, insects have gradually developed adaptations to low temperatures. Many insects are able to build up their tolerance before cold winter temperatures set in to improve the survival of their populations (Bale, 1987; Danks, 2006; Lee, 1989). The cold hardiness of the prepupae of C. pallens show obvious seasonal characteristics. The prewinter and midwinter prepupae had significantly higher cold tolerance than those in summer and post‐wintering (Yu, Shi, et al., 2012). C. pallens overwintered by adopting a freeze avoidance strategy, and the diapausing prepupae increased their cold tolerance by reducing the supercooling point. As the short‐day diapause type, the prepupae of C. pallens, such as Ostrinia furnacalis (Gong et al., 1984) and Chrysoperla sinica (Guo et al., 2006), can improve the cold resistance through their own part‐time diapause. The gene families responding to cold acclimation were found to be expanded in C. pallens in this study (Figure 5a). However, the mechanism of how C. pallens goes through the winter still needs more specialized researches in the future.Chrysopa pallens is a complete metamorphosis insect as its developmental stages consist of egg, larva, prepupa, pupa and adult (Yang et al., 2005). Interestingly, each egg of Chrysopidae is attached to the top of a filamentous petiole while the eggs of other neuropteridan insects are not. The mature larvae often cocoon on the wall of the back of the leaf. There are filaments attached to the attachment outside the cocoon. The expansion of gene families involved in wax synthesis in the genome of C. pallens may be related to these biological habits mentioned above.To further understand the evolution of expanded gene families, the 49 rapidly expanded gene families were analysed for their selective pressures. Three families related to digestion were found to be positively selected, namely trypsin (OG0001146), chymotrypsin (OG0001336) and zinc finger MYM‐type protein 1 (OG0000501), which may be one of the important driving forces for the wide feeding behaviours of C. pallens.
CONCLUSIONS
In this study, a chromosome‐level genome assembly was reported for C. pallens, an important predatory biological control agent. The key functional genes expanded in C. pallens were identified, including those involved in digestion, detoxification and chemoreception, implying why green lacewings can serve as an excellent biocontrol agent. The chromosome‐level genome of C. pallens also provides insights into the genomics of Neuropterida and would serve as an invaluable reference for further comparative genomic studies of Neuropterida and even Insecta. In addition, it will support more accurate and reliable studies about the habits of C. pallens at the genomic level which may help to improve its control efficiency. The high quality genome will also provide a solid foundation to promote the development and utilization of natural enemies in biological pest control in the future.
CONFLICT OF INTEREST
The authors have declared that no competing interest exits.
AUTHOR CONTRIBUTIONS
Yuyu Wang, Xingyue Liu and Ding Yang designed the research. Ruyue Zhang, Mengqing Wang, Jing Li and Lisheng Zhang performed research. Yuyu Wang, Cheng‐Min Shi, Mengqing Wang and Ruyue Zhang analysed data., Ruyue Zhang, Fan Fan, Jing Li, Shuo Geng, Xingyue Liu and Ding Yang wrote the manuscript. All authors reviewed the manuscript.Supplementary MaterialClick here for additional data file.
Authors: Aron Marchler-Bauer; Yu Bo; Lianyi Han; Jane He; Christopher J Lanczycki; Shennan Lu; Farideh Chitsaz; Myra K Derbyshire; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Zhouxi Wang; Roxanne A Yamashita; Dachuan Zhang; Chanjuan Zheng; Lewis Y Geer; Stephen H Bryant Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971
Authors: Evgenia V Kriventseva; Dmitry Kuznetsov; Fredrik Tegenfeldt; Mosè Manni; Renata Dias; Felipe A Simão; Evgeny M Zdobnov Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971
Authors: Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971