Literature DB >> 34549894

The first chromosome-level genome assembly of a green lacewing Chrysopa pallens and its implication for biological control.

Yuyu Wang¹, Ruyue Zhang¹, Mengqing Wang², Lisheng Zhang², Cheng-Min Shi¹, Jing Li¹, Fan Fan¹, Shuo Geng¹, Xingyue Liu³, Ding Yang³.

Abstract

Many lacewing species (Insecta: Neuroptera) are important predators of pests with great potential in biological control. So far, there is no chromosome-level published genome available for Neuroptera. Here we report a high-quality chromosome-level reference genome for a green lacewing species Chrysopa pallens (Neuroptera: Chrysopidae), which is one of the most important insect natural enemies used in pest biocontrol. The genome was sequenced using a combination of PacBio and Hi-C technologies and assembled into seven chromosomes with a total size of 517.21 Mb, occupying 96.07% of the genome sequence. A total of 12,840 protein-coding genes were identified and approximately 206.21 Mb of repeated sequences were annotated. Phylogenetic analyses indicated that C. pallens diverged from its common ancestor with Tribolium castaneum (Coleoptera) approximately 300 million years ago. The gene families involved in digestion, detoxification, chemoreception, carbohydrate metabolism, immunity, nerves and development were significantly expanded, revealing the potential genomic basis for the polyphagia of C. pallens and its role as an excellent biocontrol agent. This high-quality genome of C. pallens will provide an important genomic resource for future population genetics, evolutionary and phylogenetic investigations of Chrysopidae as well as comparative genomic studies of Neuropterida and other insects.

Entities: Chemical

Keywords: zzm321990Chrysopa pallenszzm321990; Hi-C; PacBio sequencing; chromosome-level genome

Mesh：

Year: 2021 PMID： 34549894 PMCID： PMC9292380 DOI： 10.1111/1755-0998.13503

Source DB: PubMed Journal: Mol Ecol Resour ISSN： 1755-098X Impact factor: 8.678

INTRODUCTION

With the development of new sequencing technologies, the genomic information of more and more species has been obtained and analysed. Deep sequencing data can provide extensive information about genomes and gene expression profiles. Since the completion of the Human Genome Project, obtaining high‐quality reference genome sequence maps has become the basis for functional gene studies of various species (Venter et al., 2001). Especially, high‐quality genomes of many insects have helped us to understand the molecular mechanisms of their ecological and evolutionary characteristics. For example, the analyses of the digestion and detoxification genes based on the genome of Hypothenemus hampei revealed the mechanism of food seeking and pathogenic bacteria resistance, which can help develop insecticides, biological control agents and traps (Vega et al., 2015). The relationships between the expansion of the gene families of Spodoptera litura (gustatory receptor, cytochrome P450, carboxy‐lesterase as well as glutathione‐S‐transferase) and the resistance to various insecticides were analysed based on the high‐quality genome data, providing new insights into the evolutionary mechanism, host plant selection and ecological adaptability of Lepidoptera (Cheng et al., 2017). The high‐quality draft genome of Propylea japonica provides invaluable resource for understanding the molecular mechanisms of stress resistance in Coleoptera (Zhang et al., 2020). Two gene families associated with environmental adaptation (odorant receptor and cytochrome P450) were analysed, and a putative biosynthesis pathway of the defence alkaloid harmonine was successfully constructed based on the chromosome‐level genome assembly of Harmonia axyridis (Chen et al., 2021). The fauna of extant Neuropterida comprises nearly 6600 species of 20 families from three orders (Neuroptera, Megaloptera and Raphidioptera) (Yang et al., 2018), which is the key group connecting Holometabola and Hemimetabola. Many species of this group are important predators of pests with great potential in biological pest management such as green lacewings, brown lacewings, dusty lacewings and so on. Chrysopa pallens (Rambur, 1838) (Neuropterida: Neuroptera: Chrysopidae) is one of the most important predatory species widely used in agricultural and forestry ecosystems (Figure 1). It is the dominant species widely distributed in China except for the Tibet Plateau (Yang et al., 2018) and also distributed in Japan, Korea and Europe (Yang et al., 2005). Both larvae and adults of this species are predacious, feeding on aphids, coccids, thrips, planthoppers, whiteflies and some lepidopteran insects, as well as mites in agroforestry ecosystems (Bai et al., 2005; Winterton & de Freitas, 2006). In addition, laboratory experiments showed that C. pallens demonstrated an effective control capacity on the eggs and larvae of S. frugiperda (Xu et al., 2019). The high pest controlling potentials of C. pallens were underlain by its wide diet, high predation, strong adaptability and high egg production (about 2000 eggs per female) (Miller et al., 2004; Tauber et al., 2000). However, the molecular mechanisms of why C. pallens serve as such an excellent biocontrol agent remain unclear.

FIGURE 1

Ecological photos of Chrysopa pallens. (a) Egg, (b) larva, (c) pupa, (d) adult

Ecological photos of Chrysopa pallens. (a) Egg, (b) larva, (c) pupa, (d) adult Scholars from all over the world had carried out researches on the olfactory tropism, artificial feeding (Ali et al., 2018), mechanisms of resistance to chemical pesticides (Liu & Zeng, 2014) and pest‐control abilities of C. pallens (Xu et al., 2019), laying foundations for further exploring their potential application in biological control. C. pallens had evolved a sophisticated olfactory system to accurately sense a large amount of chemical information in order to seek prey, locate host plants, find mates and spot dangers (Li et al., 2019). Previous studies have shown that using chemicals to guide C. pallens predation on pests could effectively improve their abilities to search for prey (Cui et al., 2015; Koczor et al., 2010). It is necessary to study the potential mechanism of their olfactory system for effective biological control (Hesler, 2016). A normalized transcriptome of C. pallens was sequenced by Li et al. (2013), and a large number of candidate chemosensory genes were identified. A full‐length cDNA library from the antenna of C. pallens was constructed by Wang et al. (2014), and the functions of OBPs (CpalOBPs), the clone of olfaction‐related genes of antennae as well as the mechanism of information recognition of C. pallens were studied. All the results mentioned above provided a basis for exploring the mechanisms of chemoreception in C. pallens. However, the olfactory mechanism of C. pallens at the genetic level is still not clarified. The absence of genomic information about C. pallens has limited further study into the molecular mechanisms of its ecological and evolutionary characteristics. To date, there are more than 600 insect genomes sequenced and available at GenBank (accession date, 15 April 2021). However, no chromosome‐level genome has been published for any species of Neuroptera so far except one of Chrysoperla carnea (GCA_905475395.1) uploaded to NCBI during the preparation of this manuscript, which remain unpublished. Therefore, we generated a high‐quality genome assembly of a green lacewing species C. pallens, which would provide invaluable information for further study of C. pallens and promote large‐scale breeding and commercial utilization of C. pallens. This study will also facilitate the development of modern pest control strategies. Besides, it will also serve as a reference for further comparative genomic studies of Neuropterida and even Insecta.

MATERIALS AND METHODS

Sampling and genome sequencing

The specimens used in the present study were reared at Langfang station of the Chinese Academy of Agricultural Sciences and inbred under laboratory conditions. All samples were selected from newly‐emerged unfed adults to avoid contamination. There were 50 female and 10 male adults used for genome sequencing. The whole bodies of samples were used for the construction of each library. High‐quality genomic DNA was extracted with the Qiagen DNeasy Blood & Tissue kit. DNA was quantified by 0.75% agarose gel electrophoresis, Nanodrop spectrophotometry (Thermo Fisher) and Qubit 3.0 fluorometry (Invitrogen). RNA was extracted with the TRIzol Reagent kit. A Single Molecule Real‐Time DNA library was prepared for sequencing using SMRTbell Template Prep Kit 2.0 with an insert size of 40 kb. The genome of C. pallens was sequenced with PacBio Sequel II System using 25 male adults. Illumina paired‐end library (with 350 bp insert size) was prepared with Truseq DNA PCR‐free kit using five male adults. Illumina paired‐end reads were used for error checking and as references to Hi‐C. The RNA library was constructed with Illumina TruSeq RNA v2 Kit according to the manufacturer's instructions using 10 female and 10 male adults. Hi‐C library was constructed using 10 male adults. The construction of a Hi‐C library included cross‐linking of formaldehyde, restriction enzyme digestion, ends repair of fragments, DNA cyclization, DNA purification and other steps with MboI as the restriction enzyme.

Genome size estimation and assembly

Quality control of the second‐generation Illumina data was carried out by bbtools v38.67 (qtrim=rl trimq =20 minlen =15 ecco =t maxns =5 trimpolya =10 trimpolyg =10 trimpolyc =10) (Khan et al., 2012). Genome survey was conducted based on K‐mer distribution analyses using genomescope v2.0 (Ranallo‐Benavidez et al., 2020) to predict the genome size, heterozygosity and proportion of repeat content to select appropriate assembly tools and adjust corresponding parameter (‐k 21 ‐p 2 ‐m 10,000) (Kirkness et al., 2003). The k‐mer frequency was evaluated using bbtools v38.67 (Khan et al., 2012) and the length was set to 21 k‐mer. For genomic contig assembly, the long PacBio reads were corrected by nextdenovo v2.3.0 (seed_cutoff =20,000) (https://github.com/Nextomics/NextDenovo). The corrected PacBio sequences were assembled using raven v1.2.2 (‐‐weaken ‐p 0) (Vaser & Šikić, 2021). After the preliminary assembly of the genome, nextpolish v1.3.0 (Hu et al., 2019) was used to perform one round of long‐read polishing and two rounds of short‐read polishing to get the corrected genome sequence, which further improves the assembly accuracy. In this process, minimap2 v2.17 (Li, 2018) with default parameters were used to align the second‐generation/third‐generation data to the third‐generation assembled genome. samtools v1.9 (samtools view) (Li et al., 2009) was used for format conversion (SAM to BAM). 3D‐DNA v180922 (Dudchenko et al., 2017; Nene et al., 2007) process was used to conduct chromosome anchoring based on the Hi‐C sequences. The adapter sequences of raw reads were trimmed and low‐quality PE reads were removed using juicer v1.6.2 (Durand et al., 2016) to obtain clean data. Then, the clean reads were mapped to the draft genome using 3D‐DNA v180922 (Dudchenko et al., 2017; Nene et al., 2007). Contigs in the same chromosome were separated by 100 Ns. A heat map was constructed based on the interaction signals revealed by the effective mapping read pairs. The sequencing depth of each locus and the average depth of chromosomes were calculated using samtools (Li et al., 2009). Illumina paired‐end reads of male adults were sequenced separately and then were aligned to the C. pallens assembled genome to identify the sex chromosomes (X and Y) in C. pallens. The average sequencing depth and length were based to identify chromosome X and Y. Potential contamination sequences were searched and removed from the nucleotide sequence database (NT) (Stoesser et al., 2001) and UniVec (https://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/) using blast+ (i.e., Blastn) v2.9.1 (Camacho et al., 2009) and uploaded to GenBank for reconfirmation. The completeness of the final draft genome was assessed using busco v3.0.1 (Simão et al., 2015) with insecta_odb10 reference, predicting the integrity of the genome with the percentage of single‐copy orthologues founded in the insect orthologues library.

Genome annotation

The repeat sequences were identified by both homology‐based and de novo prediction methods. Firstly, the LTR search (‐LTRStruct) was processed under repeatmodeler v2.0.1 (https://github.com/Dfam‐consortium/RepeatModeler) and the de novo repeat library was built based on the specific structure of repeat sequence and the principle of de novo repeat library. A custom library was then formed by combining Dfam_3.3 (http://www.dfam.org/) (Storer et al., 2021) and RepBase‐20181026 databases (https://www.girinst.org/repbase/) under the default parameters (Jurka et al., 2005). Finally, the repeat sequences were searched by repeatmasker v4.0.9 with the default commands (http://repeatmasker.org/cgi‐bin/WEBRepeatMasker) according to the custom library. The protein coding genes were annotated by integrating the evidences of ab initio, transcriptome‐based prediction and homology‐based annotations. Firstly, the protein coding gene structures were predicted using maker v3.01.03 with the default commands (Cantarel et al., 2008). Different pieces of evidences were weighted using EVidenceModeler (EVM) (Haas et al., 2008). augustus v3.3.4 (http://augustus.gobics.de/) (Stanke et al., 2006) and GeneMark‐ES/ET/EP 4.59_lic (Lomsadze et al., 2014) were automatically trained by BRAKER (Tomas et al., 2021) to improve the prediction accuracy combined with transcriptome and protein homology. Then, the transcriptome information was used to align to the genome by hisat2 v2.2.0 (Kim et al., 2015) to generate BAM files. The arthropod protein sequences were extracted from the orthodb10 v1 data (Kriventseva et al., 2018). Transcriptome assembly with reference genome was performed using stringtie v2.1.4 (Kovaka et al., 2019). For the homology‐based approach, proteins of related species were downloaded from GenBank, such as Drosophila melanogaster (Diptera), Tribolium castaneum (Coleoptera), Apis mellifera (Hymenoptera), Bombyx mori (Lepidoptera) and Rhopalosiphum maidis (Hemiptera) (Table 1). Finally, the preparation files obtained from the above three approaches were imported into maker v3.01.03 (Cantarel et al., 2008) for integrated annotations. EVM weights for the three types of evidences were set as: transcriptome 8, protein homologue 2, and ab initio 1.

TABLE 1

Information of species used in this study

Order	Species	NCBI accession	References
Ephemeropterp	Cloeon dipterum	GCA_902829235.1	Almudi et al. (2020)
Ispptera	Coptotermes formosanus	GCA_013340265.1	Itakura et al. (2020)
Hemiptera	Rhopalosiphum maidis	GCA_003676215.3	Chen et al. (2019)
Thysanoptera	Thrips palmi	GCA_012932325.1	Guo et al. (2020)
Coleptera	Tribolium castaneum	GCA_000002335.3	Richards et al. (2008)
Diptera	Drosophila melanogaster	GCA_000001215.4	Adams et al. (2000)
Hymenoptera	Apis mellifera	GCA_003254395.2	Solignac et al. (2007)
Trichoptera	Stenopsyche tienmushanensis	GCA_008973525.1	Luo et al. (2018)
Lepidoptera	Bombyx mori	GCA_014905235.2	International Silkworm Genome Consortium (2008)
Siphonaptera	Ctenocephalides felis	GCA_003426905.1	Driscoll et al. (2020)
Neuroptera	Chrysopa pallens	this study	this study

Information of species used in this study Two strategies were used for the annotation of gene functions. In the first strategy, the gene functions were predicted by aligning with existing databases, that is, uniprotkb (Swissprot +Trembl) (https://www.uniprot.org/) (Morgat et al., 2019) and the nonredundant protein sequence database (NR) using the sensitive mode of diamond v0.9.24 (‐‐more ‐sensitive ‐e 1e‐5) (Buchfink et al., 2015). In the second strategy, the gene functions were predicted by aligning with a comprehensive database, that is, gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome. Five databases including protein families (Pfam) (El‐Gebali et al., 2019), Smart (http://smart.embl‐heidelberg.de/) (Letunic et al., 2021), gene3d v21.0 (http://gene3d.biochem.ucl.ac.uk/) (Lewis et al., 2018), Superfamily (Wilson et al., 2009) and CDD (Marchler‐Bauer et al., 2017) were searched by interproscan 5.41–78.0 (Quevillon et al., 2005). At the same time, eggnog v5.0 (http://eggnog.embl.de) database was searched with eggnog‐mapper v2.0.1 (Huerta‐Cepas et al., 2017). The results of these two strategies were combined to get the final prediction of gene functions. Non‐coding RNAs including transfer RNAs (tRNAs), microRNAs (miRNAs), ribosome RNAs (rRNAs) and small nuclear RNAs (snRNAs) were also identified. The rRNAs, snRNAs and miRNAs were detected from the rfam database (release 13.0) (Kalvari et al., 2018) using infernal v1.1.3 (Nawrocki & Eddy, 2013). The tRNAs were predicted using trnascan‐se v2.0 with “EukHighConfidenceFilter” (Lowe & Eddy, 1997). The rRNAs and subunits were predicted using rnammer v1.2 (Lagesen et al., 2007).

Orthology, synteny and phylogenetic reconstruction

Orthologues from 11 insect species were clustered using the orthofinder v2.5.2 (Emms & Kelly, 2015) pipelines with default parameters (Table 1). According to the results of gene family clustering obtained from orthofinder v2.5.2 (Emms & Kelly, 2015), single‐copy genes were used for phylogenetic reconstruction. Firstly, mafft v7.394 (Katoh & Standley, 2013) was used to align the homologous region of the sequences with the L‐INS‐I strategy. The unreliable homologous alignments were filtered using bmge v1.12 (‐m BLOSUM90 ‐h 0.4) (Criscuolo & Gribaldo, 2010). Alignment sequences were then combined into a supermatrix using fasconcat‐g v1.04 (Kück & Longo, 2014). Phylogenetic reconstruction was performed with IQ‐TREE v2.0‐rc1 (‐m MFP ‐‐mset LG ‐‐msub nuclear ‐‐rclusterf 10) (Nguyen et al., 2015) based on the supermatrix and support values for nodes were assessed by Ultrafast Boostrap (Minh et al., 2013) and sh‐alrt algorithm (‐B 1000 ‐‐alrt 1000) (Guindon et al., 2010). Calibration points applied to estimate the divergence time of C. pallens were set based on the fossil records in the pbdb database (https://paleobiodb.org/navigator/) as well as from Misof et al. (2014) and Nel et al. (2013) (Table 2).

TABLE 2

Information of calibration times used in this study

Placement	Minimum age (Ma)	Maximum age (Ma)
Diptera	242	252
Pterygota	–	443
Paraneoptera	315	323
Holometabola	315	383
Coleoptera +Neuroptera	307	323
Trichoptera +Lepidoptera+ Diptera +Siphonaptera	311	323

Information of calibration times used in this study Tribolium castaneum was used as the reference to obtain relevant information to study the variation of interspecific chromosomes due to the lack of published high quality genomes of Neuropterida. There are only one chromosome‐level genome of Chrysoperla carnea (GCA_905475395.1) uploaded to ncbi during the preparation of this manuscript, which remain unpublished and another scaffold‐level genome of Neoneuromus ignobilis (GCA_014529405.1). The protein sequences of C. pallens and T. castaneum were compared with “blastp‐like” using mmseq2 v11‐E1A1C with default parameters (‐s 7.5 ‐‐alignment‐mode 3 ‐‐num‐iterations 4 ‐e 1e‐5 ‐‐max‐accept 5) (Steinegger & Söding, 2017). In addition, the all.blast result file and the integrated annotation file (all.gff file) were used as input files to obtain the results of collinearity analyses by MCScanX (Wang et al., 2012). Finally, the diagram was drawn by tbtools v1.0692 (Chen et al., 2020).

Gene family expansion and contraction analyses

Expansions and contractions of orthologous gene families were determined using computational analysis of family evolution (cafÉ) v4.2.1 (Han et al., 2013) with default parameters (p = .01) according to the gene families and phylogenetic tree. The random genetic birth and death rate was simulated to predict the evolution of gene families in different species at various branches of evolution to assess the expansion and contraction of gene families at each node on the phylogenetic tree. The significantly expanded gene families were classified through r package clusterprofiler v3.14.3 (Yu, Wang, et al., 2012) according to the gene annotation database. Then, the redundancy was removed to get final results of the gene enrichment. The selection of rapidly expanded families were analysed using paml 4 (Yang, 2007) based on the CDS with site models to further understand the evolution of the expansion of gene families. The phylogenetic tree used was reconstructed with iq‐tree v2.0‐rc1 (Nguyen et al., 2015) with ambiguity data filtered before the formal calculation. The specific model used were M0 (one rate), M1a (neutral) – M2a (selection), M7 (beta) – M8 (beta & w) (i.e., “NSsites=0 1 2 7 8”). The results of the M1a–M2a and M7–M8 pairs of models were compared using the likelihood ratio test (LRT) (p = .05). The gene family was identified as being positively selected only if M2a and M8 confirmed that the family had positive selection sites at the same time.

RESULTS AND DISCUSSION

Genome sequencing and assembly

About 78.59 Gb clean reads were generated by Illumina paired‐ended short‐read sequencing and 113.72 Gb filtered subreads were obtained by PacBio long‐read sequencing (Table 3). The average length and N50 of PacBio long reads were 32.86 kb and 18.31 kb, respectively. Hi‐C fragment libraries sequencing resulted in about 75.48 Gb (Table 3).

TABLE 3

Sequencing data used for the Chrysopa pallens genome assembly

The type of libraries	Number of reads	Raw data (bp)	Read length (bp)	N50 contig length (bp)
Survey	523,921,116	78,588,167,400	150	–
RNA	112,484,672	16,872,700,800	150	–
PacBio	6,210,590	113,715,446,179	18,309.93	32,855
Hi‐C	251,600,926	75,480,277,800	150	–

Sequencing data used for the Chrysopa pallens genome assembly The genome of C. pallens was predicted to be about 526.01–529.11 Mb indicated by the distribution frequency of 21‐mers (Table 4). The kmer analysis shows that the genome of C. pallens has high heterozygosity ranged from (1.48%–1.54%) (Table 4). The genome size was reduced from 555.40 to 539.70 Mb after the removal of heterozygous sequences (Table S1). These characteristics implied that the C. pallens genome harboured a high degree of complexity. The mitochondrial genome of C. pallens was assembled and removed by aligning with the genome. An assembled genome of 538.35 Mb slightly larger than predicted was finally obtained including 1639 contigs with contig N50 of 1.32 Mb long and the longest contig of 12.21 Mb. The average GC content was 27.08% (Table 5). The assembly was further improved with Hi‐C data, resulting in 552 scaffolds with a scaffold N50 of 89.78 Mb long and the longest scaffold of 141.00 Mb (Table 5). The results of each assembly step are shown in Table S1. Our genome assembly is highly complete with 98.10% BUSCO completeness, whereas only 0.30% and 1.60% BUSCO genes are fragmented and missing, respectively. This assembly includes seven pseudochromosomes which is consistent with the chromosome number identified by karyotype analyses (Figure 2a). In total, the length of pseudochromosomes was 517.21 Mb and contained 96.07% of the genome sequence. The pseudochromosomes ranged from 10.14 Mb to 141.00 Mb in length. The average sequencing depth of X chromosome (73.16x) was significantly lower than other chromosomes (deeper than 100x) and the length of Y chromosome was extremely short (10.14 Mb) (Table 6). The GC content of Y chromosome was the lowest (Figure 2b). The final genome sequences were submitted to GenBank under accession number JAEMTY000000000.

TABLE 4

Statistical results from the genome survey

Property	Minimum	Maximum
Homozygous (aa)	98.46%	98.52%
Heterozygous (ab)	1.48%	1.54%
Genome haploid length (bp)	526,010,433	529,108,099
Genome repeat length (bp)	87,197,893	87,711,400
Genome unique length (bp)	438,812,540	441,396,699
Model fit	79.74%	92.05%
Read error rate	0.59%	0.59%

TABLE 5

Genome assembly and annotation statistics of Chrysopa pallens

Statistic	Value
Genome assembly
Assembly size (Mb)	538.35
Number of scaffolds/contigs	552/1639
Longest scaffold/contig (Mb)	141.00/12.21
N50 scaffold/contig length (Mb)	89.78/1.32
GC (%)	27.08
Gaps (%)	0.02
BUSCO completeness (%)	98.10
Gene annotation
Protein‐coding genes	12,840
Mean protein length (aa)	540.79
Mean gene length (bp)	7555.93
Exons/introns per gene	6.03/4.75
Exon (%)	5.89
Mean exon length	409.18
Intron (%)	12.13
Mean intron length	1070.23
BUSCO completeness (%)	97.50

FIGURE 2

Heatmap of genome‐wide Hi‐C data and overview of the genomic landscape of Chrysopa pallens. (a) The heatmap shows all‐by‐all interactions among seven chromosomes of C. pallens. Resolution: 500 kb. There were strong intrachromosomal interactions (blocks on the diagonal line), while interchromosomal interactions were weaker. The frequencies of Hi‐C interaction links are represented by the colour, which ranges from white (low) to red (high). (b) Blocks on the outmost circle represent all seven chromosomes of C. pallens. Peak plots from outer to inner circles represent the length of each chromosome, the GC content of each chromosome, protein coding genes, the density of repeat sequences (SINE, short interspersed elements; LINE, long interspersed elements; LTR, long terminal repeat elements; simple repeats), respectively

TABLE 6

Chromosome length and average sequencing depth

Chromosome	Length (bp)	Average sequencing depth (x)
Chr1	140,998,244	105.33
Chr2	115,703,577	106.76
Chr3	89,781,665	107.80
Chr4	73,683,371	108.56
Chr5	61,463,935	107.43
ChrX	25,444,600	73.16
ChrY	10,136,392	113.63

Statistical results from the genome survey Genome assembly and annotation statistics of Chrysopa pallens Heatmap of genome‐wide Hi‐C data and overview of the genomic landscape of Chrysopa pallens. (a) The heatmap shows all‐by‐all interactions among seven chromosomes of C. pallens. Resolution: 500 kb. There were strong intrachromosomal interactions (blocks on the diagonal line), while interchromosomal interactions were weaker. The frequencies of Hi‐C interaction links are represented by the colour, which ranges from white (low) to red (high). (b) Blocks on the outmost circle represent all seven chromosomes of C. pallens. Peak plots from outer to inner circles represent the length of each chromosome, the GC content of each chromosome, protein coding genes, the density of repeat sequences (SINE, short interspersed elements; LINE, long interspersed elements; LTR, long terminal repeat elements; simple repeats), respectively Chromosome length and average sequencing depth

Gene annotation

A total of 206.21 Mb of repeat sequences were identified constituting 38.31% of the genome of C. pallens. The three most abundant classes of repeat sequences included unclassified (24.65%), DNA elements (3.82%) and simple repeats (2.13%), respectively (Table S2). The high proportion of unclassified elements may be due to the lack of studies on the repeat sequences of Neuropterida. In total, 12,840 protein‐coding genes were predicted in the genome of C. pallens (Table 5). The average lengths of gene, CDS, exon, and intron regions were 7555.93 bp, 1625.29 bp, 409.18 bp, and 1070.23 bp, respectively. The average exon number per gene was 6.03. The completeness of protein‐coding gene predictions was further evaluated by BUSCO (Simão et al., 2015) (insecta_odb10) to be 97.50%, including 93.90% single‐copy and 3.60% duplicated BUSCO genes (Table 5). 11,583 annotated genes (90.21%) matched one or more records in the UniProtKB database, while 10,879 genes (84.73%) encode proteins with at least one known domain in the InterPro database (Quevillon et al., 2005). Additional functional annotations identified 9503 GO terms, 7569 KO terms, 2662 enzyme codes, 8503 KEGG and 9580 reactome pathways, and 10,759 COG categories. There were 2361 noncoding RNA sequences annotated totally, including 1271 tRNAs, 607 ribosomal RNAs (rRNAs), 182 small nucleolar RNAs (snRNAs), 52 microRNAs (miRNAs), 15 small RNA (sRNAs) and others (Table S3). SnRNAs included 142 spliceosomal RNAs, four minor spliceosomal RNAs, one (small Cajal body‐specific RNA) scaRNA (SCARNA8) and others (Table S3). The numbers of rRNAs and tRNAs are significantly higher than that of other insects, which might indicate vigorous protein synthesis of C. pallens.

Orthology, synteny and phylogenetic relationships

Comparative genomic analyses were performed between C. pallens and 10 other insects representing major insect orders (i.e., Ephemeroptera, Isoptera, Hemiptera, Thysanoptera, Hymenoptera, Siphonaptera, Diptera, Trichoptera, Lepidoptera, Coleoptera and Neuroptera). In total, 138,491 genes were clustered into 13,857 gene families using OrthoFinder (Table S4). Among them, 967 were single‐copy and 3053 were multicopy in all 11 insect genomes (Figure 3a, Table S4). In the genome of C. pallens, 11,444 of the 12,840 genes were assigned to 8357 families that are also present in other insects, whereas the remaining 936 genes (223 gene families) were specific for this species (Table S5).

FIGURE 3

Phylogenetic tree, gene orthology, and synteny blocks. The phylogenetic tree was constructed based on 879 single‐copy gene families with 11 insects using maximum likelihood methods by IQ‐TREE (shown in Table 1). The length of branch indicated the divergence time. Bootstrap values (UFB/SH‐aLRT) were 100 at all nodes. Bars are subdivided to represent different types of orthologues with different colours. 1:1:1 (single copy orthologous genes in common gene families); N:N:N (mutiple copy orthologous genes in common gene families); Specific (genes from unique gene family from each species); Other (genes that do not belong to any above‐mentioned orthologue categories); Unassigned (orthologues which cannot be assigned into any orthogroups) (b) Synteny blocks between Chrysopa pallens and Tribolium cataneum Among the 967 single‐copy genes, 879 genes (321,918 amino acid sites) passed the symmetry test in IQ‐Tree (Nguyen et al., 2015) and were used for phylogenetic reconstruction. All nodes in the estimated ML tree were fully supported (100/100 for UFB/SH‐aLRT). The phylogenetic tree supported Neuropterida being the sister group to Coleoptera which was consistent with former studies based on different molecular markers, such as 18S rRNA (Kjer, 2004), Vitellogenin sequences (Liu et al., 2015), transcriptome (Misof et al., 2014; Wang et al., 2019) and mitochondrial genomes (Song et al., 2016) (Figure 3a). Neuropterida and Coleoptera separated from each other about 300 million years ago (Figure 3a). Comparison of the chromosome‐level genome assemblies of C. pallens and T. cataneum revealed poor collinearity in general, which may be due to the relatively long evolutionary distance between the two species. Only X chromosome had an obvious large syntenic block (Figure 3b). There is one chromosome‐level genome of Ch. carnea (GCA_905475395.1) uploaded to NCBI during the preparation of this manuscript, which remain unpublished. Further analyses will be performed in the future.

Gene family expansion and contraction

The analysis of gene family evolution showed that 912 gene families were expanded and 3704 gene families were contracted in the genome of C. pallens compared to that of T. cataneum (Coleoptera) (Figure 4). Among them, 49 orthologous groups were expanded significantly (p < .05), including genes encoding C2H2‐type zinc finger proteins, trypsin, zinc finger Y‐chromosomal protein, acyltransferase, facilitated trehalose transporter, histone‐lysine N‐methyltransferase PRDM9‐like protein, longitudinals lacking protein, ionotropic receptor, cytochrome P450, leucine rich repeat and so on (Figure S1). On the other hand, only one orthologous group was contracted significantly (p < .05).

FIGURE 4

Gene family evolutions among genomes of Chrysopa pallens and other species. Red indicates gene family expansions, green indicates gene family contractions, and blue indicates gene family rapid evolutions. The length of branch indicated the divergence time GO analyses showed that the expanded gene families were enriched for functions related to digestion, immunity and nerves (Figure 5a), which might explain why C. pallens could serve as an excellent biocontrol agent. For instance, the successful capture of prey is attributed to the ability of C. pallens to effectively perceive information in the environment, in which the olfactory system on antennae plays a key role (Li et al., 2019). Li et al. (2015) cloned five OBP genes and analysed the tissue expression profiles of CpalOBPs in five strains of C. pallens using quantitative RT‐PCR and demonstrated that CpalOBP10 and the OBP7 of aphids have close evolutionary relationships and can both mediate the perception of green leaf volatiles and (E)‐β‐farnesene later (Li et al., 2017). However, the underlying molecular mechanisms were not clarified. Further researches on olfactory related genes in C. pallens based on the high‐quality genome will help to clarify the olfactory recognition mechanism of insects at the genetic level and to control the behavior of natural enemies.

FIGURE 5

Enrichment analyses of expanded gene families of Chrysopa pallens. (a) GO enrichment map of gene families expanded rapidly. The vertical axis represents the path name and the horizontal axis represents gene ratio. The size of the dot indicates the number of differentially expressed genes in the pathway and the colour of the point corresponds to different q‐value ranges. (b) KEGG enrichment map of gene families expanded rapidly. The vertical axis represents the path name and the horizontal axis represents gene ratio. The size of the dot indicates the number of differentially expressed genes in the pathway and the colour of the point corresponds to different q‐value ranges The reproduction of C. pallens was proven to be closely related to the weight of females and the activities of trypsin‐like as well as chymotrypsine‐like enzymes (Liu, Mao, & Zeng, et al., 2015). Han et al. (2017) analysed the transcriptome of female adults of C. pallens under starvation as well as feeding conditions and found that different expressed genes (DEGs) were mainly involved in ribosome biogenesis, protein processing in endoplasmic reticulum, biosynthesis of amino acids, and carbon metabolism. They also annotated four vitellogenins, three insulin‐like peptides and two insulin receptors genes. However, these earlier studies of C. pallens genes mainly relied on the transcriptome data which carry incomplete information. The chromosome‐level genome of C. pallens will provide more information, which will contribute to the further study of the molecular mechanisms of its vitellogenin protein, nutrient metabolism and reproduction. Interestingly, the gene families involved in cold acclimation and wax synthesis were also expanded in the genome of C. pallens, which may be related to its overwintering, spinning and cocooning as well as the formation of egg stalks. KEGG pathways enriched in expanded gene families mainly included insect pheromone synthesis, detoxification metabolism, digestion and other related pathways as well as wax synthesis (cutin, suberine and wax biosynthesis) (Figure 5b). In the long evolutionary process of adaptation, insects have gradually developed adaptations to low temperatures. Many insects are able to build up their tolerance before cold winter temperatures set in to improve the survival of their populations (Bale, 1987; Danks, 2006; Lee, 1989). The cold hardiness of the prepupae of C. pallens show obvious seasonal characteristics. The prewinter and midwinter prepupae had significantly higher cold tolerance than those in summer and post‐wintering (Yu, Shi, et al., 2012). C. pallens overwintered by adopting a freeze avoidance strategy, and the diapausing prepupae increased their cold tolerance by reducing the supercooling point. As the short‐day diapause type, the prepupae of C. pallens, such as Ostrinia furnacalis (Gong et al., 1984) and Chrysoperla sinica (Guo et al., 2006), can improve the cold resistance through their own part‐time diapause. The gene families responding to cold acclimation were found to be expanded in C. pallens in this study (Figure 5a). However, the mechanism of how C. pallens goes through the winter still needs more specialized researches in the future. Chrysopa pallens is a complete metamorphosis insect as its developmental stages consist of egg, larva, prepupa, pupa and adult (Yang et al., 2005). Interestingly, each egg of Chrysopidae is attached to the top of a filamentous petiole while the eggs of other neuropteridan insects are not. The mature larvae often cocoon on the wall of the back of the leaf. There are filaments attached to the attachment outside the cocoon. The expansion of gene families involved in wax synthesis in the genome of C. pallens may be related to these biological habits mentioned above. To further understand the evolution of expanded gene families, the 49 rapidly expanded gene families were analysed for their selective pressures. Three families related to digestion were found to be positively selected, namely trypsin (OG0001146), chymotrypsin (OG0001336) and zinc finger MYM‐type protein 1 (OG0000501), which may be one of the important driving forces for the wide feeding behaviours of C. pallens.

CONCLUSIONS

In this study, a chromosome‐level genome assembly was reported for C. pallens, an important predatory biological control agent. The key functional genes expanded in C. pallens were identified, including those involved in digestion, detoxification and chemoreception, implying why green lacewings can serve as an excellent biocontrol agent. The chromosome‐level genome of C. pallens also provides insights into the genomics of Neuropterida and would serve as an invaluable reference for further comparative genomic studies of Neuropterida and even Insecta. In addition, it will support more accurate and reliable studies about the habits of C. pallens at the genomic level which may help to improve its control efficiency. The high quality genome will also provide a solid foundation to promote the development and utilization of natural enemies in biological pest control in the future.

CONFLICT OF INTEREST

The authors have declared that no competing interest exits.

AUTHOR CONTRIBUTIONS

Yuyu Wang, Xingyue Liu and Ding Yang designed the research. Ruyue Zhang, Mengqing Wang, Jing Li and Lisheng Zhang performed research. Yuyu Wang, Cheng‐Min Shi, Mengqing Wang and Ruyue Zhang analysed data., Ruyue Zhang, Fan Fan, Jing Li, Shuo Geng, Xingyue Liu and Ding Yang wrote the manuscript. All authors reviewed the manuscript. Supplementary Material Click here for additional data file.

75 in total

1. Minimap2: pairwise alignment for nucleotide sequences.

Authors: Heng Li
Journal: Bioinformatics Date: 2018-09-15 Impact factor: 6.937

2. NextPolish: a fast and efficient genome polishing tool for long-read assembly.

Authors: Jiang Hu; Junpeng Fan; Zongyi Sun; Shanlin Liu
Journal: Bioinformatics Date: 2020-04-01 Impact factor: 6.937

3. AUGUSTUS: ab initio prediction of alternative transcripts.

Authors: Mario Stanke; Oliver Keller; Irfan Gunduz; Alec Hayes; Stephan Waack; Burkhard Morgenstern
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

4. FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies.

Authors: Patrick Kück; Gary C Longo
Journal: Front Zool Date: 2014-11-18 Impact factor: 3.172

5. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

Authors: Aron Marchler-Bauer; Yu Bo; Lianyi Han; Jane He; Christopher J Lanczycki; Shennan Lu; Farideh Chitsaz; Myra K Derbyshire; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Zhouxi Wang; Roxanne A Yamashita; Dachuan Zhang; Chanjuan Zheng; Lewis Y Geer; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

6. Enzyme annotation in UniProtKB using Rhea.

Authors: Anne Morgat; Thierry Lombardot; Elisabeth Coudert; Kristian Axelsen; Teresa Batista Neto; Sebastien Gehant; Parit Bansal; Jerven Bolleman; Elisabeth Gasteiger; Edouard de Castro; Delphine Baratin; Monica Pozzato; Ioannis Xenarios; Sylvain Poux; Nicole Redaschi; Alan Bridge
Journal: Bioinformatics Date: 2020-03-01 Impact factor: 6.937

7. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs.

Authors: Evgenia V Kriventseva; Dmitry Kuznetsov; Fredrik Tegenfeldt; Mosè Manni; Renata Dias; Felipe A Simão; Evgeny M Zdobnov
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

8. Gene Selection and Evolutionary Modeling Affect Phylogenomic Inference of Neuropterida Based on Transcriptome Data.

Authors: Yuyu Wang; Xiaofan Zhou; Liming Wang; Xingyue Liu; Ding Yang; Antonis Rokas
Journal: Int J Mol Sci Date: 2019-03-01 Impact factor: 5.923

9. The Pfam protein families database in 2019.

Authors: Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

10. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.

Authors: T Rhyker Ranallo-Benavidez; Kamil S Jaron; Michael C Schatz
Journal: Nat Commun Date: 2020-03-18 Impact factor: 17.694

3 in total

1. The first chromosome-level genome assembly of a green lacewing Chrysopa pallens and its implication for biological control.

Authors: Yuyu Wang; Ruyue Zhang; Mengqing Wang; Lisheng Zhang; Cheng-Min Shi; Jing Li; Fan Fan; Shuo Geng; Xingyue Liu; Ding Yang
Journal: Mol Ecol Resour Date: 2021-09-30 Impact factor: 8.678

2. Bibliometric Analyses of Web of Science Illuminate Research Advances of Neuropterida.

Authors: Shuo Tian; Yuxin An; Ruyue Zhang; Liming Wang; Yuyu Wang
Journal: Insects Date: 2022-05-16 Impact factor: 3.139

3. The First Chromosome-level Genome Assembly of Cheumatopsyche charites Malicky and Chantaramongkol, 1997 (Trichoptera: Hydropsychidae) Reveals How It Responds to Pollution.

Authors: Xinyu Ge; Jianfeng Jin; Lang Peng; Haoming Zang; Beixin Wang; Changhai Sun
Journal: Genome Biol Evol Date: 2022-10-07 Impact factor: 4.065

3 in total