Literature DB >> 35775933

A chromosome-level genome assembly of Neotoxoptera formosana (Takahashi, 1921) (Hemiptera: Aphididae).

Shuai Ye1, Chen Zeng2, Jian-Feng Liu1, Chen Wu3, Yan-Fei Song1, Yao-Guo Qin4, Mao-Fa Yang1,5.   

Abstract

Neotoxoptera formosana (Takahashi), the onion aphid, is an oligophagous pest that mainly feeds on plants from the Allium genus. It sucks nutrients from the plants and indirectly acts as a vector for plant viruses. This aphid causes severe economic losses to Allium tuberosum agriculture in China. To better understand the host plant specificity of N. formosana on Allium plants and provide essential information for the control of this pest, we generated the entire genome using Pacific Biosciences long-read sequencing and Hi-C data. Six chromosomes were assembled to give a final size of 372.470 Mb, with an N50 scaffold of 66.911 Mb. The final draft genome assembly, from 192 Gb of raw data, was approximately 371.791 Mb in size, with an N50 contig of 24.99 Kb and an N50 scaffold of 2.637 Mb. The average GC content was 30.96%. We identified 73 Mb (31.22%) of repetitive sequences, 14,175 protein-coding genes, and 719 noncoding RNAs. The phylogenetic analysis showed that N. formosana and Pentalonia nigronervosa are sister groups. We found significantly expanded gene families that were involved in the THAP domain, the DDE superfamily endonuclease, zinc finger, immunity (ankyrin repeats), digestive enzyme (serine carboxypeptidase) and chemosensory receptor. This genome assembly could provide a solid foundation for future studies on the host specificity of N. formosana and pesticide-resistant aphid management.
© The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.

Entities:  

Keywords:  zzm321990 Neotoxoptera formosanazzm321990 ; Hi-C; chromosome-level genome; gene family evolution; genome assembly

Mesh:

Year:  2022        PMID: 35775933      PMCID: PMC9339310          DOI: 10.1093/g3journal/jkac164

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.542


Significance

Onion aphids cause significant economic losses to Allium plant agriculture, particularly A. tuberosum. However, there is very little knowledge of this aphid, in terms of genetics. To expand the genetic resource of this pest, we assembled the whole genome and found the expanding gene families of N. formosana. This will provide opportunities for future studies on N. formosana genetics and will eventually contribute to aphid control.

Introduction

Allium crop species are used worldwide as vegetables and play an important role in daily diets in Asia (Shahrajabian ). These Allium vegetables contain various sulfur and organic compounds that exhibit anticancer activities and may be useful for the treatment and prevention of cancers (Asemani ). During our previous investigation, we found that Neotoxoptera formosana (Takahashi) (Hemiptera: Aphididae) causes significant economic losses to Allium plant agriculture, particularly A. tuberosum. N. formosana, also known as the onion aphid, damages Allium plants by sucking the cell sap from the plants, spreading many plant viruses and defecating sticky honeydew (Wang ; Lin ). Onion aphids can search for Allium plants but show a significant response to the repellent effects of volatile sulfides, which are released by the plants (Hori 2007), and substances such as rosemary (Hori and Komatsu 1997). This aphid survives all year round but there are 2 main hazard peaks in the year: March–May and July–September, in Guizhou province, China. Previous studies have found that the predatory gall midge, Aphidoletes aphidimyza, has a good control efficiency against N. formosana under laboratory conditions (Wang ). The complete mitochondrial genome of N. formosana was sequenced and annotated by Song . Despite its highly specialized host range and significant economic losses to Allium plant agriculture, no genome information for N. formosana has been published. In this study, we provided a chromosome-level genome assembly of N. formosana. This is the first genome assembly of this genus and will provide important basic information for the study of aphid taxonomy, host plant specificity, and pesticide-resistant aphid management.

Materials and methods

Sample collection and sequencing

The onion aphids used for sequencing were obtained from Puding County, Anshun City, Guizhou Province, China (105˚ 27ʹ 49″ E, 26˚ 26ʹ 36″ N), in December 2020, and were reared in the laboratory at the Institute of Entomology, Guizhou University (Fig. 1). There were 90, 30, and 60 adult females used for PacBio sequencing, RNA-Seq analysis, and Hi-C sequencing, respectively. High-quality DNA was extracted using the QIAGEN DNeasy Blood and Tissue kit. For PacBio sequencing, a 20-kb insert-size library was constructed using the SMRTbell Template Prep Kit 2.0 and sequencing was performed on the PacBio Sequencer Sequel II. For the Illumina sequencing, the Truseq DNA PCR-free kit was used to construct PCR-free libraries with an insert size of 350 bp. For the RNA-Seq analysis, the TRIzol Reagent kit was used to extract RNA and libraries were constructed using the TruSeq RNA v2 kit. The Hi-C library construction was performed by Berry Genomics and included cross-linking, restriction enzyme (MboI) digestion, fragment end repair, DNA cyclization, and DNA purification. All the Illumina libraries were sequenced on a NovaSeq 6000, to achieve reads of 150 bp in length.
Fig. 1.

Neotoxoptera formosana. a) N. formosana damaging Allium tuberosum; b) N. formosana damaging yellow A. tuberosum; c) N. formosana female and nymph.

Neotoxoptera formosana. a) N. formosana damaging Allium tuberosum; b) N. formosana damaging yellow A. tuberosum; c) N. formosana female and nymph.

Genome assembly

The Illumina genomic datasets were assessed for quality and trimmed using BBTools v38.82 (Bushnell 2014), using the following steps: (1) the duplicated sequences were removed using clumpify.sh; (2) bbduk.sh was used to remove low quality bases (

Genome annotation

We annotated repeats, protein-coding genes, and noncoding RNAs from the assembly. To identify repeats, RepeatModeler v2.0.2a (Flynn ) was first used to generate a de novo repeat library with the parameter of “-LTRStruct.” This repeat library was then merged with the sequences from the RepBase-20181026 (Bao ) database to form a more extensive repeat library, which was used as the input into RepeatMasker v4.1.0 (Smit ). The protein-coding gene models were predicted using MAKER v3.01.03 (Holt and Yandell 2011), by integrating the predictions from 3 strategies. EVidenceModeler (EVM) was used for evidence weighting. The 3 strategies were: (1) ab initio prediction: BRAKER v2.1.6 (Hoff ) was used to train Augustus v3.4.0 (Stanke ) and GeneMark-ES/ET/EP 4.68_lic (Brůna ) and then predict genes with the evidence from RNA-Seq data, to improve accuracy. Alignments from the RNA-Seq analysis were generated using HISAT2 v2.2.1 (Kim ). (2) Transcript-based gene structure prediction: a reference-guided transcriptome assembly was generated using StringTie v2.1.6 (Kovaka ), by assembling RNA-Seq reads. This assembly was then aligned to the genome assembly using HISAT2. (3) Protein-homology-based prediction: the characterized protein sequences from the phylogenetically close species, Acyrthosiphon pisum, Drosophila melanogaster, Nilaparvata lugens, Thrips palmi, Rhopalosiphum maidis, and Pediculus humanus, were downloaded from NCBI for the model. Gene functional annotation was conducted in 3 steps: (1) gene models were searched against the UniProtKB (SwissProt+TrEMBL) and nr databases. To search against UniProtKB, the sensitive mode (–very-sensitive -e 1e-5) was used for Diamond v2.0.11.149 (Buchfink ) to obtain functional description; (2) gene models were searched against the Pfam, Smart, Superfamily, and CDD databases using InterProScan 5.48-83.0 (Quevillon ) and the eggNOG v5.0 (Huerta-Cepas ) database, with eggNOG-mapper v2.1.5 (Huerta-Cepas ). These data were used to predict protein domains, gene ontology (GO) terms, KEGG, and Reactome pathways. (3) The results generated from the above were integrated to produce the final functional annotation. To annotate noncoding RNAs, infernal v1.1.4 was used to annotate rRNA, snRNA, and miRNA by searching against the Rfam database. The tRNAs were annotated using tRNAscan-SE v2.0.9 (Chan and Lowe 2019) and the predicted tRNAs with low fidelity were removed with the “EukHighConfidenceFilter” script.

Species phylogeny and gene family evolution

We downloaded protein sequences from 15 species from NCBI to infer gene family homology. These species covered orders, tribes, and families and included Thrips palmi, Acyrthosiphon pisum, Sitobion miscanthi, Diuraphis noxia, Myzus persicae, Aphis gossypii, Melanaphis sacchari, Pentalonia nigronervosa, Rhopalosiphum maidis, Eriosoma lanigerum, Sipha flava, Nilaparvata lugens, Phenacoccus solenopsis, Riptortus pedestris, and Pachypsylla venusta. The sequences were initially clustered using OrthoFinder v2.3.8 (Emms and Kelly 2019) and then aligned using Diamond. To generate alignments for species phylogenetic tree construction, the 1,273 single-copy gene clusters were aligned individually to generate homologous regions using MAFFT v7.453 (Katoh and Standley 2013), with “L-INS-I.” The regions that were inappropriately aligned were trimmed using trimAl v1.4.1. To construct the phylogeny, FASconCAT-G v1.04 (Kück and Longo 2014) was used to generate a supermatrix as the input for IQ-TREE v2.1.3 (Minh ), with settings of “–symtest-remove-bad –symtest-pval 0.10 –m MFP –mset LG –msub nuclear –rclusterf 10 –B 1000 –alrt 1000.” The estimation of evolutionary time because species divergence was performed using MCMCTREE, from the PAML v4.9j package, with parameters of “clock = 2, RootAge = < 3.827, model = 0, BDparas = 1 1 0.1, kappa_gamma = 6 2, alpha_gamma = 1 1, rgene_gamma = 2 20 1, sigma2_gamma = 1 10 1.” There were 6 sets of fossil evidence downloaded from the PBDB database (https://www.paleobiodb.org/navigator/) and used as calibrations for this estimation: Hemiptera and Thysanoptera (<3.827 MYA) as the root, Sternorrhyncha (3.146–3.589 MYA), Aphalaridae and Pseudococcidae (2.793–3.232 MYA), Aphididae (0.996–1.4 MYA), Macrosiphini (>0.339 MYA), and Nilaparvata lugens and Riptortus pedestris (2.989–3.232 MYA). The prediction of gene family expansion and contraction within N. formosana, when compared with the other 15 species, was conducted using CAFÉ v4.2.1, with the model of single birth-death parameter lambda and a significance level of 0.01 (P = 0.01). The significantly expanded/contracted gene families were then assigned to GO and KEGG categories, using R package clusterProfiler v3.10.1 (Yu ), with the default parameters (P = 0.01 and q = 0.05).

Results and discussion

Genome sequencing and assembly

We obtained 104.3 Gb of PacBio long reads, with a mean read length of 15.42 kb and an N50 length of 24.99 kb. The genome was predicted to be between 395.5 and 397.2 Mb, with extremely low heterozygosity (Fig. 2). We estimated that 31.22% of the assembly contained regions of repetitive sequences (Supplementary Table 1). Our initial genome assembly, which was assembled solely from PacBio reads, was 371.791 Mb and contained 357 scaffolds and 1259 contigs (Table 1). We found that 93.9% of this assembly contained complete BUSCO genes (1,367), with a duplicate gene rate of 2.3% (Table 2). The mapping-back rates for Illumina short and PacBio long reads were 96.79% and 92.95%, respectively, which indicated that our assembly had high coverage of the raw data. There were approximately 800 Mb of Hi-C data used to assign scaffolds and contigs onto the 6 chromosomes (Figs. 3 and 4).
Fig. 2.

The K-mer frequency distribution analysis of Neotoxoptera formosana.

Table 1.

Genome assembly statistics of Neotoxoptera formosana.

AssemblyTotal length (Mb)Number of scaffoldsN50 length (Mb)Longest scaffold (Mb)GC (%)BUSCO (n = 1,367) (%)
CDFM
wtdbg393.6782,8062.63724.99931.5592.12.62.05.9
NextPolish392.0172,8062.63224.93331.5994.02.60.95.1
Purge_dups376.8401,9112.80024.93331.1594.02.60.95.1
3D-DNA372.47046166.91197.25630.9893.92.30.95.2
Final371.79135766.90897.22330.9693.92.30.95.2
Table 2.

Genome assembly and annotation statistics for Neotoxoptera formosana.

Neotoxoptera formosana
Genome assembly
 Assembly size (Mb)371.791
 Number of scaffolds/contigs357/1,259
 Longest scaffold/contig (Mb)97.223/24.933
 N50 scaffold/contig length (Mb)66.908/2.772
 GC (%)30.96
 Gaps (%)0.024
 BUSCO completeness (%)93.9%
Gene annotation
 Protein-coding genes14,175
 Mean protein length (aa)504.5
 Mean gene length (bp)6,552.6
 Exons/introns per gene9.3/8.0
 Exon (%)9.64
 Mean exon length271.4
 Intron (%)15.34
 Mean intron length501.4
 BUSCO completeness (%)93.9
Fig. 3.

Hi-C contact map of the Neotoxoptera formosana genome.

Fig. 4.

Circos plot that indicates chromosome length, GC content, and protein-coding gene/repeat sequence density.

The K-mer frequency distribution analysis of Neotoxoptera formosana. Hi-C contact map of the Neotoxoptera formosana genome. Circos plot that indicates chromosome length, GC content, and protein-coding gene/repeat sequence density. Genome assembly statistics of Neotoxoptera formosana. Genome assembly and annotation statistics for Neotoxoptera formosana. We annotated the repetitive sequences, protein-coding genes, and noncoding RNAs from the genome assembly. There were 754,839 predicted repetitive sequences, which made-up approximately 116 Mb (31.22%) of the assembly. The 5 most abundant repeat types were DNA elements (11.64%), unclassified (10.72%), simple repeats (4.09%), LINEs (1.92%), and LTR elements (1.21%) (Supplementary Table 1). There were 14,175 predicted protein-coding genes, supported by approximately 8 Gb of RNA-Seq data. The predicted genes had a mean length of 6,552.6 bp, a mean CDS length of 211.1 bp and a mean number of exons of 9.3 (Table 1). Of the genes that were completely recovered from this gene set, 93.9% were BUSCO genes. InterProScan identified protein domains for 11,227 predicted protein-coding genes and, together with eggNOG results, 9,607 and 8,287 genes were annotated with gene ontology (GO) terms and KEGG pathways, respectively. We annotated 719 noncoding RNAs that contained 128 miRNAs, 89 rRNAs, 97 snRNAs, 225 tRNAs, 27 ribozymes, 4 lncRNAs, and 149 other RNAs. The 97 snRNAs included 47 G4-forming RNAs (U1, U2, U4, U5, U6, and U11), 3 minor G4-forming RNAs (U4atac, U6atac, and U12) and 47 C/D box snoRNAs (Supplementary Table 2).

Genome collinearity analysis

We found a relatively high level of conserved linkage between N. formosana (Nf) and Acyrthosiphon pisum (Ap) genomes, when compared with Nf and Rhopalosiphum maidis (Rm) genomes (Fig. 5). N. formosana chromosome 1 (NfChr1) was mostly collinear with ApChrX, with some small regions aligned to ApChrA2. We found that NfChr2 and NfChr6 had homologous regions on ApChrA1. The syntenic regions of NfChr3 were located on ApChrA1 and the entire of ApChrA3. Conservation was observed between NfChr4 and 5 and ApChrA2. When compared with the R. maidis genome, only NfChr1 showed a high level of conservation with RmChr3, whilst other chromosomes had syntenic regions scattered across the R. maidis genome.
Fig. 5.

Chromosome collinearity analysis graph. Ap: Acyrthosiphon pisum; Nf: Neotoxoptera formosana; Rm: Rhopalosiphum maidis.

Chromosome collinearity analysis graph. Ap: Acyrthosiphon pisum; Nf: Neotoxoptera formosana; Rm: Rhopalosiphum maidis.

Phylogeny

We used protein sequences from 15 species, together with the annotated N. formosana protein models, to construct a phylogenetic tree (Fig. 6). There were 254,609 (91.3%) gene models assigned to 19,010 gene families. Among 4,169 gene families that were present in all species, 1,273 and 2,896 were single- and multi-copy families, respectively. In N. formosana, 14,037 genes were clustered into 9,838 families and 77 genes from 30 families were found to be specific to this species (Fig. 6). A total of 555,217 amino acid residues, obtained from 1,113 single-copy genes, were used for phylogenetic construction. Most lineages had UFB/SH-aLRT supports of 100/100, apart from Macrosiphini sacchari, which had supports of 99.9/94, and Aphis gossypii and Melanaphis sacchar, which had supports of 99.3/98 (Fig. 6). This phylogeny suggested that N. formosana and P. nigronervosa were sister groups.
Fig. 6.

Phylogenetic tree of Neotoxoptera formosana: The branch length represents evolution time, numbers represent the number of expanded, contracted and rapidly evolving (statistically significant, labeled as red) gene families in that branch.

Phylogenetic tree of Neotoxoptera formosana: The branch length represents evolution time, numbers represent the number of expanded, contracted and rapidly evolving (statistically significant, labeled as red) gene families in that branch.

Gene family evolution

When compared with the other species, we found that the number of expanded and contracted gene families in N. formosana was 629 and 3,067, respectively. There were 330 gene families that showed significant expansion or contraction. The gene families with significant expansion were THAP domain, DDE superfamily endonuclease, zinc finger, immunity (ankyrin repeats), digestive enzyme (serine carboxypeptidase), and chemosensory receptor families (Fig. 7a). The Odorant receptors (ORs) gene family exhibited rapid expansion, in accordance with the GO enrichment analysis. The results of the KEGG pathway enrichment analysis showed that pathways involved in detoxification, immunity, and secondary metabolite synthesis were significantly enriched (Fig. 7b).
Fig. 7.

Gene family evolution of Neotoxoptera formosana. a) Bubble plot of GO enrichment analysis of rapidly expanding gene families; b) bubble plot of KEGG enrichment analysis of rapidly expanding gene families.

Gene family evolution of Neotoxoptera formosana. a) Bubble plot of GO enrichment analysis of rapidly expanding gene families; b) bubble plot of KEGG enrichment analysis of rapidly expanding gene families. Aphid ORs play an essential role in the perception of different host odors or pheromones (Liu ). In this study, 16 ORs candidate genes were rapidly exhibited expansion in N. formosana. This rapid expansion might be associated with the feeding behaviour of N. formosana, which is an oligophagic aphid pest only feeding on different Allium species. Hori (2007) finds that N. formosana might use dipropyl trisulphide (extracted from Allium fistulosu) and diallyl disulphide (extracted from Allium tuberosum) as olfactory cues to search for the host plants based on Y-tube olfactometer. In order to understand the odor perception of N. formosana, future work should analyze the expression patterns of ORs genes in different issues and identify the functional analysis of ORs genes to different plant volatiles (Zhang ).

Data availability

Genome assembly and raw sequencing data have been deposited at the NCBI under the accessions JAIWJD000000000 and SRR18085628, SRR18079676, SRR18079766 and SRR13334673, respectively. Genome annotations are available at the Figshare under the link: https://figshare.com/articles/online_resource/A_Chromosome-level_genome_assembly_of_Neotoxoptera_formosana_Takahashi_Takahashi_1921_Hemiptera_Aphididae_/19165817. Supplemental material is available at G3 online. Click here for additional data file.
  33 in total

1.  Repbase Update, a database of repetitive elements in eukaryotic genomes.

Authors:  Weidong Bao; Kenji K Kojima; Oleksiy Kohany
Journal:  Mob DNA       Date:  2015-06-02

2.  tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences.

Authors:  Patricia P Chan; Todd M Lowe
Journal:  Methods Mol Biol       Date:  2019

3.  NextPolish: a fast and efficient genome polishing tool for long-read assembly.

Authors:  Jiang Hu; Junpeng Fan; Zongyi Sun; Shanlin Liu
Journal:  Bioinformatics       Date:  2020-04-01       Impact factor: 6.937

4.  RepeatModeler2 for automated genomic discovery of transposable element families.

Authors:  Jullien M Flynn; Robert Hubley; Clément Goubert; Jeb Rosen; Andrew G Clark; Cédric Feschotte; Arian F Smit
Journal:  Proc Natl Acad Sci U S A       Date:  2020-04-16       Impact factor: 11.205

5.  Identification and Tissue Expression Profiles of Odorant Receptor Genes in the Green Peach Aphid Myzus persicae.

Authors:  Jingtao Liu; Jiaoxin Xie; Adel Khashaveh; Jingjiang Zhou; Yongjun Zhang; Hui Dong; Bin Cong; Shaohua Gu
Journal:  Insects       Date:  2022-04-20       Impact factor: 3.139

6.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.

Authors:  Daehwan Kim; Joseph M Paggi; Chanhee Park; Christopher Bennett; Steven L Salzberg
Journal:  Nat Biotechnol       Date:  2019-08-02       Impact factor: 54.908

7.  GenomeScope: fast reference-free genome profiling from short reads.

Authors:  Gregory W Vurture; Fritz J Sedlazeck; Maria Nattestad; Charles J Underwood; Han Fang; James Gurtowski; Michael C Schatz
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

8.  Complete mitochondrial genome of Neotoxoptera formosana (Takahashi, 1921) (Hemiptera: Aphididae), with the phylogenetic analysis.

Authors:  Yan-Fei Song; Hui Zhang; Chen Zeng; Shuai Ye; Mao-Fa Yang; Jian-Feng Liu
Journal:  Mitochondrial DNA B Resour       Date:  2021-05-21       Impact factor: 0.658

9.  InterProScan: protein domains identifier.

Authors:  E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

10.  FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies.

Authors:  Patrick Kück; Gary C Longo
Journal:  Front Zool       Date:  2014-11-18       Impact factor: 3.172

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.