Literature DB >> 29785409

De novo genome and transcriptome resources of the Adzuki bean borer Ostrinia scapulalis (Lepidoptera: Crambidae).

B Gschloessl1, F Dorkeld1, P Audiot1, A Bretaudeau2,3, C Kerdelhué1, R Streiff1.   

Abstract

We present a draft genome assembly with a de novo prediction and automated functional annotation of coding genes, and a reference transcriptome of the Adzuki bean borer, Ostrinia scapulalis, based on RNA sequencing of various tissues and developmental stages. The genome assembly spans 419 Mb, has a GC content of 37.4% and includes 26,120 predicted coding genes. The reference transcriptome holds 33,080 unigenes and contains a high proportion of a set of genes conserved in eukaryotes and arthropods, used as quality assessment of the reconstructed transcripts. The new genomic and transcriptomic data presented here significantly enrich the public sequence databases for the Crambidae and Lepidoptera, and represent useful resources for future researches related to the evolution and the adaptation of phytophagous moths. The genome and transcriptome assemblies have been deposited and made accessible via a NCBI BioProject (id PRJNA390510) and the LepidoDB database (http://bipaa.genouest.org/sp/ostrinia_scapulalis/).

Entities:  

Keywords:  Crambidae; De novo assembly; Gene prediction; Genome; Lepidoptera; Transcriptome

Year:  2018        PMID: 29785409      PMCID: PMC5958680          DOI: 10.1016/j.dib.2018.01.073

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Genome and Transcriptome Value of the data The draft genome represents the first available genome assembly for O. scapulalis. The reference transcriptome of O. scapulalis will allow comparative expression studies. The new genomic and transcriptomic data enrich the public sequence databases for the Crambidae and Lepidoptera. The data represent pangenomic resources for future researches related to the evolution and the adaptation of phytophagous moths.

Data

The Adzuki bean borer, Ostrinia scapulalis (hereafter OSCA), is a palaearctic phytophagous moth feeding on various dicotyledons, including hop (Humulus lupulus), mugwort (Artemisia vulgaris) and hemp (Cannabis sativa) [1]. In Europe, it partly co-occurs with its sibling species, the European corn borer, Ostrinia nubilalis, which is a major pest of maize (Zea mays). Previous studies demonstrated that O. scapulalis and O. nubilalis are specialized to their respective host plants [2], [3], [4], [5], [6], [7] and that their genetic divergence is rather low so that they can be considered as sibling species [8]. Yet, a few genomic sequences and rearrangements are much more divergent than the rest of the genomic background [1], [9], [10], [11]. These genomic regions are of particular interest to understand the divergence process between O. scapulalis and O. nubilalis. To further investigate the host adaptation and divergence between these two sibling species at a pangenomic scale, we have elaborated new genomic and transcriptomic resources consisting of an OSCA draft genome and a related reference transcriptome. The latter extends a published transcriptomic set generated with Roche 454 sequencing technology [12].

Experimental design, materials and methods

De novo draft genome

Diapausing larvae were collected in mugwort stems in 2008 near Amiens (Picardie, France) and stored in 95% ethanol at −20 °C. Whole genomic DNA extracts were obtained from a CTAB-based method [13]. DNA quality and integrity was evaluated through migration on an agarose gel and nanodrop technology. The sex of each sampled larvae was determined with a molecular coamplification of markers specific to each heterochromosome (Z and W in Lepidoptera) as described in Orsucci et al. [6]. Only samples of the ZZ homogametic sex (males in Lepidoptera), were retained for the libraries construction. A 2 × 100 bp shot-gun paired-end library and a 3 (2 × 50 bp) and an 8 kb (2 × 100 bp) mate-pair library were generated using the DNA extract of one larva for each library and the Illumina TruSeq TM and Nextera Mate Pair Library Preparation kits, respectively. All libraries were sequenced by LGC Genomics GmbH (Berlin, Germany) on an Illumina HiSeq 2000 platform using the paired-end protocol. Between 234 and 351 million DNA raw reads were generated per library (Table 1). Assembly and scaffolding of the cleaned reads were done with the software Allpaths-LG [14], followed by GapCloser for closing gaps in the assembled scaffolds. The resulting scaffolding was then further improved by integrating independent reconstructed transcript data after RNA sequencing (see Supplementary material for details). The OSCA v1.2 draft genome assembly consisted of 50,738 scaffolds, representing 419 Mb with a mean read coverage of 50 reads per base (Table 2). N50 and N90 were 29,308 and 3051 bp, respectively. Scaffold lengths ranged between 883 bp and 619.8 kb. The genome assembly had a GC content of 37.4% and a proportion of repeated elements of 16.6% (Table 3). A total of 8372 short duplicated regions (average length: 1813 bp, min.: 808 bp, max.: 11,745 bp) were identified on 7009 scaffolds. The quality of the OSCA v1.2 genome assembly was evaluated by the recovery rate of three sets of genes highly conserved in eukaryotes and arthropods [15], [16]. Among both eukaryotic gene sets, 61% of the CEGMA genes and 70% of the BUSCO genes were recovered in the OSCA v1.2 genome, while 51% of the BUSCO arthropod conserved genes could be identified. Using the MAKER pipeline [see Supplementary material, [17]] on the OSCA v1.2 nuclear genome 26,120 coding genes were predicted (Table 4). Of these coding genes, 80.3% could be functionally annotated. Furthermore, 19,023 OSCA genes were assigned to 9785 ortholog groups (Fig. 1) of which 93% were shared with at least one of the three Lepidoptera species Bombyx mori, Danaus plexippus or Spodoptera frugiperda.
Table 1

Statistic features of sequence reads issued from different genomic libraries and used for the OSCA v1.2 genome assembly.

LibraryPE300Mate3kbMate8kb
Raw read count350,636,628234,843,432295,988,124
Read count after clippinga349,359,410233,657,586287,067,720
Total Size [Gb]34.511.923.9
Minimum Length [bp]202020
Maximum Length [bp]10051100
Mean Length [bp]995183
Expected insert size [bp]b30025007500
Mapped on OSCA v1.2 genome207,000,258101,565,815146,139,301
Mapped as paired-end on OSCA v1.2 genome170,567,84816,877,19431,116,832

Only paired-end reads taken into account

Based on in silico measurements

Table 2

Genome features of the OSCA v1.2 assembly. The coverage is defined as the average read count per assembled bp.

OSCA v1.2 assembly
Contig count163,703
Scaffold (scf) count50,738
N50 scf [bp]29,308
N50 scf sequence count3395
N90 scf [bp]3051
N90 scf sequence count21,872
Minimum scf length [bp]883
Maximum scf length [kb]619.8
Illumina PE300 coverage [reads/bp]49.7
GC content [%]37.4
N base content in assembly [%]30.9
Total Length [Mb]419.2
Total Length without Ns [Mb]289.8
GenomeScope PE genome size estimation [Mb]302.9
CEGMA identified [%] (count of 248)152 (61.3)
CEGMA full-length [%] (count of 248)83 (33.5)
BUSCO2 euk identified [%] (count of 303)211 (69.6)
BUSCO2 euk full-length [%] (count of 303)156 (51.5)
BUSCO2 arthropod identified [%] (count of 2675)1363 (50.9)
BUSCO2 arthropod full-length [%] (count of 2675)842 (31.5)
Table 3

Number of repeated elements found in the OSCA v1.2 draft genome assembly and corresponding genome ratio.

FamilyFragmentsTotal length [Mb]% of genome
LTR413,39345.810.9
LINE95,50611.82.8
SINE34,6733.80.9
DNA64,64681.9
Total608,21869.416.6
Table 4

Characteristics of the different transcriptome assemblies and the genes predicted from the genome.

HiSeq transcriptome454 transcriptomeMAKER genes
Raw read count325,008,948322,504N/A
Cleaned read count267,359,188287,429N/A
Mapped Reads198,962,467145,588N/A
Transcriptome size [Mb]49.210.421.6
Coverage (mean read count per bp)339.29.8N/A
Transcript count44,56411,23126,120
Unigene count33,080889226,120
Mean CDS/transcript length [bp]1103922829
Median transcript length [bp]591693498
N50 transcript length [bp]200610361296
N50 sequence count706128584475
N90 transcript length [bp]429485353
N90 sequence count26,722873617,539
Minimum length20111166
Maximum length27,55910,99153,685
CEGMA identified [%] (count of 248)209 (84.3)56 (22.6)N/A
CEGMA full-length [%] (count of 248)189 (76.2)22 (8.9)N/A
BUSCO2 euk identified [%] (count of 303)268 (88.4)70 (23.1)226 (74.6)
BUSCO2 euk full-length [%] (count of 303)256 (84.5)17 (5.6)162 (53.5)
BUSCO2 arthropod identified [%] (count of 2675)1892 (70.7)265 (9.9)1453 (54.3)
BUSCO2 arthropod full-length [%] (count of 2675)2109 (78.8)119 (4.4)923 (34.5)
Transcripts with predicted CDS (%)18,494 (41.5)4016 (35.8)26,120 (100)
Transcripts with full-length CDS (%)12,515 (28.1)1826 (16.3)26,120 (100)
Located on OSCA v1.2 genome [count] (%)11,010 (24.7)3331 (29.7)26,120 (100)
Split on two OSCA v1.2 scaffolds [count] (%)11,721 (26.3)2350 (20.9)0 (0)
Fig. 1

Venn diagram showing all OrthoMCL ortholog groups among the MAKER-predicted O. scapulalis proteins (OSCA) and the proteomes of Spodoptera frugiperda (SFRU), Bombyx mori (BMOR), Danaus plexippus (DPLE) and Drosophila melanogaster (DMEL).

Venn diagram showing all OrthoMCL ortholog groups among the MAKER-predicted O. scapulalis proteins (OSCA) and the proteomes of Spodoptera frugiperda (SFRU), Bombyx mori (BMOR), Danaus plexippus (DPLE) and Drosophila melanogaster (DMEL). Statistic features of sequence reads issued from different genomic libraries and used for the OSCA v1.2 genome assembly. Only paired-end reads taken into account Based on in silico measurements Genome features of the OSCA v1.2 assembly. The coverage is defined as the average read count per assembled bp. Number of repeated elements found in the OSCA v1.2 draft genome assembly and corresponding genome ratio. Characteristics of the different transcriptome assemblies and the genes predicted from the genome.

De novo transcriptome

In March 2011, diapausing larvae were collected in mugwort stems from Nadarzin (Poland) and then reared in the laboratory to obtain fresh tissues from the following developmental stages: eggs and larval whole body and hemolymph from the fifth instar. At the adult stage we sampled and separated heads/thorax from abdomens and males from females. In total, we prepared 7 RNA extracts corresponding to these various stages and tissues (Table 5). RNA quality and concentration were evaluated using the RNA 6000 Nano kit with an Agilent 2100 Bioanalyser (Agilent Technologies, Palo Alto, CA, USA). Indexed cDNA libraries with an insert size of 150–200 bp were constructed for each developmental stage and tissue extract using the Illumina TruSeq RNA sample preparation kit. Subsequently, the libraries were sequenced by an Illumina HiSeq 2000 System at GATC Biotech (Konstanz, Germany) using the paired-end protocol. Between 32 and 59 million 100 bp raw reads were generated per library (Table 5). After a series of read cleaning and normalization steps, transcripts were reconstructed with Trinity, CD-HIT-EST and CAP3 [see Supplementary material, [[18], [19], [20]]]. The de novo transcriptome assembly had an overall size of 49.2 Mb and resulted in 44,564 transcripts, grouped into 33,080 unigenes (Table 4). Transcript lengths ranged from 201 to 27,559 bp. The N50 and N90 lengths were 2006 and 429 bp, respectively. Regarding the conserved eukaryotic gene sets, 84% of the CEGMA and 88% of the BUSCO genes were identified. Furthermore, 71% of the conserved arthropod BUSCO genes were present within the reference transcriptome. Coding sequences (CDS) were predicted with FrameDP [21] for 18,494 transcripts of which 12,515 were complete. An additional analysis of the 26,070 transcripts without predicted CDS with FEELnc [22] identified 10,883 potential long non-coding RNAs and sequences for which the CDS was either too fragmented (n = 15,132) or not present at all (n = 55). Further comparative analysis of the reference transcripts with the Lepbase [23] reference protein set detected probable homologs for 20,835 transcripts. OHR analyzes [12] on the best matches indicated that 61% of the CDS were reconstructed at least at 60% of the corresponding reference lepidopteran protein homolog, whereas 43% of the transcript CDS were assembled at full length.
Table 5

Developmental stages and tissues of 7 RNA extracts issued from F1 individuals obtained in the laboratory after rearing diapausing larvae collected in the field.

Extract/library IDDevelopmental stageTissuenRaw read count
Lib1eggwhole egg3 egg masses (ca. 60 eggs)58,941,438
Lib2L5whole body954,020,124
Lib3L5hemolymph3143,406,048
Lib4Female adultHead/thorax443,465,884
Lib5Female adultAbdomen454,635,026
Lib6Male adultHead/thorax438,226,332
Lib7Male adultAbdomen432,314,096
Total325,008,948
Developmental stages and tissues of 7 RNA extracts issued from F1 individuals obtained in the laboratory after rearing diapausing larvae collected in the field.
Subject areaBiology
More specific subject areaLepidoptera, Genomics
Type of dataDNA and cDNA sequence reads, genome assembly and transcript assembly
How data was acquiredShotgun whole genome and cDNA sequencing using Illumina HiSeq 2000
Data formatAnalyzed: i.e. raw data and assembled sequences
Experimental factorsGenome: total DNA extraction from male larvae of wild samples
Transcriptome: total RNA extraction from various tissues, developmental stages and of males and females
Experimental featuresGenome: DNA sequencing
Transcriptome: RNA sequencing of various tissues from eggs to adults, controlled conditions
Data source locationGenome: Amiens, Picardie/France (49°54′0.01′″N, 2°18′0″E)
Transcriptome: Nadarzin, Poland (52°4′2.05″N, 20°47′33.00″E)
Data accessibilityAll raw sequence reads are accessible as NCBI BioProject (id PRJNA390510). The OSCA v1.2 draft genome assembly, the reference transcriptome assembly and automatic functional annotations can be found in the LepidoDB database (http://bipaa.genouest.org/sp/ostrinia_scapulalis/).
  20 in total

1.  Genetic structure and gene flow in French populations of two Ostrinia taxa: host races or sibling species?

Authors:  T Malausa; A Dalecky; S Ponsard; P Audiot; R Streiff; Y Chaval; D Bourguet
Journal:  Mol Ecol       Date:  2007-09-06       Impact factor: 6.185

2.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

3.  A recombination suppressor contributes to ecological speciation in OSTRINIA moths.

Authors:  C B Wadsworth; X Li; E B Dopman
Journal:  Heredity (Edinb)       Date:  2015-01-28       Impact factor: 3.821

4.  Scanning the European corn borer (Ostrinia spp.) genome for adaptive divergence between host-affiliated sibling species.

Authors:  Afiwa Midamegbe; Renaud Vitalis; Thibaut Malausa; Emilie Delava; Sandrine Cros-Arteil; Réjane Streiff
Journal:  Mol Ecol       Date:  2011-03-07       Impact factor: 6.185

5.  Genetic isolation between two sympatric host plant races of the European corn borer, Ostrinia nubilalis Hubner. II: assortative mating and host-plant preferences for oviposition.

Authors:  M-T Bethenod; Y Thomas; F Rousset; B Frérot; L Pélozuelo; G Genestier; D Bourguet
Journal:  Heredity (Edinb)       Date:  2005-02       Impact factor: 3.821

6.  FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.

Authors:  Valentin Wucher; Fabrice Legeai; Benoît Hédan; Guillaume Rizk; Lætitia Lagoutte; Tosso Leeb; Vidhya Jagannathan; Edouard Cadieu; Audrey David; Hannes Lohi; Susanna Cirera; Merete Fredholm; Nadine Botherel; Peter A J Leegwater; Céline Le Béguec; Hille Fieten; Jeremy Johnson; Jessica Alföldi; Catherine André; Kerstin Lindblad-Toh; Christophe Hitte; Thomas Derrien
Journal:  Nucleic Acids Res       Date:  2017-05-05       Impact factor: 16.971

7.  Differences in oviposition behaviour of two sympatric sibling species of the genus Ostrinia.

Authors:  T Malausa; B Pélissié; V Piveteau; C Pélissier; D Bourguet; S Ponsard
Journal:  Bull Entomol Res       Date:  2008-02-07       Impact factor: 1.750

8.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

9.  De novo transcriptomic resources for two sibling species of moths: Ostrinia nubilalis and O. scapulalis.

Authors:  Bernhard Gschloessl; Emmanuelle Beyne; Philippe Audiot; Denis Bourguet; Réjane Streiff
Journal:  BMC Res Notes       Date:  2013-02-28

10.  When history repeats itself: exploring the genetic architecture of host-plant adaptation in two closely related lepidopteran species.

Authors:  Hermine Alexandre; Sergine Ponsard; Denis Bourguet; Renaud Vitalis; Philippe Audiot; Sandrine Cros-Arteil; Réjane Streiff
Journal:  PLoS One       Date:  2013-07-12       Impact factor: 3.240

View more
  2 in total

1.  New WGS data and annotation of the heterosomal vs. autosomal localization of Ostrinia scapulalis (Lepidoptera, Crambidae) nuclear genomic scaffolds.

Authors:  Louise Brousseau; Sabine Nidelet; Réjane Streiff
Journal:  Data Brief       Date:  2018-08-09

2.  Integrated miRNA and transcriptome profiling to explore the molecular determinism of convergent adaptation to corn in two lepidopteran pests of agriculture.

Authors:  Sylvie Gimenez; Imène Seninet; Marion Orsucci; Philippe Audiot; Nicolas Nègre; Kiwoong Nam; Réjane Streiff; Emmanuelle d'Alençon
Journal:  BMC Genomics       Date:  2021-08-09       Impact factor: 3.969

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.