| Literature DB >> 29785409 |
B Gschloessl1, F Dorkeld1, P Audiot1, A Bretaudeau2,3, C Kerdelhué1, R Streiff1.
Abstract
We present a draft genome assembly with a de novo prediction and automated functional annotation of coding genes, and a reference transcriptome of the Adzuki bean borer, Ostrinia scapulalis, based on RNA sequencing of various tissues and developmental stages. The genome assembly spans 419 Mb, has a GC content of 37.4% and includes 26,120 predicted coding genes. The reference transcriptome holds 33,080 unigenes and contains a high proportion of a set of genes conserved in eukaryotes and arthropods, used as quality assessment of the reconstructed transcripts. The new genomic and transcriptomic data presented here significantly enrich the public sequence databases for the Crambidae and Lepidoptera, and represent useful resources for future researches related to the evolution and the adaptation of phytophagous moths. The genome and transcriptome assemblies have been deposited and made accessible via a NCBI BioProject (id PRJNA390510) and the LepidoDB database (http://bipaa.genouest.org/sp/ostrinia_scapulalis/).Entities:
Keywords: Crambidae; De novo assembly; Gene prediction; Genome; Lepidoptera; Transcriptome
Year: 2018 PMID: 29785409 PMCID: PMC5958680 DOI: 10.1016/j.dib.2018.01.073
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Statistic features of sequence reads issued from different genomic libraries and used for the OSCA v1.2 genome assembly.
| Raw read count | 350,636,628 | 234,843,432 | 295,988,124 |
| Read count after clipping | 349,359,410 | 233,657,586 | 287,067,720 |
| Total Size [Gb] | 34.5 | 11.9 | 23.9 |
| Minimum Length [bp] | 20 | 20 | 20 |
| Maximum Length [bp] | 100 | 51 | 100 |
| Mean Length [bp] | 99 | 51 | 83 |
| Expected insert size [bp] | 300 | 2500 | 7500 |
| Mapped on OSCA v1.2 genome | 207,000,258 | 101,565,815 | 146,139,301 |
| Mapped as paired-end on OSCA v1.2 genome | 170,567,848 | 16,877,194 | 31,116,832 |
Only paired-end reads taken into account
Based on in silico measurements
Genome features of the OSCA v1.2 assembly. The coverage is defined as the average read count per assembled bp.
| OSCA v1.2 assembly | |
|---|---|
| Contig count | 163,703 |
| Scaffold (scf) count | 50,738 |
| N50 scf [bp] | 29,308 |
| N50 scf sequence count | 3395 |
| N90 scf [bp] | 3051 |
| N90 scf sequence count | 21,872 |
| Minimum scf length [bp] | 883 |
| Maximum scf length [kb] | 619.8 |
| Illumina PE300 coverage [reads/bp] | 49.7 |
| GC content [%] | 37.4 |
| N base content in assembly [%] | 30.9 |
| Total Length [Mb] | 419.2 |
| Total Length without Ns [Mb] | 289.8 |
| GenomeScope PE genome size estimation [Mb] | 302.9 |
| CEGMA identified [%] (count of 248) | 152 (61.3) |
| CEGMA full-length [%] (count of 248) | 83 (33.5) |
| BUSCO2 euk identified [%] (count of 303) | 211 (69.6) |
| BUSCO2 euk full-length [%] (count of 303) | 156 (51.5) |
| BUSCO2 arthropod identified [%] (count of 2675) | 1363 (50.9) |
| BUSCO2 arthropod full-length [%] (count of 2675) | 842 (31.5) |
Number of repeated elements found in the OSCA v1.2 draft genome assembly and corresponding genome ratio.
| LTR | 413,393 | 45.8 | 10.9 |
| LINE | 95,506 | 11.8 | 2.8 |
| SINE | 34,673 | 3.8 | 0.9 |
| DNA | 64,646 | 8 | 1.9 |
| Total | 608,218 | 69.4 | 16.6 |
Characteristics of the different transcriptome assemblies and the genes predicted from the genome.
| HiSeq transcriptome | 454 transcriptome | MAKER genes | |
|---|---|---|---|
| Raw read count | 325,008,948 | 322,504 | |
| Cleaned read count | 267,359,188 | 287,429 | |
| Mapped Reads | 198,962,467 | 145,588 | |
| Transcriptome size [Mb] | 49.2 | 10.4 | 21.6 |
| Coverage (mean read count per bp) | 339.2 | 9.8 | |
| Transcript count | 44,564 | 11,231 | 26,120 |
| Unigene count | 33,080 | 8892 | 26,120 |
| Mean CDS/transcript length [bp] | 1103 | 922 | 829 |
| Median transcript length [bp] | 591 | 693 | 498 |
| N50 transcript length [bp] | 2006 | 1036 | 1296 |
| N50 sequence count | 7061 | 2858 | 4475 |
| N90 transcript length [bp] | 429 | 485 | 353 |
| N90 sequence count | 26,722 | 8736 | 17,539 |
| Minimum length | 201 | 111 | 66 |
| Maximum length | 27,559 | 10,991 | 53,685 |
| CEGMA identified [%] (count of 248) | 209 (84.3) | 56 (22.6) | |
| CEGMA full-length [%] (count of 248) | 189 (76.2) | 22 (8.9) | |
| BUSCO2 euk identified [%] (count of 303) | 268 (88.4) | 70 (23.1) | 226 (74.6) |
| BUSCO2 euk full-length [%] (count of 303) | 256 (84.5) | 17 (5.6) | 162 (53.5) |
| BUSCO2 arthropod identified [%] (count of 2675) | 1892 (70.7) | 265 (9.9) | 1453 (54.3) |
| BUSCO2 arthropod full-length [%] (count of 2675) | 2109 (78.8) | 119 (4.4) | 923 (34.5) |
| Transcripts with predicted CDS (%) | 18,494 (41.5) | 4016 (35.8) | 26,120 (100) |
| Transcripts with full-length CDS (%) | 12,515 (28.1) | 1826 (16.3) | 26,120 (100) |
| Located on OSCA v1.2 genome [count] (%) | 11,010 (24.7) | 3331 (29.7) | 26,120 (100) |
| Split on two OSCA v1.2 scaffolds [count] (%) | 11,721 (26.3) | 2350 (20.9) | 0 (0) |
Fig. 1Venn diagram showing all OrthoMCL ortholog groups among the MAKER-predicted O. scapulalis proteins (OSCA) and the proteomes of Spodoptera frugiperda (SFRU), Bombyx mori (BMOR), Danaus plexippus (DPLE) and Drosophila melanogaster (DMEL).
Developmental stages and tissues of 7 RNA extracts issued from F1 individuals obtained in the laboratory after rearing diapausing larvae collected in the field.
| Lib1 | egg | whole egg | 3 egg masses ( | 58,941,438 |
| Lib2 | L5 | whole body | 9 | 54,020,124 |
| Lib3 | L5 | hemolymph | 31 | 43,406,048 |
| Lib4 | Female adult | Head/thorax | 4 | 43,465,884 |
| Lib5 | Female adult | Abdomen | 4 | 54,635,026 |
| Lib6 | Male adult | Head/thorax | 4 | 38,226,332 |
| Lib7 | Male adult | Abdomen | 4 | 32,314,096 |
| Total | 325,008,948 |
| Subject area | Biology |
| More specific subject area | Lepidoptera, Genomics |
| Type of data | DNA and cDNA sequence reads, genome assembly and transcript assembly |
| How data was acquired | Shotgun whole genome and cDNA sequencing using Illumina HiSeq 2000 |
| Data format | Analyzed: |
| Experimental factors | Genome: total DNA extraction from male larvae of wild samples |
| Transcriptome: total RNA extraction from various tissues, developmental stages and of males and females | |
| Experimental features | Genome: DNA sequencing |
| Transcriptome: RNA sequencing of various tissues from eggs to adults, controlled conditions | |
| Data source location | Genome: Amiens, Picardie/France (49°54′0.01′″N, 2°18′0″E) |
| Transcriptome: Nadarzin, Poland (52°4′2.05″N, 20°47′33.00″E) | |
| Data accessibility | All raw sequence reads are accessible as NCBI BioProject (id PRJNA390510). The OSCA v1.2 draft genome assembly, the reference transcriptome assembly and automatic functional annotations can be found in the LepidoDB database ( |