| Literature DB >> 35242913 |
Maria Sorokina1, Emanuel Barth2, Mahnoor Zulfiqar1, Michiel Kwantes1, Georg Pohnert1, Christoph Steinbeck1.
Abstract
Diatoms (Bacillariophyceae) are a major constituent of the phytoplankton and have a universally recognized ecological importance. Between 1,000 and 1,300 diatom genera have been described in the literature, but only 10 nuclear genomes have been published and made available to the public up to date. Skeletonema costatum is a cosmopolitan marine diatom, principally occurring in coastal regions, and is one of the most abundant members of the Skeletonema genus. Here we present a draft assembly of the Skeletonema cf. costatum RCC75 genome, obtained from PacBio and Illumina NovaSeq data. This dataset will expand the knowledge of the Bacillariophyceae genetics and contribute to the global understanding of phytoplankton's physiological, ecological, and environmental functioning.Entities:
Keywords: Algal genome; Bacillariophyceae; Diatoms; Genome sequencing; Illumina sequencing; PacBio sequencing; Skeletonema costatum
Year: 2022 PMID: 35242913 PMCID: PMC8866145 DOI: 10.1016/j.dib.2022.107931
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Bright-field light microscopy image of an S. costatum RCC75 filament consisting of five cells. For the upper pair of cells, the connecting processes are indicated by triangles. Scale bar, 20 µm.
Genome assembly statistics from Quast analysis.
| # contigs | 1282 |
| # contigs (> = 1,000 bp) | 1,242 |
| # contigs (> = 50,000 bp) | 304 |
| Total length | 51,134,913 |
| Total length (> = 1,000 bp) | 51,104,503 |
| Total length (>= 5000 bp) | 50,448,718 |
| Total length (>= 25000 bp) | 43,834,615 |
| Total length (>= 50000 bp) | 36,634,768 |
| Largest contig | 756,974 |
| N50 | 97,960 |
| N75 | 42,259 |
| L50 | 147 |
| L75 | 342 |
| GC (%) | 45.13 |
| # N's | 2,800 |
| # N's per 100 kbp | 5.48 |
| Predicted genes | |
| # predicted genes (unique) | 27,770 |
| # predicted genes (>= 0 bp) | 28,308 + 79 part |
| # predicted genes (>= 300 bp) | 24,999 + 75 part |
| # predicted genes (>= 1500 bp) | 7,002 + 18 part |
| # predicted genes (>= 3000 bp) | 1,487 + 6 part |
| Subject | Omics |
| Specific Subject Area | Genomics |
| Type of Data | Table, Raw data, genome sequences in Fasta format |
| How the data was acquired | Genome sequence was acquired using Pacbio Sequel I and Illumina NovaSeq PE150 |
| Data Format | Raw, analysed and filtered data |
| Description of Data Collection | The strain RCC75 was grown in a seawater medium for 10 days. Later it was split into four samples which were used for DNA Extraction and sequencing. |
| Data Source Location | Institute: Roscoff Culture Collection |
| Data Accessibility | This Whole Genome Sequencing project has been deposited at DDBJ/ENA/GenBank under the accession number |