| Literature DB >> 35198665 |
Grace Joy Wei Lie Chin1,2, Salley Venda Law1, Kenneth Francis Rodrigues1,2, Jaeyres Jani3, Ann Anton2.
Abstract
The dataset comprises a whole-genome sequence of Ruegeria sp. PBVC088, a symbiotic (Gram-negative) bacterium associated with Pyrodinium bahamense var. compressum, which has been associated with harmful algal blooms in the coastal waters of west Sabah, Malaysia. Harmful algal blooms contribute to economic losses for the aquaculture industry, as well as human illnesses and fatalities due to paralytic shellfish poisoning. Bacteria-algae dynamics have posited that the interaction is potentially responsible for the toxin production during a toxic harmful algal bloom event. Despite the expanding body of literature on the capabilities of these bacteria to metabolize, produce, and modify toxins autonomously, it has yet to be confirmed that these toxin-producing bacteria are capable of autonomous toxin synthesis. Saxitoxin, a paralytic shellfish poisoning toxin, is produced by a unique biosynthetic pathway, where the genetic basis for the saxitoxin production was first reported in the saxitoxin-producing cyanobacteria strain Cylindrospermopsis raciborskii T3 (NCBI accession no. DQ787200). The genes responsible for saxitoxin biosynthesis in dinoflagellates, have yet to be fully elucidated. The identification of cyanobacteria saxitoxin biosynthesis genes (sxt) may eventually lead to the identification of homologous genes within the dinoflagellates. Previous studies on the diversity of the bacterial communities associated with the same toxic P. bahamense harmful alga has been carried out by using both the culture-dependent 16S ribosomal RNA gene sequence analysis and culture-independent 16S metagenomic sequence analysis. This study extends the knowledge pertaining to the genomic aspect of an associated bacterium isolated from P. bahamense alga by adopting a whole genome sequencing approach. Here, we report the genome sequencing, de novo assembly, and annotation data of a bacterium, Ruegeria sp. PBVC088, associated with harmful alga P. bahamense, which can be referenced by researchers to identify the genes and pathways related to toxin biosynthesis from a much larger data set. The genome of Ruegeria sp. PBVC088 was sequenced using the Illumina MiSeq platform with 250 bp paired-end reads. The number of reads generated from the MiSeq sequencer was 1,135,484, with an estimated coverage of 100X. The estimated genome size for the marine bacterium was computed to be 5.78 Mb. Annotation of the genome predicted 5,689 gene sequences, which were assigned putative functions based on homology to existing protein sequences in public databases. In addition, annotation of genes related to saxitoxin biosynthesis pathway was also performed. Raw fastq reads and the final version of the genome assembly have been deposited in the National Center for Biotechnology Information (NCBI) (BioProject: PRJNA324753, WGS: LZNT00000000, SRA: SRR3646181). The genome data provided here are expected to better understand the genetic processes involved in saxitoxin biosynthesis in marine bacteria associated with dinoflagellates.Entities:
Keywords: Bacteria association; Harmful algal bloom; Illumina MiSeq; Marine bacteria; Saxitoxin
Year: 2022 PMID: 35198665 PMCID: PMC8844759 DOI: 10.1016/j.dib.2022.107881
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Vital statistics of the draft genome for Ruegeria sp. bacterium PBVC088.
| Raw data | 1,135,484 (542.5 M bases) |
|---|---|
| Number of contigs | 143 |
| Average length of contigs | 40,437 bp |
| Smallest contig | 507 bp |
| Largest contig | 403,173 bp |
| Contig N50 | 149,793 bp |
| GC content | 64.96% |
| Total genes | 5,689 |
| Total CDS | 5,632 |
| Number of coding genes | 5,435 |
| Number of coding CDS | 5,435 |
| Number of tRNA | 49 |
| Number of rRNA | 5 |
| Number ncRNA | 3 |
Fig. 1Output summary of functional distribution of protein-coding genes in the PBVC088 from RAST server.
BLASTP sequence similarity search of genome gainst the 26 putative sxt genes of STX-producing cyanobacteria, C. raciborskii T3 and their accesion numbers.
| Genes | Candidate genes | Annotated genes | E-value | Similarity percentage (%) | Alignemnt length (base) | Putative sxt genes in C. raciborskii T3 | Accession number |
|---|---|---|---|---|---|---|---|
| DTG_03375 | 5-aminolevulinate synthase | 7.00 E−32 | 29.33 | 358 | Polyketide synthase | ABI75094 | |
| DTG_00776 | Cytidine deaminase | 0.002 | 37.36 | 91 | Cytidine deaminase | ABI175093 | |
| Unidentified | |||||||
| DTG_04183 | Multidrug-efflux transporter | 1.00 E-37 | 28.71 | 404 | Sodium-driven multidrug and toxic compound extrusion protein | ABI75096 | |
| 6.00 E−40 | 31.31 | 444 | ABI75103 | ||||
| Unidentified | |||||||
| DTG_05121 | 3-ketosteroid-9-alpha-hydroxylase oxygenase subunit | 1.00 E−33 | 27.35 | 340 | Phenylpropionate dioxygenase | ABI75098 | |
| 7.00 E−31 | 26.84 | 339 | ABI75109 | ||||
| Unidentified | |||||||
| DTG_00920 | Ectoine hydroxylase | 3.00 E−06 | 22.10 | 181 | Phytanoyl-CoA dioxygenase | ABI75110 | |
| DTG_01373 | Sorbitol dehydrogenase | 6.00 E−42 | 41.54 | 195 | Short-chain alcohol dehydrogenase | ABI75108 | |
| DTG_02991 | L-aspartate oxidase | 3.00 E−17 | 23.81 | 546 | FAD-dependent succinate dehydrogenase/ fumarate reductase | ABI75107 | |
| DTG_02930 | Formate hydrogenlyase complex | 5.00 E−06 | 31.03 | 58 | Ferredoxin | ABI75106 | |
| Unidentified | |||||||
| DTG_02593 | Phosphate regulon sensor protein phoR | 7.00 E−33 | 27.32 | 355 | Histidine kinase | ABI75118 | |
Conserved domains identified in putative candidate genes by BLAST search and CD-search web service.
| Genes | Candidate genes | Conserved domain |
|---|---|---|
| DTG_03375 | Cd06454, KBL_like: pyridoxal phosphate (PLP)-dependent aspartate aminotransferase superfamily (fold I); pfam00155, Aminotransferase class I and II | |
| DTG_00776 | Cd01283, Cytidine deaminase | |
| DTG_04183 | Cd13131, multidrug and toxic compound extrusion (MatE)-like protein domain | |
| DTG_05121 | COG4638, Phenylpropionate dioxygenase | |
| DTG_00920 | COG5285, Phytanoyl-CoA dioxygenase (PhyH) | |
| DTG_01373 | Rossmann-fold NAD(P)(+)-binding proteins | |
| DTG_02991 | pfam02910, Succinate dehydrogenase/ fumarate reductase flavoprotein C-terminal domain | |
| DTG_02930 | Pfam13534, 4Fe-4S ferredoxins-type, iron sulphur binding domain | |
| DTG_02593 | Cd00075, Histidine kinase-like ATPases |
Fig. 2SNP phylogeny tree was constructed using parsimony algorithm-based core SNP matrix inferred from the closed genome of Roseobacter clade from NCBI database and the draft genome of Ruegeria sp. PBVC088 (highlighted in red box) by kSNP analysis. Bootstrap values (100 replicates) are reported above each node.
| Subject | Biological sciences |
| Specific subject area | Biotechnology, Marine Biology and Molecular Biology |
| Type of data | Tables and figures |
| How data were acquired | The whole genome sequencing was conducted on Illumina MiSeq paired-end platform. |
| Data format | Raw sequencing data and analyzed data |
| Description of data collection | Total genomic DNA extraction was performed using the DNeasy Blood and Tissue DNA Isolation Kit following manufacturer's instructions. The gDNA library was subsequently processed with the Illumina Nextera XT Library Preparation Kit following manufacturer's instructions. Paired-end sequencing of the constructed library was performed on an Illumina MiSeq (2 × 250 bp run configuration) at the Biotechnology Research Institute of Universiti Malaysia Sabah. |
| Data source location | The harmful algal bloom seawater samples were collected at Sepanggar Bay, Sabah, Malaysia (6.08° N, 116.12° E). The isolation of the bacterium was performed at the Biotechnology Research Institute of Universiti Malaysia Sabah. |
| Data accessibility | The raw sequencing data is available at BioProject, BioSample and SRA, NCBI at |