| Literature DB >> 35024395 |
Xin Jie Ching1, Nazalan Najimudin2, Yoke Kqueen Cheah3, Clemente Michael Vui Ling Wong1.
Abstract
Parageobacillus caldoxylosilyticus, or previously identified as Geobacillus caldoxylosilyticus, is a thermophilic Gram-positive bacterium which can easily withstand growth temperatures ranging from 40 °C to 70 °C. Here, we present the first complete genome sequence of Parageobacillus caldoxylosilyticus ER4B which was isolated from an empty oil palm fruit bunch compost in Malaysia. Whole genome sequencing was performed using the PacBio RSII platform. The genome size of strain ER4B was around 3.9Mbp, with GC content of 44.31%. The genome consists of two contigs, in which the larger contig (3,909,276bp) represents the chromosome, while the smaller one (54,250bp) represents the plasmid. A total of 4,164 genes were successfully predicted, including 3,972 protein coding sequences, 26 rRNAs, 91 tRNAs, 74 miscRNA, and 1 tmRNA. The genome sequence data of strain ER4B reported here may contribute to the current molecular information of the species. It may also facilitate the discovery of molecular traits related to thermal stress, thus, expanding our understanding in the acclimation or adaptation towards extreme temperature in bacteria.Entities:
Keywords: Complete whole genome sequence; Parageobacillus caldoxylosilyticus; Thermal stress; Thermophile
Year: 2021 PMID: 35024395 PMCID: PMC8724967 DOI: 10.1016/j.dib.2021.107764
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Genomic features of P. caldoxylosilyticus ER4B.
| Features | Value |
|---|---|
| Contigs no. | 2 |
| Genome size (bp) | 3,963,526 |
| Chromosome size (bp) | 3,909,276 |
| Plasmid size (bp) | 54,250 |
| GC content (%) | 44.32 |
| Total number of genes | 4,164 |
| Protein coding sequences (CDS) | 3,972 |
| Genes with predicted function | 2,933 |
| Hypothetical genes | 1,039 |
| rRNA | 26 |
| tRNA | 91 |
| miscRNA | 74 |
| tmRNA | 1 |
C represents chromosome; P represents plasmid.
Fig. 1Genome map of ER4B was constructed using DNAPlotter. From the outer track: 1st track represents total annotated genes, 2nd track represents forward CDS, 3rd track represents reverse CDS, 4th track represents rRNA, 5th track represents tRNA, 6th track represents miscellaneous RNA (miscRNA), 7th track represents tmRNA, 8th track represents GC plot, and the last track represents GC skew. Major tick marks interval was set at 1/10th of the overall genome size, which is 396,352bp, so 0 represents both the beginning and the ending of the sequence.
Fig. 2Whole genome phylogenetic tree constructed by PhyloSift, using Maximum Likelihood method based on Generalised Time-Reversible (GTR) model. The tree shows the close relationship between P. caldoxylosilyticus ER4B with the close related species, while E. coli K-12 substr. MG1655 is included to serve as an outgroup.
Fig. 3Functional distribution of genes within the P. caldoxylosilyticus ER4B genome classified by clusters of orthologous groups (COG). COG in red box refers to major category “information storage and processing”; green box refers to “cellular processes and signaling”; blue box refers to “metabolism”; and yellow bow refers to “poorly characterized”.
Number of gene copies for thermal stress related proteins in P. Caldoxylosilyticus ER4B.
| Thermal stress related proteins | Number of copies |
|---|---|
| Cold shock protein CspB | 3 |
| Chaperone protein DnaJ | 1 |
| Chaperone protein DnaK | 1 |
| Chaperone protein ClpB | 1 |
| Heat shock protein 60 co-chaperone GroES | 1 |
| Heat shock protein 60 family chaperone GroEL | 1 |
| Heat shock protein GrpE | 1 |
| Heat-inducible transcription repressor HrcA | 1 |
| small heat shock protein | 6 |
| General stress protein | 9 |
| Universal stress protein | 1 |
| Putative SOS response-associated peptidase YedK | 1 |
| Recombinase A (RecA) | 1 |
| LexA repressor | 1 |
| Catalases | 2 |
| Peroxiredoxin and Peroxidase | 7 |
| Superoxide dismutase | 3 |
| Thioredoxins | 9 |
Small heat shock proteins include HSP15, HSP18, HSP20, HSP31, and HSP33.
| Subject | Biology |
| Specific subject area | Microbiology and Genomics |
| Type of data | Table |
| How data were acquired | Whole genome sequence of |
| Data format | Raw and Analyzed |
| Parameters for data collection | Pure culture of strain ER4B was grown in Lennox Broth (LB) at its’ optimal growth temperature 64 °C and the genomic DNA was extracted when the culture reaches mid log phase. |
| Description of data collection | The genomic DNA was sequenced using PacBio RSII, while subsequent genome assembly and annotation was done using Canu (V1.6) and Prokka (v1.12) respectively. |
| Data source location | |
| Data accessibility | The complete genome sequence of |