Literature DB >> 35372653

Draft genome sequence data of Gordonia hongkongensis strain EUFUS-Z928 isolated from the octocoral Eunicea fusca.

Jeysson Sánchez-Suárez1,2, Luis Díaz1,2, Javier Melo-Bolivar1, Luisa Villamil1.   

Abstract

Octocorals are among the most prolific sources of biologically active compounds. A significant part of their specialized metabolites richness is linked to the abundance of their associated microbiota. Consequently, research on the bioprospecting potential of microorganisms associated with these marine invertebrates has gained much interest. Here, we describe the draft genome of Gordonia hongkongensis strain EUFUS-Z928 isolated from the octocoral Eunicea fusca. The genome was assembled de novo from short-read whole-genome sequencing data. Additionally, functional annotation of predicted genes was performed using the RAST tool kit, including genome mining for specialized metabolite biosynthetic gene clusters using the antiSMASH v6.0 tool. The genome sequence data of G. hongkongensis EUFUS-Z928 can provide information for further analysis of the potential biotechnological use of this microorganism and guide the characterization of other related actinobacterial isolates. Likewise, this information increases the analytical capacity for studying the genus Gordonia.
© 2022 The Author(s).

Entities:  

Keywords:  Actinobacteria; Corynebacteriales; Marine actinomycete; Rare actinobacteria

Year:  2022        PMID: 35372653      PMCID: PMC8971566          DOI: 10.1016/j.dib.2022.108076

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The draft genome data of Gordonia hongkongensis strain EUFUS-Z928 provides valuable information for the study of the evolution of the genus Gordonia and its biotechnological potential. These data are valuable for environmental and clinical microbiology, bioprospecting, and biotechnology researchers. These data can be used for genome mining to discover novel metabolite biosynthesis pathways. Given the potential shown by Gordonia species in bioremediation, these data serve to conduct comparative genomics work further and allow a better understanding of the mechanisms involved in bioremediation processes.

Data Description

The strain EUFUS-Z928 was isolated from the octocoral Eunicea fusca collected in Santa Marta Bay, Colombia. Table 1 shows the results of the de novo and scaffolded genome assembly of the strain EUFUS-Z928. Scaffolding substantially improved the assembly by reducing the number of contigs by 76.23% and leaving an L50 and L75 of 1 (N50=5,295,384). The scaffolding was performed using as reference the genomes of the closest relatives according to the overall genome relatedness indices (OGRI) results obtained on the de novo assembly (Table S1: https://osf.io/q8xus/).
Table 1

Characteristics of the de novo assembly and scaffolded genome of strain EUFUS-Z928.

Featuresde novo Assembled GenomeScaffolded Genome
Genome size (bp)5,329,2215,333,421
Total number of contigs12229
Largest contig (bp)599,9805,295,384
N50 (bp)252,7005,295,384
N75 (bp)105,9225,295,384
L5071
L75151
GC (%)67.9767.96
Characteristics of the de novo assembly and scaffolded genome of strain EUFUS-Z928. The genome-based classification and identification found the strain EUFUS-Z928 to be closely related to Gordonia terrae and Gordonia lacunae type strains (Table 2, Fig. 1A). Phylogeny analysis with the 16S rRNA gene also found a close relationship with Gordonia hongkongensis (Fig. 1B). Finally, phylogenetic analysis with the sequences of the genes coding for protein translocase subunit SecA1 (secA1) and DNA gyrase subunit B (gyrB) allowed classification of strain EUFUS-Z928 as G. hongkongensis (Fig. 1C and D). It is important to clarify that at the time of the analysis, G. honkongensis genomes were not available in the Type Strain Genome Server (TYGS); therefore, it was impossible to include them in the whole genome-based phylogram.
Table 2

Overall genome relatedness indices (OGRI) between EUFUS-Z928 and the closely related type strain genomes.

StraindDDHa (d0, in %)dDDHa (d4, in %)dDDHa (d6, in %)G+CΔb (in %)ANIbc (%)ANImc (%)
G. terrae NRRL B-1628370.8034.5061.500.1587.6888.83
G. terrae NCTC 1066970.8034.5061.400.1587.7088.81
G. terrae NBRC 10001670.4034.4061.100.1287.6988.83
G. lacunae BS269.4035.5061.000.1288.0989.19

digital DNA–DNA hybridization (DDH): formula d0 (length of all high-scoring segment pairs (HSPs) divided by total genome length), formula d4 (sum of all identities found in HSPs divided by overall HSP length), formula d6 (sum of all identities found in HSPs divided by total genome length).

G+C content difference.

Average nucleotide identity based on BLAST (ANIb) and MUMmer (ANIm).

Fig. 1

Phylograms of strain EUFUS-Z928 based on (A) whole-genome sequences, (B) 16S rRNA gene sequences, (C) gyrB gene sequences and (D) secA1 gene sequences. The phylogenetic trees were drawn to scale, with branch lengths measured in the number of substitutions per site. The percentage of bootstrap replicates >50% (out of 100 for whole-genome and out of 1000 for single-gene trees) that supported each node are shown. Genome BLAST Distance Phylogeny approach was used for the whole-genome cladogram using the TYGS server. Single-gene phylogenetic trees were inferred by maximum likelihood with the IQ-TREE algorithm.

Overall genome relatedness indices (OGRI) between EUFUS-Z928 and the closely related type strain genomes. digital DNA–DNA hybridization (DDH): formula d0 (length of all high-scoring segment pairs (HSPs) divided by total genome length), formula d4 (sum of all identities found in HSPs divided by overall HSP length), formula d6 (sum of all identities found in HSPs divided by total genome length). G+C content difference. Average nucleotide identity based on BLAST (ANIb) and MUMmer (ANIm). Phylograms of strain EUFUS-Z928 based on (A) whole-genome sequences, (B) 16S rRNA gene sequences, (C) gyrB gene sequences and (D) secA1 gene sequences. The phylogenetic trees were drawn to scale, with branch lengths measured in the number of substitutions per site. The percentage of bootstrap replicates >50% (out of 100 for whole-genome and out of 1000 for single-gene trees) that supported each node are shown. Genome BLAST Distance Phylogeny approach was used for the whole-genome cladogram using the TYGS server. Single-gene phylogenetic trees were inferred by maximum likelihood with the IQ-TREE algorithm. A total of 5042 genes were annotated in the genome of G. honkongensis EUFUS-Z928 (the complete annotation data can be found in Table S2: https://osf.io/2ra3k/). Of these, 4987 corresponded to coding sequences (CDS), most of them (62.44%) with a functional assignment (Table 3). Additionally, analysis with the antiSMASH v6.0 tool identified 14 biosynthetic gene clusters (BGCs) (Table 3 and Fig. 2), among which NRPS and Terpene had more than 1 cluster (i.e., 4 and 2, respectively).
Table 3

Annotation results of the G. hongkongensis EUFUS-Z928 genome.

FeatureValues
tRNAa47
rRNAa8
CDSa4987
 Hypothetical proteins1873
 Proteins with functional assignments3114
Proteins with GO assignmentsa991
Proteins with Subsystem assignmentsa1640
BGCb14
 Arylpolyene1
 Ectoine1
 NAPAA1
 NRPS4
 NRPS, Betalactone1
 NRPS, Siderophore1
 Redox-cofactor1
 RiPP-like1
 T1PKS,NRPS-like1
 Terpene2

According to the RAST tool kit using the PATRIC service center.

According to the antiSMASH v6.0 tool.

Fig. 2

Circular genome view of G. hongkongensis EUFUS-Z928. The inner ring shows the length of the genome. The following two rings show the GC content and GC skew, respectively. The gray rings correspond to the CDSs annotated by the RAST tool kit in each DNA direction. The outer ring indicates the BGCs annotated by antiSMASH v6.0.

Annotation results of the G. hongkongensis EUFUS-Z928 genome. According to the RAST tool kit using the PATRIC service center. According to the antiSMASH v6.0 tool. Circular genome view of G. hongkongensis EUFUS-Z928. The inner ring shows the length of the genome. The following two rings show the GC content and GC skew, respectively. The gray rings correspond to the CDSs annotated by the RAST tool kit in each DNA direction. The outer ring indicates the BGCs annotated by antiSMASH v6.0. Regarding proteins with assignments to subsystems, as shown in Fig. 3, Metabolism (45.61%), Protein Processing (14.76%), Energy (13.23%), and Stress Response, Defense, Virulence (8.72%) were the subsystems with the highest assignments. In the latter, genes related to antibiotic resistance (n = 43), arsenic resistance (n = 5), as well as genes related to protection against oxidative stress such as mycothiol (n = 10) and protection from reactive oxygen species (n = 3) stand out. Complete information on the 1640 genes assigned to subsystems is shown in Table S3 (https://osf.io/6j5zs/).
Fig. 3

Overview of the assignments to functional subsystems of the G. hongkongensis EUFUS-Z928 genome according to the PATRIC annotation service.

Overview of the assignments to functional subsystems of the G. hongkongensis EUFUS-Z928 genome according to the PATRIC annotation service. According to the List of Prokaryotic names with Standing in Nomenclature, 47 species of the genus Gordonia have been reported so far (https://lpsn.dsmz.de/genus/gordonia; consulted on 04/02/2022). Although several strains of Gordonia are opportunistic pathogens, their potential for bioremediation of polluted environments [1] makes them a valuable biological resource in several research areas. The whole-genome sequence and functional annotation data of G. hongkongensis EUFUS-Z928 provides valuable information to facilitate the design and execution of more in-depth studies such as comparative genomics and genome mining.

Experimental Design, Materials and Methods

Strain isolation and DNA extraction

Strain EUFUS-Z928 was isolated from a sample of the octocoral Eunicea fusca (collected by diving at Punta de Betín, 11°15′02.1″N 74°13′16.0″W, Santa Marta, Magdalena, Colombia). The isolation was carried out using a modified Zobell medium (1.25 g of yeast extract, 3.75 g of peptone, 18 g of NaCl, 2 g of MgCl2, 0.525 g of KCl, 0.075 g of CaCl2 and 15 g of agar dissolved in enough distilled water to make 1 l of solution) supplemented with nalidixic acid (50 μg/mL). Genomic DNA extraction was performed using the Quick-DNA Fungal/Bacterial Microprep kit (Zymo Research Corporation, Irvine, CA, USA) following the manufacturer's instructions. The quality of the extracted DNA was verified by agarose gel electrophoresis and quantified using Qubit 1X dsDNA High Sensitivity kit (Invitrogen, Life Technologies, CA, USA).

Whole genome sequencing, assembly and annotation

Whole-genome sequencing was performed by Macrogen Inc. (Korea) using Illumina paired-end sequencing technology. Short read (151 bp) libraries were prepared using TruSeq Nano DNA Library Prep kit (Part # 15041110, Rev. D, Illumina, Inc., San Diego, CA, USA) and sequencing on Illumina NovaSeq 6000 platform. The raw sequence reads were quality filtering, trimming and de novo assembled applying the Shovill pipeline v1.1.0 (with default parameters) [2], employing SPAdes as the assembler tool [3]. Contigs shorter than 200 bp were removed. To check the quality of the de novo assembly, the genome completeness was analyzed by the BUSCO tool [4] (it reached 99.8%), and the ContEst16S algorithm [5] did not identify contamination in the assembled genome. Genome sequencing and assembly data are available from NCBI BioProject with accession PRJNA798903. Genome scaffolding was performed using the MeDuSa web server [6] with the reference genomes Gordonia sp. SGD-V-85 (RefSeq assembly accession: GCF_001456905) and Gordonia terrae (RefSeq assembly accession: GCF_901542405). These genomes were selected considering the results of the OGRI with the de novo assembled genome. The de novo assembled and scaffolded genomes were compared using the QUAST web server [7]. Genome annotation was done according to the RAST tool kit using the PATRIC service center [8]. To detect and characterize the content of specialized metabolite BGCs, we annotated the genome using the antiSMASH v6.0 tool [9]. A graphical circle map was generated on the CGView server to visualize the annotation results [10].

Phylogeny analysis

The analysis was conducted on both a genome-wide and single gene basis; including 16S ribosomal RNA gene (well established for phylogenetic analysis of bacteria), secA1, and gyrB genes, which have also been used for Gordonia phylogenetic analysis with more discriminatory power to identification at the species level [11]. The OGRI were calculated using the TYGS [12] and JSpeciesW [13] web servers. The sequences of 16S rRNA, secA1, and gyrB genes were retrieved from our annotated genome G. hongkongensis EUFUS-Z928. Phylogenetic trees for single gene analysis were estimated based on the maximum likelihood method using the IQ-Tree tool [14] (bootstrap values were calculated from 1000 replicates). Phylograms were generated using MEGA v11.0.10 [15]. Whole-genome phylogeny analyses were inferred using the Genome BLAST Distance approach in the TYGS server.

Ethics Statements

The samples used by this research were of Colombian origin, and they were obtained according to Amendment No. 5 to ARG Master Agreement No. 117 of 26 May 2015, granted by the Ministry of Environment and Sustainable Development, Colombia.

CRediT authorship contribution statement

Jeysson Sánchez-Suárez: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Visualization. Luis Díaz: Project administration, Funding acquisition, Supervision, Writing – review & editing. Javier Melo-Bolivar: Software, Validation, Formal analysis, Data curation. Luisa Villamil: Project administration, Funding acquisition, Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
SubjectBiological sciences
Specific subject areaBiotechnology, Microbiology: Bacteriology, Omics: Genomics
Type of dataTableFigureDraft genome sequence data
How the data were acquiredWhole-genome sequencing using Illumina NovaSeq 6000 platform for short reads
Data formatRawAnalyzed
Description of data collectionStrain EUFUS-Z928 was isolated from the octocoral Eunicea fusca. High-quality DNA was extracted and sequenced using Illumina NovaSeq 6000 (short reads). Raw paired-end reads were de novo assembled following the Shovill pipeline. The assembly was scaffolded with the MEDUSA algorithm, and annotation was performed using PATRIC web resources. Detection of specialized metabolite biosynthesis gene clusters was conducted with the antiSMASH tool.
Data source locationInstitution: Universidad de La SabanaCity/Town/Region: Chía, CundinamarcaCountry: ColombiaGPS coordinates for collected samples: 11°15′02.1″N 74°13′16.0″W
Data accessibilityRepository name: OSFData identification number: R4UZ8Direct URL to data: https://osf.io/r4uz8/.
  14 in total

1.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

2.  Phylogenetic studies of Gordonia species based on gyrB and secA1 gene analyses.

Authors:  Yingqian Kang; Kenjiro Takeda; Katsukiyo Yazawa; Yuzuru Mikami
Journal:  Mycopathologia       Date:  2008-09-10       Impact factor: 2.574

3.  MeDuSa: a multi-draft based scaffolder.

Authors:  Emanuele Bosi; Beatrice Donati; Marco Galardini; Sara Brunetti; Marie-France Sagot; Pietro Lió; Pierluigi Crescenzi; Renato Fani; Marco Fondi
Journal:  Bioinformatics       Date:  2015-03-25       Impact factor: 6.937

4.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

5.  ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences.

Authors:  Imchang Lee; Mauricio Chalita; Sung-Min Ha; Seong-In Na; Seok-Hwan Yoon; Jongsik Chun
Journal:  Int J Syst Evol Microbiol       Date:  2017-06-22       Impact factor: 2.747

6.  QUAST: quality assessment tool for genome assemblies.

Authors:  Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal:  Bioinformatics       Date:  2013-02-19       Impact factor: 6.937

7.  W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis.

Authors:  Jana Trifinopoulos; Lam-Tung Nguyen; Arndt von Haeseler; Bui Quang Minh
Journal:  Nucleic Acids Res       Date:  2016-04-15       Impact factor: 16.971

8.  The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.

Authors:  James J Davis; Alice R Wattam; Ramy K Aziz; Thomas Brettin; Ralph Butler; Rory M Butler; Philippe Chlenski; Neal Conrad; Allan Dickerman; Emily M Dietrich; Joseph L Gabbard; Svetlana Gerdes; Andrew Guard; Ronald W Kenyon; Dustin Machi; Chunhong Mao; Dan Murphy-Olson; Marcus Nguyen; Eric K Nordberg; Gary J Olsen; Robert D Olson; Jamie C Overbeek; Ross Overbeek; Bruce Parrello; Gordon D Pusch; Maulik Shukla; Chris Thomas; Margo VanOeffelen; Veronika Vonstein; Andrew S Warren; Fangfang Xia; Dawen Xie; Hyunseung Yoo; Rick Stevens
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

Review 9.  Harnessing the catabolic versatility of Gordonia species for detoxifying pollutants.

Authors:  Harshada Sowani; Mohan Kulkarni; Smita Zinjarde
Journal:  Biotechnol Adv       Date:  2019-02-13       Impact factor: 14.227

10.  TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy.

Authors:  Jan P Meier-Kolthoff; Markus Göker
Journal:  Nat Commun       Date:  2019-05-16       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.