Literature DB >> 27408680

High quality genome sequence and description of Enterobacter mori strain 5-4, isolated from a mixture of formation water and crude-oil.

Fan Zhang1, Sanbao Su2, Gaoming Yu3, Beiwen Zheng4, Fuchang Shu2, Zhengliang Wang2, Tingsheng Xiang2, Hao Dong5, Zhongzhi Zhang5, DuJie Hou1, Yuehui She2.   

Abstract

Enterobacter mori strain 5-4 is a Gram-negative, motile, rod shaped, and facultatively anaerobic bacterium, which was isolated from a mixture of formation water (also known as oil-reservior water) and crude-oil in Karamay oilfield, China. To date, there is only one E. mori genome has been sequenced and very little knowledge about the mechanism of E. mori adapted to the petroleum reservoir. Here, we report the second E. mori genome sequence and annotation, together with the description of features for this organism. The 4,621,281 bp assembly genome exhibits a G + C content of 56.24% and contains 4,317 protein-coding and 65 RNA genes, including 5 rRNA genes.

Entities:  

Keywords:  Enterobacter mori strain 5–4; Formation water; Genome; Hydrocarbon degradation

Year:  2015        PMID: 27408680      PMCID: PMC4940761          DOI: 10.1186/1944-3277-10-9

Source DB:  PubMed          Journal:  Stand Genomic Sci        ISSN: 1944-3277


Introduction

The genus Enterobacter was created by Hormaeche and Edwards in 1960 [1]. Members of the genus were isolated mostly from the environment, in particular from plants and recognized as notorious plant pathogens, but were also frequently isolated from hospitals, notably in healthcare associated infections and recognized as opportunistic pathogens [2, 3]. Twenty-nine validly published species and 2 subspecies have previously been recorded in the genus Enterobacter. However, 17 of the validly named species have been subsequently reclassified as members of 11 other genera. As of Oct 2014, this genus contains only 10 species and two subspecies [4]. As of Oct, 2014, a total of 116 Enterobacter strains have been sequenced and 29 genome sequences were published [5-12], however, only one genome of E. mori isolated from diseased mulberry roots has been sequenced [13]. E. mori strain 5–4 is a Gram-negative, motile, rod shaped, and facultatively anaerobic bacterium, isolated from a crude-oil well. It is worthy of note that E. mori strain 5–4 is capable of degrading petroleum (Additional file 1). In order to elucidate comprehensive alkane degradation pathways and adaption mechanism in E. mori strain 5–4, whole-genome sequence analysis was thus conducted. Here, we present a summary classification and a set of features for E. mori strain 5–4, together with the description of the genomic sequencing and annotation.

Classification and features

A formation water sample was collected from Karamay Oilfield, Xinjiang, China, in 2012. The water sample was preserved at -80°C immediately after collection and sent to the lab. E. mori strain 5–4 was isolated after cultivation on LB agar medium at 37°C. The optimum temperature for growth is 35°C, with a temperature range of 4-45°C (Table 1). Growth occurs under aerobic condition. Grows at pH 5.5-10.0, and optimally at pH 7.0. Cell morphology was examined by using scanning electron microscopy (Quanta 200, FEI Co., USA). Colonies are light yellow, smooth, circular with entire margins, with a diameter ranging 0.3-0.8 μm, and from 0.6 to 1.8 μm long (Figure 1). Themethyl red test is negative. H2S and indole are not produced. Casein and starch are not hydrolysed; gelatin is hydrolysed. Sorbitol, glycerol, tetradecane and hexadecane are utilized as the carbon source, while lactose, rhamnose, glucose, maltose, cellobiose, galactose, raffinose and sucrose are not utilized. Nitrite sodium and ammonium chloride are utilized, while nitrate sodium is not reduced. Antimicrobial susceptibility test showed that this strain is susceptible to ampicillin, tetracycline, erythromycin and gentamicin, and resistant to kanamycin.
Table 1

Classification and general features of strain 5–4 according to the MIGS recommendations [14]

MIGS IDPropertyTermEvidence code a
ClassificationDomain Bacteria TAS [15]
Phylum Proteobacteria TAS [16]
Class Gammaproteobacteria TAS [17, 18]
Order Enterobacteriales TAS [19]
Family Enterobacteriaceae TAS [2022]
Genus Enterobacter TAS [20, 23, 24]
Species Enterobacter mori
Strain: Strain 5-4IDA
Gram stainNegativeIDA
Cell shapeRodIDA
MotilityMotileIDA
SporulationNon-sporulatingIDA
Temperature range4-45°CIDA
Optimum temperature35°CIDA
pH range; OptimumUnknownIDA
Carbon sourceSorbitol, glycerol, tetradecane and hexadecaneIDA
MIGS-6HabitatEnvironmentIDA
MIGS-6.3SalinityGrowth in 0% ~ 7% NaClIDA
MIGS-22Oxygen requirementAerobicIDA
MIGS-15Biotic relationshipFree livingIDA
MIGS-14PathogenicityUnknownIDA
MIGS-4Geographic locationKaramay, ChinaIDA
MIGS-5Sample collection2012IDA
MIGS-4.1Latitude45°62’NIDA
MIGS-4.2Longitude85°02’E
MIGS-4.4Altitude460 mIDA

aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [25].

Figure 1

Scanning electron micrograph of cells of Enterobacter mori strain 5–4 bar: 2.0 μm.

Classification and general features of strain 5–4 according to the MIGS recommendations [14] aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [25]. Scanning electron micrograph of cells of Enterobacter mori strain 5–4 bar: 2.0 μm. Phylogenetic tree highlighting the position of E. mori 5–4 relative to other type strains within the genus Enterobacter. The strains and their corresponding GenBank accession numbers for 16S rRNA genes are shown following the organism names. Bootstrap consensus trees were inferred from 100 replicates, only bootstrap values > 50% were indicated. Xenorhabdus poinarii DSM 4768T was used as anoutgroup. The scale bar, 0.0005 substitutions per nucleotide position. A comparative taxonomic analysis was conducted based on the 16S rRNA nucleotide sequence. The representative 16S rRNA nucleotide sequence of Enterobacter mori strain 5–4 was compared against the most recent release of the EzTaxon-e database [26]. CLUSTAL W was used to generate alignments with comparative sequences collected from EzTaxon-e database [27]. The alignments were trimmed and converted to the MEGA 6.06 format before phylogenetic analysis. Phylogenetic inferences were made using Neighbor-joining method based on Tamura-Nei model within the MEGA 6.06 [28]. Phylogenetic tree indicated the taxonomic status of strain 5–2, clearly classified into the same branch with species E. mori type strain LMG 25706T (Figure 2).
Figure 2

Phylogenetic tree highlighting the position of E. mori 5–4 relative to other type strains within the genus Enterobacter. The strains and their corresponding GenBank accession numbers for 16S rRNA genes are shown following the organism names. Bootstrap consensus trees were inferred from 100 replicates, only bootstrap values > 50% were indicated. Xenorhabdus poinarii DSM 4768T was used as anoutgroup. The scale bar, 0.0005 substitutions per nucleotide position.

Genome sequencing information

Genome project history

E. mori strain 5–4 was selected for whole genome sequencing on the consideration of its potential relevance to microbial enhanced oil recovery (MEOR). The genome project is deposited in the Genome On Line Database and the draft genome sequence is deposited in GenBank under the accession JFHW00000000 and consists of 36 contigs. A summary of the project information and its association with MIGS version 2.0 compliance are shown in Table 2[14].
Table 2

Project information

MIGS IDPropertyTerm
MIGS-31Finishing qualityHigh-quality draft
MIGS-28Libraries usedOne pair-end 450 bp library
MIGS-29Sequencing platformsIllumina HiSeq 2000
MIGS-31.2Fold coverage358.0 × (based on 450 bp library)
MIGS-30AssemblersVelvet 1.2.07
MIGS-32Gene calling methodGlimmer 3.0
Locus TagAA74
Genbank IDJFHW00000000
Genbank Date of ReleaseApril 2, 2014
GOLD IDGi0064796
BIOPROJECTPRJNA224116
Project relevanceIndustrial
MIGS-13Source Material IdentifierCGMCC9982
Project information

Growth conditions and DNA isolation

E. mori strain 5–4 was x-Bertani Broth. Cells in late-log-phase growth were harvested and lysed by EDTA, lysozyme, and detergent treatment, followed by proteinase K and RNase digestion. Genomic DNA was extracted using the DNeasy blood and tissue kit (Qiagen, Germany), according to the manufacturer’s recommended protocol. The quantity of DNA was measured by the NanoDrop Spectrophotometer and Cubit. Then 10 μg of DNA was sent to BGI (Shenzhen, China) for sequencing on a Hiseq2000 (Illumina, CA) sequencer.

Genome sequencing and assembly

Genomic DNA sequencing of E. mori strain 5–4 was performed using Solexa paired-end sequencing technology (HiSeq2000 system, Illumina). One DNA library was generated (450 bp insert size, with Illumina adapter at both end, detected by Agilent DNA analyzer 2100), then sequencing was performed with a 2 x 100 bp pair end sequencing strategy. Finally, a total of 6,652.30 M bp data was produced and quality control was performed with the following criteria: 1) Reads linkaged to adapters at both end were considered as sequencing artifacts then removed. 2) Bases with quality index lower than Q20 at both end was trimmed. 3) Reads with ambiguous bases (N) were removed. 4) Single qualified reads were discarded (In this situation, one read is qualified but its mate is not). Filtered 687.39 M clean reads were assembled into scaffolds using the Velvet version 1.2.07 with parameters “-scaffolds no” [29], then we use a PAGIT flow [30] to prolong the initial contigs and correct sequencing errors to arrive at a set of improved scaffolds.

Genome annotation

Predict genes were identified using Glimmer version 3.0 [31], tRNAscan-SE version 1.21 [32] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer version 1.2 [33]. To annotate predict genes, we used HMMER version 3.0 [34] to align genes against Pfam version 27.0 [35] (only pfam-A was used) to find genes with conserved domains. KAAS server [36] was used to assign translated amino acids into KEGG Orthology [37] with SBH (single-directional best hit) method. Translated genes were aligned with COG database [38, 39] using NCBI blastp (hits should have scores no less than 60, e value is no more than 1e-6). To find genes with hypothetical or putative function, we aligned genes against NCBI nucleotide sequence database database (nt database was downloaded at Sep 20, 2013 ) by using NCBI blastn, only if hits have identity no less than 0.95, coverage no less than 0.9 , and reference gene had annotation of putative or hypothetical. To define genes with singnal peptide, we use SignaIP version 4.1 [40] to identify genes with signal peptide with default parameters. TMHMM 2.0 [41] was used to identify genes with transmembrane helices.

Genome properties

The draft genome sequence of E. mori strain 5–4 was assembled into 36 scaffolds with a assembly genome size of 4,621,281 bp and a G + C content of 56.2% (N is 358,174 bp). These scaffolds contain 4317 coding sequences (CDSs), 60 tRNAs (excluding 0 Pseudo tRNAs) and incomplete rRNA operons (3 small subunit rRNA and 2 large subunit rRNAs). A total of 980 protein-coding genes were assigned as putative function or hypothetical proteins. 3625 genes were categorized into COGs functional groups (including putative or hypothetical genes). The properties and the statistics of the genome are summarized in Table 3 and Table 4.
Table 3

Genome statistics

AttributeValue% of totala
Genome size (bp)4,621,281100.00
DNA Coding region (bp)4,117,46789.10
DNA G + C content (bp)2,599,11756.24
DNA scaffolds36
Total genes4,322100.00
Protein-coding genes4,31799.88
RNA genes651.51
Pseudo genes170.39
Genes with function prediction98022.67
Genes assigned to COGs3,62583.87
Genes assigned to Pfam domains3,99592.43
Genes with signal peptides4209.72
Genes with transmembrane helices1,08525.10
CRISPR repeats10.023

aThe total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Table 4

Number of genes associated with the general COG functional categories

CodeValue% ageDescription
J2024.68Translation, ribosomal structure and biogenesis
A10.02RNA processing and modification
K4009.27Transcription
L1493.45Replication, recombination and repair
B10.02Chromatin structure and dynamics
D591.37Cell cycle control, mitosis and meiosis
V1463.38Defense mechanisms
T2285.28Signal transduction mechanisms
M2666.16Cell wall/membrane biogenesis
N1363.15Cell motility
U1303.01Intracellular trafficking and secretion
O1764.08Posttranslational modification, protein turnover, chaperones
C2956.83Energy production and conversion
G49911.56Carbohydrate transport and metabolism
E60413.99Amino acid transport and metabolism
F942.18Nucleotide transport and metabolism
H2305.33Coenzyme transport and metabolism
I1202.78Lipid transport and metabolism
P4219.75Inorganic ion transport and metabolism
Q1343.10Secondary metabolites biosynthesis, transport and catabolism
R72016.68General function prediction only
S3618.36Function unknown
-3337.71Not in COGs

The total is based on the total number of protein coding genes in the annotated genome.

Genome statistics aThe total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. Number of genes associated with the general COG functional categories The total is based on the total number of protein coding genes in the annotated genome.

Genome comparison

Genome alignment between E. mori 5–4 (JFHW00000000) and E. mori type strain LMG 25706 T (AEXB00000000) was performed by using Mauve [42]. Orthology identification was carried out by a modified method introduced by Lerat [43]. Genome alignment showed that some functional regions are highly homologous between these two assemblies. The alignment also reveals some discrepancies between them, some short stretches of LMG 25706 T genome absent from the contigs in 5–4 (Figure 3A). However, two alkane 1-monooxygenase, one alkanesulfonate monooxygenase, one putative alkanesulfonate transporter, one putative sulfate permease and one alkanesulfonate transporter permease subunit were identified in the genome. Alkane 1-monooxygenase was found as one of the key enzymes responsible for the aerobic transformation of n-alkanes [44]. Moreover, alkanesulfonate monooxygenase and alkanesulfonate transporter may be responsible for organosulfur compound degradation [45]. Comparison of these two strains revealed the presence of a large core-genome (Figure 3B). They shared 3555 CDS in the genome. In addition, 759 CDS from the 5–4 genome were classified as unique, while 1097 CDS from the LMG 25706 T genome were classified as unique. Our genomic data will provide an excellent platform for further improvement of this organism for potential application in bioremediation.
Figure 3

Genome comparison between 5–4 and LMG 25706 . (A). Alignment is represented as local colinear blocks (colored) filled with a similarity plot. Height of the similarity plot indicates nucleotide identity of both assemblies; (B). Numbers inside the Venn diagrams indicate the number of genes found to be shared among the indicated genomes.

Genome comparison between 5–4 and LMG 25706 . (A). Alignment is represented as local colinear blocks (colored) filled with a similarity plot. Height of the similarity plot indicates nucleotide identity of both assemblies; (B). Numbers inside the Venn diagrams indicate the number of genes found to be shared among the indicated genomes.

Conclusions

Here, we report the second draft genome sequence and description of E. mori, which was isolated from a mixture of formation water and crude-oil. The genome revealed two alkane 1-monooxygenase, one alkanesulfonate monooxygenase, one putative alkanesulfonate transporter, one putative sulfate permease and one alkanesulfonate transporter permease subunit. Our genomic data of strain 5-4 provide a vast pool of genes involved in hydrocarbon degradation and an excellent platform for further improvement of this organism for potential application in bioremediation of oil-contaminated environments. And further comparative genomic study between stain 5-4 and other Enterobacter strains will give us a better understanding of the evolution of environmental bacteria towards industrial application. Additional file 1: Figure S1: Crude-oil and liquid paraffin degradation of E. mori 5–4. (A) Bio-degradation of crude-oil by E. mori 5–4 after 4-days incubation; (B) Negative control of crude-oil degradation; (C) Bio-degradation of liquid paraffin by E. mori 5–4 after 4-days incubation; (D) Negative control of liquid paraffin degradation. (DOCX 2 MB)
  33 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

3.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

Authors:  Koichiro Tamura; Glen Stecher; Daniel Peterson; Alan Filipski; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2013-10-16       Impact factor: 16.240

4.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

Review 5.  Enterobacter cloacae complex: clinical impact and emerging antibiotic resistance.

Authors:  Maria Lina Mezzatesta; Floriana Gona; Stefania Stefani
Journal:  Future Microbiol       Date:  2012-07       Impact factor: 3.165

6.  Genome sequence of the Enterobacter mori type strain, LMG 25706, a pathogenic bacterium of Morus alba L.

Authors:  Bo Zhu; Guo-Qing Zhang; Miao-Miao Lou; Wen-Xiao Tian; Bin Li; Xue-Ping Zhou; Guo-Feng Wang; He Liu; Guan-Lin Xie; Gu-Lei Jin
Journal:  J Bacteriol       Date:  2011-05-20       Impact factor: 3.490

7.  Complete genome sequence of Enterobacter aerogenes KCTC 2190.

Authors:  Sang Heum Shin; Sewhan Kim; Jae Young Kim; Soojin Lee; Youngsoon Um; Min-Kyu Oh; Young-Rok Kim; Jinwon Lee; Kap-Seok Yang
Journal:  J Bacteriol       Date:  2012-05       Impact factor: 3.490

8.  The minimum information about a genome sequence (MIGS) specification.

Authors:  Dawn Field; George Garrity; Tanya Gray; Norman Morrison; Jeremy Selengut; Peter Sterk; Tatiana Tatusova; Nicholas Thomson; Michael J Allen; Samuel V Angiuoli; Michael Ashburner; Nelson Axelrod; Sandra Baldauf; Stuart Ballard; Jeffrey Boore; Guy Cochrane; James Cole; Peter Dawyndt; Paul De Vos; Claude DePamphilis; Robert Edwards; Nadeem Faruque; Robert Feldman; Jack Gilbert; Paul Gilna; Frank Oliver Glöckner; Philip Goldstein; Robert Guralnick; Dan Haft; David Hancock; Henning Hermjakob; Christiane Hertz-Fowler; Phil Hugenholtz; Ian Joint; Leonid Kagan; Matthew Kane; Jessie Kennedy; George Kowalchuk; Renzo Kottmann; Eugene Kolker; Saul Kravitz; Nikos Kyrpides; Jim Leebens-Mack; Suzanna E Lewis; Kelvin Li; Allyson L Lister; Phillip Lord; Natalia Maltsev; Victor Markowitz; Jennifer Martiny; Barbara Methe; Ilene Mizrachi; Richard Moxon; Karen Nelson; Julian Parkhill; Lita Proctor; Owen White; Susanna-Assunta Sansone; Andrew Spiers; Robert Stevens; Paul Swift; Chris Taylor; Yoshio Tateno; Adrian Tett; Sarah Turner; David Ussery; Bob Vaughan; Naomi Ward; Trish Whetzel; Ingio San Gil; Gareth Wilson; Anil Wipat
Journal:  Nat Biotechnol       Date:  2008-05       Impact factor: 54.908

9.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

10.  Complete genome sequence of Enterobacter sp. IIT-BT 08: A potential microbial strain for high rate hydrogen production.

Authors:  Namita Khanna; Ananta Kumar Ghosh; Marcel Huntemann; Shweta Deshpande; James Han; Amy Chen; Nikos Kyrpides; Kostas Mavrommatis; Ernest Szeto; Victor Markowitz; Natalia Ivanova; Ioanna Pagani; Amrita Pati; Sam Pitluck; Matt Nolan; Tanja Woyke; Hazuki Teshima; Olga Chertkov; Hajnalka Daligault; Karen Davenport; Wei Gu; Christine Munk; Xiaojing Zhang; David Bruce; Chris Detter; Yan Xu; Beverly Quintana; Krista Reitenga; Yulia Kunde; Lance Green; Tracy Erkkila; Cliff Han; Evelyne-Marie Brambilla; Elke Lang; Hans-Peter Klenk; Lynne Goodwin; Patrick Chain; Debabrata Das
Journal:  Stand Genomic Sci       Date:  2013-12-15
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.