Literature DB >> 26203323

Genome sequence of the soil bacterium Corynebacterium callunae type strain DSM 20147(T).

Marcus Persicke1, Andreas Albersmeier1, Hanna Bednarz2, Karsten Niehaus2, Jörn Kalinowski1, Christian Rückert1.   

Abstract

Corynebacterium callunae DSM 20147(T) is a member of the genus Corynebacterium which contains Gram-positive and non-spore forming bacteria with a high G + C content. C. callunae was isolated during a screening for l-glutamic acid producing bacteria and belongs to the aerobic and non-haemolytic corynebacteria. As this is a type strain in a subgroup of industrial relevant bacteria for many of which there are also complete genome sequence available, knowledge of the complete genome sequence might enable genome comparisons to identify production relevant genetic loci. This project, describing the 2.84 Mbp long chromosome and the two plasmids, pCC1 (4.11 kbp) and pCC2 (85.02 kbp), with their 2,647 protein-coding and 82 RNA genes, will aid the Genomic Encyclopedia of Bacteria and Archaea project.

Entities:  

Keywords:  Aerobic; Glutamic acid producing; Gram-positive; Non-motile; Non-spore forming

Year:  2015        PMID: 26203323      PMCID: PMC4510995          DOI: 10.1186/1944-3277-10-5

Source DB:  PubMed          Journal:  Stand Genomic Sci        ISSN: 1944-3277


Introduction

Strain DSM 20147T is the type strain in a subgroup of industrial relevant bacteria originally isolated during a screening for l-glutamic acid producing microorganisms and was classified to belong to the genus [1]. This genus is comprised of Gram-positive bacteria with a high G + C content. It currently contains 126 validly published members (species and subspecies), 4 of which are synonyms of other species within the genus, 27 that were later reclassified as members of 7 other genera, and 1 member abolished in erratum [2-11]. The remaining 93 were isolated from diverse backgrounds like soil, sea, or ripening cheese, but also from human clinical samples and animals. Within this diverse genus, has been found to be a producer of l-glutamic acid, like one of the most prominent representatives of the corynebacteria, [1]. The biological context of this species is, unfortunately, basically unknown as it was first described in a patent application [1] that does neither mention the geographic location nor the exact habitat of the strain. Based on the name and the habitats of its close relatives , , and , the most likely habitat of is soil around heather plants. But while the biotechnological uses and capabilities of this subgroup within the genus has been studied in detail, especially for , the ability of all these strains to secrete considerable amounts of l-glutamic acid is still not well understood in the context of the environment. DSM 20147T harbors two cryptic plasmids: pCC1 (4,109 bp) which encodes a Rep protein that shows similarity to the corynebacterial plasmid pAG3 and pBL1, and pCC2 (85,023 bp) the Rep protein of which has possible orthologs in many other corynebacteria. Aside from this, DSM 20147T is an alkaline-tolerant bacterium, which grows well at pH 5.0 - 9.0 (optimum pH 6–8) [1]. Here we present a summary classification and a set of features for DSM 20147T, together with the description of the genomic sequencing and annotation.

Organism information

Classification and features

A representative genomic 16S rRNA sequence of DSM 20147T was compared to the Ribosomal Database Project database [12] confirming the initial taxonomic classification. shows highest similarity to and (97%, respectively). Figure 1 shows the phylogenetic neighborhood of in a 16S rRNA based tree. forms a subgroup containing furthermore the species ATCC 13032T, GIMN1.010T, and YS-314T.
Figure 1

Phylogenetic tree highlighting the position of relative to type strains of other species within the genus . Species with at least one publicly available genome sequence (not necessarily the type strain) are highlighted in bold face. The tree is based on sequences aligned by the RDP aligner and utilizes the Jukes-Cantor corrected distance model to construct a distance matrix based on alignment model positions without alignment inserts, using a minimum comparable position of 200. The tree is built with RDP Tree Builder, which utilizes Weighbor [13] with an alphabet size of 4 and length size of 1000. The building of the tree also involves a bootstrapping process repeated 100 times to generate a majority consensus tree [14]Rhodococcus equi (X80614) was used as an outgroup.

Phylogenetic tree highlighting the position of relative to type strains of other species within the genus . Species with at least one publicly available genome sequence (not necessarily the type strain) are highlighted in bold face. The tree is based on sequences aligned by the RDP aligner and utilizes the Jukes-Cantor corrected distance model to construct a distance matrix based on alignment model positions without alignment inserts, using a minimum comparable position of 200. The tree is built with RDP Tree Builder, which utilizes Weighbor [13] with an alphabet size of 4 and length size of 1000. The building of the tree also involves a bootstrapping process repeated 100 times to generate a majority consensus tree [14]Rhodococcus equi (X80614) was used as an outgroup. DSM 20147T is a Gram-positive rod shaped bacterium, which is 1–2 μm long and 0.4-0.6 μm wide (Figure 2). It is described to be non-motile [1], which coincides with a complete lack of genes associated with ‘cell motility’ (functional category N in COGs table). Growth of DSM 20147T was shown at temperatures between 25–37°C with optimal l-glutamic acid production between 25–35°C [1]. Carbon sources utilized by strain DSM 20147T include dextrose, fructose, galactose, inulin, inositol, maltose, mannitol, mannose, raffinose, salicin, sucrose and trehalose [1]. DSM 20147T tested positive for citrate, catalase and urease, but shows no nitrate reduction activity [1]. Details on the chemotaxonomy are largely missing, but can be inferred from the close relatives , , and [3]. Based on these relatives, meso-diaminopimelic acid is expected to be the major diamino acid of the cell wall, with arabinose and galactose as the main sugars (chemotype IV). Short-chain mycolic acids (32 ± 36 carbon atoms) are also certain to be present, as all necessary genes were found to be present. The major cellular fatty acids are expected to be hexadecanoic acid (C16:0, 40-50%) and octadecenoic acid (C18:1ω9c, 40-50%) with small amounts of octadecanoic acid (C18:0, ~1%) and possible others. MK-9(H2) is thought to be the major menaquinone, although MK-8(H2) might also be present in significant amounts. Phosphatidylinositol, diphosphatidylglycerol, and phosphatidylglycerol as well as their glycosides are expected to be the main components of the polar lipids (Table 1).
Figure 2

Scanning electron micrograph of DSM 20147 .

Table 1

Classification and general features of DSM 20147 according to the MIGS recommendations[15]

MIGS IDPropertyTermEvidence code a)
 Current classificationDomain BacteriaTAS [16]
Phylum ‘ActinobacteriaTAS [17]
Class ActinobacteriaTAS [18,19]
Order ActinomycetalesTAS [18,20-22]
Family CorynebacteriaceaeTAS [18,20,22,23]
Genus CorynebacteriumTAS [24,25]
Species Corynebacterium callunaeTAS [1,22,26]
Type-strain DSM 20147TAS [1,22,26]
 Gram stainPositiveTAS [1]
 Cell shapeRod-shapedTAS [1]
 MotilityNon-motileTAS [1]
 SporulationNon-sporulatingTAS [1]
 Temperature rangeMesophileTAS [1]
 pH range5 - 9; optimum 6 - 8TAS [1]
 SalinityNot reportedTAS [1]
MIGS-22Oxygen requirementAerobeTAS [1]
 Carbon sourceDextrose, fructose, galactose, inulin, inositol, maltose, mannitol, mannose, raffinose, salicin, sucrose and trehaloseTAS [1]
 Energy metabolismChemoorganoheterotrophicNAS
 Terminal electron acceptorOxygenNAS
MIGS-6HabitatNot reportedTAS [1]
MIGS-15Biotic relationshipFree livingNAS
MIGS-14PathogenicityNon-pathogenicNAS
 Biosafety level1NAS
MIGS-23.1IsolationNot reportedTAS [1]
MIGS-4Geographic locationNot reportedTAS [1]
MIGS-5Sample collection timeNot reportedTAS [1]

a)Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [27].

Scanning electron micrograph of DSM 20147 . Classification and general features of DSM 20147 according to the MIGS recommendations[15] a)Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [27].

Genome sequencing and annotation

Genome project history

Due to its phylogenetic position in the near neighborhood of industrial relevant species of the genus , was selected for sequencing as part of a project to define production relevant loci in corynebacteria. While not being part of the GEBA project, sequencing of the type strain will nonetheless aid the GEBA effort. The genome project is deposited in the Genomes OnLine Database [28] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed at the CeBiTec. A summary of the project information is shown in Table 2.
Table 2

Genome sequencing project information

MIGS IDPropertyTerm
MIGS-31Finishing qualityFinished
MIGS-28Libraries usedNextera DNA Sample Prep Kit, Nextera Mate Pair Sample Prep Kit
MIGS-29Sequencing platformsIllumina MiSeq
MIGS-31.2Sequencing coverage99.51×
MIGS-30AssemblersNewbler version 2.8
MIGS-32Gene calling methodGeneMark, Glimmer
 Locus TagH924
 Genbank IDCP004354, CP004355, CP004356
 GenBank Date of ReleaseMarch 6, 2013
 GOLD IDGc0042965
 BIOPROJECT ID190670
 Project relevanceIndustrial, GEBA
MIGS-13Source material identifierDSM 20147
Genome sequencing project information

Growth conditions and DNA isolation

DSM 20147T was grown aerobically in CASO bouillon (Carl Roth GmbH, Karlsruhe, Germany) at 30°C. DNA was isolated from ~ 108 cells using the protocol described by Tauch et al. [29].

Genome sequencing and assembly

Two libraries were prepared: a WGS library using the Illumina-Compatible Nextera DNA Sample Prep Kit (Epicentre, WI, U.S.A) and a 6 k MatePair library using the Nextera Mate Pair Sample Preparation Kit, both according to the manufacturer's protocol. Both libraries were sequenced in a 2× 250 bp paired read run on the MiSeq platform, yielding 1,747,266 total reads, providing 99.51× coverage of the genome. Reads were assembled using the Newbler assembler v2.8 (Roche). The initial Newbler assembly consisted of 29 contigs in four scaffolds. Analysis of the four scaffolds revealed two to be an extrachromosomal element (plasmid pCC1 and pCC2), one to make up the chromosome and the remaining one containing the seven copies of the RRN operon. The Phred/Phrap/Consed software package [30-33] was used for sequence assembly and quality assessment in the subsequent finishing process, gaps between contigs were closed by manual editing in Consed (for repetitive elements).

Genome annotation

Gene prediction and annotation were done using the PGAP pipeline [34]. Genes were identified using GeneMark [35], GLIMMER [36], and Prodigal [37]. For annotation, BLAST searches against the NCBI Protein Clusters Database [38] are performed and the annotation is enriched by searches against the Conserved Domain Database [39] and subsequent assignment of coding sequences to COGs. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [40], Infernal [41], RNAMMer [42], Rfam [43], TMHMM [44], and SignalP [45].

Genome properties

The genome (on the scale of 2,928,683 bp) includes one circular chromosome of 2,839,5514 bp (52.39% G + C content) and two plasmids of 4,109 bp (54.42% G + C content) and 85,023 bp (54.38% G + C content, [Figure 3]). For chromosome and plasmids, a total of 2,729 genes were predicted, 2,647 of which are protein coding genes. 2,085 (76.40%) of the protein coding genes were assigned to a putative function, the remaining were annotated as hypothetical proteins. 1,937 protein coding genes belong to 314 paralogous families in this genome corresponding to a gene content redundancy of 41.52%. The properties and the statistics of the genome are summarized in [Tables 3, 4 and 5].
Figure 3

Graphical map of the chromosome and the two plasmids pCC1 and pCC2 (not drawn to scale). From the outside in: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), GC content, GC skew.

Table 3

Summary of genome: one chromosome and two plasmids

LabelSize (Mb)TopologyINSDC identifier
Chromosome2.840circularCP004354
Plasmid pCC10.004circularCP004355
Plasmid pCC20.085circularCP004356
Table 4

Genome statistics

AttributeValue% of total a
Genome size (bp)2,928,683100.00
DNA coding (bp)2,678,51191.46
DNA G + C (bp)1,536,29252.46
DNA scaffolds3 
Total genes2,729100.00
Protein coding genes2,64797.00
RNA genes823.00
Pseudo genes612.24
Genes in internal clusters1,93764.05
Genes with function prediction2,08576.40
Genes assigned to COGs1,74841.52
Genes with Pfam domains2,1255.06
Genes with signal peptides1585.79
Genes with transmembrane helices67324.66
CRISPR repeats0 

a)The total is based on either the size of the genome in base pairs or the total number of total genes in the annotated genome.

Table 5

Number of genes associated with the general COG functional categories

CodeValue% ageDescription
J1485.59Translation, ribosomal structure and biogenesis
A10.04RNA processing and modification
K1746.57Transcription
L1927.25Replication, recombination and repair
B00.00Chromatin structure and dynamics
D200.76Cell cycle control, cell division, chromosome partitioning
Y00.00Nuclear structure
V411.55Defense mechanisms
T662.49Signal transduction mechanisms
M1164.38Cell wall/membrane biogenesis
N10.04Cell motility
Z00.00Cytoskeleton
W10.04Extracellular structures
U281.06Intracellular trafficking and secretion, and vesicular transport
O762.87Posttranslational modification, protein turnover, chaperones
C1154.34Energy production and conversion
G1736.54Carbohydrate transport and metabolism
E2449.22Amino acid transport and metabolism
F742.80Nucleotide transport and metabolism
H1074.04Coenzyme transport and metabolism
I572.23Lipid transport and metabolism
P1826.88Inorganic ion transport and metabolism
Q532.00Secondary metabolites biosynthesis, transport and catabolism
R31511.90General function prediction only
S1706.42Function unknown
-62923.76Not in COGs
Graphical map of the chromosome and the two plasmids pCC1 and pCC2 (not drawn to scale). From the outside in: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), GC content, GC skew. Summary of genome: one chromosome and two plasmids Genome statistics a)The total is based on either the size of the genome in base pairs or the total number of total genes in the annotated genome. Number of genes associated with the general COG functional categories

Insights from the genome sequence

The complete genome sequence of was already mined for biotechnological purposes to define the core genome of the - - subgroup to define the chassis genome for [46]. Comparison of the three genomes using EDGAR [47] reveals that the core genome of this group comprises just 1,873 genes and the number of genes that are found only in is also relatively small (366), especially when compared to number of singletons found in the other two (926 and 773 in and , respectively; Figure 4). As was shown to produce l-glutamate in an amount comparable to , might be considered as a potential candidate for future genome reduction efforts since the chromosome is already considerably smaller than that of and (2.84 Mbp versus 3.21 Mbp and 3.15 Mbp, respectively). This future approach is aided by the observation that many of the singletons are clustered in just three regions (I: H924_2045-H924_02230, 37 genes, 25.2 kbp; II: H924_03630-H924_03880, 50 genes 52.5 kbp; III: H924_07070-H924_07380, 61 genes, 48.2 kbp) which constitutes ~ 4.4% of the genome size. As at least region II and region III are likely prophages, loss of these regions should be neutral or even beneficial, as demonstrated for [48].
Figure 4

Venn diagram depicting the number of genes shared between , , and . EDGAR [47] was used to determine the core genomes shared between respectively singletons unique to the three species.

Venn diagram depicting the number of genes shared between , , and . EDGAR [47] was used to determine the core genomes shared between respectively singletons unique to the three species. One central prerequisite for future rational strain development is the genetic accessibility of the prospective strain. Knowledge of the complete genome sequence of helps to overcome at least two of the main obstacles: the construction of plasmids usable as vectors and removal of elements that hinder DNA transfer. For the former, the knowledge of the sequences of the two plasmids pCC1 and pCC2 allows use of plasmid-tagging approaches via a counter-selectable marker [49] to cure them, should conventional approaches like heat-shock curing fail. Once cured, the sequence of the plasmids help to identify the minimal gene set necessary for replication to build shuttle vectors, as demonstrated for pCC1 [50]. For the latter, the genome sequence helps to identify restriction-modification systems. A preliminary analysis revealed the presence of at least 4 such systems, one of which is located in the potential prophage region II. Removal of such systems has been shown to significantly enhance the stability of foreign DNA introduced and thus facilitating genetic engineering approaches [48].

Conclusion

The complete genome sequence of C. callunae is the third genome sequence of the C. glutamicum - C. deserti - C. efficiens - C. callunae subgroup of L-glutamic acid producing corynebacteria within the genus Corynebacterium. Knowledge of the complete genome sequence has already contributed to identify the core genome of this group. With a size of 2.84 Mbp and an a total of 2,647 protein coding genes, the genome of C. callunae is by far the smallest within this group. Therefore, this bacterium might be an ideal choice for future development of a platform strain as the otherwise high degree of similarity of its genome content to the well studied C. glutamicum would allow an easy transfer of knowledge to the new host. Furthermore, knowledge of the complete genome sequence also facilitates the identification of possible targets to improve the accessibility to genetic engineering (like restriction-modification systems) and to enhance genome stability (like phages and transposases).

Abbreviations

CeBiTec: Center for Biotechnology; GEBA: Genomic Encyclopedia of Bacteria and Archaea.

Competing interests

The authors declare that they have no competing interests.

Authors contributions

MP prepared and wrote the manuscript, AA performed library preparation and sequencing, HB and KN performed electron microscopy, JK coordinated the study, and CR assembled and analyzed the genome sequence.
  40 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction.

Authors:  W J Bruno; N D Socci; A L Halpern
Journal:  Mol Biol Evol       Date:  2000-01       Impact factor: 16.240

3.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

4.  Consed: a graphical tool for sequence finishing.

Authors:  D Gordon; C Abajian; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

5.  The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

Authors:  Konstantinos Liolios; I-Min A Chen; Konstantinos Mavromatis; Nektarios Tavernarakis; Philip Hugenholtz; Victor M Markowitz; Nikos C Kyrpides
Journal:  Nucleic Acids Res       Date:  2009-11-13       Impact factor: 16.971

6.  The erythromycin resistance gene of the Corynebacterium xerosis R-plasmid pTP10 also carrying chloramphenicol, kanamycin, and tetracycline resistances is capable of transposition in Corynebacterium glutamicum.

Authors:  A Tauch; F Kassing; J Kalinowski; A Pühler
Journal:  Plasmid       Date:  1995-05       Impact factor: 3.466

7.  Rfam: annotating non-coding RNAs in complete genomes.

Authors:  Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

8.  Chassis organism from Corynebacterium glutamicum--a top-down approach to identify and delete irrelevant gene clusters.

Authors:  Simon Unthan; Meike Baumgart; Andreas Radek; Marius Herbst; Daniel Siebert; Natalie Brühl; Anna Bartsch; Michael Bott; Wolfgang Wiechert; Kay Marin; Stephan Hans; Reinhard Krämer; Gerd Seibold; Julia Frunzke; Jörn Kalinowski; Christian Rückert; Volker F Wendisch; Stephan Noack
Journal:  Biotechnol J       Date:  2014-10-08       Impact factor: 4.677

9.  CDD: specific functional annotation with the Conserved Domain Database.

Authors:  Aron Marchler-Bauer; John B Anderson; Farideh Chitsaz; Myra K Derbyshire; Carol DeWeese-Scott; Jessica H Fong; Lewis Y Geer; Renata C Geer; Noreen R Gonzales; Marc Gwadz; Siqian He; David I Hurwitz; John D Jackson; Zhaoxi Ke; Christopher J Lanczycki; Cynthia A Liebert; Chunlei Liu; Fu Lu; Shennan Lu; Gabriele H Marchler; Mikhail Mullokandov; James S Song; Asba Tasneem; Narmada Thanki; Roxanne A Yamashita; Dachuan Zhang; Naigong Zhang; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2008-11-04       Impact factor: 16.971

10.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure.

Authors:  Sean R Eddy
Journal:  BMC Bioinformatics       Date:  2002-07-02       Impact factor: 3.169

View more
  2 in total

Review 1.  Tenets in Microbial Endocrinology: A New Vista in Teleost Reproduction.

Authors:  Ramjanul Haque; Ipsita Iswari Das; Paramita Banerjee Sawant; Narinder Kumar Chadha; Lakshman Sahoo; Rajesh Kumar; Jitendra Kumar Sundaray
Journal:  Front Physiol       Date:  2022-08-12       Impact factor: 4.755

Review 2.  Insight of Genus Corynebacterium: Ascertaining the Role of Pathogenic and Non-pathogenic Species.

Authors:  Alberto Oliveira; Leticia C Oliveira; Flavia Aburjaile; Leandro Benevides; Sandeep Tiwari; Syed B Jamal; Arthur Silva; Henrique C P Figueiredo; Preetam Ghosh; Ricardo W Portela; Vasco A De Carvalho Azevedo; Alice R Wattam
Journal:  Front Microbiol       Date:  2017-10-12       Impact factor: 5.640

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.