| Literature DB >> 22180814 |
Yun-Juan Chang, Miriam Land, Loren Hauser, Olga Chertkov, Tijana Glavina Del Rio, Matt Nolan, Alex Copeland, Hope Tice, Jan-Fang Cheng, Susan Lucas, Cliff Han, Lynne Goodwin, Sam Pitluck, Natalia Ivanova, Galina Ovchinikova, Amrita Pati, Amy Chen, Krishna Palaniappan, Konstantinos Mavromatis, Konstantinos Liolios, Thomas Brettin, Anne Fiebig, Manfred Rohde, Birte Abt, Markus Göker, John C Detter, Tanja Woyke, James Bristow, Jonathan A Eisen, Victor Markowitz, Philip Hugenholtz, Nikos C Kyrpides, Hans-Peter Klenk, Alla Lapidus.
Abstract
Ktedonobacter racemifer corrig. Cavaletti et al. 2007 is the type species of the genus Ktedonobacter, which in turn is the type genus of the family Ktedonobacteraceae, the type family of the order Ktedonobacterales within the class Ktedonobacteria in the phylum 'Chloroflexi'. Although K. racemifer shares some morphological features with the actinobacteria, it is of special interest because it was the first cultivated representative of a deep branching unclassified lineage of otherwise uncultivated environmental phylotypes tentatively located within the phylum 'Chloroflexi'. The aerobic, filamentous, non-motile, spore-forming Gram-positive heterotroph was isolated from soil in Italy. The 13,661,586 bp long non-contiguous finished genome consists of ten contigs and is the first reported genome sequence from a member of the class Ktedonobacteria. With its 11,453 protein-coding and 87 RNA genes, it is the largest prokaryotic genome reported so far. It comprises a large number of over-represented COGs, particularly genes associated with transposons, causing the genetic redundancy within the genome being considerably larger than expected by chance. This work is a part of the Genomic Encyclopedia of Bacteria and Archaea project.Entities:
Keywords: Chloroflexi; GEBA; Gram-positive; Ktedonobacteraceae; aerobic; broken-stick distribution; entropy; filamentous; heterotrophic; moderately acidophilic; non-motile; sporulating; transposon
Year: 2011 PMID: 22180814 PMCID: PMC3236041 DOI: 10.4056/sigs.2114901
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Figure 1Phylogenetic tree highlighting the position of K. racemifer relative to the other type strains within the phylum ‘Chloroflexi’. The tree was inferred from 1,359 aligned characters [7,8] of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion [9]. Rooting was done initially using the midpoint method [10] and then checked for its agreement with the current classification (Table 1). The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 750 ML bootstrap replicates [11] (left) and from 1,000 maximum parsimony bootstrap replicates [12] (right) if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [13] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks [14-17] as well as CP001337, CP000804, CP000909, CP002084, and AP012029.
Classification and general features of K. racemifer SOSP1-21T according to the MIGS recommendations [18] and the NamesforLife database [19]
| MIGS ID | Property | Term | Evidence code |
|---|---|---|---|
| Current classification | Domain | TAS [ | |
| Phylum | TAS [ | ||
| Class | TAS [ | ||
| Order | TAS [ | ||
| Family | TAS [ | ||
| Genus | TAS [ | ||
| Species | TAS [ | ||
| Type strain SOSP1-21 | TAS [ | ||
| Gram stain | positive | TAS [ | |
| Cell shape | filamentous | TAS [ | |
| Motility | non-motile | TAS [ | |
| Sporulation | spherical spore-forming | TAS [ | |
| Temperature range | mesophile | TAS [ | |
| Optimum temperature | 28-33°C | TAS [ | |
| Salinity | NaCl up to 10 g/l growth w/o problem, inhibited at 30 g/l | TAS [ | |
| MIGS-22 | Oxygen requirement | aerobic and microaerophilic | TAS [ |
| Carbon source | sugars and peptides | TAS [ | |
| Energy metabolism | heterotrophic | TAS [ | |
| MIGS-6 | Habitat | soil | TAS [ |
| MIGS-15 | Biotic relationship | free-living | NAS |
| MIGS-14 | Pathogenicity | none | NAS |
| Biosafety level | 1 | TAS [ | |
| Isolation | soil from a black locust wood | TAS [ | |
| MIGS-4 | Geographic location | Gerenzano, Northern Italy | TAS [ |
| MIGS-5 | Sample collection time | November 2001 | NAS |
| MIGS-4.1 | Latitude | 45.64 | NAS |
| MIGS-4.2 | Longitude | 9.00 | NAS |
| MIGS-4.3 | Depth | not reported | |
| MIGS-4.4 | Altitude | about 210 m | NAS |
Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [24].
Figure 2aScanning electron micrographs of K. racemifer SOSP1-21T mycelium.
Figure 2bScanning electron micrographs of K. racemifer SOSP1-21T spores.
Genome sequencing project information
| | | |
|---|---|---|
| MIGS-31 | Finishing quality | Non-contiguous finished |
| MIGS-28 | Libraries used | Two Sanger 8 kb pMCL200 and fosmid libraries; one 454 pyrosequence standard library |
| MIGS-29 | Sequencing platforms | ABI3730, 454 GS FLX |
| MIGS-31.2 | Sequencing coverage | 10.1 × Sanger; 24.6 × pyrosequence |
| MIGS-30 | Assemblers | Newbler version 1.1.02.15, phrap |
| MIGS-32 | Gene calling method | Prodigal 1.4, Genemark 4.6b, tRNAScan-SE-1.23, infernal 0.81 |
| INSDC ID | ADVG00000000 | |
| Genbank Date of Release | June 14, 2010 | |
| GOLD ID | Gi02261 | |
| NCBI project ID | 27943 | |
| Database: IMG-GEBA | 648276680 | |
| MIGS-13 | Source material identifier | DSM 44963 |
| Project relevance | Tree of Life, GEBA |
Genome Statistics
| | | |
|---|---|---|
| Genome size (bp) | 13,661,586 | 100.00% |
| DNA coding region (bp) | 10,422,932 | 76.29% |
| DNA G+C content (bp) | 7,348,426 | 53.79% |
| Number of contigs | 10 | |
| Extrachromosomal elements | unknown | |
| Total genes | 11,540 | 100.00% |
| RNA genes | 87 | 0.75% |
| rRNA operons | 8 | |
| Protein-coding genes | 11,453 | 99.25% |
| Pseudo genes | 0 | |
| Genes with function prediction | 7,065 | 61.22% |
| Genes in paralog clusters | 4,919 | 42.63% |
| Genes assigned to COGs | 6,654 | 57.66% |
| Genes assigned Pfam domains | 7,250 | 62.82% |
| Genes with signal peptides | 2,660 | 23.05% |
| Genes with transmembrane helices | 2,581 | 22.27% |
| CRISPR repeats | 7 |
Figure 3Graphical linear map of the largest, 3,837,106 bp long contig. From bottom to the top: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.
Number of genes associated with the general COG functional categories
| Code | value | %age | Description |
|---|---|---|---|
| J | 224 | 2.9 | Translation, ribosomal structure and biogenesis |
| A | 0 | 0.0 | RNA processing and modification |
| K | 893 | 11.6 | Transcription |
| L | 975 | 12.6 | Replication, recombination and repair |
| B | 3 | 0.0 | Chromatin structure and dynamics |
| D | 34 | 0.4 | Cell cycle control, cell division, chromosome partitioning |
| Y | 0 | 0.0 | Nuclear structure |
| V | 215 | 2.8 | Defense mechanisms |
| T | 617 | 8.0 | Signal transduction mechanisms |
| M | 257 | 3.3 | Cell wall/membrane/envelope biogenesis |
| N | 20 | 0.3 | Cell motility |
| Z | 0 | 0.0 | Cytoskeleton |
| W | 0 | 0.0 | Extracellular structures |
| U | 54 | 0.7 | Intracellular trafficking, secretion, and vesicular transport |
| O | 195 | 2.5 | Posttranslational modification, protein turnover, chaperones |
| C | 416 | 5.4 | Energy production and conversion |
| G | 612 | 7.9 | Carbohydrate transport and metabolism |
| E | 474 | 6.2 | Amino acid transport and metabolism |
| tF | 135 | 1.8 | Nucleotide transport and metabolism |
| H | 264 | 3.4 | Coenzyme transport and metabolism |
| I | 236 | 3.1 | Lipid transport and metabolism |
| P | 255 | 3.3 | Inorganic ion transport and metabolism |
| Q | 217 | 2.8 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 1,098 | 14.4 | General function prediction only |
| S | 519 | 6.7 | Function unknown |
| - | 4,886 | 42.3 | Not in COGs |
Figure 4Screen shot from CROSSMATCH [32] indicating the matches between sequences within and across the contigs. CROSSMATCH options were – minmatch 30 – minscore 60.
Figure 5Venn diagram depicting the intersections of protein sets (total number of derived protein sequences in parentheses) of K. racemifer, S. thermophilus and T. roseum.
Pairwise comparison of K. racemifer, S. thermophilus and T. roseum using the GGDC-Calculator.
| HSP length /total length [%] | identities /HSP length [%] | identities /total length [%] | ||
|---|---|---|---|---|
| | 0.57 | 86.4 | 0.50 | |
| | 0.48 | 87.2 | 0.42 | |
| | 9.41 | 83.1 | 7.82 |
Figure 6Relative frequencies of the 100 most frequent COGs in the genome of K. racemifer (blue line) compared to their expected frequency as estimated using the broken-stick distribution (red line). Over-represented COGs are labeled.
Figure 7Relative frequencies of the 100 most frequent COGs in the genome of S. thermophilus (blue line) compared to their expected frequency as estimated using the broken-stick distribution (red line). Over-represented COGs are labeled.
Figure 8Relative frequencies of the 100 most frequent COGs in the genome of T. roseum (blue line) compared to their expected frequency as estimated using the broken-stick distribution (red line). Over-represented COGs are labeled.