| Literature DB >> 22675603 |
Javier A Izquierdo, Lynne Goodwin, Karen W Davenport, Hazuki Teshima, David Bruce, Chris Detter, Roxanne Tapia, Shunsheng Han, Miriam Land, Loren Hauser, Cynthia D Jeffries, James Han, Sam Pitluck, Matt Nolan, Amy Chen, Marcel Huntemann, Konstantinos Mavromatis, Natalia Mikhailova, Konstantinos Liolios, Tanja Woyke, Lee R Lynd.
Abstract
Clostridium clariflavum is a Cluster III Clostridium within the family Clostridiaceae isolated from thermophilic anaerobic sludge (Shiratori et al, 2009). This species is of interest because of its similarity to the model cellulolytic organism Clostridium thermocellum and for the ability of environmental isolates to break down cellulose and hemicellulose. Here we describe features of the 4,897,678 bp long genome and its annotation, consisting of 4,131 protein-coding and 98 RNA genes, for the type strain DSM 19732.Entities:
Keywords: Anaerobic; bioenergy; biotechnology; cellulolytic; cellulosome; lignocellulose utilization; thermophilic
Year: 2012 PMID: 22675603 PMCID: PMC3368404 DOI: 10.4056/sigs.2535732
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Figure 1Phylogenetic tree of 16S rRNA gene highlighting the position of Clostridium clariflavum DSM 19732 relative to other clostridia in Cluster III. This tree was inferred from 1,401 aligned characters using the Minimum Evolution criterion [5] and rooted using C. cellulosi (from adjacent clostridial Cluster IV). Numbers above branches are support values from 1,000 bootstrap replicates [6] if larger than 60%.
Figure 2Scanning electron micrograph of C. clariflavum DSM 19732.
Classification and general features of Clostridium clariflavum DSM 19732 according to the MIGS recommendations [8]
| | | | |
|---|---|---|---|
| Current classification | Domain | TAS [ | |
| Phylum | TAS [ | ||
| Class | TAS [ | ||
| Order | TAS [ | ||
| Family | TAS [ | ||
| Genus | TAS [ | ||
| Species | TAS [ | ||
| Type strain EBR45T | TAS [ | ||
| Gram stain | positive | TAS [ | |
| Cell shape | straight or slightly curved rods | TAS [ | |
| Motility | non-motile | TAS [ | |
| Sporulation | sporulating | TAS [ | |
| Temperature range | thermophile | TAS [ | |
| Optimum temperature | 55-60oC | TAS [ | |
| Carbon source | Cellulose and cellobiose | TAS [ | |
| Energy source | chemoorganotrophic | TAS [ | |
| Terminal electron receptor | |||
| MIGS-6 | Habitat | Municipal waste | TAS [ |
| MIGS-6.3 | Salinity | 0–0.7% (w/v) | TAS [ |
| MIGS-22 | Oxygen | Moderately anaerobic (O2<0.4%) | TAS [ |
| MIGS-15 | Biotic relationship | free living | NAS |
| MIGS-14 | Pathogenicity | non pathogenic | NAS |
| MIGS-4 | Geographic location | not reported | |
| MIGS-5 | Sample collection time | 2006 | TAS [ |
| MIGS-4.1 | Latitude | not reported | |
| MIGS-4.2 | Longitude | not reported | |
| MIGS-4.3 | Depth | not reported | |
| MIGS-4.4 | Altitude | not reported |
Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [20]
Project information
| | | |
|---|---|---|
| MIGS-31 | Finishing quality | Complete, Level 6: Finished |
| MIGS-28 | Libraries used | 454 Standard and 9kb libraries, Illumina Standard library |
| MIGS-29 | Sequencing platforms | Illumina GAii, 454-GS-FLX-Titanium |
| MIGS-31.2 | Fold coverage | 38.7 × pyrosequence, 695.4 × Illumina sequence |
| MIGS-30 | Assemblers | Newbler, Velvet, Phrap |
| MIGS-32 | Gene calling method | Prodigal 1.4, GenePRIMP |
| Genbank ID | CP003065 | |
| Genbank Date of Release | December 12, 2011 | |
| GOLD ID | Gi10738 | |
| Project relevance | Bioenergy, lignocellulose utilization |
Nucleotide content and gene count levels of the genome
| | | |
|---|---|---|
| Genome size (bp) | 4,897,678 | 100.00% |
| DNA Coding region (bp) | 3,915,750 | 79.95% |
| DNA G+C content (bp) | 1,749,312 | 35.72% |
| Total genesb | 4,229 | 100.00% |
| RNA genes | 98 | 2.32% |
| rRNA genes | 6 | |
| Protein-coding genes | 4,131 | 97.68% |
| Pseudo genes | 239 | 5.65% |
| Genes with function prediction | 3,014 | 71.27% |
| Genes in paralog clusters | 585 | 13.83% |
| Genes assigned to COGs | 2,850 | 67.39% |
| Genes assigned Pfam domains | 3,029 | 71.62% |
| Genes with signal peptides | 1,003 | 23.72% |
| Genes with transmembrane helices | 1,047 | 24.76% |
a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.
b) Also includes 239 pseudogenes.
Figure 3Graphical circular map of the genome. From outside to the center; Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.
Number of genes associated with the 25 general COG functional categories
| | | | |
|---|---|---|---|
| J | 169 | 5.44 | Translation |
| A | 1 | 0.03 | RNA processing and modification |
| K | 224 | 7.21 | Transcription |
| L | 372 | 11.98 | Replication, recombination and repair |
| B | 1 | 0.03 | Chromatin structure and dynamics |
| D | 53 | 1.71 | Cell cycle control, mitosis and meiosis |
| Y | 0 | 0.00 | Nuclear structure |
| V | 81 | 2.61 | Defense mechanisms |
| T | 191 | 6.15 | Signal transduction mechanisms |
| M | 207 | 6.67 | Cell wall/membrane biogenesis |
| N | 89 | 2.87 | Cell motility |
| Z | 4 | 0.13 | Cytoskeleton |
| W | 0 | 0.00 | Extracellular structures |
| U | 70 | 2.25 | Intracellular trafficking and secretion |
| O | 115 | 3.70 | Posttranslational modification, protein turnover, chaperones |
| C | 150 | 4.83 | Energy production and conversion |
| G | 166 | 5.35 | Carbohydrate transport and metabolism |
| E | 193 | 6.22 | Amino acid transport and metabolism |
| F | 70 | 2.25 | Nucleotide transport and metabolism |
| H | 136 | 4.38 | Coenzyme transport and metabolism |
| I | 54 | 1.74 | Lipid transport and metabolism |
| P | 132 | 4.25 | Inorganic ion transport and metabolism |
| Q | 31 | 1.00 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 332 | 10.69 | General function prediction only |
| S | 264 | 8.50 | Function unknown |
| - | 1,379 | 32.61 | Not in COGs |
a) The total is based on the total number of protein coding genes in the annotated genome.
Figure 4Comparison of glycosyl hydrolase inventory between C. clariflavum DSM 19732 and C. thermocellum ATCC 27405. The numbers of genes per GH family are shown, with GH families organized along the y-axis based on putative substrate specificity (cellulose, xylan, mannan, xyloglucan and other).
Bifunctional glycosyl hydrolases in the C. clariflavum genome
| | | | |
|---|---|---|---|
| Clocl_1418 | GH11- CBM6-Doc-UK | Endo-1,4-beta-xylanase, unknown | Cellulosome |
| Clocl_2083 | GH11-GH10-Doc | Endo-1,4-beta-xylanases | Cellulosome |
| Cloco_2441 | GH11-CBM6-Doc-GH10 | Endo-1,4-beta-xylanases | Cellulosome |
| Clocl_3038 | GH48-GH9-CBM3-CBM3 | Cellulose 1,4-beta-cellobiosidase (GH48), Endo-1,4-β-D-glucanase (GH9) | Untethered |
aDomain architecture denotes presence of glycosyl hydrolases (GH) from families 9, 10, 11 and 48, Type I dockerin domains (Doc), carbohydrate binding modules (CBM) from families 3 and 6, and domains of unknown function (UK)
Genes with putative carbohydrate sensing function
| | | | | |
|---|---|---|---|---|
| Clocl_1053/ Clocl_1054 | SigI / RsgI-UK-CBM3 | Cellulose | Cthe_0268/ Cthe_0267 | 61/41 |
| Clocl_2843/ Clocl_2844 | SigI / RsgI-UK-CBM3 | Cellulose | Cthe_0058/ Cthe_0059 | 58/31 |
| Clocl_4008/ Clocl_4009 | SigI / RsgI-UK-CBM3 | Cellulose | Cthe_0058/ Cthe_0059 | 59/36 |
| Clocl_2098/ Clocl2099 | SigI / RsgI-UK-CBM42 | Xylan | Cthe_1272/ Cthe_1273 | 72/44 |
| Clocl_2747/ Clocl_2748 | SigI / RsgI-UK-PA14-PA14 | Pectin | Cthe_0315/ Cthe_0316 | 43/32 |
| Clocl_2044/ Clocl_2045 | SigI / RsgI-UK | Unknown | Cthe_2975/ | 44/32 |
| Clocl_4136/ Clocl_4137 | SigI / RsgI-UK | Unknown | Cthe_2522/ Chte_2521 | 63/42 |
| Clocl_2797/ Clocl_2798 | SigI / RsgI-UK | Unknown | Cthe_2975/ Cthe_2974 | 65/59 |
1Domain structure denotes pairs of sigma I- like protein (SigI) and its associated trans-membrane protein (RsgI), containing domains of unknown function (UK), carbohydrate binding domains from families 3 (CBM3) and 42 (CBM42), and a conserved domain proposed to have pectin-binding function (PA14).
2Indicates matches in pairs of loci in the C. thermocellum ATCC 27405 genome