| Literature DB >> 26196387 |
Nupoor Chowdhary1, Ashok Selvaraj1, Lakshmi KrishnaKumaar1, Gopal Ramesh Kumar1.
Abstract
Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26196387 PMCID: PMC4510573 DOI: 10.1371/journal.pone.0133183
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Visualization of alignment of the 8 Caldicellulosiruptor genomes generated using Blast Ring Image Generator (BRIG) showing (from inner to outer), % G+C, GC skew and the homology based on BLASTn.
The deep purple circle represents the reference sequence, C. saccharolyticus. Outer rings show shared identity (according to BLASTn) with various other Caldicellulosiruptor genomes. BLASTn matches between 50% and 100% nucleotide identity are colored from lightest to darkest shade respectively, according to the graduated scale on the right of the circular BLAST image. Matches with less than 50% identity appear as blank spaces in each ring.
Fig 2Statistics of Known, Hypothetical and Putative genes before and after re-annotation.
Before re-annotation category include: Known (1854), Hypothetical (781) and Putative (47). After re-annotation category include: Known (2285), Hypothetical: old CDSs (372) + new CDSs (182) and putative: old CDSs (25) + new CDSs (49).
Functional categories of Coding Sequences in the genome of Caldicellulosiruptor saccharolyticus before and after re-annotation.
| COG category | COG functional category | Number of CDSs in each category | Number of CDSs in each category |
|---|---|---|---|
| INFORMATION STORAGE AND PROCESSING | |||
|
| Translation, ribosomal structure and biogenesis | 147 | 157 |
|
| Transcription | 134 | 158/2 |
|
| Replication, recombination and repair | 222 | 273/6 |
|
| Chromatin structure and dynamics | 2 | 2 |
| METABOLISM | |||
|
| Amino acid transport and metabolism | 166 | 200/1 |
|
| Nucleotide transport and metabolism | 56 | 57 |
|
| Carbohydrate transport and metabolism | 213 | 236/1 |
|
| Coenzyme transport and metabolism | 101 | 95 |
|
| Energy production and conversion | 111 | 115 |
|
| Inorganic ion transport and metabolism | 73 | 100 |
|
| Secondary metabolites biosynthesis, transport and catabolism | 14 | 24 |
|
| Lipid transport and metabolism | 34 | 39 |
| CELLULAR PROCESSES AND SIGNALING | |||
|
| Cell wall/membrane/envelope biogenesis | 107 | 122/2 |
|
| Cell cycle control, cell division, chromosome partitioning | 35 | 57 |
|
| Signal transduction mechanisms | 125 | 126/3 |
|
| Intracellular trafficking, secretion, and vesicular transport | 42 | 55 |
|
| Defense mechanisms | 48 | 56/2 |
|
| Cytoskeleton | 3 | 3 |
|
| Cell motility | 71 | 81 |
|
| Posttranslational modification, protein turnover, chaperones | 59 | 67/1 |
| POORLY CHARACTERIZED | |||
|
| General function prediction only | 228 | 315/1 |
|
| Function unknown | 177 | 199 |
*Number of protein-encoding genes in each category without pseudogenes.
# (X/Y) = > X: value belonging to the old CDSs / Y: value belonging to the new CDSs.
List of Predicted Glycoside hydrolase, S-layer domain containing proteins, hydrogenase, and iron-sulfur clusters (with confidence level) using sequence similarity based approach.
| Label | RefSeqID | CG Cont-ent | Annotated Function | Re-annotated Function | Confidence level |
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig 3Figure showing a network of predicted associations for a particular group of proteins (related to COG2006 (containing the hypothetical protein Csac_1294 (red)) which is predicted to be involved in valine, leucine and isoleucine biosynthesis pathway and oxidation-reduction process.
The network edges represent the predicted functional associations. Any edge may be drawn with differently coloured lines: a red line indicates the presence of fusion evidence; a green line—neighborhood evidence; a blue line—Co-occurrence evidence, a black line—Co-expression evidence, a yellow line—text mining evidence, and a light blue line indicates database evidence.
Functional categories of newly predicted Coding Sequences in the genome of Caldicellulosiruptor saccharolyticus.
| Labels | COG class ID | COG functional category | COG process |
|---|---|---|---|
| CALS8_0014,CALS8_0429,CALS8_0430, | T | Signal transduction mechanisms | CELLULAR PROCESSES AND SIGNALING |
| CALS8_1121,CALS8_2818 | V | Defense mechanisms | CELLULAR PROCESSES AND SIGNALING |
| CALS8_1664,CALS8_2385 | M | Cell wall/membrane/ envelope biogenesis | CELLULAR PROCESSES AND SIGNALING |
| CALS8_2734 | O | Posttranslational modification, protein turnover, chaperones | CELLULAR PROCESSES AND SIGNALING |
| CALS8_0355,CALS8_0358,CALS8_0367,CALS8_0544,CALS8_0639,CALS8_2622 | L | Replication, recombination and repair | INFORMATION STORAGE AND PROCESSING |
| CALS8_0578,CALS8_1233 | K | Transcription | INFORMATION STORAGE AND PROCESSING |
| CALS8_1211 | G | Carbohydrate transport and metabolism | METABOLISM |
| CALS8_2733 | E | Amino acid transport and metabolism | METABOLISM |
| CALS8_2384 | R | General function prediction only | POORLY CHARACTERIZED |