| Literature DB >> 35433132 |
Abstract
Cyanobacteria are important participants in global biogeochemical process, but their metabolic processes and genomic functions are incompletely understood. In particular, operon structure, which can provide valuable metabolic and genomic insight, is difficult to determine experimentally, and algorithmic operon predictions probably underestimate actual operon extent. A software method is presented for enhancing current operon predictions by incorporating information from whole-genome time-series expression studies, using a Machine Learning classifier. Results are presented for the marine cyanobacterium Crocosphaera watsonii. A total of 15 operon enhancements are proposed. The source code is publicly available.Entities:
Keywords: Crocosphaera; Cyanobacteria; Diel; Operon
Year: 2022 PMID: 35433132 PMCID: PMC9009326 DOI: 10.7717/peerj.13259
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Pairs of predicted operons recommended for merging.
| Score | Prior 1 | Prior 1 gene functions | Prior 2 | Prior 2 gene functions |
|---|---|---|---|---|
| 0.868 | 2207 | HAD-superfamily hydrolase, subfamily IA, var |
|
|
| 0.834 |
|
| 3080 | Glucose-6-phosphate dehydrogenase |
| 0.673 | 4526 | Hydrogenase expression/synthesis, HypA Hydrogenase accessory protein HypB |
|
|
| 0.672 | 3989 | extracellular solute-binding protein, family 3 Amino acid ABC transporter, permease protein |
|
|
| 0.631 | 5388 | Cytochrome-c oxidase | ||
| 0.618 | 1168 | 3-isopropylmalate dehydratase small subunit | ||
| 0.599 | 4216 | Pentapeptide repeat | ||
| 0.576 | 2640 | Phosphopantethiene-protein transferase | 2638 | K+ channel, pore region |
| 0.565 |
|
| 6742 | Competence-damaged protein:CinA, C-trmnl |
| 0.56 | 4082 | Carbamoyltransferase |
|
|
| 0.552 | 2855 | Glycosyl transferase, group 1 | 2853 | Phycobilisome linker polypeptide |
| 0.537 | 5116 | GTP cyclohydrolase I | 5114 | Cobalamin synthesis protein/P47K:Cobalami |
| 0.537 | 2158 | Ribosomal protein L33 | 2160 | Exoribonuclease II |
| 0.518 | 3464 | Porphobilinogen synthase | ||
| 0.514 | 3440 | Hemolysin-type calcium-binding region |
Note:
Each row presents a consecutive pair of previously predicted operons. The score is generated by a Logistic Model Tree classifier. Each pair presented here has classifier score >0.5, and therefore should likely be merged into a single longer predicted operon. The numbers in the “Prior 1” and “Prior 2” columns are gene identifiers, truncated for formatting; prepend “CwatDRAFT_” to the numbers to generate the full identifier. In the “Gene Functions” columns, genes with unknown function are noted in bold; known function of other genes in a predicted operon can provide clues to the unknown function.
Figure 1Classification algorithm.
Prior predicted operons (red, blue) are candidates for merging if their genes are consecutive, if all genes are on the same DNA strand, and if at least one gene in each prior exhibits diel expression.