| Literature DB >> 15867197 |
Sarath Chandra Janga1, Julio Collado-Vides, Gabriel Moreno-Hagelsieb.
Abstract
Since operons are unstable across Prokaryotes, it has been suggested that perhaps they re-combine in a conservative manner. Thus, genes belonging to a given operon in one genome might re-associate in other genomes revealing functional relationships among gene products. We developed a system to build networks of functional relationships of gene products based on their organization into operons in any available genome. The operon predictions are based on inter-genic distances. Our system can use different kinds of thresholds to accept a functional relationship, either related to the prediction of operons, or to the number of non-redundant genomes that support the associations. We also work by shells, meaning that we decide on the number of linking iterations to allow for the complementation of related gene sets. The method shows high reliability benchmarked against knowledge-bases of functional interactions. We also illustrate the use of Nebulon in finding new members of regulons, and of other functional groups of genes. Operon rearrangements produce thousands of high-quality new interactions per prokaryotic genome, and thousands of confirmations per genome to other predictions, making it another important tool for the inference of functional interactions from genomic context.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15867197 PMCID: PMC1088069 DOI: 10.1093/nar/gki545
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Finding links by operon rearrangement. Operon predictions are based on a well established method which relies solely on inter-genic distances (14,15), and not on conservation of gene order. This is the main difference with other available tools (11–13). Though we also incorporate fusions >99% of our links come from operon predictions alone.
Figure 2(a) Distribution of KEGG links recovered in 1000 randomly shuffled networks keeping the connectivity fixed in E.coli K12. (b) Distribution of DIP links obtained in same set of 1000 random networks.
Figure 3Effect of increasing thresholds on the quality of predictions. We used the fraction of predicted links whose products work within the same KEGG metabolic pathway as a measure of quality. (a) Effect of increasing the LLH to accept an operon prediction. (b) Effect of increasing the number of associations (number of times the genes are found in the same operon). The measure is far from perfect, but it does give a sense of what happens as thresholds increase. The apparently slow growth in quality with increasing LLH is due to the 0.0 threshold being high to start with. Operon predictions have a positive predictive value (true positives divided by the sum of true positives and false positives) of 0.86 at a 0.0 LLH, and of 0.93 at 1.0 LLH in E.coli K12.
Figure 4Fraction of internal versus external links found in E.coli K12 network. (a) KEGG and DIP datasets. (b) Fraction of internal versus external links found in E.coli K12 network for each pathway in KEGG. Pathway identifiers mean MAP00193: ATP synthesis; MAP00632: Benzoate degradation via CoA ligation; MAP00650: Butanoate metabolism; MAP00020: Citrate cycle (TCA cycle); MAP00061: Fatty acid biosynthesis (path 1); MAP00071: Fatty acid metabolism; MAP02040: Flagellar assembly; MAP00790: Folate biosynthesis; MAP00260: Glycine, serine and threonine metabolism; MAP00010: Glycolysis/Gluconeogenesis; MAP00630: Glyoxylate and dicarboxylate metabolism; MAP00340: Histidine metabolism; MAP00300: Lysine biosynthesis; MAP00910: Nitrogen metabolism; MAP00520: Nucleotide sugars metabolism; MAP00190: Oxidative phosphorylation; MAP00770: Pantothenate and CoA biosynthesis; MAP00040: Pentose and glucuronate interconversions; MAP00030: Pentose phosphate pathway; MAP00550: Peptidoglycan biosynthesis; MAP00400: Phenylalanine, tyrosine and tryptophan biosynthesis; MAP00195: photosynthesis; MAP00860: porphyrin and chlorophyll metabolism; MAP00640: propanoate metabolism; MAP00230: purine metabolism; MAP00240: pyrimidine metabolism; MAP00720: reductive carboxylate cycle (CO2 fixation); MAP00500: starch and sucrose metabolism; MAP03070: type III secretion system; MAP00130: ubiquinone biosynthesis; MAP00220: Urea cycle and metabolism of amino groups; and MAP00290: valine, leucine and isoleucine biosynthesis.
Figure 5Links to the argR gene coding for the ArgR transcription factor in E.coli K12 using a LLH threshold of 0.4 and associations found in at least one genome.
Details of the newly found links in the recovery of the ArgR regulon
| Gene | No. of associations and genomes in which the evidence is found | No. of intervening genes and LLHs | Function of protein |
|---|---|---|---|
| 5— | 0 (0.4291), 0 (0.4291), 0 (0.5067), 0 (0.8840), 0 (0.8840) | Protein used in recombination and DNA repair | |
| 3— | 1 (0.8840), 1 (1.1343), 0 (0.7944) | Amino acid biosynthesis, arginine acetylornithine delta-aminotransferase | |
| 2— | 0 (0.1721), 0 (0.8840) | DNA-replication, repair. Methyl-directed mismatch repair | |
| 2— | 0 (1.1343), 0 (0.5067) | Putative enzyme | |
| 1— | 0 (0.7944) | DNA-replication, repair. Flavoprotein affecting synthesis of DNA and pantothenate metabolism | |
| gmk | 1— | 2 (0.7944) | Purine ribonucleotide biosynthesis, guanylate kinase |
| 1— | 0 (0.7944) | Putative transport | |
| dxs | 1— | 2 (0.4291) | Central intermediary metabolism, 1-deoxyxylulose-5-phosphate synthase |
| 1— | 0 (0.8840) | Hypothetical protein | |
| 1— | 4 (0.5067) | Biosynthesis of cofactors, folic acid 5,10-methylene-tetrahydrofolate dehydrogenase | |
| ispA | 1— | 1 (0.5067) | Biosynthesis of cofactors, geranyltransferase |
| 1— | 5 (0.5067) | RNA synthesis, transcription termination, L factor | |
| 1— | 3 (0.5067) | Degradation of DNA |
aCases where we expect the genes to be linked functionally because the LLH scores are high and the orthologs are conserved with no intervening genes in the genome of evidence. The genes gmk, ychE and yfjB have been predicted to be regulated by ArgR (28). It can also be noticed that in all these cases the genes are either putative, hypothetical or poorly annotated indicating the possibility of these associations to be real.
bIn all 13 of these links we only expect the links marked (3 in number) to be false positives because of the high number of intervening genes. Such links could serve as a guide for future refinements in Nebulon. Complete genome names can be found in the website.
Figure 6Uber-operon recovery. (a) Links to tufA in Nebulon with a LLH threshold of 0.4. (a) Minimum number of evidences set to 1. (b) Minimum number of evidences set to 2. (b) Two shells of links to flgA in Nebulon showing known and predicted associations.
Figure 7Links among genes involved in Nitrogen fixation and in Nodulation of S.meliloti. Core genes refer to genes annotated as involved in these activities in S.meliloti (31), while non-core are other linked genes found by Nebulon.
Genes having at least two links with genes related to nitrogen fixation in S.meliloti
| Gene | No. of links to core | Function of protein |
|---|---|---|
| cysH | 5 | Probable thioredoxin dependent padops reductase 3′-phosphoadenylylsulfate sulfotransferase cysteine biosynthesis protein |
| cysG | 4 | Probable siroheme synthase protein |
| cysQ | 4 | Putative transmembrane protein |
| SMc02124 | 4 | Putative nitrite reductase protein |
| cobA | 3 | Probable uroporphyrin-III C-methyltransferase protein |
| fixG | 3 | Iron sulfur membrane protein |
| fixI1 | 3 | Copper transport ATPase |
| cysD | 2 | Putative sulfate adenylate transferase subunit 2 cysteine biosynthesis protein |
| dcp | 2 | Probable peptidyl-dipeptidase A protein |
| etf | 2 | Probable electron transfer flavoprotein-ubiquinone oxidoreductase |
| fixI2 | 2 | E1–E2 type cation ATPase |
| fixN1 | 2 | Heme b/copper cytochrome |
| fixO2 | 2 | cytochrome |
| fixP1 | 2 | Di-heme cytochrome |
| glcF | 2 | Probable glycolate oxidase iron-sulfur subunit protein |
| ispB | 2 | Putative octaprenyl-diphosphate synthase protein |
| ivdH | 2 | Putative isovaleryl-CoA dehydrogenase protein |
| pfs | 2 | Putative MTA/SAH nucleosidase P46 includes: 5′-methylthioadenosine nucleosidase and |
| rpsJ | 2 | Probable 30S ribosomal protein S10 |
| SMa1207 | 2 | FixK-like regulatory protein |
| SMa2359 | 2 | Conserved hypothetical protein |
| SMb20753 | 2 | Putative acyl-CoA dehydrogenase protein |
| SMb21225 | 2 | Putative inositol monophosphatase, possibly involved in PAPS metabolism protein |
| SMb21232 | 2 | Putative nucleotide sugar epimerase dehydratase protein |
| SMc00977 | 2 | Putative acyl-COA dehydrogenase protein |
| SMc01153 | 2 | Probable enoyl COA hydratase protein |
| SMc02123 | 2 | Conserved hypothetical protein |
| thiF | 2 | Putative Thiamine biosynthesis transmembrane protein |
| typA | 2 | Probable GTP-binding protein |
| ubiE | 2 | Probable ubiquinone/menaquinone biosíntesis methyltransferase protein |