| Literature DB >> 17397539 |
Jingjun Sun1, Kagan Tuncay1, Alaa Abi Haidar1, Lisa Ensman1, Frank Stanley1, Michael Trelinski1, Peter Ortoleva1.
Abstract
Transcriptional regulatory network (TRN) discovery from one method (e.g. microarray analysis, gene ontology, phylogenic similarity) does not seem feasible due to lack of sufficient information, resulting in the construction of spurious or incomplete TRNs. We develop a methodology, TRND, that integrates a preliminary TRN, microarray data, gene ontology and phylogenic similarity to accurately discover TRNs and apply the method to E. coli K12. The approach can easily be extended to include other methodologies. Although gene ontology and phylogenic similarity have been used in the context of gene-gene networks, we show that more information can be extracted when gene-gene scores are transformed to gene-transcription factor (TF) scores using a preliminary TRN. This seems to be preferable over the construction of gene-gene interaction networks in light of the observed fact that gene expression and activity of a TF made of a component encoded by that gene is often out of phase. TRND multi-method integration is found to be facilitated by the use of a Bayesian framework for each method derived from its individual scoring measure and a training set of gene/TF regulatory interactions. The TRNs we construct are in better agreement with microarray data. The number of gene/TF interactions we discover is actually double that of existing networks.Entities:
Year: 2007 PMID: 17397539 PMCID: PMC1852316 DOI: 10.1186/1748-7188-2-2
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Probability distribution for correlation (Pearson) between a random pair and known gene/TF regulatory interaction for E. coli. Square markers refer to the dataset obtained from the U. of Oklahoma E. coli database. Diamond markers refer to the datasets obtained from the NIH omnibus service (GSE7, GSE8, GSE9; 65 datasets). The solid and hollow markers show the probability distribution for correlation between a random gene pair and known gene/TF regulatory interaction, respectively. As these probability distributions are indistinguishable, it does not seem feasible to construct the TRN using expression data alone. We also calculated probability distributions for mutual information which yielded similar findings.
The list of bacteria used in the phylogenic similarity analysis.
| Actinobacteria | Bifidobacterium longum NCC2705, Corynebacterium diphtheriae NCTC 13129, Corynebacterium efficiens YS-314, Corynebacterium glutamicum ATCC13032, Corynebacterium glutamicum ATCC 13032, Leifsonia xyli subsp. xyli str. CTCB07, Mycobacterium avium subsp. paratuberculosis str. k10, Mycobacterium bovis AF2122/97, Mycobacterium leprae TN, Mycobacterium tuberculosis H37Rv, Mycobacterium tuberculosis CDC1551, Nocardia farcinica IFM 10152, Propionibacterium acnes KPA171202, Streptomyces avermitilis MA-4680, Streptomyces coelicolor A3(2), Symbiobacterium thermophilum IAM 14863, Tropheryma whipplei TW08/27, Tropheryma whipplei str. Twist |
| Aquificae | Aquifex aeolicus VF5 |
| Bacteroidetes | Bacteroides fragilis YCH46, Bacteroides fragilis NCTC 9343, Bacteroides thetaiotaomicron VPI-5482, Porphyromonas gingivalis W83 |
| Cyanobacteria | Prochlorococcus marinus subsp. marinus str. CCMP1375, Prochlorococcus marinus str. MIT 9313 |
| Chlamydiae | Chlamydophila abortus S26/3, Chlamydia muridarum Nigg, Chlamydia trachomatis D/UW-3/CX, Chlamydophila caviae GPIC, Chlamydophila pneumoniae AR39, Chlamydophila pneumoniae CWL029, Chlamydophila pneumoniae J138, Chlamydophila pneumoniae TW-183, Parachlamydia sp. UWE25 |
| Chlorobi | Chlorobium tepidum TLS |
| Chloroflexi | Dehalococcoides ethenogenes 195 |
| Crenarchaeota | Aeropyrum pernix K1, Pyrobaculum aerophilum str. IM2, Sulfolobus solfataricus P2, Sulfolobus tokodaii str. 7 |
| Cyanobacteria | Gloeobacter violaceus PCC 7421, Nostoc sp. PCC 7120, Prochlorococcus marinus subsp. pastoris str. CCMP1986, Synechococcus elongatus PCC 6301, Synechococcus sp. WH 8102, Synechocystis sp. PCC 6803, Thermosynechococcus elongatus BP-1 |
| Deinococcus-Thermus | Deinococcus radiodurans R1, Thermus thermophilus HB27, Thermus thermophilus HB8 |
| Euryarchaeota | Archaeoglobus fulgidus DSM 4304, Haloarcula marismortui ATCC 43049, Halobacterium sp. NRC-1, Methanothermobacter thermautotrophicus str.Delta H, Methanocaldococcus jannaschii DSM 2661, Methanococcus maripaludis S2, Methanopyrus kandleri AV19, Methanosarcina acetivorans C2A, Methanosarcina mazei Go1, Picrophilus torridus DSM 9790, Pyrococcus abyssi GE5, Pyrococcus furiosus DSM 3638, Pyrococcus horikoshii OT3, Thermococcus kodakaraensis KOD1, Thermoplasma acidophilum DSM 1728, Thermoplasma volcanium GSS1 |
| Firmicutes | Bacillus anthracis str. Ames, Bacillus anthracis str. 'Ames Ancestor', Bacillus anthracis str. Sterne, Bacillus cereus ATCC 14579, Bacillus cereus ATCC 10987, Bacillus cereus ZK, Bacillus clausii KSM-K16, Bacillus halodurans C-125, Bacillus licheniformis ATCC 14580, Bacillus subtilis subsp. subtilis str. 168, Bacillus thuringiensis serovar konkukian str. 97-27, Clostridium acetobutylicum ATCC 824, Clostridium perfringens str. 13, Clostridium tetani E88, Enterococcus faecalis V583, Geobacillus kaustophilus HTA426, Lactobacillus acidophilus NCFM, Lactobacillus johnsonii NCC 533, Lactobacillus plantarum WCFS1, Lactococcus lactis subsp. lactis Il1403, Listeria innocua Clip11262, Listeria monocytogenes EGD-e, Listeria monocytogenes str. 4b F2365, Mesoplasma florum L1, Mycoplasma gallisepticum R, Mycoplasma genitalium G-37, Mycoplasma hyopneumoniae 232, Mycoplasmamobile 163K, Mycoplasma mycoides subsp. mycoides SC str. PG1, Mycoplasma penetrans HF-2, Mycoplasma pneumoniae M129, Mycoplasma pulmonis UAB CTIP, Oceanobacillus iheyensis HTE831, Onion yellows phytoplasma OY-M, Staphylococcus aureus subsp. aureus COL, Staphylococcus aureus subsp. aureus MW2, Staphylococcus aureus subsp. aureus Mu50, Staphylococcus aureus subsp. aureus N315, Staphylococcus aureus subsp. aureus MRSA252, Staphylococcus aureus subsp. aureus MSSA476, Staphylococcus epidermidis ATCC 12228, Staphylococcus epidermidis RP62A, Streptococcus agalactiae 2603V/R, Streptococcus agalactiae NEM316, Streptococcus mutans UA159, Streptococcus pneumoniae R6, Streptococcus pneumoniaeTIGR4, Streptococcus pyogenes M1 GAS, Streptococcus pyogenes MGAS10394, Streptococcus pyogenes MGAS315, Streptococcus pyogenes MGAS8232, Streptococcus pyogenes SSI-1, Streptococcus thermophilus CNRZ1066, Streptococcus thermophilus LMG 18311, Thermoanaerobacter tengcongensis MB4, Ureaplasma parvum serovar 3 str. ATCC 700970 |
| Fusobacteria | Fusobacterium nucleatum subsp. nucleatum ATCC 25586 |
| Nanoarchaeota | Nanoarchaeum equitans Kin4-M |
| Planctomycetes | Rhodopirellula baltica SH 1 |
| Proteobacteria | Acinetobacter sp. ADP1, Agrobacterium tumefaciens str. C58, Agrobacterium tumefaciens str. C58, Anaplasma marginale str. St. Maries, Azoarcus sp. EbN1, Bartonella henselae str. Houston-1, Bartonella quintana str. Toulouse, Bdellovibrio bacteriovorus HD100, Candidatus Blochmannia floridanus, Bordetella bronchiseptica RB50, Bordetella parapertussis 12822, Bordetella pertussis Tohama I, Bradyrhizobium japonicum USDA 110, Brucella abortus biovar 1 str. 9–941, Brucella melitensis 16M, Brucella suis 1330, Buchnera aphidicola str. Bp (Baizongia pistaciae), Buchnera aphidicola str. Sg (Schizaphis graminum), Buchnera aphidicola str. APS (Acyrthosiphon pisum), Burkholderia mallei ATCC 23344, Burkholderia pseudomallei K96243, Campylobacter jejuni subsp. jejuni NCTC 11168, Campylobacter jejuni RM1221, Caulobacter crescentus CB15, Chromobacterium violaceum ATCC 12472, Coxiella burnetii RSA 493, Desulfotalea psychrophila LSv54, Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough, Ehrlichia ruminantium str. Gardel, Ehrlichia ruminantium str. Welgevonden, Ehrlichia ruminantium str. Welgevonden, Erwinia carotovora subsp. atroseptica SCRI1043, Escherichia coli CFT073, Escherichia coli K12, Escherichia coli O157:H7 EDL933, Escherichia coli O157:H7, Francisella tularensis subsp. tularensis Schu 4, Gluconobacter oxydans 621H, Geobacter sulfurreducens PCA, Haemophilus ducreyi 35000HP, Haemophilus influenzae Rd KW20, Helicobacter hepaticus ATCC 51449, Helicobacter pylori 26695, Helicobacter pylori J99, Idiomarina loihiensis L2TR, Legionella pneumophila str. Lens, Legionella pneumophila str. Paris, Legionella pneumophila subsp. pneumophila str. Philadelphia 1, Mannheimia succiniciproducens MBEL55E, Mesorhizobium loti MAFF303099, Methylococcus capsulatus str. Bath, Neisseria gonorrhoeae FA 1090, Neisseria meningitidis MC58, Neisseria meningitidis Z2491, Nitrosomonas europaea ATCC 19718, Pasteurella multocida subsp.multocida str. Pm70, Photobacterium profundum SS9, Photorhabdus luminescens subsp. laumondii TTO1, Pseudomonas aeruginosa PAO1, Pseudomonas putida KT2440, Pseudomonas syringae pv. syringae B728a, Pseudomonas syringae pv. tomato str. DC3000, Ralstonia solanacearum GMI1000, Rhodopseudomonas palustris CGA009, Rickettsia conorii str. Malish 7, Rickettsia prowazekii str. Madrid E, Rickettsia typhi str. Wilmington, Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67, Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150, Salmonella enterica subsp. enterica serovar Typhi str. CT18, Salmonella enterica subsp. enterica serovar Typhi Ty2, Salmonella typhimurium LT2, Shewanella oneidensis MR-1, Shigella flexneri 2a str. 301, Silicibacter pomeroyi DSS-3, Sinorhizobium meliloti 1021, Shigella flexneri 2a str. 2457T, Vibrio cholerae O1 biovar eltor str. N16961, Vibrio fischeri ES114, Vibrio parahaemolyticus RIMD 2210633, Vibriovulnificus CMCP6, Vibrio vulnificus YJ016, Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis, Wolbachia endosymbiont strain TRS of Brugia malayi, Wolbachia endosymbiont of Drosophila melanogaster, Wolinella succinogenes DSM 1740, Xanthomonas campestris pv. campestris str. ATCC 33913, Xylella fastidiosa 9a5c, Xanthomonas axonopodis pv. citri str. 306, Xanthomonas campestris pv. campestris str. 8004, Xanthomonas oryzae pv. oryzae KACC10331, Xylella fastidiosa Temecula1, Yersinia pestis biovar Medievalis str. 91001, Yersinia pestis CO92, Yersinia pestis KIM, Yersinia pseudotuberculosis IP 32953, Zymomonas mobilis subsp. mobilis ZM4 |
| Spirochaetes | Borrelia burgdorferi B31, Borrelia garinii PBi chromosome linear, Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130, Leptospira interrogans serovar Lai str. 56601, Treponema denticola ATCC 35405, Treponema pallidum subsp. pallidum str. Nichols |
| Thermotogae | Thermotoga maritima MSB8 |
Figure 2Properties of TRNs used in the synthetic examples. Networks that consist of 1000 genes and 100 TFs are generated using the probability distribution for the number of genes regulated by a given TF shown in (a). The corresponding probability distribution for the number of regulators per gene is shown in (b). The average number of regulators per gene is 3.62, 5.22, and 7.02 for Networks 1, 2 and 3, respectively. Equal likelihood is chosen for up versus down regulation.
Figure 3Reconstruction of TRNs. We have used the Network 1 of Fig. 2 and generated synthetic expression data. Then, we eliminated 50% of the network (randomly), and used FTF to reconstruct the deleted network. Fig. a) shows the percentage of the deleted network recovered as a function of success rate, a measure of the likelihood that an interaction is correct, as estimated from the training set (known interactions). As the number of microarray experiments increases, a higher percentage of the network can be reconstructed. However, full reconstruction requires too many experiments. Fig. b) shows success rate as a function of the absolute value of the linear correlation between the constructed TF activity profiles and gene expression data.
Figure 4Effect of TRN properties. We used Networks 1, 2 and 2 of Fig. 2 to generate 100 synthetic expression data sets, and eliminated 50% of the gene/TF interactions in the TRN. Shown is the percentage of the deleted network recovered as a function of success rate. As the number interactions increases, the percentage of the network that can be recovered decreases. b) Same as a) except we used Network 1 and eliminated 25%, 50%, and 75% of the network. As expected, higher percentage of the deleted network is recoverable when a more complete network is known.
Figure 5a) Probability distribution for the number of genes regulated by a given TF, b) probability distribution for the number of gene/TF interactions per gene. These graphs are based on the preliminary TRN taken from [14].
Figure 6Probability distribution of FTF similarity scores of the training set (dashed) with respect the random set (solid). x-axis refers to FTF similarity score while y-axis refers to its probability distribution. A comparison with Fig. 1 (diamond markers) shows that our approach is superior to the gene-gene correlation approach.
Figure 7Comparison of the probability distributions of GO similarity scores of the training set (square markers) and the random set (diamond markers). The training set consists of all known E. coli gene/TF interactions for those genes with GO terms assigned. The random set consists of all possible gene/TF interactions for those genes with GO terms assigned. It is seen that higher GO similarity score implies higher likelihood of a gene/TF interaction, particularly when the GO similarity score is larger than 8.
Figure 8Comparison of the probability distributions of Phylogenic Similarity scores of the training set (dashed) and the random set (solid). x-axis refers to Phylogenic Similarity Score while y-axis refers to its probability distribution. The training set is based on all known gene/TF interactions from [14]. The random set consists of all possible gene/TF interactions. It is seen that higher score implies higher likelihood of a gene/TF interaction, particularly when the similarity score is greater than 500.
Figure 9Probability distribution of combined scores for the training set (dashed) and the random set (solid). The training set is based on all known gene/TF interactions from [14]. The random set consists of all possible gene/TF interactions. It is seen that higher combined score implies higher likelihood of a gene/TF interaction.
Figure 10Probability distributions for the number of gene/TF interactions per gene. Although the suggested TRN is denser, the overall shape of the probability distribution remains the same.
Out of 206 gene/TF interactions found in the RegulonDB (Salgado et al. 2004) and EcoCyc databases, 44 scored higher than the imposed threshold.
| TF | Gene | Final Score | predicted sign | actual sign | |
| 1 | ArcA-Phosphorylated | yfiD | 1.670768591 | up | up |
| 2 | ArgR-L-arginine | argG | 2.624262246 | down | down |
| 3 | CRP-cAMP | ugpQ | 2.085693237 | up | up |
| 4 | CRP-cAMP | ugpC | 1.438960585 | up | up |
| 5 | CRP-cAMP | ugpB | 1.527292985 | up | up |
| 6 | CRP-cAMP | rhaB | 2.909109432 | up | up |
| 7 | CRP-cAMP | cytR | 2.119101207 | up | up |
| 8 | CRP-cAMP | fis | 1.887330412 | up/down | up/down |
| 9 | FhlA-Formate | hyfA | 2.509458668 | up | up |
| 10 | Fis | relA | 1.595459732 | up | up |
| 11 | GntR | idnK | 1.591861262 | down | down |
| 12 | Lrp | livJ | 1.380451883 | down | down |
| 13 | MalT-Maltotriose-ATP | malZ | 1.44543234 | up | up |
| 14 | NarL-Phosphorylated | fdhF | 2.059069186 | up | up |
| 15 | NtrC-Phosphorylated | astC | 2.256990592 | up | up |
| 16 | ArgR-L-arginine | astC | 1.505034 | down | up |
| 17 | CRP-cAMP | nagE | 1.501987378 | up/down | up |
| 18 | CRP-cAMP | rpoS | 2.319114034 | up/down | down |
| 19 | OmpR-Phosphorylated | nmpC | 1.572242248 | up | down |
| 20 | ArcA-Phosphorylated | aceE | 1.998375233 | down | down |
| 21 | ArcA-Phosphorylated | appC | 2.066734924 | up | up |
| 22 | CRP-cAMP | entC | 2.93996178 | up | up |
| 23 | CRP-cAMP | fepA | 3.821927836 | up | up |
| 24 | CRP-cAMP | fumA | 3.496049117 | up | up |
| 25 | CRP-cAMP | gadB | 2.197333426 | down | down |
| 26 | CRP-cAMP | galP | 2.37334941 | up | up |
| 27 | CRP-cAMP | gapA | 1.811986697 | up | up |
| 28 | CRP-cAMP | ompF | 1.315706561 | up | up |
| 29 | FruR | acnA | 1.633652931 | up | up |
| 30 | FruR | glk | 2.112996214 | down | down |
| 31 | GadE | gadB | 1.973757857 | up | up |
| 32 | Hns | gadB | 1.78680429 | down | down |
| 33 | LexA | uvrC | 1.590391147 | down | down |
| 34 | MetJ-S-adenosylmethionine | metE | 3.264214502 | down | down |
| 35 | MetJ-S-adenosylmethionine | metR | 1.611151228 | down | down |
| 36 | MetR-Homocysteine | metE | 3.45265202 | up | up |
| 37 | PhoP-Phosphorylated | rstA | 1.6475787 | up | up |
| 38 | CRP-cAMP | prpR | 3.147648347 | up/down | up |
| 39 | Fnr | fdhF | 1.985483486 | up/down | up |
| 40 | CRP-cAMP | nagE | 1.501987378 | up/down | up |
| 41 | NarP-Phosphorylated | fdhFp | 1.645007039 | up/down | up |
| 42 | NarP-Phosphorylated | fdnG | 2.778075473 | up/down | down |
| 43 | Fnr | dcuS | 1.359810967 | down | up |
| 44 | Fnr | purM | 1.427076396 | up | down |
The p-value (using binary distribution) is found to be less than 1.0e-50.