| Literature DB >> 18701645 |
Jiajian Liu1, Xing Xu, Gary D Stormo.
Abstract
While hundreds of microbial genomes are sequenced, the challenge remains to define their cis-regulatory maps. Here, we present a comparative genomic analysis of the cis-regulatory map of Shewanella oneidensis, an important model organism for bioremediation because of its extraordinary abilities to use a wide variety of metals and organic molecules as electron acceptors in respiration. First, from the experimentally verified transcriptional regulatory networks of Escherichia coli, we inferred 24 DNA motifs that are conserved in S. oneidensis. We then applied a new comparative approach on five Shewanella genomes that allowed us to systematically identify 194 nonredundant palindromic DNA motifs and corresponding regulons in S. oneidensis. Sixty-four percent of the predicted motifs are conserved in at least three of the seven newly sequenced and distantly related Shewanella genomes. In total, we obtained 209 unique DNA motifs in S. oneidensis that cover 849 unique transcription units. Besides conservation in other genomes, 77 of these motifs are supported by at least one additional type of evidence, including matching to known transcription factor binding motifs and significant functional enrichment or expression coherence of the corresponding target genes. Using the same approach on a more focused gene set, 990 differentially expressed genes derived from published microarray data of S. oneidensis during exposure to metal ions, we identified 31 putative cis-regulatory motifs (16 with at least one type of additional supporting evidence) that are potentially involved in the process of metal reduction. The majority (18/31) of those motifs had been found in our whole-genome comparative approach, further demonstrating that such an approach is capable of uncovering a large fraction of the regulatory map of a genome even in the absence of experimental data. The integrated computational approach developed in this study provides a useful strategy to identify genome-wide cis-regulatory maps and a novel avenue to explore the regulatory pathways for particular biological processes in bacterial systems.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18701645 PMCID: PMC2532739 DOI: 10.1093/nar/gkn515
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flow chart of the procedure of identifying conserved cis-regulatory motifs in S. oneidensis by comparative analysis.
Figure 2.Conservation of the known E. coli transcriptional regulatory interactions in S. oneidensis. The logos of the E. coli TFBS were drawn based on the motif weight matrix models obtained from RegulonDB (Version 5.0). aThe number of binding sites used to build the motif profiles. Multiple binding sites may be from the same gene. bOnly the first genes of the target operons of a TF were considered when we counted the number of conserved target genes. A few target genes in RegulonDB were not included due to discrepancies in gene names.
Figure 3.Estimation of neutral distances between five Shewanella species. The numbers shown are neutral substitution rates (Ks) measured by synonymous substitutions in coding sequences.
Numbers of motifs identified at each step of the comparative analysis for Dataset I and II
| Dataset I | Dataset II | |
|---|---|---|
| The anchor genome(s) | ||
| Other genomes | At least one other | At least two other |
| Number of sets of orthologous promoters | 862 | 1961 |
| Step I: Phylogenetic footprinting | 11247 | 22525 |
| Step II: PhyloNet searching | 203 | 1665 |
| Step III: Motif hierarchical clustering | 38 | 183 (824) |
aNumber of nonredundant motifs ultimately identified.
bNumber of TUs covered by the predicted motifs.
Figure 4.Conservation of the 183 motifs identified from Dataset II in the distantly related Shewanella species. The black bars represent the percentage of motifs identified from the five Shewanella genomes that are also conserved in the seven newly sequenced Shewanella species. The white bars represent the results from the control sequence sets.
Figure 5.Venn diagrams showing the numbers of predicted motifs supported by different types of evidence, including matching to known TF-binding motifs, significant functional enrichment of the target genes in GO or KEGG pathway terms (GO/KG) and EC of the target genes in microarray experiments (EC). (A) Motifs identified from Dataset I (see Table 2 for detailed information). (B) Motifs identified from Dataset II (see Table 3 for detailed information).
The list of predicted motifs from Dataset I that have at least one type of supporting evidence
| Motif Number | Motif consensus sequence | Known TF | Expression coherence of target genes | Biological functions in GO(G) or KEGG pathway (K) terms for which the target genes are enriched |
|---|---|---|---|---|
| 1 | aCTGTwtaTAtawACAGt | LexA | Yes | DNA repair (G) |
| 2 | ACgTcTAgAcGTCtA | MetJ | Yes | Methionine biosynthesis (G) |
| 3 | tTGATctagATCAa | FNR | Yes | |
| 4 | yAaarNGCGCGCNyttTr | Yes | Structural constituent of ribosome (G), ribosome (G, K), protein biosynthesis (G) | |
| 5 | ktAaAATkNcGCgNmATTtTam | CadC | Yes | Protein biosynthesis (G), structural constituent of ribosome (G), ribosome (G, K) |
| 6 | AAtTtAAACgNNcGTTTaAaTT | Yes | Lipid catabolism (G, K) | |
| 7 | TGTTGTaATATtACAACA | Yes | Pentose phosphate pathway (K) | |
| 8 | aNTGaATtWWaATtCANt | ArgR | ||
| 9 | tGGTcWgACCa | FadR | ||
| 10 | tGCACcatwatgGTGCa | NtrC | ||
| 11 | WAaaAWycgCGcgrWTttTW | Yes | ||
| 12 | CACmAkATmTkGTG | YfeT | Yes | |
| 13 | aatgCgScGcatt | Yes | ||
| 14 | AtWTTgyatrcAAWaT | Yes | ||
| 15 | GCGtAtWaTaCGC | Nlp | Yes | |
| 16 | AAARggcgccwWwggcgccYTTT | Yes | ||
| 17 | aTtGGTaWtACCaAt | PdhR |
aMotifs that have been computationally associated with TFs by Tan et al. (41).
The list of predicted motifs from Dataset II that have at least one type of supporting evidence
| Motif Number | Motif consensus sequence | Known TF | Expression coherence of target genes | Biological functions in GO (G) or KEGG pathway (K) terms for which the target genes are enriched |
|---|---|---|---|---|
| 1 | ACTGTaTatawatACAGT | LexA | Yes | DNA repair (G) |
| 2 | TaGACGTCTAgA | MetJ | Yes | Methionine biosynthesis (G, K) |
| 3 | tTGATctagATCAa | FNR | Yes | Oxidoreductase activity, acting on NADH or NADPH, quinone or similar compound as acceptor (G), thiamine metabolism (K), other energy metabolism (K) |
| 4 | CCGtWaCGG | CueR | Acetolactate synthase activity (G), butanoate metabolism (K) | |
| 5 | tGCACcawwwtgGTGCa | NtrC | Starch and sucrose metabolism (K) | |
| 6 | aATGatAAtNaTTatCATt | Fur | Yes | |
| 7 | tGGTCWGACCa | FadR | Yes | |
| 8 | aGCYWRGCt | Yes | Structural constituent of ribosome (G), ribosome (G, K), rRNA binding (G), protein biosynthesis (G) | |
| 9 | RGCANNwWwNNTGCY | Yes | Motor activity (G), ciliary or flagellar motility (G, K), flagellum (K, G) | |
| 10 | tAaAATKNcGCgNMATTtTa | Yes | Protein biosynthesis (G), structural constituent of ribosome (G), ribosome (K), rRNA binding (G) | |
| 11 | tCGCGa | Yes | Oxidoreductase activity, acting on NADH or NADPH, quinone or similar compound as acceptor (G), other energy metabolism (K) | |
| 12 | GCGSCGC | Yes | Structural constituent of ribosome (G), ribosome (G, K), protein biosynthesis (G) | |
| 13 | AgcGAcKRYMgTCgcT | Yes | Cytochrome complex assembly (G), heme transporter activity (G), nitrogen metabolism (K) | |
| 14 | GTAATWWWATTAC | Yes | Glucose metabolism (G), main pathways of carbohydrate metabolism (G), pentose phosphate pathway (K) | |
| 15 | tTtAAACaNNtGTTT | Yes | Lipid catabolism (G, K) | |
| 16 | RGACAaWtTGTCY | Yes | Unlocalized protein complex (G), ferredoxin hydrogenase activity (G) | |
| 17 | GTWATATWAC | Yes | Pentose phosphate pathway (K) | |
| 18 | TGTAaaNNWWNNttTACA | TyrR | ||
| 19 | TGaCANNatNNTGtCA | PhoB | ||
| 20 | CGTrATyACG | Oxidoreductase activity, acting on NADH or NADPH, quinone or similar compound as acceptor (G), other energy metabolism (K) | ||
| 21 | CTSSAG | Hydrogen-transporting ATPase, ATP synthase activity, rotational mechanism (G, K), proton-transporting two-sector ATPase complex (G) | ||
| 22 | GCGyATWATrCGC | Oxidoreductase activity, acting on NADH or NADPH, quinone or similar compound as acceptor (G), other energy metabolism (K) | ||
| 23 | KaATATATtM | Cytochrome complex assembly (G), heme transporter activity (G) | ||
| 24 | GATcagGTTaA | Thiamin biosynthesis (G, K) | ||
| 25 | ccTGATCAgg | Thiamin biosynthesis (G, K) | ||
| 26 | CWCcSSgGWG | Bacterial chemotaxis (K) | ||
| 27 | GNGcMCAYKWTAWMRTGKgCNC | Cell division (K) | ||
| 28 | TGGCgaAtaTtcGCCA | Type II secretion system (K) | ||
| 29 | GACATAWTATGTC | Yes | ||
| 30 | TATGSCATA | Yes | ||
| 30 | TATGSCATA | Yes | ||
| 31 | AcTTTACGTtaACGTAAAgT | Yes | ||
| 32 | aAAAagSssscNwNgssscctTTTt | Yes | ||
| 33 | gGCsATsGCc | Yes | ||
| 34 | aACNCSGNGTt | Yes | ||
| 35 | CNatRMTSAKYatNG | Yes | ||
| 36 | CCASTGG | Yes | ||
| 37 | AAAGTSACTTT | Yes | ||
| 38 | CMCCTWAGGKG | Yes | ||
| 39 | tAccYgAgTaaWttAcTcRggTa | Yes | ||
| 40 | ARAGYTARCTYT | Yes | ||
| 41 | CAaTWAtTG | Yes | ||
| 42 | RcATSATgY | Yes | ||
| 43 | aAaAttkGcgCmaaTtTt | Yes | ||
| 44 | CaAGGCCTtG | Yes | ||
| 45 | AAATctGCwGCagATTT | Yes | ||
| 46 | aacAtAAAGYNNRCTTTaTgtt | Yes | ||
| 47 | TAAYRTTA | Yes | ||
| 48 | TTtaTAcCTAGgTAtaAA | Yes | ||
| 49 | gCCaAcAmaGSCtkTgTtGGc | Yes | ||
| 50 | cgtWTGtTATAaCAWacg | Yes | ||
| 51 | cTTCGAAg | Yes | ||
| 52 | TTTTRYAAAA | Yes | ||
| 53 | WTGTSACAW | Yes | ||
| 54 | GcMcTATATAgKgC | Yes | ||
| 55 | gNcaaTGcTAgCAttgNc | Yes | ||
| 56 | AaaAARCGYTTttT | Yes | ||
| 57 | GGNAaAWTtTNCC | Yes | ||
| 58 | AtgGCSGCcaT | Yes | ||
| 59 | SCaTCawtGAtGS | Yes |
The list of predicted motifs potentially involved in metal reduction that have at least one type of supporting evidence
| Motif Number | Motif consensus sequence | Known TF | Expression coherence of target genes | Biological functions in GO (G) or KEGG pathway (K) terms for which the target genes are enriched |
|---|---|---|---|---|
| 1 | TaGACGTCTAgA | MetJ | Yes | |
| 2 | tGGTcWgACCa | FadR | Yes | |
| 3 | AGCcTAgGCT | Yes | Structural constituent of ribosome (G), ribosome (G, K), protein biosynthesis (G), rRNA binding (G) | |
| 4 | RGCANNwWwNNTGCY | Yes | Flagellum, flagellar assembly (G, K), bacterial motility proteins (K) | |
| 5 | cGTCaAaWWtTtGACg | Yes | Ciliary or flagellar motility (G), flagellum, flagellar assembly (G, K), bacterial motility proteins (K) | |
| 6 | AgcGActRYagTCgcT | Yes | Heme-transporting ATPase activity (G), cytochrome complex assembly (G), nitrogen metabolism (K) | |
| 7 | YAAATgaNAacsgtTNtcATTTR | Fur | ||
| 8 | TGATctagATCA | FNR | ||
| 9 | gggRggCTWAGccYccc | Structural constituent of ribosome (G), ribosome (G, K), protein biosynthesis (G), rRNA binding (G) | ||
| 10 | CcatTGSCAatgG | Yes | Copper ion binding (G), heme binding (G) | |
| 11 | WAaNcGCGCgNtTW | Yes | ||
| 12 | AaAGtTAaCTtT | Yes | ||
| 13 | TGtAAcANyrNTgTTaCA | Yes | ||
| 14 | aacAtAAAGYNNRCTTTaTgtt | Yes | ||
| 15 | TTtAAACaNNtGTTTaAA | Yes | ||
| 16 | aaAAarggNgcctNNgssscctTTTt | Signal transduction mechanisms (K) |