| Literature DB >> 20661434 |
Yann S Dufour1, Patricia J Kiley, Timothy J Donohue.
Abstract
The processes underlying the evolution of regulatory networks are unclear. To address this question, we used a comparative genomics approach that takes advantage of the large number of sequenced bacterial genomes to predict conserved and variable members of transcriptional regulatory networks across phylogenetically related organisms. Specifically, we developed a computational method to predict the conserved regulons of transcription factors across alpha-proteobacteria. We focused on the CRP/FNR super-family of transcription factors because it contains several well-characterized members, such as FNR, FixK, and DNR. While FNR, FixK, and DNR are each proposed to regulate different aspects of anaerobic metabolism, they are predicted to recognize very similar DNA target sequences, and they occur in various combinations among individual alpha-proteobacterial species. In this study, the composition of the respective FNR, FixK, or DNR conserved regulons across 87 alpha-proteobacterial species was predicted by comparing the phylogenetic profiles of the regulators with the profiles of putative target genes. The utility of our predictions was evaluated by experimentally characterizing the FnrL regulon (a FNR-type regulator) in the alpha-proteobacterium Rhodobacter sphaeroides. Our results show that this approach correctly predicted many regulon members, provided new insights into the biological functions of the respective regulons for these regulators, and suggested models for the evolution of the corresponding transcriptional networks. Our findings also predict that, at least for the FNR-type regulators, there is a core set of target genes conserved across many species. In addition, the members of the so-called extended regulons for the FNR-type regulators vary even among closely related species, possibly reflecting species-specific adaptation to environmental and other factors. The comparative genomics approach we developed is readily applicable to other regulatory networks.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20661434 PMCID: PMC2908626 DOI: 10.1371/journal.pgen.1001027
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Major sub-families of the CRP/FNR-type transcription factors in 87 representative α-proteobacteria.
The hierarchical tree representation of the amino acid sequence similarities was constructed by partitioning protein groups using increasing clustering stringency (inflation value, see Materials and Methods). The bold numbers within each box represent the number of individual proteins within each cluster and the number below represents the number of species possessing at least one of these proteins. The bottom of the tree shows names for the major 8 sub-families using nomenclature described previously [3]. Minor sub-families could not be classified definitively, so these sub-families are designated by roman numerals.
Figure 2Protein binding domains and predicted DNA binding motifs of the CRP/FNR-type transcription regulators.
(A) Logos representing the protein sequence alignments of the predicted helix-turn-helix domains of each of the 8 major sub-families of the CRP/FNR-type transcription regulators. The amino acid residues predicted from the E. coli CRP-DNA crystal structure to make base specific contacts [23] were mapped onto the sub-families and indicated with grey boxes. (B) Logos representing the predicted DNA binding motifs associated with each of the 8 major sub-families of the CRP/FNR-type transcription regulators. (N.D. means that a logo was not defined using criteria described in the text). For both (A,B), the heights of the letters represent their conservation (in bits) at a particular position in the multiple sequence alignments, and the numbers on the x-axis represent the relative position in the multiple sequence alignments.
Figure 3Schematic description of the stepwise prediction of the conserved regulons of FNR, FixK, and DNR.
(A) The first step was to identify orthologous genes across species (G1, G2,…). Second, orthologous genes that also contain the target DNA-binding sequence in their promoter regions are indicated in bold. (B) Third, the phylogenetic profiles of the target genes found in sets of orthologous genes were constructed, and the phylogenetic profiles of the genes encoding for the transcription regulators. (C) Fourth, the similarities (J(A,B)) between each target gene profile and each regulator profile were calculated. (D) Finally, target genes were assigned to the regulator with which it shared the most similar phylogenetic profile.
Figure 4Predicted members of the DNR, FixK, and FNR regulons.
The heatmap indicates whether the promoter region of the corresponding gene contains a significant match to the DNA target sequence shared by FNR, FixK, and DNR for each species (row) and each set of orthologous genes (column). Orange and yellow indicate respectively moderate and strong match to the DNA target sequence position-weighted matrix. Black indicates that the corresponding species possesses a gene belonging to the corresponding set of orthologs, while grey indicates that the species does not possess an orthologous gene. Arbitrary numbers were given to identify the different sets of orthologous genes (Table 2). The presence of DNR (blue), FixK (green) or FNR (red) in each genome is indicated by a bar on the right side of the heatmap. Genes that were experimentally determined to be R. sphaeroides FnrL target genes are indicated by a black box below their labels. Species are organized according to the phylogenetic tree presented in Figure S1.
Gene product annotations of the predicted members of the DNR, FixK, and FNR regulons determined from the comparative genomics analysis of α-proteobacteria.
| ID | Gene product annotation |
| |
|
| |||
| 616 | 50S ribosomal protein L4 | RSP1717 | |
| 689 | 50S ribosomal protein L23 | RSP1718 | |
| 1300 | putative universal stress family protein, UspA | RSP3802 | |
| 1696 | heme- and copper-containing membrane protein, NnrS | RSP0328 | |
| 1847 | transcription factor, DNR | ||
|
| 2800 | peptidase U32 family | RSP0465 |
| 2903 | respiratory nitrate reductase alpha-subunit | ||
|
| 2966 | putative lipid carrier protein | RSP0466 |
| 3120 | respiratory nitrate reductase beta-subunit | ||
| 4023 | respiratory nitrate reductase delta-subunit | ||
| 4488 | putative nitrite transporter | ||
|
| |||
| 114 | response regulator receiver protein, FixJ | RSP0907 | |
| 321 | putative ABC transporter ATP binding protein | RSP1628 | |
| 329 | putative ABC transporter permease protein | RSP2459 | |
| 332 | PAS/PAC sensor signal transduction histidine kinase, FixL | RSP0909 | |
| 1387 | putative ABC transporter periplasmic substrate binding protein | RSP2811 | |
| 1551 | heat shock protein Hsp20 | ||
| 1862 | putative HlyD family secretion protein | RSP3160 | |
| 2072 | putative ABC transporter permease protein | RSP3157 | |
| 2238 | putative phosphoketolase | ||
| 2256 | putative signal transduction protein with CBS domains | ||
| 2518 | putative ABC transporter subunit | RSP3159 | |
| 2551 | cytochrome c class I | ||
| 2589 | putative kinase | RSP0470 | |
| 2810 | hypothetical protein | ||
| 2855 | putative xanthine and cobalt dehydrogenase maturation factor | RSP1934 | |
| 3100 | putative Zinc binding alcohol dehydrogenase | ||
| 3532 | hypothetical protein | ||
| 5478 | hypothetical protein | ||
|
| |||
|
| 28 | putative heavy metal translocating P-type ATPase | RSP0690 |
|
| 74 | glutamyl-tRNA reductase | RSP2984 |
|
| 125 | cytochrome c oxidase subunit I | RSP1877 |
|
| 129 | putative universal stress protein, UspA | RSP0697 |
|
| 219 | cytochrome c oxidase subunit II | RSP1826 |
|
| 301 | transcriptional regulator, FNR/FixK | RSP0698 |
| 555 | putative dimethyladenosine transferase | RSP2905 | |
| 1083 | putative peptidase U62 modulator of DNA gyrase | RSP1825 | |
|
| 1230 | oxygen-independent coproporphyrinogen III oxidase | RSP0699 |
|
| 1264 | iron-sulfur binding protein RdxA/RdxB/FixG family | RSP0692 |
|
| 1289 | cytochrome c oxidase cbb3-type subunit III | RSP0693 |
|
| 1331 | cytochrome c oxidase cbb3-type subunit I | RSP0696 |
|
| 1348 | cytochrome c oxidase cbb3-type subunit III | RSP0695 |
|
| 1758 | trans-membrane cation transporter, FixH family | RSP0691 |
|
| 1774 | putative outer membrane protein, OmpW | RSP2507 |
|
| 1915 | cytochrome oxidase maturation protein cbb3-type | RSP0689 |
|
| 1987 | putative protoporphyrin monomethyl-ester oxidative cyclase | RSP0281 |
|
| 2282 | cytochrome c oxidase cbb3 type subunit IV | RSP0694 |
| 2905 | hypothetical protein | ||
|
| 3768 | putative DnaK suppressor protein | RSP0166 |
*Genes for which promoter regions have been shown to be bound by FnrL in R. sphaeroides in this study.
Arbitrary ID numbers given to the sets of orthologous genes determined across the 87 α-proteobacteria.
Functional annotation resulting from the consensus of all the annotations of the genes constituting each set of orthologs.
Locus ID of R. sphaeroides genes if one is present in the sets of orthologs.
Putative FnrL binding sites detected by ChIP–chip analysis or by bioinformatic analysis of the R. sphaeroides genome sequence.
| Chr | FnrL ChIP-chip peak coordinates | σ70 peak | Putative FnrL binding sequence | FnrL target genes | |||||
| Begin | End | Begin | Scores | Sequences | Loci IDs | Regulation | Annotation | ||
| Chr 1 | 408824 | 409553 | + | 409223 | 2156.75 |
| RSP1819-7 | + |
|
| Chr 1 | 417320 | 418709 | + | 417975 | 2501.25 |
| RSP1826-9 | − |
|
| Chr 1 | 476545 | 477386 | − | 477119 | 2554.25 |
| RSP1877-6 | − |
|
| Chr 1 | 792149 | 793028 | + | 792528 | 2249.50 |
| |||
| Chr 1 | 862277 | 863397 | + | 862812 | 2276.00 |
| RSP2247 | + |
|
| Chr 1 | 963978 | 964891 | + | 964492 | 1693.00 |
| RSP2337 | + |
|
| Chr 1 | 1022044 | 1022949 | + | 1022541 | 2368.75 |
| RSP2395 |
| |
| Chr 1 | 1152077 | 1152912 | + | 1152640 | 2249.50 |
| RSP2507 | + |
|
| Chr 1 | 1217112 | 1218401 | + | 1217769 | 2024.25 |
| RSP2573 | + | |
| Chr 1 | 1675600 | 1676545 | + | 1676046 | 2143.50 |
| RSP2984 | + |
|
| Chr 1 | 1679670 | 1680230 | − | 1680004 | −996.75 |
| |||
| Chr 1 | 1811885 | 1812675 | + | 1812207 | 1971.25 |
| RSP0100-12 | + |
|
| Chr 1 | 1881897 | 1882994 | + | 1882413 | 2249.50 |
| RSP0166 | + |
|
| Chr 1 | 2007383 | 2008346 | + | 2007816 | 2117.00 |
| RSP0281-76 | + |
|
| Chr 1 | 2046834 | 2047877 | + | 2047244 | 262.00 |
| RSP0317 | + |
|
| Chr 1 | 2193245 | 2193765 | − | 2193494 | 2382.00 |
| |||
| Chr 1 | 2201048 | 2202201 | + | 2201632 | 2342.25 |
| RSP0466-4 | + | |
| RSP0467-8 | + |
| |||||||
| Chr 1 | 2206264 | 2207340 | + | 2206759 | 1613.50 |
| RSP6116 | ? | |
| Chr 1 | 2439761 | 2441022 | + | 2440385 | 2196.50 |
| RSP0696-3 | + |
|
| 2440417 | 2196.50 |
| RSP0697 | + |
| ||||
| Chr 1 | 2441409 | 2442838 | + | 2442182 | 2501.25 |
| RSP0698 | − |
|
| RSP0699 | + |
| |||||||
| Chr 1 | 2518226 | 2519045 | + | 2518696 | 1746.00 |
| RSP0775 | + | |
| Chr 1 | 2565774 | 2566660 | + | 2566094 | 2382.00 |
| RSP0820 | + | |
| Chr 1 | 3026608 | 3027564 | + | 3027028 | 1679.75 |
| RSP1257-4 | + |
|
| Chr 2 | 77569 | 78562 | − | 78069 | 1812.25 |
| RSP3044 | + |
|
| P002 | 22017 | 26368 | + | 24255 | 977.50 |
| |||
| P004 | 1074 | 2014 | − | 1088 | −877.50 |
| |||
| P004 | 51099 | 52291 | + | 51739 | 2050.75 |
| RSP4201-4 | + | |
|
| |||||||||
| Chr 1 | 403983 | 1891.75 |
| ||||||
| Chr 1 | 635560 | 1679.75 |
| ||||||
| Chr 1 | 659055 | 1653.25 |
| ||||||
| Chr 1 | 1185086 | 16930 |
| ||||||
| Chr 1 | 1842368 | 1732.75 |
| ||||||
| Chr 1 | 2104074 | 1640.00 |
| ||||||
| Chr 1 | 2436687 | 1666.50 |
| RSP0692-89 | + |
| |||
| Chr 1 | 2760402 | 1772.50 |
| ||||||
| Chr 2 | 403914 | 1640.00 |
| RSP3341 | + | ||||
| Chr 2 | 748494 | 1640.00 |
| RSP3640-3 | + |
| |||
Chromosomes or plasmids.
Genomic coordinates of regions of the genome that were significantly enriched by chromatin immuno-precipitation using antibodies against FnrL.
Indicates if genomic regions bound by FnrL overlap with regions bound by σ70 as determined by chromatin immuno-precipitation using antibodies against σ70.
Genomic coordinates, scores (log-likelihood ratio), and sequences of putative FnrL binding sites identified using the position-weighted matrix constructed from the conserved DNA target sequence of the FNR-type proteins across α-proteobacteria.
Locus number and annotations of the FnrL target genes. The signs indicates whether the transcription of the target operons is increased (+) or decreased (−) by FnrL binding.
Figure 5Identification of FnrL binding sites in the R. sphaeroides genome by ChIP–chip assays.
A representative region of the R. sphaeroides genome showing profiles resulting from the enrichment of DNA fragments by immuno-precipitation of the β′ subunit (blue) or σ70 (red) subunit of RNA polymerase or FnrL (green) is plotted along the indicated genomic coordinates. The data plot the log2 of the ratio of the immunoprecipitated sample to the control sample as a function of probe location along the genome (coordinates are indicated in base pairs). DNA regions significantly enriched (p-value ≤0.01) by FnrL immuno-precipitation (green boxes), positions of sequences matching the FnrL consensus binding site (green tick mark) and the coordinates of annotated genes (black boxes). The data were plotted using SignalMap 1.9 (NimbleGen Systems).
Figure 6Transcription profile heatmap of members of the FnrL regulon across conditions with varying oxygen tension.
The colors represent the relative level of mRNA abundance compared to the mean level of expression for each locus (yellow = high expression, red = low expression). Genes are identified by their locus ID and gene names. Vertical lines next to the locus IDs denote predicted transcription units. Asterisks denote transcription units that had no FnrL ChIP–chip peak detected within their promoter regions but had a sequence matching the FnrL binding site consensus. The amount of oxygen or light in the experimental conditions are indicated below the plot (Photo10 and Photo100 represent illumination of the cultures at 10W/m2 and 100W/m2, respectively). Genes were grouped according to their expression profiles. Group A contains genes whose expression levels negatively correlate with oxygen tension. Group B contains genes whose expression levels also negatively correlate with oxygen tension but with the exception that these genes have relatively low expression under low light conditions (Photo10). Group C contains genes whose expression levels positively correlate with oxygen tension.
Figure 7The predicted conservation of the FnrL regulon determined in R. sphaeroides across α-proteobacteria.
Orange and yellow indicate respectively moderate and strong match to the DNA target sequence position-weighted matrix. Black indicates that the corresponding species possesses a gene belonging to the corresponding set of orthologs, while grey indicates that the species does not possess an orthologous gene. Sets of orthologous genes are labeled with arbitrary numbers. The core FNR regulon, as determined in Figure 4, and the extended FnrL regulon, determined in R. sphaeroides, are indicated by arrows below the sets of ortholog labels. Species are organized according to the phylogenetic tree presented in Figure S1.