| Literature DB >> 34254822 |
Kevin S Myers1, Daniel R Noguera1,2, Timothy J Donohue1,3.
Abstract
Much of our knowledge of bacterial transcription initiation has been derived from studying the promoters of Escherichia coli and Bacillus subtilis. Given the expansive diversity across the bacterial phylogeny, it is unclear how much of this knowledge can be applied to other organisms. Here, we report on bioinformatic analyses of promoter sequences of the primary σ factor (σ70) by leveraging publicly available transcription start site (TSS) sequencing data sets for nine bacterial species spanning five phyla. This analysis identifies previously unreported differences in the -35 and -10 elements of σ70-dependent promoters in several groups of bacteria. We found that Actinobacteria and Betaproteobacteria σ70-dependent promoters lack the TTG triad in their -35 element, which is predicted to be conserved across the bacterial phyla. In addition, the majority of the Alphaproteobacteria σ70-dependent promoters analyzed lacked the thymine at position -7 that is highly conserved in other phyla. Bioinformatic examination of the Alphaproteobacteria σ70-dependent promoters identifies a significant overrepresentation of essential genes and ones encoding proteins with common cellular functions downstream of promoters containing an A, C, or G at position -7. We propose that transcription of many σ70-dependent promoters in Alphaproteobacteria depends on the transcription factor CarD, which is an essential protein in several members of this phylum. Our analysis expands the knowledge of promoter architecture across the bacterial phylogeny and provides new information that can be used to engineer bacteria for use in medical, environmental, agricultural, and biotechnological processes. IMPORTANCE Transcription of DNA to RNA by RNA polymerase is essential for cells to grow, develop, and respond to stress. Understanding the process and control of transcription is important for health, disease, the environment, and biotechnology. Decades of research on a few bacteria have identified promoter DNA sequences that are recognized by the σ subunit of RNA polymerase. We used bioinformatic analyses to reveal previously unreported differences in promoter DNA sequences across the bacterial phylogeny. We found that many Actinobacteria and Betaproteobacteria promoters lack a sequence in their -35 DNA recognition element that was previously assumed to be conserved and that Alphaproteobacteria lack a thymine residue at position -7, also previously assumed to be conserved. Our work reports important new information about bacterial transcription, illustrates the benefits of studying bacteria across the phylogenetic tree, and proposes new lines of future investigation.Entities:
Keywords: bioinformatics; motif prediction; promoters; transcription
Year: 2021 PMID: 34254822 PMCID: PMC8407463 DOI: 10.1128/mSystems.00526-21
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
Summary of TSS data analyzed and annotated σ factors for organisms studied
| Species | Phylum or class | No. of: | Genome accession no. | ||
|---|---|---|---|---|---|
| TSS analyzed | TSS conditions | Annotated σ factors | |||
|
|
| 2,139 | 3 | 26 |
|
|
|
| 3,570 | 44 | 63 |
|
|
|
| 2,726 | 8 | 16 |
|
|
|
| 2,301 | 2 | 12 |
|
|
|
| 3,015 | 2 | 17 |
|
|
|
| 3,940 | 3 | 5 |
|
|
|
| 6,598 | 1 | 20 |
|
|
|
| 5,601 | 1 | 18 |
|
|
|
| 2,702 | 3 | 7 |
|
FIG 1Sequences of MEME-predicted σ70-dependent −35 and −10 promoter elements for individual bacterial species. Indicated are the organism’s name, the taxonomic group it belongs to, and the most likely sequences of −35 and −10 elements predicted by the MEME motif finder upstream of the published TSS-seq data for this organism. The last column on the right indicates the percentage of the predicted σ70-dependent −10 promoter elements that contain a thymine at position −7 (−7T) relative to the published TSS.
FIG 2Sequences of Delila-PY-predicted σ70-dependent −35 and −10 promoter elements for individual bacterial species. Indicated are the organism’s name, the taxonomic group it belongs to, and the most likely sequences of −35 and −10 elements predicted by Delila-PY upstream of the published TSS-seq data for this organism. The last column on the right indicates the percentage of the predicted σ70-dependent −10 promoter elements that contain a thymine at position −7 (−7T) relative to the published TSS.
FIG 3Distribution of −7A (green), −7C (blue), −7G (yellow), and −7T (red) bases within −10 elements upstream of all TSSs as identified by Delila-PY (A), upstream of genes with at least one homolog in the Database of Essential Genes (DEG) (B), and upstream of genes identified as essential using transposon insertion data sets (C) for the bacterial species indicated. The average distribution across all Alphaproteobacteria is indicated as “All Alphaproteobacteria.” The number of TSSs identified for each data set is listed below each group of bars.
FIG 4Functional enrichment of genes downstream of predicted −7T σ70-dependent promoters (left) or −7A/C/G σ70-dependent promoters (right). Colors indicate percentage of all enriched genes within each species present in each cluster (rows) in each organism (columns) using the code shown at the lower left. Darker purple indicates that more enriched genes were present within that individual category. Gray boxes indicate gene sets which show no functional enrichment (NF) for a specific group in the indicated bacterial species.
Functional enrichment of genes downstream of predicted −7T σ70-dependent promoters
| Functional group | Species | Subgroup | No. of genes |
|---|---|---|---|
| Cell Cycle |
| Cell Cycle (KEGG Brite ccs04112) | 12 |
| Cell Wall / Cell Membrane |
| Peptidoglycan Metabolic Process (GO:0000270) | 4 |
|
| Lipid Biosynthesis Proteins (KEGG Brite nar01004) | 2 | |
|
| Peptidoglycan Biosynthetic Process (GO:0009252) | 10 | |
| DNA Organization |
| Chromosome and Associated Proteins (KEGG Brite nar03036) | 4 |
| DNA Repair |
| DNA Repair and Recombination Proteins (KEGG Brite nar03400) | 5 |
| DNA Replication |
| DNA Replication Proteins (KEGG Brite nar03032) | 3 |
| Phospho-Group Transferase |
| Transferase Activity Transferring Phosphorus-Containing Groups (GO:0016772) | 10 |
| Protein Degradation |
| Peptidases and Inhibitors (KEGG Brite nar01002) | 4 |
| Protein Catabolic Process (GO:0030163) | 3 | ||
|
| Peptidases and Inhibitors (KEGG Brite rsp01002) | 14 | |
| Protein Catabolic Process (GO:0030163) | 4 | ||
| Protein Folding |
| Chaperones and Folding Catalysts (KEGG Brite nar03110) | 5 |
| Transcription |
| Transcription Factors (KEGG Brite nar03000) | 5 |
| Two-Component System (KEGG Brite nar02022) | 2 | ||
| Transcription Machinery (KEGG Brite nar03021) | 2 | ||
| Translation |
| Mitochondrial Biogenesis (KEGG Brite nar03029) | 4 |
| Exosome (KEGG Brite nar04147) | 3 | ||
| Ribosome Biogenesis (KEGG Brite nar03009) | 3 | ||
| Transport |
| Transporters (KEGG Brite nar02000) | 8 |
FIG 5Average transcript abundance values for selected functional groups in C. crescentus (A) and R. sphaeroides (B). (A) Average transcript abundance values for genes downstream of predicted −7T σ70-dependent promoters involved in the cell cycle (blue) and the same number of randomly selected genes downstream of predicted −7T σ70-dependent promoters (orange) in C. crescentus during the cell cycle. (B) Average transcript abundance values for genes downstream of predicted −7A/C/G σ70-dependent promoters involved in photosynthesis (blue) and translation (orange) in R. sphaeroides as a function of time after exposing an anaerobic culture to oxygen.
Functional enrichment of genes downstream of predicted −7A/C/G σ70-dependent promoters
| Functional group | Species | Subgroup | No. of genes |
|---|---|---|---|
| 4Fe-4S Cluster Binding |
| 4Fe-4S Cluster Binding (GO:0051539) | 20 |
| Biosynthesis of Amino Acids |
| Biosynthesis of Amino Acids (KEGG Pathways nar01230) | 64 |
| Cellular Amino Acid Biosynthetic Process (GO:0008652) | 38 | ||
| Valine, Leucine, and Isoleucine Biosynthesis (KEGG Pathways nar00290) | 10 | ||
| Branched-Chain Amino Acid Biosynthetic Process (GO:0009082) | 9 | ||
| C5-Branched Dibasic Acid Metabolism (KEGG Pathways nar00660) | 9 | ||
| Cysteine and Methionine Metabolism (KEGG Pathways nar00270) | 17 | ||
| Lysine Biosynthetic Process via Diaminopimelate (GO:0009089) | 7 | ||
| Lysine Biosynthetic Process (GO:0009085) | 7 | ||
| Leucine Biosynthetic Process (GO:0009098) | 5 | ||
| Isoleucine Biosynthetic Process (GO:0009097) | 6 | ||
| Lysine Biosynthesis (KEGG Pathways nar00300) | 9 | ||
| Isoprenoid Biosynthetic Process (GO:0008299) | 6 | ||
|
| Biosynthesis of Amino Acids (KEGG Pathways rsp01230) | 50 | |
| Cellular Amino Acid Biosynthetic Process (GO:0008652) | 29 | ||
| Histidine Biosynthetic Process (GO:0000105) | 7 | ||
| Threonine Biosynthetic Process (GO:0009088) | 4 | ||
| Lysine Biosynthetic Process (GO:0009085) | 6 | ||
| Lysine Biosynthetic Process via Diaminopimelate (GO:0009089) | 6 | ||
| Lysine Biosynthesis (KEGG Pathways rsp00300) | 9 | ||
| Cysteine and Methionine Metabolism (KEGG Pathways rsp00270) | 15 | ||
| Methionine Biosynthetic Process (GO:0009086) | 7 | ||
|
| Biosynthesis of Amino Acids (KEGG Pathways zmo01230) | 29 | |
| Biosynthesis of Lipids |
| DNA Replication (KEGG Pathways rsp03030) | 8 |
| Biosynthesis of Purines |
| Purine Nucleotide Biosynthetic Process (GO:0006164) | 11 |
| Purine Metabolism (KEGG Pathways nar00230) | 18 | ||
| ‘De Novo’ IMP Biosynthetic Process (GO:0006189) | 8 | ||
|
| Purine Nucleotide Biosynthetic Process (GO:0006164) | 10 | |
|
| Exosome (KEGG Brite zmo04147) | 20 | |
| Biosynthesis of Secondary Metabolites |
| Biosynthesis of Secondary Metabolites (KEGG Pathways ccs01110) | 26 |
| Biosynthesis of Cofactors (KEGG Pathways ccs01240) | 7 | ||
|
| Biosynthesis of Secondary Metabolites (KEGG Pathways nar01110) | 141 | |
| Heme Biosynthetic Process (GO:0006783) | 5 | ||
|
| Biosynthesis of Secondary Metabolites (KEGG Pathways rsp01110) | 119 | |
| Biosynthesis of Cofactors (KEGG Pathways rsp01240) | 61 | ||
|
| Biosynthesis of Secondary Metabolites (KEGG Pathways zmo01110) | 58 | |
| Biosynthesis of Purines |
| Purine Nucleotide Biosynthetic Process (GO:0006164) | 11 |
| Purine Metabolism (KEGG Pathways nar00230) | 18 | ||
| ‘De Novo’ IMP Biosynthetic Process (GO:0006189) | 8 | ||
|
| Purine Nucleotide Biosynthetic Process (GO:0006164) | 10 | |
|
| Exosome (KEGG Brite zmo04147) | 20 | |
| Carbon Metabolism |
| One-Carbon Metabolic Process (GO:0006730) | 7 |
|
| 2-Oxocarboxylic Acid Metabolism (KEGG Pathways nar01210) | 16 | |
| Carbon Metabolism (KEGG Pathways nar01200) | 47 | ||
| Glycolytic Process (GO:0006096) | 7 | ||
| Citrate Cycle (TCA Cycle (KEGG Pathways nar00020) | 14 | ||
| Gluconeogenesis (GO:0006094) | 6 | ||
| TCA Cycle (GO:0006099) | 9 | ||
| Pentose Phosphate Pathway (KEGG Pathways nar00030) | 10 | ||
|
| Carbon Metabolism (KEGG Pathways rps01200) | 43 | |
| 2-Oxocarboxylic Acid Metabolism (KEGG Pathways rsp01210) | 12 | ||
| TCA Cycle (GO:0006099) | 8 | ||
|
| Carbon Metabolism (KEGG Pathways zmo01200) | 18 | |
| Glycolysis/Gluconeogenesis (KEGG Pathways – zmo00010) | 11 | ||
| Pentose Phosphate Pathway (KEGG Pathways zmo00030) | 5 | ||
| Carbohydrate Metabolic Process (GO:0005975) | 10 | ||
| Cell Wall / Membrane |
| Polysaccharide Biosynthetic Process (GO:0000271) | 7 |
| Phospholipid Biosynthetic Process (GO:0008654) | 7 | ||
| DNA Repair |
| DNA Repair (GO:0006281) | 19 |
| Cellular Response to DNA Damage Stimulus (GO:0006974) | 15 | ||
| DNA Topoisomerase Type II (Double Strand Cut) (GO:0003918) | 4 | ||
| Base Excision Repair (KEGG Pathways nar03410) | 8 | ||
| DNA Replication |
| DNA Replication (KEGG Pathways rsp03030) | 8 |
| Photosynthesis |
| Porphyrin-Containing Compound Biosynthetic Process (GO:0006779) | 8 |
| Porphyrin and Chlorophyll Metabolism (KEGG Pathways rsp00860) | 22 | ||
| Protoporphyrinogen IX Biosynthetic Process (GO:0006782) | 7 | ||
| Chlorophyll Biosynthetic Process (GO:0015995) | 12 | ||
| Coproporphyrinogen Oxidase Activity (GO:0004109) | 4 | ||
| Carbon Fixation in Photosynthetic Organisms (KEGG Pathways rsp00710) | 11 | ||
| Protein Degradation |
| Proteolysis (GO:0006508) | 28 |
| Serine-Type Endopeptidase Activity (GO:0004252) | 10 | ||
| Protein Folding |
| Peptidyl-Prolyl Cis-Trans Isomerase Activity (GO:0003755) | 7 |
| Protein Peptidyl-Prolyl Isomerization (GO:0000413) | 7 | ||
| Protein Folding (GO:0006457) | 8 | ||
| RNA Processing |
| RNA Phosphodiester Bond Hydrolysis Exonucleolytic (GO:0090503) | 4 |
| Transcription |
| Transcription Machinery (KEGG Brite rsp03021) | 12 |
| Translation |
| Translation (GO:0006412) | 3 |
|
| Translation (GO:0006412) | 50 | |
| Aminoacyl-tRNA Ligase Activity (GO:0004812) | 19 | ||
| tRNA Aminoacylation for Protein Translation (GO:0006418) | 14 | ||
| Aminoacyl-tRNA Biosynthesis (KEGG Pathways nar00970) | 19 | ||
| tRNA Processing (GO:0008033) | 14 | ||
| tRNA Binding (GO:0000049) | 12 | ||
| tRNA Aminoacylation (GO:0043039) | 5 | ||
|
| Translation (GO:0006412) | 54 | |
| Aminoacyl-tRNA Biosynthesis (KEGG Pathways rsp00970) | 44 | ||
| Aminoacyl-tRNA Ligase Activity (GO:0004812) | 18 | ||
| Transfer RNA Biogenesis (KEGG Brite rsp03016) | 33 | ||
| tRNA Aminoacylation for Protein Translation (GO:0006418) | 14 | ||
| Translation Factors (KEGG Brite rsp03012) | 13 | ||
| Mitochondrial Biogenesis (KEGG Brite rsp03029) | 20 | ||
| tRNA Binding (GO:0000049) | 15 | ||
| RNA Binding (GO:0003723) | 31 | ||
| Non-Coding RNAs (KEGG Brite rsp03100) | 25 | ||
| Translational Termination (GO:0006415) | 5 | ||
| Ribosome (GO:0005840) | 21 | ||
| tRNA Processing (GO:0008033) | 12 | ||
| Structural Constituent of Ribosome (GO:0003735) | 20 | ||
| Ribosome Biogenesis (KEGG Brite rsp03009) | 17 | ||
| Translation Elongation Factor Activity (GO:0003746) | 6 | ||
| Translation Elongation (GO:0006414) | 6 | ||
|
| Translation (GO:0006412) | 29 | |
| Ribosome (KEGG Brite zmo03011) | 38 | ||
| Ribosome (KEGG Brite zmo03010) | 19 | ||
| Structural Component of Ribosome (GO:0003735) | 19 | ||
| Ribosome (GO:0005840) | 20 | ||
| Transport |
| Protein Export (KEGG Pathways nar03060) | 10 |
|
| Protein Export (KEGG Pathways rsp03060) | 11 | |
| Bacterial Secretion System (KEGG Pathways rsp03070) | 10 |
FIG 6Amino acid alignment of region 2 (A) and region 4 (B) of the housekeeping σ factor and portion of CarD homologs (C) from the indicated bacterial species. Asterisks and highlighting indicate fully conserved residues, colons indicate conservation between residues with strongly similar properties, and periods indicate conservation between residues with weakly similar properties (94, 95). Conserved residues involved in predicted σ70-dependent −10 promoter binding (A), −35 promoter binding (B), or key functional residue in CarD (C) are indicated by red arrows.