| Literature DB >> 22586357 |
Beate Krueger1, Torben Friedrich, Frank Förster, Jörg Bernhardt, Roy Gross, Thomas Dandekar.
Abstract
Two-component systems (TCS) are short signalling pathways generally occurring in prokaryotes. They frequently regulate prokaryotic stimulus responses and thus are also of interest for engineering in biotechnology and synthetic biology. The aim of this study is to better understand and describe rewiring of TCS while investigating different evolutionary scenarios. Based on large-scale screens of TCS in different organisms, this study gives detailed data, concrete alignments, and structure analysis on three general modification scenarios, where TCS were rewired for new responses and functions: (i) exchanges in the sequence within single TCS domains, (ii) exchange of whole TCS domains; (iii) addition of new components modulating TCS function. As a result, the replacement of stimulus and promotor cassettes to rewire TCS is well defined exploiting the alignments given here. The diverged TCS examples are non-trivial and the design is challenging. Designed connector proteins may also be useful to modify TCS in selected cases.Entities:
Keywords: Mycoplasma; connector; engineering; histidine kinase; promoter; response regulator; sensor; sequence alignment; synthetic biology
Year: 2012 PMID: 22586357 PMCID: PMC3348925 DOI: 10.4137/BBI.S9356
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Stimulus recognition consensus sequences for various TCS stimuli.
| Stimulus | No. of sequences | Position | Recognition sequence |
|---|---|---|---|
| Phosphor | 1 | 29–32 | GYLP |
| Osmotic | 4 | 36–158 | NFAILPSLQQFNKVLAYEVRMLMTDKLQLEDGTQLVVPPAFRREIyrelgISLYTNEA |
| Stress | 6 | 25–135 | LVYKFTAERAGRQSLDDLMNSSLYLMRSELREIPPHDWGKTLKEmdlnlsfdlrvepls |
| Iron | 6 | 35–64 | HESTEQIQLFEQALRDNRNNDRHIMREIRE |
| Copper | 3 | 37–86 | HSVKVHFAEQDINDLKEISATLERVLNHPDETQARRLMTLEDIVSGYSNVLISLADSH |
| Citrate | 4 | 43–182 | asfedyltlhvrdmamnqakiiasndsvisavktrdykrlatianklQRDTDFDYVVIG |
| Fumarate | 4 | 42–181 | SQISDMTRDGLANKALAVARTLADSPEIRQGLQKKPQESGIQAIAEAVRKRNDLLFIVV |
| Nitrate/Nitrite | 8 | 38–151 | sslrDAHAINKAGSLRMQSYRLGYDLPSGEPDKNAHRQMFQQAlhspvltnlnvwyv |
Notes:
Only the consensus recognition sequences are listed according to Uniprot. Well annotated sensors and organisms were compared as listed in Supplementary material. The sensor protein recognition site composition depends on the signal and is independent of the organism. Exact sequences and positions are aligned in Supplementary material. Accurate numbering according to E. coli proteins can be transferred to other organisms. Conserved amino-acids are labeled in bold print. Less conserved amino-acids are labeled in lowercase.
Alignment of the Nitrate/Nitrite recognition site comparing NarX and NarQ.1
Specific target gene DNA sequences in E. coli.1
| Regulated gene | Sequence |
|---|---|
| OmpC | TTTACATTTTGAAACATCT |
| OmpF | T[GT][GT][TG]TA[CG][AC][TA][AC]TTT[TC] |
| OmpF/OmpC | TTT[TA]C-TTTT[TG] |
| NarG1 | 1 TACCCATTAA 10 |
| NarG2 | 1 TAACCAT--- 7 |
| NarG3 | 1 TAATTAT--- 7 |
| NarG4 | 1 TACTTTA--- 7 |
| NarG5 | 1 -AGGGGTA-- 7 |
| NarG6 | 1 TAGGAAT--- 7 |
| NarG7 | TTTAACCCGAtcggggtatg |
| NarK | TAC[TC][CG][CA]T |
| CitB | agtAATTTAATTaatt |
| LytT | [TA][AC][CA]GTTN[AG][TG] |
| LytT | taaggAAATAAAACTGATTTTcacgtca |
| AlgR | aaatGAATATTTATTCAAat |
| GlnG/GlnK | tgcaCCACCATGGTGCA |
| Spo1 | 1 ------------TTTGTCGAATGTAA----------- 14 |
| Spo2 | 1 --AATTTCATTTTTAGTCGAAAAACAGAGAAAAACAT 35 |
| Spo3 | 1 AAAAGAAGATTTTTCGACAAATTCA------------ 25 |
Notes:
Profiles of target gene binding sites bound by regulators in E. coli are given. Consensus sequences were derived from detailed multiple alignments (see Supplementary material) mining several databases (Prodoric, TractorDB, PDB and PDBSum, PubMed). Sequences and positions were aligned (Supplementary material). Given binding sequences were first found in E. coli K-12 strains and were verified for the other E. coli strains (see Supplementary material) using motif specific scripts (Materials and methods). Less conserved parts are labeled in lowercase letters, motifs with brackets and strongly conserved parts are highlighted by black boxes.
Specific target gene DNA sequences in further gram negative bacteria.1
| Family | Regulated gene | Function | Example organism | Sequence |
|---|---|---|---|---|
| NtrC | GlnH | Transcription factor | GacatTTGCACTTAAATAGTGCACaaccc | |
| NtrC | GlnA | Transcription factor | ttctaTTGCACCAATGTGGTGCTTaatgt | |
| NtrC | GlnK | Transcription factor | CcattATGCACCGTCGTGGTGCGTttttc | |
| NtrC | GlnA | Transcription factor | CtataATGCACTAAAATGGTGCAAccttt | |
| NarL | NarK | Transcription factor | AatagCCTACTCATTAAGGGTAATaacta | |
| NtrC | GlnG | Transcription factor | CtataATGCACTAAAATGGTGCAAcctgt | |
| ArgR | ArgA | Transcription factor | actaaTTTCGAATAATAATTCACTAgtggg | |
| ArgR | ArgC | Transcription factor | cgttaATGAATAAAAATACATaatta |
Notes:
The table shows TCS target gene promotor sites in Salmonella (two strains) and Shigella. Capital letters indicate similarities within the binding site between the three compared organisms.
Promotor binding sites.
| Response regulator protein | Regulated gene | Repetition | Distance [NS] |
|---|---|---|---|
| Citrate utilization protein B (CitB) | Citrate lyase (CitC) | 6 | 40 |
| Nitrogen regulation protein (NtrC) | Sequences glutamine synthetase (GlnA) | 2 | 63 |
| Nitrogen regulation protein (NtrC) | Nitrogen regulator protein (GlnK) | 7–12 | Variable |
| Nitrate/Nitrite response regulator protein (NarL) | Respiratory nitrate reductase (NarG) | Variable | Ca. 6 |
| Nitrate/Nitrite response regulator protein (NarL) | Nitrite extrusion protein (NarK) | Variable | Variable |
| Osmolarity response regulator (OmpR) | Outer membrane protein C and F (OmpC/OmpF) | 3 | 7 |
Recognition of divergent TCS and missing TCS partners.
| Family | Identification | Stimulus | Sensor | Regulator | Strain | Function |
|---|---|---|---|---|---|---|
| OmpR | Iterative sequence searches with cut off e-30 using OmpR sequences from | Mg starvation | QseC | GI:52841523 which is potential similar to QseB | Philadelphia 1 | Regulated protein FliC; GI: 52841570; Flagella regulation; |
| NarL | Iterative sequence searches with cut off e-30 using NP_288375 | Carbon | BarA | GI:52842852 which is potential similar to UvrY | Philadelphia 1 | Regulated protein CsrA; GI:52841018 Carbon storage regulator |
| NarL | Iterative sequence searches with cut off e-30 in | Pheromone | GI:52840952 which is potential similar to EvgA | Philadelphia 1 | Regulated protein EmrY; GI:52841684; antibiotic resistance | |
| NarL | Iterative sequence searches with cut off e-30 in | Q4EKW8_LISMO which is potential similar to EvgS | GI: 16804553 which is potential similar to EvgA | EGD-e | Antibiotic resistance | |
| OmpR | Iterative sequence searches with cut off e-30 in | Stress | GI: 16804620 | GI: 16804621 which is potential similar to CSSR_BACSU | EGD-e | Regulated protein HtrA; serine protease |
| OmpR | PSI-Blast search in | Mg starvation | GI: 16803061 which is potential similar to ZP_03239257 | PhoP | EGD-e | Virulence, antimicrobial peptide resistance |
Notes:
New annotated features (interactions or part of TCS) apparent from sequence searches with various available TCS sequences and domains in the genome sequence (Genbank acc. No.: AE017354, Chien M, et al, 2004). Regulated proteins are given as well as homologous standard TCS. Predicted changes (mainly by their operon context) in their function for L. pneumophila are indicated on the right. The right-most column summarizes which aspect of the TCS is reported here new.
Listed are well characterized homologs from other organisms which have the same function within the same family.
Table contains additional features (interactions or parts of TCS) extending what is already known in KEGG or annotated in Genbank (Acc. No.: AE017262) or Listilist (http://genolist.pasteur.fr/ListiList/). On the left the TCS family is given. Starting from B. subtilis TCS sequences we searched for missing sensor and regulator proteins. The right-most column summarizes which aspect of the TCS is reported here new.
Natural examples for domain shuffling in divergent TCS.1
| Domain | Protein | Context | Function |
|---|---|---|---|
| HisKin | Pyruvate dehydrogenase kinase | Glucose metabolism In | Inhibits the mitochondrial pyruvate dehydrogenase complex by phosphorylation of the E1 alpha subunit, thus contributing to the regulation of glucose metabolism |
| HisKin | Adenylate cyclase | Sporulation in some organisms | Stringent response, protein kinases are activated (PKAs) |
| HisKin | BCKD-kinase | Valine, leucine and isoleucine catabolic pathways in | Catalyzes the phosphorylation and inactivation of the branched-chain alpha-ketoacid dehydrogenase complex, the key regulatory enzyme of the valine, leucine and isoleucine catabolic pathways. Key enzyme that regulate the activity state of the BCKD complex |
| HisKin | Phytochrome A | Regulatory photoreceptor In | Regulatory photoreceptor which exists in two forms that are reversibly interconvertible by light: the Pr form that absorbs maximally in the red region of the spectrum and the Pfr form that absorbs maximally in the far-red region. Photoconversion of Pr to Pfr induces an array of morphogenic responses, whereas reconversion of Pfr to Pr cancels the induction of those responses. Pfr controls the expression of a number of nuclear genes including those encoding the small subunit of ribulose-bisphosphate carboxylase, chlorophyll A/B binding protein, protochlorophyllide reductase, rRNA, etc. It also controls the expression of its own gene(s) in a negative feedback fashion |
| Response Reg | Adventurous-gliding motility protein Z | Chemosensory system in | Required for adventurous-gliding motility, in response to environmental signals sensed by the frz chemosensory system. Forms ordered clusters that span the cell length and that remain stationary relative to the surface across which the cells move, serving as anchor points that allow the bacterium to move forward. Clusters disassemble at the lagging cell pol |
| Response Reg | Adenylate cyclase | Sporulation in some organisms | Stringent response, response regulators are activated |
| Response Reg | Serine/threonine-protein kinase ppk18 | Serine/threonine-protein kinase ppk18 plays pivotal roles in cell proliferation and cell growth in response to nutrient status |
Notes:
The table shows natural domain shuffling events where sensor domains and response regulator domains appear in different new contexts. In the three prokaryotic as well as in the eukaryotic examples only domains can be recognized but new functions are adopted.
Figure 1Divergent TCS sensor in M. pneumoniae.
Notes: Compared are the structure template (T. maritima), structure of NarX from E. coli, P. arcticus, and MPN013 (M. pneumoniae). Aligned are the secondary structure from PDB template 2c2a_A (top, magenta; HK853 from T. maritima) and its sequence (blue), valid (sequences aligned) for NarX from P. arcticus and the sequence of MPN013. Conserved residues are highlighted by yellow boxes. Below the secondary structure triangles indicate binding sites annotated in PDBSum (green: ADP binding site, blue SO4 binding site, red dots ligand binding site). Conserved residues for TCS (see above) are highlighted in yellow boxes. Structure: Calculated secondary structure (green) according to the SWISS-MODEL template for MPN013 (PDB entry 2ba2_A for MPN010).
Figure 2Diverged TCS regulator in M. pneumoniae.
Notes: Compared are the structure template (T. maritima), structure of PhoP, OmpR and NarL from E. coli, NarL in P. arcticus and MPN014 (M. pneumoniae). Aligned are the secondary structure from PDB template 1rnl (top, magenta; NarL from T. maritime; red letters: phosphor binding three-layer alpha/beta sandwich, blue: DNA-binding alpha orthogonal bundle) and its sequence (red), valid (sequences aligned) for PhoP, OmpR and NarL from E. coli, NarL in P. arcticus and MPN014 (M. pneumoniae). Conserved residues are highlighted in colored boxes. The first green highlighted part corresponds to the first part of the regulator overview. Conserved area starts with an aliphatic residue, followed by a charged residue. The second conserved part (yellow background) starts with an aliphatic residues and a Leu, followed by a charged residue and some Gly. The third part (dark red background) contains a strongly conserved lysine, followed by hydrophobic residues. N-terminal of the conserved lysine two positively charged residues is found. Secondary structure predictions (Predator, PredictProtein) predict a mixed structure out of helices, sheets and many loops over the whole protein. Consequently the phosphor binding part could be an alpha/beta sandwich like in other regulators. The second part of MPN014 contains no helix-turn-helix motif, but is predicted to be involved in DNA binding due to high sequence similarity to DNA primase/topoisomerase.
SafA containing proteins (potential connector proteins).
| Protein | Description | Organism | STRING score |
|---|---|---|---|
| NP_310132 | Hypothetical protein ECs2105 | 0,9 to EvgS | |
| ZP_02799272 | Conserved hypothetical protein | 0,9 to EvgS | |
| YP_540723 | Hypothetical protein C1714 | 0,9 to EvgS | |
| NP_837211 | Hypothetical protein S1655 | 0,76 to EvgS | |
| NP_458304 | Putative phosphodiesterase | 0,65 to ygiM (put. signal transduction protein) | |
| NP_462516 | Putative phosphodiesterase | 0,6l to lon |
Notes:
SafA similar proteins can be found in several organisms. This table lists the proteins of the family, a short description and the detected organism as well as the predicted probability to interact with TCS as a connector according to the protein interaction database STRING.
Putative connector proteins containing an EAL-domain and their interaction partners.
| Protein with EAL-Domain | Interaction partner |
|---|---|
| >Q21G90_SACD2 | Sde_3649 GGDEF family protein |
| >A6Q1G4_NITSB | dgkA Diacylglycerol kinase |
| >A1AD34_ECOK1 | yedQ hypothetical protein |
Note:
Interaction predictions included sequence- and structure analysis and data from public interaction databases such as STRING database.
Domain combinations occurring most often in PFAM regarding sensor and response regulator proteins.
| Combination of sensor domains | Response regulator domains |
|---|---|
| HisKA + | HATPase_c + |
| (n * HAMP + m * | |
| PAS + p * Hpt) | |
| HATPase_c | Response_reg * s |
| HAMP | Response_reg + GerE |
| His_kinase + | Response_reg + HTH |
| HATPase_c | |
| HisKA + | Response_reg + LytTR |
| HATPase_c | |
| HWE_HK | Response_reg + HisKA domain |
| HisKA_2 + | Response_reg + CheB or CheW |
| HATPase_c | |
| HisKA_2 | Response_reg + Sigma |
| HisKA_3 | Response_reg + Spo |
| HisKA | Response_reg + GGDEF |
| Response_reg + EAL | |
| Response_reg + HDOD |
Notes: PFAM-family combinations in sensor and response regulator proteins are listed ordered by the frequency of occurrence (top ranked combination are shown at the top; however, each sensor domain combination can combine with any of the response domain combinations). Lower case letters symbolize domain replicates within a specific combination.
m: 0–6, n: 0–10, p: 1–9;
s: 1–2;
Lists promotor site for TCS involved proteins.
Pfam search for BCKD_MOUSE.
| Pfam-A | Description | Entry type | Seq start | Seq end | HMM From | To | Bits score | E-value |
|---|---|---|---|---|---|---|---|---|
| HATPase_c | Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase | Domain | 7 | 135 | 12 | 126 | 68.3 | 5.8e–19 |
Pfam search.
| Pfam-A | Description | Entry type | Seq start | Seq end | HMM from | To | Bits score | E-value |
|---|---|---|---|---|---|---|---|---|
| Response_reg dicdi | Response | Domain | 2 | 86 | 1 | 80 | 24.6 | 2.6e-06 |
| Response_reg AGLZ | Response | Regulator | Receiver | Domain | Domain | 2 | 83 | 1 |
SafA similar proteins.
| Organism | Protein Id | Protein name | Score | E-value |
|---|---|---|---|---|
| NP_310132.1 | Hypothetical protein ECs2105 | 100 | 5e-23 | |
| ZP_02799272.2 | Conserved hypothetical protein | 88.2 | 2e-19 | |
| YP_540723.1 | Hypothetical protein UTI89_C1714 | 97.4 | 2e-22 | |
| NP_837211.1 | Hypothetical protein S1655 | 91.5 | 2e-17 |
TCS domains in several organisms.
| Organismus | Mist-annotation/ScanProsite or SMART count | |
|---|---|---|
| HisKa | Response reg | |
| 29/77 | 31/39 | |
| 18/30 | 17/285 | |
| 16/56 | 16/54 | |
| 16/61 | 22/285 | |
| 20/25 | 22/44 | |
Notes:
The Table compares the annotated number of TCS domains in MIST database that are known to belong to TCS versus the TCS domains found by motif similarity using ScanProsite or domain similarity using SMART. The two plant examples are not yet annotated in MIST, however, for these organisms there are in Arabidopsis 16 His protein kinases (Hwang et al, Plant Physiology 2002, 129:500–515) and 22 response regulators (ARRs), 12 of which contain a Myb-like DNA binding domain called ARRM (type B). The remainder (type A) possess no apparent functional unit other than a signal receiver domain containing two aspartate and one lysine residues (DDK) at invariant positions, and their genes are transcriptionally induced by cytokinins without de novo protein synthesis. The type B members, ARR1 and ARR2, bind DNA in a sequence-specific manner and work as transcriptional activators (Database of Arabidopsis transcription factors, http://datf.cbi.pku.edu.cn/browsefamily.php?familyname=GARP-ARR-B). In Maize there are 11 cytokinin receptory, 9 phosphotransfer proteins and 22 response regulators (Chu et al, Genet Mol Res. 2011;10(4):3316–3330).