| Literature DB >> 24386217 |
Sarah Meinhardt1, Michael W Manley1, Daniel J Parente1, Liskin Swint-Kruse1.
Abstract
The millions of protein sequences generated by genomics are expected to transform protein engineering and personalized medicine. To achieve these goals, tools for predicting outcomes of amino acid changes must be improved. Currently, advances are hampered by insufficient experimental data about nonconserved amino acid positions. Since the property "nonconserved" is identified using a sequence alignment, we designed experiments to recapitulate that context: Mutagenesis and functional characterization was carried out in 15 LacI/GalR homologs (rows) at 12 nonconserved positions (columns). Multiple substitutions were made at each position, to reveal how various amino acids of a nonconserved column were tolerated in each protein row. Results showed that amino acid preferences of nonconserved positions were highly context-dependent, had few correlations with physico-chemical similarities, and were not predictable from their occurrence in natural LacI/GalR sequences. Further, unlike the "toggle switch" behaviors of conserved positions, substitutions at nonconserved positions could be rank-ordered to show a "rheostatic", progressive effect on function that spanned several orders of magnitude. Comparisons to various sequence analyses suggested that conserved and strongly co-evolving positions act as functional toggles, whereas other important, nonconserved positions serve as rheostats for modifying protein function. Both the presence of rheostat positions and the sequence analysis strategy appear to be generalizable to other protein families and should be considered when engineering protein modifications or predicting the impact of protein polymorphisms.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24386217 PMCID: PMC3875437 DOI: 10.1371/journal.pone.0083502
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Structure and function of the LacI/GalR proteins.
(A) Structure of LacI/GalR homodimer (pdb 1efa; [39]). One monomer is shown in white and the other in gray. DNA is shown with blue wires. The linker region is magenta (positions 45–49), yellow (50–58), and green (59–62). The YPAL motif is in space-filling; positions mutated in this study are in ball-and-sticks. Black space-filling shows an allosteric effector bound to the regulatory domain. On the right, the linker is enlarged and shown in two different views. (B–D) Representative functional data for LacI/GalR synthetic homologs. Repression of the lac operon was determined in the absence (front series) and presence (back series) of allosteric effector. Lower values correspond to tighter repression. “DEL” (black bar) shows β-galactosidase activity in the absence of repressor protein. Below 13 Miller units (solid black line), any change in repression impacted bacterial growth [17]. The red dashed lines indicate the activities of the starting proteins that are listed in Table 1. Error bars are the standard deviation of 2–4 independent bacterial colonies, each in quadruplicate or duplicate. Color coding of the front series represents amino acid hydrophobicity (green to magenta represents highest to lowest hydrophobicity); note the poor correlation with repression. Other physico-chemical scales are listed in Table S20 in Data S2 and mapped to repression data in Figures S25–S87 in Data S5, Data S6, and Data S7. (B) Rheostat example. (C) Toggle-like example (note that the red line overlaps the black line in this example). (D) Neutral example.
Wild-type LacI/GalR proteins used to create the LXhXa chimeras.
| Natural Proteinsb | “X” abbreviation | Mutated Proteinsc | Repression (Miller units)d |
| LacI | L | LacI-11e | 0.12±0.06 |
| RbsR | R | LLhR | 0.06±0.06 |
| FruR | F | LLhF | 1.9±0.3 |
| GalR | G | LLhG | 15±4 |
| LLhG/E62K | 0.7±0.2 | ||
| LGhG | 13±11 | ||
| GalS | S | LLhS | 6±4 |
| LLhS/R51S | 58±20 | ||
| LLhS/D62N | 3±1 | ||
| LLhS/R51S/D62N | 0.06±0.03 | ||
| PurR | P | LLhP | 3.9±2.2 |
| LPhP57cs | 37±5 | ||
| LGhP | 320±130 | ||
| TreR | T | LLh | 120±16 |
| LLhT/V52A | 0.5±0.1 | ||
| AscG | A | LLhA | 78±10 |
| LLhA/Q55A | 0.2±0.04 |
a: Nomenclature: “L” indicates the LacI DNA binding domain (positions 1–44), “Xh” represents the protein source of the linker (positions 45–61), and the final “X” indicates the source of the regulatory domain. LPhP57cs has a linker sequence comprising PurR 45–56 and LacI 57–61 [21]. LGhP comprises the LacI DNA binding domain, the GalR linker, and the PurR regulatory domain.
b: All proteins are from E. coli. LacI: Lactose repressor protein. RbsR: Ribsose repressor. FruR: Fructose repressor. GalR: Galactose repressor. GalS: Galactose isorepressor. PurR: Purine repressor. TreR: Trehalose repressor. AscG: Cryptic asc operon repressor.
c: Point mutations listed in this table were generated in earlier studies [17], [18]. For this study, LLhT/V52A and LLhA/Q55A were used to generate mutations at most other positions because, if mutations were carried out in the weak repressors LLhT and LLhA, subsequent functional changes might be undetectable (as occurred for many variants of LGhP). A second rationale for using chimeras with point mutations was to compare outcomes between polymorphic variants (for example LLhG and LLhG/E62K). In either case, the noted position was fixed while other linker positions are mutated. (For example, position 62 was not further mutated in LLhG/E62K.).
d: These values were determined in the absence of allosteric effector for all inducible repressors and LLhA, which has no known inducer. For the co-repressible chimeras based on PurR, values are shown for the presence of effector.
e Lacks the eleven C-terminal amino acids of the tetramerization domain [34].
Sequence entropiesa of LacI/GalR linker positions, calculated from various MSAs.
| Linker position | All Seqs | YPAL Seqs | LacI subfamily |
| L45b,c | 1.20 | 1.01 | 0.00 |
| 46 | 1.56 | 1.62 | 0.86 |
| Y47 | 0.24 | 0.07 | 0.00 |
| 48 | 2.25 | 1.99 | 0.93 |
| P49 | 0.70 | 0.00 | 0.00 |
| 50 | 1.20 | 0.60 | 0.00 |
| 51 | 2.15 | 1.92 | 0.07 |
| 52 | 2.24 | 1.80 | 0.93 |
| A53 | 0.91 | 0.00 | 0.00 |
| 54 | 1.37 | 0.82 | 0.00 |
| 55 | 2.21 | 1.67 | 0.37 |
| L56 | 0.96 | 0.00 | 0.00 |
| A57 | 1.98 | 1.70 | 0.36 |
| 58 | 2.37 | 2.26 | 0.20 |
| 59 | 2.14 | 1.66 | 0.20 |
| 60 | 2.28 | 2.16 | 1.28 |
| 61 | 1.68 | 1.03 | 1.07 |
| 62 | 2.30 | 2.18 | 1.47 |
a Sequence entropy = −Σi = 1 21 (fi*ln (fi)), where “fi” is the frequency of occurrence for each amino acid or gap at the given linker position. A sequence entropy value of zero (0) corresponds to perfect conservation; equal frequency of all 21 possibilities corresponds to 3.04.
b: Positions 47, 49, 53, 56, and 57 were not mutated in the current study.
c: The LacI, PurR, GalR, and all chimeras of this study have leucine at position 45.
Frequency of substitutions that enhance repression.
| 46 | 48 | 50 | 51 | 52 | 54 | 55 | 58 | 59 | 60 | 61 | 62 | |
| Totala variants | 114 | 102 | 96 | 100 | 113 | 117 | 113 | 95 | 101 | 107 | 92 | 126 |
| Enhance >10-fold | 4 | 1 | 0 | 10 | 7 | 1 | 9 | 5 | 1 | 3 | 2 | 21 |
| % Subst'ns enhanced | 4b | 1 | 0 | 11 | 7 | 1 | 9 | 6 | 1 | 3 | 3 | 19 |
| Parent proteins mutated | 13 | 14 | 14 | 13 | 14 | 13 | 14 | 13 | 14 | 14 | 14 | 13 |
| Parent proteins enhanced | 2c | 1 | 0 | 5 | 7 | 1 | 5 | 3 | 1 | 3 | 1 | 6 |
a: Each parent protein was counted as one of the amino acids in all 12 positions.
b: % substitutions enhancing = (enhanced >10-fold)/(Total variants – Parent proteins mutated).
c: All enhancing substitutions at positions 46 and 48 occurred in “LXhX” chimeras, which had domain fusion between the DNA binding domain and linker.
Rheostat, toggle-like, and neutral behaviors of nonconserved linker positionsa.
| 46 | 48 | 50 | 51 | 52 | 54 | 55 | 58 | 59 | 60 | 61 | 62 | |
| LacI-11 | R | R | R | 3b | R | R | r | R | R | n | R | Nc |
| LLhR | N+rd | N+R | # | 3 | R | R | R | 3 | R | R | R | R+e |
| LLhF | r | R | r | R | R | R | R | R | R | R | r+T | R |
| LLhG | N | # | T | R | n/rf | # | r | R | # | r | n/r | R |
| LLhG/E62K | N+r | R | R | r | R | R | R | R | r | n/r | # | – |
| LGhG | R | r | # | R | r | # | R | R | r | n+T | R | R |
| LLhP | r | R | R+T | R | R | R | 3 | T | R | r | T | 3 |
| LPhP57cs | n/r | r | T | R | n/r | r/T | R | T | r | r | x | T+Rg |
| LGhP | r | R | # | x | # | – | x | r | R | r | # | x |
| LLhS | R | # | # | r | # | # | # | # | R | r | # | R |
| LLhS/R51S | – | – | – | – | – | – | – | – | R | r | r | R |
| LLhS/D62N | – | # | # | # | # | T | # | – | – | – | – | – |
| LLhS: R51S/D62N | r | r | R | – | R | R | r+T | R | R | # | # | – |
| LLhT | – | – | – | – | 3 | – | – | – | – | – | – | r |
| LLhT/V52A | r | R | R+T | T | – | R | R | R | R | r | R | R |
| LLhA/Q55A | r+T | r | # | R | R | R | R | R | R | N | R | N |
a: “R, rheostatic (progressive) changes that span >2 orders of magnitude; “r”, rheostat character but <2 orders of magnitude. “T”, toggle-like. “N”, neutral (within 2-fold change); “n”, between 2 and ∼5-fold change. “#”, insufficient number of substitutions to determine behavior. “–”, not mutated. “x”, weak or no measureable repression for any substitution.
b:. Substitutions generated 3 states instead of a continuum. Either 2- or 3-state toggles might reveal rheostat behavior if additional amino acid substitutions were characterized. However, in addition to reported variants, no intermediates were identified for the 2-state toggles during colony selection, which involved visual inspection of lac repression for several hundred bacterial colonies expressing randomly mutated chimeras.
c: In designating a neutral position, we invoked the caveat “most amino acids” because, for example, proline and glycine substitutions can have large backbone effects. Nevertheless, all reported variants bound DNA in the pull-down assay, which indicated that the protein structure was not grossly distorted.
d: Seven (7) of 11 amino acids are neutral; the remaining 4 have rheostat character.
e: A subset of positions had rheostat behavior, and another subset abolished detectable repression.
f: Substitution results were between neutral and rheostat behavior (∼5–9 fold change).
g: Four substitutions convey equally enhanced repression; another 6 have rheostat character.
Figure 2Substitution outcomes do not correlate with amino acid frequency.
(A) Substitution outcomes for position 51 among the LacI/GalR chimeras; 5 amino acid substitutions are shown. Each starting protein had different repression activity, which was used to normalize its variants. No change corresponds to a value of 1 (dashed black line). The straight dotted lines indicate 2-fold change from the starting protein; this range is usually larger than the error bars of a repression measurement. Substitutions that enhance repression have increased fold-change (>2). Substitutions that diminish repression have decreased fold-change (<2). The jagged dotted line shows the no repressor “DEL” control relative to the starting protein and represents the lowest possible value. Colored connecting lines are to aid visual inspection of the data. (B) Amino acid frequency in the naturally occurring proteins at position 51, as calculated from the MSA of LacI/GalR proteins with a “YPAL” motif. Even though Ala occurs with high frequency, this substitution can be catastrophic. Further, even though Asp is absent from the natural sequences, this substitution can enhance repression in at least one chimera.
MSA frequency versus substitution outcome and results from parallel mutagenesisa.
| 46 | 48 | 50 | 51 | 52 | 54 | 55 | 58 | 59 | 60 | 61 | 62b | |
| Ala | P1 | P2D | D | P1D | P1D | P1 | P1 | P1 | D | |||
| Cys | – | A | A | A | A | D | P1 | P1 | – | |||
| Asp | – | D | P2 | – | – | A | P1D | |||||
| Glu | P1 | – | P1 | AD | P2 | – | ||||||
| Phe | A | A | P1D | P1D | – | D | – | P2 | A | |||
| Gly | P2 | P1D | P2 | P2 | XD | P1D | P2D | A | ||||
| His | P1 | – | P1D | – | P1 | P1 | ||||||
| Ile | X | P2D | D | P2 | P2D | P2 | A | AD | ||||
| Lys | A | P2 | P1 | P1D | X | L | ||||||
| Leu | A | AD | P1D | P1D | P1D | P2 | P1 | A | AD | D | ||
| Met | – | – | – | AD | P1 | – | AD | – | – | – | A | |
| Asn | X | X | P1 | – | P1 | P1 | – | A | P1D | |||
| Pro | AD | A | AD | A | P2D | |||||||
| Gln | P1 | X | X | P1 | X | L | ||||||
| Arg | P1D | P1D | X | P2 | P2D | P1D | D | L | ||||
| Ser | P2 | P2 | P2D | P1 | P2 | P1 | P1D | P1 | X | A | ||
| Thr | A | P1 | P1 | P1 | P2 | AD | P1D | P1 | P1D | |||
| Val | A | X | AD | P2 | P2 | AD | ||||||
| Trp | D | A | AD | – | A | |||||||
| Tyr | A | P2D | D | – | AD | – | – | AD |
a: “X” = the starting amino acid for LXhX chimeras. “D” = substitution caused widely different outcomes among several chimeras. “A” = Amino acid absent from the YPAL-MSA but allowed repression near or better than parent protein in 2 or more chimeras. “P1” = Amino acid present in the YPAL-MSA but diminished repression below the biologically determined threshold of 13 Miller units for at least one chimera. “P2” = Amino acid present in MSA but mutation diminished repression to the “no repressor” limit (“MIN” in Figure 2) for at least one chimera. “L” = Amino acid was absent in MSA; allows strong repression in LacI though not other chimeras (LacI data are commonly used as a single representative of the family.) “–” = an insufficient number of substitutions were isolated to determine general outcome.
b: In the un-mutated chimeras, position 62 differs for each regulatory domain.