Literature DB >> 29450126

Structural properties of the linkers connecting the N- and C- terminal domains in the MocR bacterial transcriptional regulators.

Teresa Milano1, Sebastiana Angelaccio1, Angela Tramonti2, Martino Luigi Di Salvo1, Roberto Contestabile1, Stefano Pascarella1.   

Abstract

Peptide inter-domain linkers are peptide segments covalently linking two adjacent domains within a protein. Linkers play a variety of structural and functional roles in naturally occurring proteins. In this work we analyze the sequence properties of the predicted linker regions of the bacterial transcriptional regulators belonging to the recently discovered MocR subfamily of the GntR regulators. Analyses were carried out on the MocR sequences taken from the phyla Actinobacteria, Firmicutes, Alpha-, Beta- and Gammaproteobacteria. The results suggest that MocR linkers display phylum-specific characteristics and unique features different from those already described for other classes of inter-domain linkers. They show an average length significantly higher: 31.8 ± 14.3 residues reaching a maximum of about 150 residues. Compositional propensities displayed general and phylum-specific trends. Pro is dominating in all linkers. Dyad propensity analysis indicate Pro-Pro as the most frequent amino acid pair in all linkers. Physicochemical properties of the linker regions were assessed using amino acid indices relative to different features: in general, MocR linkers are flexible, hydrophilic and display propensity for β-turn or coil conformations. Linker sequences are hypervariable: only similarities between MocR linkers from organisms related at the level of species or genus could be found with sequence searches. The results shed light on the properties of the linker regions of the new MocR subfamily of bacterial regulators and may provide knowledge-based rules for designing artificial linkers with desired properties.

Entities:  

Keywords:  Flexibility; Hydrophobicity; Linker length; Linker peptide; MocR regulators; Pro–Pro dyad

Year:  2016        PMID: 29450126      PMCID: PMC5801912          DOI: 10.1016/j.biopen.2016.07.002

Source DB:  PubMed          Journal:  Biochim Open        ISSN: 2214-0085


Introduction

Peptide inter-domain linkers are peptide segments covalently linking two adjacent domains within a protein [1], [2]. Linkers play a variety of structural and functional roles in naturally occurring proteins. For example, they have a role in tuning of biological activities of the connected domains [3], [4], in allosteric coupling [5], in viral replication [6]. They are also of the utmost interest and relevance for applications in protein engineering, for example the alteration of functionality of engineered antibodies [7], [8], [9]. Often, design of efficient and stable linkers with desired properties is hampered by the lack of an adequate knowledge of their structure–function relationship. For that reason, empirical analysis of the characteristics of naturally occurring linkers may provide useful knowledge. In this work we analyzed the sequence properties of the predicted linker regions of the bacterial transcriptional regulators belonging to the MocR subfamily of GntR regulators [10]. The members of the GntR family of bacterial transcriptional regulators are characterized by the presence of two domains, at the N-terminal and at the C-terminal part of the peptide chain [10]. The N-terminal domain, 60 residue long on average, displays the winged-helix-turn-helix architecture (wHTH) and is responsible for DNA recognition and binding [11]. The C-terminal domain belongs to one of at least four structural families and is essential for oligomerization and effector binding. The two domains are bound to each other by a peptide linker. The MocR subfamily [12], [13] of the GntR regulators is characterized by a large C-terminal domain (350 residue on average) that belongs to the fold type-I pyridoxal 5'-phosphate (PLP) dependent enzymes [14]. Aspartate aminotransferase (AAT) [15] is the archetypal enzyme for this fold. The wHTH and AAT domains are linked to each other by a peptide linker that can have different lengths in different MocRs. The solution of the first three-dimensional structure of GabR from Bacillus subtilis [16], [17] confirmed the presence of a C-terminal fold type-I domain and provided fundamental insights for further investigations aimed at deciphering the mechanism of action of these regulators. Moreover, the same structure suggested that the GabR regulator exists as a domain-swapped dimer and provided an image of the linker segment connecting the two wHTH and AAT domains. Besides GabR, only a few other MocR proteins have been experimentally characterized: for example, TauR, involved in the regulation of taurine utilization genes in Rhodobacter capsulatus [18]; PdxR, involved in the regulation of the PLP synthesis in several bacteria such as Corynebacterium glutamicum [19], Streptococcus pneumoniae [20], Listeria monocytogenes [21], Streptococcus mutans [22], Bacillus clausii [23]; DdlR from Brevibacillus brevis demonstrated to activate the expression of the gene coding for the enzyme d-alanyl-d-alanine ligase [24]. In this work, we analyzed several structural characteristics of the MocR linkers and suggested that they display a few unique features different from those already described for other classes of peptidic inter-domain linkers.

Materials and methods

Only for ease of data processing, analyses were carried out on the MocR sequences taken from the phyla Actinobacteria, Firmicutes, Alpha-, Beta- and Gammaproteobacteria. These phyla are the most populated in the databanks. The sequences of the MocR regulators of each phylum were extracted from the UniProt databank [25] accessed on October, 2015. The regulators were identified using RPSBLAST of the BLAST suite [26] and the CDD databank [27]. The protein sequences containing both the wHTH and AAT domains identified by RPSBLAST were considered genuine MocR regulators. Multiple sequence alignments were calculated with the programs ClustalO [28]. Sequence alignment manipulation and display utilized the software Jalview [29]. Data bank searches utilized BLAST [26] and Hmmer [30] software. Statistical analyses relied on R statistical package [31]. Python or Perl scripts were written for specific tasks. Physicochemical properties were assigned to the amino acid residues according to the indices provided by the AAindex databank [32] incorporated in the Interpol package [33] of the R project library [31]. Secondary structure prediction utilized the web server Jpred [34] and the program PREDATOR [35]. Sequence redundancy were eliminated with the program CD-HIT [36]. Residue and dipeptide frequency were calculated with the programs “pepstats” or “compseq” of the EMBOSS suite [37]. Residue propensities in the linker region was calculated according to the definition:where f and f are the frequencies of the amino acid i in the linker region and in the reference data bank (background frequencies), respectively. In this case, the reference data bank was the bacterial subset of the UniRef50 archive, namely the data bank containing the Uniprot bacterial proteins clustered at 50% sequence identity. Propensities were calculated using the background frequencies of the corresponding phylum. For example, residue propensities for the Actinobacteria linkers were calculated using the background frequencies observed the Actinobacteria subset of UniRef50. Propensity values greater than 1.0 indicates that the corresponding residue is more frequent in the linker region than expected whereas values smaller than 1.0, the opposite. Linker dipeptide (i.e. amino acid pairs or dyads) propensities are the propensity of each of the possible 400 residue pairs to occur preferentially in the linker region [38]. In this case, i in equation (1) refers to each one of these pairs instead of single residues. Phylum-specific dyad background frequencies were used for the propensity calculation as well. Amino acid dyads are obviously not symmetrical: for example, AlaArg is not equivalent to the pair ArgAla cause of the N- to C- terminal polarity of the peptide chain. Protein structure display and analysis utilized PyMOL [39] or Chimera [40] software.

Results

MocR sequences retrieved from the UniProt data bank were filtered at 75% sequence identity to remove potentially confounding redundancy using the program CD-HIT [36]. Multiple sequence alignments of the entire MocR sequences were calculated within each phylum. Linkers were predicted and extracted from the alignments according to the following criteria: the N-terminal of the linker was the residue immediately following the C-terminal residue of the wHTH domain while its C-terminal was the residue immediately preceding the N-terminal residue of the AAT domain. Domain boundaries were assigned according to the alignment between the MocRs and the two PSSMs (Position Specific Scoring Matrix) of the CDD databank each representing the wHTH or the AAT domains (CDD codes cd07377 and cd00609 or cd01494, respectively). Domain boundaries were “projected” onto all the sequences contained in the multiple sequence alignment and the linker sequences were manually extracted using the editing function of the Jalview software. In the multiple alignment of the Firmicutes bacteria, this procedure was able to correctly identify the boundaries of the linker from the GabR regulator from Bacillus subtilis, independently assigned by the authors of the crystallographic structure [16], [17].

Length distribution

Table 1 reports the number of MocR sequences found in each phylum after filtering at 75% sequence identity. Linker length distributions were calculated within each of the five phyla and are displayed in Fig. 1. The frequency distributions have been calculated for intervals of 10 residue length except for the first and the last frames which have been set to 0–20 and 61–200 residue length, respectively. The observed distributions are rather dispersed around the mean value and show phylum-specific trends (Table 2). In particular, Actinobacteria linkers are on average shorter than those from other phyla while linkers from Betaproteobacteria are longer (Table 2). Moreover, medians of the Beta- and Gammaproteobacteria distributions indicate that their peak frequencies are at higher length intervals. It should be noted that there is a significant number of linkers showing lengths longer than 60 residues. In a few cases, linkers can reach about 150 residue length (see Table 1 in Ref. [41]). At this level, the definition “linker” may not be appropriate anymore. Nonetheless, it will be used throughout the paper to insist that the peptide segments connecting the two main domains of the MocR regulators, wHTH and AAT, are the object of this study.
Table 1

Number of MocR sequences collected in the databanks for each phylum after redundancy filtering at 75% identity.

PhylumMocR numerosityNo. of species
Actinobacteria765129
Alphaproteobacteria618105
Betaproteobacteria63476
Firmicutes1089178
Gammaproteobacteria1065180
Fig. 1

Histogram of the linker length distribution in the five phyla considered. Horizontal axis labels indicate length intervals: 20 corresponds to 0–20, 30 (21–30), 40 (31–40), 50 (41–50), 60 (51–60) and >60 (longer than 60 residues). Gray scale corresponds to different phyla as indicated in the box inserted in the plot. Percentage (%) on the vertical axis indicates the fraction of linkers in the length interval.

Table 2

Linker length distribution parameters.

PhylumMeanStandard deviationMedian
Actinobacteria28.914.823
Alphaproteobacteria30.712.530
Betaproteobacteria37.013.439
Firmicutes31.513.929
Gammaproteobacteria31.815.132
Histogram of the linker length distribution in the five phyla considered. Horizontal axis labels indicate length intervals: 20 corresponds to 0–20, 30 (21–30), 40 (31–40), 50 (41–50), 60 (51–60) and >60 (longer than 60 residues). Gray scale corresponds to different phyla as indicated in the box inserted in the plot. Percentage (%) on the vertical axis indicates the fraction of linkers in the length interval. Number of MocR sequences collected in the databanks for each phylum after redundancy filtering at 75% identity. Linker length distribution parameters.

Residue propensity

Within each of the five phyla taken into consideration, calculations have been carried out for the entire set of linkers and for length subsets. In particular, linkers were grouped in intervals of 20-residue length (1–20, 21–40, 41–60, 61–200 residues) to assure a sufficient number of residue counts in each subset. Single residue propensities were first calculated. Results suggest that propensity distributions show phylum- and linker length-specific trends as reported in Fig. 2 and Table 3. In all linkers (Table 3), Pro residue shows a dominating propensity. In Actinobacteria linkers Pro, Ala, Arg are frequent (Fig. 2). Gly displays a neutral propensity (namely 1.0) while tends to be avoided in the linkers of the other phyla (Table 3); in Alphaproteobacteria, Arg and Pro have strong propensities, followed by Gln and Ser with a weaker trend; Betaproteobacteria propensities are similar to those observed in the Alphaproteobacteria except for Ala that displays a stronger propensity and for Gln that has a weaker propensity. Firmicutes possess a relatively “aspecific” distribution: the most represented residues are Glu, His, Asn, Pro, Gln, Ser and Trp. Asp, Lys and Arg have a weaker propensity. Interestingly, Lys appears specific of this phylum since it displays propensities lower than 1.0 in all the others. Gammaproteobacteria linkers are similar to those from Betaproteobacteria except for the higher propensities of His and Gln.
Fig. 2

Histograms displaying the residue propensity in the linker regions of the MocR from each of the five phyla. Letters indicate: Actinobacteria (A), Alphaproteobacteria (B), Betaproteobacteria (C), Firmicutes (D) and Gammaproteobacteria (E). Single residue propensities is reported for each length interval, with a bar colored according to the grey code shown in the inset of the graph A. X-axis reports the residue one-letter code while the Y-axis indicates residue propensity (P). Horizontal dashed line marks the value 1.0 corresponding to the “neutral” propensity.

Table 3

Residue propensities in the entire set of linkers.

a) Amino acid one-letter code.

b) Residue propensity; cells containing values ≥1.01 and ≤1.19 or values ≥1.20 are shaded with light or dark grey respectively. In the latter case, numbers are boldfaces.

c) Number of residues in the sample.

Histograms displaying the residue propensity in the linker regions of the MocR from each of the five phyla. Letters indicate: Actinobacteria (A), Alphaproteobacteria (B), Betaproteobacteria (C), Firmicutes (D) and Gammaproteobacteria (E). Single residue propensities is reported for each length interval, with a bar colored according to the grey code shown in the inset of the graph A. X-axis reports the residue one-letter code while the Y-axis indicates residue propensity (P). Horizontal dashed line marks the value 1.0 corresponding to the “neutral” propensity. Residue propensities in the entire set of linkers. a) Amino acid one-letter code. b) Residue propensity; cells containing values ≥1.01 and ≤1.19 or values ≥1.20 are shaded with light or dark grey respectively. In the latter case, numbers are boldfaces. c) Number of residues in the sample. Analysis of length-specific propensities highlights several differences (Tables 2–5 in Ref. [41] and Fig. 2) among the phyla considered. Pro is frequent in all the phyla over all the length ranges. At interval 0–20 residues (Table 2 in Ref. [41]) propensity distributions differ from those observed in the overall sample (Table 3). In particular, Asp appears more represented frequent in Firmicutes linkers while Glu has a high propensity in all phyla except in Actinobacteria where it is neutral. Gly now is favored in all phyla except Firmicutes. Lys is frequent in Firmicutes linkers while Arg is underrepresented in Firmicutes and Gammaproteobacteria. The propensities in the 21–40 residue range (Table 3 in Ref. [41]), are very similar to those of the entire set (Table 3). Asn and Trp become less frequent than expected in the Firmicutes. The propensities in the linker range 41–60 are also similar to those reported in Table 3. It should be noted that here Trp displays positive propensity in Alphaproteobacteria, Firmicutes and Gammaproteobacteria. However, Trp is a relatively rare residue and the corresponding counts may be affected by large statistical fluctuation. The last range considered, 61–200 residues, is the least populated and shows marked differences with respect to the propensities of Table 3. Ala is strongly represented in the Firmicutes linkers. Gly is frequent in Actinobacteria, Betaproteobacteria and Firmicutes. Lys and Asn avoid Firmicutes linkers. In Gammaproteobacteria, Leu and Thr become frequent while Gln and Arg relatively rare. Arg and Ser are rare in Alphaproteobateria as well. Linker dipeptide (i.e. amino acid pairs or dyads) propensities were also calculated separately for each phylum, for all the linkers considered or grouped by length intervals. However, to obtain reliable propensities, each pair should have a sufficient number of counts. This condition is satisfied by the linkers of the five phyla belonging to the subsets “all-linkers”, containing all the considered linkers, and to the 21–40 residue length frame (refer to Table 6 in Ref. [41]). For that reasons, only results from these two sets will be reported here. For completeness, all data are reported in Ref. [41] (Figs. 2–5 therein). Once more, different trends are evident among different phyla and, within each phylum, among length subsets. Overall, there is a strong preference for pairs containing Pro at the N- or C-terminal sides of linker dyads in all MocRs (Fig. 3 and refer to Fig. 1 in Ref. [41]). In general, the top most preferred Pro-containing pairs are: ProAla, ProAsp, ProGlu, ProPro, ProGln, ProArg, ProSer, AlaPro, GluPro, GlnPro, ArgPro, SerPro, LysPro. Among these, ProPro dyad has the strongest propensity. Other dyads are also frequent although not in every phylum. For example: ProGly (Actinobacteria and Betaproteobacteria), ProHis (Actinobacteria, Alphaprotebacteria and Gammaproteobacteria), Pro–Ile and ProLeu (Alphaproteobacteria, Firmicutes and Gammaproteobacteria), AspPro (Alphaproteobacteria), GlyPro (Actinobacteria, Alphaproteobacteria, Betaproteobacteria), LysPro (Alphaproteobacteria, Firmicutes, Gammaproteobateria). Interestingly, strong preference for dyads containing Trp can be observed (for example, TrpGly, TrpGln, TrpAsn, AsnTrp in Firmicutes) Trp however is the rarest residue in protein and sampling fluctuations may influence significantly the counts and the reliability of derived frequencies. Firmicutes linkers are distinguished from those of the other phyla because display higher propensity values for dyads containing Glu, His, Lys, Asn and Gln at the N- and/or C-terminal side such as, for example, GluGlu, LysHis, HisGln, GlnHis AsnHis, AsnAsn, and the like (Figs. 3 and 1 in Ref. [41]).
Fig. 3

Heatmaps of the propensity distribution of residue pairs in the “all-linker” sets of the Actinobacteria (A), Alphaproteobacteria (B), Betaproteobacteria (C), Firmicutes (D) and Gammaproteobacteria (E). Vertical and horizontal axes of each map indicate the N-terminal and C-terminal side residues of the dyad using the one-letter code, respectively. Side bar indicates the correspondence between color scale and numerical propensities.

Heatmaps of the propensity distribution of residue pairs in the “all-linker” sets of the Actinobacteria (A), Alphaproteobacteria (B), Betaproteobacteria (C), Firmicutes (D) and Gammaproteobacteria (E). Vertical and horizontal axes of each map indicate the N-terminal and C-terminal side residues of the dyad using the one-letter code, respectively. Side bar indicates the correspondence between color scale and numerical propensities. The propensity patterns observed in the 21–40 residue length range reflect largely those seen in the pooled sample (Fig. 3 in Ref. [41]) although their absolute values may differ in the two cases. In particular, propensities of the Trp-containing dyads become lower.

Physicochemical characteristics

Physicochemical properties of the linker regions were assessed using the AAindex database [42]. This databank contains 544 indices describing many physicochemical characteristics of each of the 20 amino acid residues. Indices were selected as to cover different properties of the peptide chain such as hydrophobicity, flexibility, linker and conformation propensity (Table 4). Moreover, the selected indices map to different groups defined by the cluster analysis based on the correlation coefficient of the AAindex pairs [32], [43]. This assures that the indices in the AAindex set herein used display low cross-correlations. Indices were assigned to each residue of a linker sequence and the average value over the linker length was calculated. The distribution of the index average values for each linker was compared in the form of box plots (Fig. 4). Within each phylum and for each index, the distributions of the average values were contrasted with those calculated in the same way for the wHTH and AAT domains from which the linkers were extracted. As usual, index average distributions were calculated also for different linker interval lengths.
Table 4

AAindex properties utilized in the linker analysis.

PropertyAAindex codeInterpol codeReference
Normalized flexibility parameters (B values) averageVINM940101425[60]
Normalized average hydrophobicity scalesCIDH92010558[61]
Linker propensity from all datasetGEOR03010491[2]
Normalized frequency of β-turnCHOP78010137[62]
Normalized frequency of α-helixCHOP78010238[62]
Normalized frequency of β-sheetCHOP78010339[62]
The Chou-Fasman parameter of coil conformationCHAM83010124[63]
Fig. 4

Box-plots of the distribution of average AAindices in the Actinobacteria (upper plot of each panel) and Firmicutes (lower plot) phyla. Horizontal axis indicates the average flexibility distribution (A), average hydrophobicity (B), average linker (C), average coil (D), and average β-turn (E) propensities in the wHTH, AAT domains, in all linkers, and in linkers belonging to different length intervals: 0–20, 21–40, 41–60 and > 60 residues. Y-axis reports the corresponding index scale (label AI stands for Average Index).

Box-plots of the distribution of average AAindices in the Actinobacteria (upper plot of each panel) and Firmicutes (lower plot) phyla. Horizontal axis indicates the average flexibility distribution (A), average hydrophobicity (B), average linker (C), average coil (D), and average β-turn (E) propensities in the wHTH, AAT domains, in all linkers, and in linkers belonging to different length intervals: 0–20, 21–40, 41–60 and > 60 residues. Y-axis reports the corresponding index scale (label AI stands for Average Index). AAindex properties utilized in the linker analysis. Linker backbone appears significantly more flexible (in the Actinobacteria, t-test of the null hypothesis of no difference between the mean flexibility of the entire linker set and the wHTH and AAT domain gives a p-value < 10−16) than the AAT and wHTH domains in all phyla and length intervals considered (Actinobacteria and Firmicutes box plots are shown in Fig. 4A while the complete set is reported in Fig. 6 in Ref. [41]). As a general trend, shorter linkers, less than 20 residue long, are more flexible, on average, than the longer ones (Fig. 4A). Likewise, MocR linkers are more hydrophilic (p-value for all-linkers in Actinobacteria <10−16), on average, than the AAT or the HTH domains (Figs. 4B and 7 in Ref. [41]). Also in this case, shorter linker appear to possess a stronger hydrophilic character than the longer ones. The distribution of the linker residue propensity derived by George and Heringa [2] on their linker set (here referred to as GH-linkers) were also calculated. The distributions should assess the similarity between the composition of the MocR linker sequences and those observed in the GH-linkers (Fig. 4C and Ref [41] Fig. 8). In other words, MocR linker average propensities greater than 1.0 would suggest that they contain residues observed frequently in the GH-linkers, the opposite for propensities lower than 1.0. Results confirm that the MocR linkers shorter than 40 residues possess many of the compositional characteristics observed in the GH-linker set although, interestingly, Firmicutes display different trends (Fig. 4C). Shorter Firmicutes linkers appear indeed to possess residues that show only weak GH-linker like propensities. Conformational propensity was also assessed using indices related to secondary structure frequency. In general, MocR linkers avoid α-helix and β-strand (Figs. 10 and 11 in Ref. [41]) while display a strong preference for β-turn or coil conformations (Fig. 4D, 9 and 12 in Ref. [41]). To further characterize the linker conformational propensity, secondary structure predictions were carried out. The program PREDATOR was chosen for its ease of use and possibility of computer local installation. Results (Table 7 in Ref. [41]) confirmed that about 80% linkers residues are predicted to be in coil conformation, irrespectively of linker length.

Sequence similarity

Linker sequences were used as queries in a BLAST search against the entire RefSeq protein databank to verify the presence of significant similarity to any other protein segments. Results suggest that linker sequences are hypervariable: we detected indeed only similarities between MocR linkers from organisms related at the level of species or genus. Only in a few cases, significant similarities were found between MocR linkers from different bacteria phyla (an example is reported in Fig. 5).
Fig. 5

Sequence alignment between the linkers found in MocR Commensalibacter sp MX01 (Alphaproteobacteria, UniProt: W7E7N5) and Carnobacterium divergens DSM 20623 (Firmicutes, UniProt: A0A0R2I605). Color indicates identical amino acids or with similar physicochemical properties.

Sequence alignment between the linkers found in MocR Commensalibacter sp MX01 (Alphaproteobacteria, UniProt: W7E7N5) and Carnobacterium divergens DSM 20623 (Firmicutes, UniProt: A0A0R2I605). Color indicates identical amino acids or with similar physicochemical properties.

Structure–function relationship

To test whether linker length could be correlated to MocR function, GabR and PdxR regulator sequences were collected from the RegPrecise databank (Table 8 in Ref. [41]) irrespective of the phylum of the source organism. GabR and PdxR have been chosen since they are so far the best characterized MocR regulators. Sequences of the two regulator subfamilies were aligned separately and a Hidden Markov Model profile [44] was calculated for each one of them. Each profile was utilized to search for other putative GabR or PdxR sequences in the reference proteomes data bank available at the Hmmer web server [30]. Data bank sequences to which Hmmer assigned an E-value smaller than 10−120 were retrieved and multiply aligned. The threshold was chosen as to increase the probability of collecting a sufficiently large number of true orthologs while minimizing paralogs. Linker sequences were extracted as described in the Materials and methods section. Linker length distribution in GabR and PdxR were compared (See Fig. 13 in Ref. [41]). Results suggest that the length distribution is rather dispersed around the mean especially in the GabR sample (see Fig. 13 in Ref. [41]): so, it not easy to determine any simple correlation between linker length and regulator function.

Discussion

Inter-domain linkers are attracting much interest cause of the functional roles, not yet fully understood, they play in multi-domain proteins and for their potential biotechnological and biomedical applications [45], [46]. For the same reasons, methods for automatic recognition of linker regions in protein structures have also been described in the literature [47], [48], [49]. Scrutiny of new linker systems may provide useful information to the understanding of their structural and functional properties and may help building a set of knowledge-based rules for de-novo design of artificial linkers with novel characteristics. In this work we report on the analysis of the predicted linkers connecting the wHTH and the AAT domains that constitute the three-dimensional architecture of the recently discovered bacterial family of MocR regulators. The components of this family share the same two-domain asset although they are markedly heterogeneous in terms of sequence similarity and linker structural characteristics [50]. Linkers connecting other multi-domain proteins have already been characterized, for example, the bacterial Q-linkers [51] or GH-linkers [2]. The MocR-linkers share many features with those linkers but appear also to possess a few unique characteristics that, in some cases, are phylum specific. For example, they show an average length significantly higher than the other linkers: 31.8 ± 14.3 residues while Q-linkers are between 15 and 25 residues in length and in the GH-linker average length is 10.0 ± 5.8 residues. Several MocRs of our sample have linkers predicted with a length greater than 60 residues (Table 1 in Ref. [41]) reaching in a few cases the length of about 150 residues. In these cases linkers may represent entire domains rather than a simple peptide stretch connecting two functional domains although they are still predicted to be mostly in extended conformation with only a fraction (about 25%) of putative α-helices (see Table 7 in Ref. [41]). Linker length and composition are parameters influencing the functional properties of the linker itself. For example, in the case of the OmpR regulator (controlling the expression of the porin genes ompF and ompC in E. coli) it has been experimentally demonstrated that linker length and composition influence its function [52]. Other examples are reported in the literature (for a review see Ref. [53]). Therefore, the striking heterogeneity of MocR linker lengths may reflect their functional diversification, the variety of the controlled genes [54], and the different mechanisms of DNA recognition and/or ability to interact with other regulative factors. Interestingly, comparative studies of the predicted MocR binding sites pointed out the lack of any conserved palindromic sequence shared by the whole MocR subfamily [54]. MocR linker residue composition displays both unique and common features compared to the Q-linkers and the GH-linkers. Q-linkers are characterized by the residues Gln, Arg, Glu, Ser and Pro [51] while GH-linkers by Pro, Arg, Phe, Thr, Glu and Gln [2]. High frequency of Pro is a common mark of all the different linker types. In the MocR linkers, Gln has a strong propensity for the Firmicutes and Gammaproteobacteria, weaker for Alphaproteobateria linkers. Arg is instead preferred in all phyla except in Firmicutes where it displays a lower propensity. Notably, Glu is frequent only in the Firmicutes MocR linkers. Interestingly, Lys is represented mainly in Firmicutes linkers, though with a marginal propensity, while it is avoided by all the other phyla. Phe and Thr are not significantly represented in the MocR linkers. Ser has a significant propensity everywhere, although weaker in Actinobacteria. Ala occurs mainly in Actinobacteria and Betaproteobacteria. Within MocR linkers, those from Firmicutes are characterized by the occurrence of Glu, Lys, His and Asn, not observed in the remaining phyla considered except for His that occurs also in Gammaproteobacteria. In general, presence of Arg or Lys suggests that linkers connecting domains of transcriptional regulators may interact with the DNA and play an active role in the regulation and/or recognition mechanism. Conformational role of Pro in the linkers have been clearly discussed by George and Heringa [2]: Pro is an imino acid with no hydrogen-bonding donation potential. For that reason, it generally destabilizes the regular secondary structures α-helix or β-strand and prevents contacts with the neighboring domains. It is therefore well suited for a polypeptide stretch whose function is connecting two domains and permitting their relative movements upon effector binding. Residue dyad analysis evidences that the ProPro dipeptide is highly represented in linkers from all phyla. ProPro dyad is also the most represented in the non-helical GH-linkers. The helical GH-linkers on the contrary do not show any strong preference for dyads containing Pro either at the N- or C-terminal side. In our linkers, dyads containing Pro at the N- or C-terminal side are also frequent; often, charged or polar residues are associated to the Pro residue within the dyad. Differences can be seen in the different phyla; for example, Firmicutes linkers display a somewhat more dispersed distribution of dyad propensities (Fig. 3D). In the non–helical GH-linkers the dyad TrpTrp is also very frequent. Interestingly, this dyad is instead rare in all the linkers from all the phyla although, in some cases, dyad containing Trp display a high propensities; for example, AspTrp in AlphaProteobacteria and Firmicutes or AsnTrp in Firmicutes. The high frequency of ProPro sequences supports the notion that the MocR linkers possess an extended conformation [55]. This is corroborated by the analysis of the amino acid properties that indicate a tendency toward hydrophilic character, and flexible extended conformation. These conclusions are also coherent with the properties observed in the linker of GabR from the Firmicutes Bacillus subtilis, the only MocR regulator of which the three dimensional structure has been solved. This linker is 29 residue long and is in an extended, mostly solvent-exposed, conformation. It possesses, among others, 4 and 5 Glu and Asp residues respectively; 3 Pro, one Lys and one Gln residues. No Arg residue is observed. There is also a single ProPro dyad. Two of the Asp and one Glu residue are involved in H-bonds to Arg residues from the other subunit (Table 5). The remaining Asp and Glu residues along with the only Lys are exposed, suggesting a possible interaction with other factors or even with DNA, upon conformational rearrangement of MocR quaternary structure. The linker residues display B-factors higher than in the rest of the dimer (Fig. 6). B-factors magnitude indeed reflects the amount of atom displacement around its average position and may indicate the degree of local flexibility [56], [57], [58]. Flexibility and rigidity are critical parameters for linker mechanical properties and strongly affect, for example, the function of fusion proteins [59]. Current models describing the possible mechanism of action of the MocR regulators predict that linkers allow movement of the wHTH domains to recognize the transcription factor binding sites on the DNA molecule [16], [17], [23].
Table 5

Residues of GabR linker from Bacillus subtilis (chain D of the PDB structure 4N0B).

ResidueSolvent exposureInteractions
Glu81Exposed
Leu82Buried
Asp83Exposed
Met84Exposed
Phe85Partly exposed
Ser86Exposed
Ala87Exposed
Glu88Exposed
Glu89Exposed
His90Exposed
Pro91Partly exposed
Pro92Exposed
Phe93Exposed
Ala94Partly exposed
Leu95Exposed
Pro96Exposed
Asp97ExposedSalt bridge to Arg331 of the other subunit
Asp98BuriedSalt bridge to Arg331 of the other subunit
Leu99Exposed
Lys100Exposed
Glu101BuriedSalt bridge to Arg155 of the other subunit
Ile102Exposed
His103Exposed
Ile104Exposed
Asp105Exposed
Gln106Partly exposedH-bond to Arg140 of the other subunit;H-bond to Arg451 of the same subunit
Ser107Exposed
Asp108Exposed
Trp109Partly exposed
Fig. 6

Stereo picture of the trace of the dimer of GabR from Bacillus subtilis (chains C and D in the entry PDB: 4N0B). Side chains are displayed only for residues belonging to the linker region. Atoms of the structure have been colored according to the magnitude of the B-factor; scale ranges from blue (low) to red (high). White and red atoms have the highest B-factors and are the most mobile. Residues mentioned in Table 5 involved in interactions are outlined with green lines and labeled.

Stereo picture of the trace of the dimer of GabR from Bacillus subtilis (chains C and D in the entry PDB: 4N0B). Side chains are displayed only for residues belonging to the linker region. Atoms of the structure have been colored according to the magnitude of the B-factor; scale ranges from blue (low) to red (high). White and red atoms have the highest B-factors and are the most mobile. Residues mentioned in Table 5 involved in interactions are outlined with green lines and labeled. Residues of GabR linker from Bacillus subtilis (chain D of the PDB structure 4N0B). This work was meant to study the overall structural properties of the inter-domain peptide linkers in the MocR bacterial regulators. Ideally, possible correlation between linker properties and parent MocR function should be explored to provide insights into the structure–function relationship in these bacterial regulators. Lack of a sufficiently large base of experimental functional characterizations hampers systematic and exhaustive analyses. An attempt to correlate a structural characteristic, namely linker length to function, suggests that a straightforward relation, at least in the GabR and PdxR subfamilies, may not be obvious.

Conclusions

This work sheds lights on the properties of the linker regions of the relatively new family of bacterial regulators MocR. The reported results suggest that the MocR linkers may be regarded as a novel linker group. Linkers were grouped according the phylum of source organisms for easy analysis, without any a-priori assumption. The results show that there are statistical trends characteristics of individual phyla, in particular for linkers extracted from the Firmicutes MocRs. Ideally, possible correlation between linker properties and parent MocR function should be tested as more experimental and functional data will accumulate. Even within these limits, the herein reported observations are useful for designing experiments aimed at understanding the role of the linkers within the MocR regulators. The set of linker sequences extracted from the regulators is also useful as a reference library to support the knowledge-based design of novel linkers with desired properties. The entire set of linker sequences will be made available by the authors at the site https://sites.google.com/a/uniroma1.it/pascarella_lab/ and detailed data are reported in Ref. [41].

Authors' contribution

This work was carried out in collaboration among all authors. Authors TM and SP designed the study, carried out the computer programming work and wrote the draft of the manuscript. Authors SA and AT carried out the databank searches and data collection and contributed to write the final version of the manuscript. Authors MLDS and RC read, revised and approved the manuscript and contributed to write the discussion section.

Conflict of interest statement

The authors declare no conflicts of interest pertaining to this work.
  59 in total

Review 1.  The manifold of vitamin B6 dependent enzymes.

Authors:  G Schneider; H Käck; Y Lindqvist
Journal:  Structure       Date:  2000-01-15       Impact factor: 5.006

2.  A study on the effects of linker flexibility on acid phosphatase PhoC-GFP fusion protein using a novel linker library.

Authors:  Ziliang Huang; Gang Li; Chong Zhang; Xin-Hui Xing
Journal:  Enzyme Microb Technol       Date:  2015-11-14       Impact factor: 3.493

Review 3.  Chapter 1: Variation in form and function the helix-turn-helix regulators of the GntR superfamily.

Authors:  Paul A Hoskisson; Sébastien Rigali
Journal:  Adv Appl Microbiol       Date:  2009       Impact factor: 5.086

4.  Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors:  Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2009-01-16       Impact factor: 6.937

5.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

6.  HMMER web server: 2015 update.

Authors:  Robert D Finn; Jody Clements; William Arndt; Benjamin L Miller; Travis J Wheeler; Fabian Schreiber; Alex Bateman; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2015-05-05       Impact factor: 16.971

7.  RefSeq microbial genomes database: new representation and annotation strategy.

Authors:  Tatiana Tatusova; Stacy Ciufo; Boris Fedorov; Kathleen O'Neill; Igor Tolstoy
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

8.  The extracytoplasmic linker peptide of the sensor protein SaeS tunes the kinase activity required for staphylococcal virulence in response to host signals.

Authors:  Qian Liu; Hoonsik Cho; Won-Sik Yeo; Taeok Bae
Journal:  PLoS Pathog       Date:  2015-04-07       Impact factor: 6.823

9.  Design and characterization of structured protein linkers with differing flexibilities.

Authors:  Joshua S Klein; Siduo Jiang; Rachel P Galimidi; Jennifer R Keeffe; Pamela J Bjorkman
Journal:  Protein Eng Des Sel       Date:  2014-10       Impact factor: 1.650

10.  SH2-catalytic domain linker heterogeneity influences allosteric coupling across the SFK family.

Authors:  A C Register; Stephen E Leonard; Dustin J Maly
Journal:  Biochemistry       Date:  2014-10-29       Impact factor: 3.162

View more
  2 in total

1.  Computational classification of MocR transcriptional regulators into subgroups as a support for experimental and functional characterization.

Authors:  Stefano Pascarella
Journal:  Bioinformation       Date:  2019-02-28

2.  Conformational transitions induced by γ-amino butyrate binding in GabR, a bacterial transcriptional regulator.

Authors:  Mario Frezzini; Leonardo Guidoni; Stefano Pascarella
Journal:  Sci Rep       Date:  2019-12-17       Impact factor: 4.379

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.