Literature DB >> 27668276

Data from computational analysis of the peptide linkers in the MocR bacterial transcriptional regulators.

Sebastiana Angelaccio1, Teresa Milano1, Angela Tramonti2, Martino Luigi Di Salvo1, Roberto Contestabile1, Stefano Pascarella1.   

Abstract

Detailed data from statistical analyses of the structural properties of the inter-domain linker peptides of the bacterial regulators of the family MocR are herein reported. MocR regulators are a recently discovered subfamily of bacterial regulators possessing an N-terminal domain, 60 residue long on average, folded as the winged-helix-turn-helix architecture responsible for DNA recognition and binding, and a large C-terminal domain (350 residue on average) that belongs to the fold type-I pyridoxal 5'-phosphate (PLP) dependent enzymes such aspartate aminotransferase. Data show the distribution of several structural characteristics of the linkers taken from bacterial species from five different phyla, namely Actinobacteria, Alpha-, Beta-, Gammaproteobacteria and Firmicutes. Interpretation and discussion of reported data refer to the article "Structural properties of the linkers connecting the N- and C- terminal domains in the MocR bacterial transcriptional regulators" (T. Milano, S. Angelaccio, A. Tramonti, M. L. Di Salvo, R. Contestabile, S. Pascarella, 2016) [1].

Entities:  

Keywords:  Dyad propensity; Flexibility; GabR; Hydrophobicity; Linker engineering; Linker length; Linker peptide; MocR regulators; PdxR; Residue propensity

Year:  2016        PMID: 27668276      PMCID: PMC5026710          DOI: 10.1016/j.dib.2016.08.064

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Data represent the description of the structural properties of the peptide linkers connecting the N- and C-terminal domains in the MocR bacterial regulators. Data provide researchers with a framework to select specific MocR for experimental characterization. Data provide a support to design experiments for the investigation of properties of specific MocR: for example, experiments of site-directed mutagenesis, deletions or insertions of linker regions. Data can help interpretation of experimental data obtained from MocR studies. Data provide a framework to derive rules for de-novo design of peptide linkers with desired properties.

Data

Results derived from computational analysis of the inter-domain sequences of the peptide linker connecting the N-terminal and the C-terminal domain of the bacterial transcriptional regulators of the subfamily MocR are herein reported. Data are shown as tables describing linker statistics such as residue and dyad composition propensities, predicted secondary structure frequency, and box-plots showing the distribution of several structural properties. Moreover, plots of length distributions of linkers from two specific MocR subgroups, namely PdxR and GabR, are also reported.

Experimental design, materials and methods

Data was created from the analysis of MocR sequences taken from the most populated phyla Actinobacteria, Firmicutes, Alpha-, Beta- and Gammaproteobacteria. Sequences of the MocR regulators in each phylum were retrieved from the UniProt data bank [2] accessed on October, 2015 with the application of RPSBLAST of the BLAST suite [3] and the CDD data bank [4]. The protein sequences containing both the wHTH and AAT domains identified by RPSBLAST were considered genuine MocR regulators. Before further processing, retrieved sequences were filtered at 75% sequence identity with the program CD-HIT [5]. Multiple sequence alignments were calculated with the programs ClustalO [6] and processed with the software Jalview [7]. Linker sequences were manually extracted from the multiple sequence alignments according with the wHTH and AAT domain boundaries assigned by RPSBLAST. List of the MocR regulators possessing linkers longer than 60 residues is reported in Table 1. Residue frequency and propensities were calculated as described in [1] and are displayed in Table 2, Table 3, Table 4, Table 5 organized according to linker length and phylum class. Propensities for the entire linker set are reported in [1]. Dipeptide frequency and propensity calculations relied on the software ‘compseq’ of the EMBOSS suite [8]. Table 6 reports the average number of residue dyads in each group. The highest the number, the highest the reliability of the dyad propensities reported in Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5. Average content of predicted secondary structures (obtained with the program PREDATOR [9]) are displayed in Table 7. Physicochemical properties were assigned to the amino acid residues according to the indices provided by the AAindex data bank [10] incorporated in the Interpol package [11] of the R-project library [12]. Distribution of the properties are reported as box-plots in Fig. 6, Fig. 7, Fig. 8, Fig. 9, Fig. 10 limited to the phyla Alphaproteobacteria, Betaproteobacteria and Gammaproteobacteria and in Fig. 11, Fig. 12 for all the phyla considered. Box-plots for Actinobacteria and Firmicutes missing in Fig. 6, Fig. 7, Fig. 8, Fig. 9, Fig. 10 are to be found in [1].
Table 1

List of MocR regulators predicted to have linkers of length equal or greater to 60 residues.

UniProt codeStartaEndbLength
A0A023C4T7_9PSED8814860
A0A0B2AVS1_9ACTN8514560
A0NP21_LABAI8014060
I9W6R0_9RALS8714760
W4CMK3_9BACL12118160
A0A074LC92_PAEPO8214361
I4N7I5_9PSED8714861
A0A0D5NE20_9BACL8514762
F8FPR4_PAEMK10616862
G8QJ34_DECSP8514762
V7DIJ8_9PSED8815062
W4P2V0_9BURK8714962
B9QZW6_LABAD8014363
F3KUT6_9BURK11818163
F7T5G0_9BURK8514863
M2X958_9MICC8014363
R9LS02_9BACL8314663
S2WJB8_DELAC8915263
A0A098SWK7_9PSED8815264
A0A0J6J2M6_9PSED8815264
A0A0F4KHT0_9ACTN10116665
D5BN74_PUNMI8214765
D7DQ74_METV09015666
K0YXF4_9ACTN7914566
A0A077LFC1_9PSED8715467
A0A095YU49_9FIRM7814567
H0BWG7_9BURK7514267
A0A087DUC1_9BIFI7814668
A0A090ZGE9_PAEMA8315269
A0A0A6Q9N6_9BURK7414369
F3JEN8_PSESX8815769
W0HH53_PSECI8815769
A0A0A6QBJ9_9BURK8915970
A0A0B4DLS5_9MICC8915970
A0A088Y9M0_BURPE8815971
A0A0F4JB47_9ACTN6213573
A0A069DE36_9BACL8515974
A0A087EGV8_9BIFI10518176
A0A089I7M0_9BACL8215876
A8SVX0_9FIRM7915576
A0A089N895_9BACL7815577
A0A0F5JX35_9BURK8416177
A0A0E4CZM5_9BACL9016878
A0A061LXN0_9MICO8416379
A0A0A8BLT7_9BURK8916879
R6HHE8_9ACTN7915980
X4ZGS7_9BACL8416480
A0A089HPN9_PAEDU7816284
D2PX75_KRIFD9317885
D3F8U9_CONWI8016686
A0A0A4HID4_9PSED8817991
F2RK57_STRVP8618094
C7MPD0_CRYCD7917495
F4QXL0_BREDI8317996
A0A087AB73_9BIFI7817597
A0A087E7D4_9BIFI7817597
A0A0B4DPH0_KOCRH8518398
F2RA50_STRVP8818698
V6KRX5_STRRC9319198
M8D4I1_9BACL79179100
A0A0A3JRX6_BURPE88189101
M8DED6_9BACL80183103
A0A087BLK1_BIFLN78187109
A0A087CXD8_9BIFI78187109
S6CDU1_9ACTN130244114
A0A0A6SYE7_9BURK87209122
F5LR05_9BACL82209127
A0A089IZ38_PAEDU84218134
A0A0B6S8F7_BURGL88231143
A0A087A119_9BIFI78222144
A0A089MC10_9BACL82234152
A0A089KZI8_9BACL82244162

Linker N-terminal sequence position.

Linker C-terminal sequence position.

Table 2

Residue propensities in the linkers of length range 0–20.

aAmino acid one-letter code.

bResidue propensity; cells containing values ≥1.01 and ≤1.19 and values ≥1.20 are shaded with light and dark grey respectively. In the latter case, numbers are boldfaces.

cNumber of residues in the sample.

Table 3

Residue propensities in the linkers of length range 21–40.

aAmino acid one-letter code.

bResidue propensity; cells containing values ≥1.01 and ≤1.19 and values ≥1.20 are shaded with light and dark grey respectively. In the latter case, numbers are boldfaces.

cNumber of residues in the sample.

Table 4

Residue propensities in the linkers of length range 41–60.

aAmino acid one-letter code.

bResidue propensity; cells containing values ≥1.01 and ≤1.19 and values ≥1.20 are shaded with light and dark grey respectively. In the latter case, numbers are boldfaces.

cNumber of residues in the sample.

Table 5

Residue propensities in the linkers of length range 61–200.

aAmino acid one-letter code.

bResidue propensity; cells containing values ≥1.01 and ≤1.19 and values ≥1.20 are shaded with light and dark grey respectively. In the latter case, numbers are boldfaces.

cNumber of residues in the sample.

Table 6

Average number of residue pairs in each data set.

Length intervals
All0–2021–4041–6061–200
Actinobacteria53.5±93.19.2±17.629.2±53.810.0±16.45.0±8.5
Alphaproteobacteria45.7±56.76.0±9.020.3±28.518.9±22.40.5±0.8
Betaproteobacteria57.1±78.23.2±5.125.5±35.125.1±34.83.0±5.8
Firmicutes83.0±63.56.4±6.839.9±34.832.4±25.04.4±6.4
Gammaproteobacteria82.0±81.98.7±9.450.8±54.120.9±20.61.5±3.5
Fig. 1

Dipeptide propensity for the entire set of linkers. Vertical and horizontal sides of each matrix indicate the N- and C-side residue of each dyad, respectively. Cells containing propensity values ≥1.1 and ≤1.99 or ≥2.0 and ≤3.99 or ≥4.0 are shaded with very light, light or dark grey respectively and numbers therein contained are boldfaced. A, B, C, D and E denote propensities for Actinobacteria, Alphaproteobacteria, Betaproteobacteria, Firmicutes and Gammaproteobacteria, respectively.

Fig. 2

Dipeptide propensity for the 0–20 residue length linker set. Interpretation of figure refers to legend to Fig. 1.

Fig. 3

Dipeptide propensity for the 21–40 residue length linker set. Interpretation of figure refers to legend to Fig. 1.

Fig. 4

Dipeptide propensity for the 41–60 residue length linker set. Interpretation of figure refers to legend to Fig. 1.

Fig. 5

Dipeptide propensity for the 61–200 residue length linker set. Interpretation of figure refers to legend to Fig. 1.

Table 7

Fraction of predicted secondary structure in linker regions.

Secondary structure
α-helixβ-strandcoil
Actinobacteria0.140.020.86
Alphaproteobacteria0.190.030.78
Betaproteobacteria0.300.010.69
Firmicutes0.020.060.92
Gammaproteobacteria0.260.020.72
Fig. 6

Box-plots of the distribution of the average linker flexibility (index #425 of Table 2 in [1] and code VINM940101 in AAindex [10]). Horizontal axis indicates the average flexibility distribution in the wHTH, AAT domains, in all linkers, and in linkers belonging to different length intervals: 0–20, 21–40, 41–60 and >60 residues. Y-axis reports the flexibility scale (label AI stands for Average Index). A, B, and C, denote Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria, respectively.

Fig. 7

Box plots of the distribution of average linker hydrophobicity (index #58 of Table 2 in [1] and code CIDH920105 in AAindex [10]). For interpretation of plots, refer to Fig. 6 caption.

Fig. 8

Box plots of the distribution of average Linker propensity index (#491 of Table 2 in [1] and code GEOR03010 in AAindex [10]). For interpretation of plots, refer to Fig. 6 caption.

Fig. 9

Box plots of the distribution of the average normalized β-turn propensity (index #37 Table 2 in [1] and code CHOP780101 in AAindex [10]). For interpretation of plots, refer to Fig. 6 caption.

Fig. 10

Box plots of the distribution of the average Chou–Fasman coil propensity (#24 of Table 2 in [1] and code CHAM830101 in AAindex [10]). For interpretation of plots, refer to Fig. 6 caption.

Fig. 11

Box plots of the distribution of average normalized α-helix propensity (index #38 of Table 2 in [1] and code CHOP780102 in AAindex [10]). A, B, C, D and E denote Actinobacteria, Alphaproteobacteria, Betaproteobacteria, Firmicutes and Gammaproteobacteria, respectively.

Fig. 12

Box plots of the distribution of average normalized β-sheet propensity (index #39 of Table 2 in [1] and code CHOP780103 in AAindex [10]). Letter interpretation is as in Fig. 11 caption.

The linker length distribution were analyzed within two specific MocR subfamilies: GabR [13] and PdxR [14] involved in the regulation of the synthesis of acid γ-amino butyric and pyridoxal 5′-phosphate, respectively. Sequences assigned to each of the two subgroups were retrieved from the RegPrecise data bank [15] and aligned separately (Table 8); a HMM profile [16] was calculated for each one of the multiple alignment. The profile was utilized to search for other putative GabR or PdxR sequences in the reference proteomes data bank available at the Hmmer web server [17]. Sequences showing an E-value smaller than 10−120, were retrieved and multiply aligned. Linker sequences were extracted as described above. Length distribution were plotted and compared for the GabR and PdxR sets (Fig. 13).
Table 8

GabR and PdxR sequences retrieved from RegPrecise data bank.

GabR
UniProt codeSpeciePhylum
A0A098SFD5Acinetobacter baumannii AB0057Gammaproteobacteria
Q6F766Acinetobacter sp. ADGammaproteobacteria
A7Z1D7Bacillus amyloliquefaciens FZB42Firmicutes
A8F9Y9Bacillus pumilus SAFR 032Firmicutes
P94426Bacillus subtilis subsp. subtilis str. 168Firmicutes
Q2KX56Bordetella avium 197NBetaproteobacteria
A0A0H3LKN1Bordetella bronchiseptica RB50Betaproteobacteria
Q0B6G3Burkholderia cepacia AMMDBetaproteobacteria
C5ALU9Burkholderia glumae BGR1Betaproteobacteria
A0A0H2XDM4Burkholderia mallei ATCC 23344Betaproteobacteria
B2JSD8Burkholderia phymatum STM815Betaproteobacteria
B2JR38Burkholderia phymatum STM815Betaproteobacteria
Q63NL7Burkholderia pseudomallei K96243Betaproteobacteria
A4JJX2Burkholderia vietnamiensis G4Betaproteobacteria
Q13LC0Burkholderia xenovorans LB400Betaproteobacteria
A9BMY2Delfia acidovorans SPH-1Betaproteobacteria
D4HXE9Erwinia amylovora ATCC 49946Gammaproteobacteria
Q6D5I8Erwinia carotovora subsp.atroseptica SCRI1043Gammaproteobacteria
A6TF79Klebsiella pneumonia subsp. pneumoniae MGH 78578Gammaproteobacteria
B2U7Y5Ralstonia pickettii 12JBetaproteobacteria
A8GJW1Serratia proteamaculans 568Gammaproteobacteria
Q4A0R1Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305Firmicutes
C4ZIR5Thauera sp.MZ1TBetaproteobacteria
Q7CJK7Yersinia pestis KIMGammaproteobacteria
A1VQK3Polaromonas naphthalenivorans CJ2Betaproteobacteria
Q129G7Polaromonas sp. JS666Betaproteobacteria
Q221G1Rhodoferax ferrireducens DSM 15236Betaproteobacteria
C5CM40Variovorax paradoxus S110Betaproteobacteria



PdxR
B9MKZ0Anaerocellum thermophilum DSM6725Firmicutes
A4XIB4Caldicellulosiruptor saccharolyticus DSM 8903Firmicutes
Q929S0Listeria innocua Clip11262Firmicutes
Q8Y5G3Listeria monocytogenes EGD eFirmicutes
A0AKK7Listeria welshimeri serovar 6b str. SLCC5334Firmicutes
C7MF20Brachybacterium faecium DSM 4810Actinobacteria
Q6AFC0Leifsonia xyli subsp.xyli str. CTCB07Actinobacteria
B3GXB5Actinobacillus pleuropneumoniae servar 7 str. AP76Gammaproteobacteria
Q5WKW3Bacillus clausii KSM K16Firmicutes
C3PLB2Corynebacterium aurimucosum ATCC 700975Actinobacteria
Q6NK11Corynebacterium diphtheriae NCTC 13129Actinobacteria
Q8NS92Corynebacterium glutamicum ATCC 13032Actinobacteria
B2GK63Kocuria rhizophila DC2201Actinobacteria
B9E8T3Macrococcus caseolyticus JCSC5402Firmicutes
W8TRW2Staphylococcus aureus subsp. aureus N325Firmicutes
B9DKX6Staphylococcus aureus subsp.carnosus TM300Firmicutes
A0A0H2VKR4Staphylococcus epidermidis ATCC 12228Firmicutes
A0A0Q1AKJ7Staphylococcus haemolyticus JCSC1435Firmicutes
Q49V27Staphylococcus saprophyticus subsp.saprophyticus ATCC15035Firmicutes
Fig. 13

Histogram of the linker length distribution in the MocR subgroups GabR and PdxR. Horizontal axis labels indicate length intervals: 20 corresponds to 0–20, 30 (21–30), 40 (31–40), 50 (41–50), 60 (51–60) and >60 (longer than 60 residues). Percentage (%) on the vertical axis indicates the fraction of linkers in the length interval. Sequences were retrieved from the reference proteomes data bank available at the Hmmer web server [17] using a significance E-value thresholds equal to 10−120. With this threshold, 885 and 334 sequences were retrieved for GabR and PdxR, respectively.

Perl and R-scripts were written for data analysis, processing and display.
Subject areaBiology
More specific subject areaStructural properties of linkers in the bacterial transcriptional regulators
Type of dataTable, graph, figure
How data was acquiredDatabank searches. Computational analysis
Data formatRaw, filtered, analyzed
Experimental factorsAnalyses were mostly carried out with Perl, Python and R scripts and software for structural bioinformatics
Experimental featuresLinker sequences were extracted from multiple sequence alignments of MocR regulators. Computational analysis defined the residue and residue dyads propensities and the distribution of physicochemical properties in the linker sequences.
Data source locationUniProt, RefSeq
Data accessibilityData is within this article. Linker sequence sets are available athttps://sites.google.com/a/uniroma1.it/pascarellalab/home/resources
  14 in total

1.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

2.  Bacillus subtilis GabR, a protein with DNA-binding and aminotransferase domains, is a PLP-dependent transcriptional regulator.

Authors:  Boris R Belitsky
Journal:  J Mol Biol       Date:  2004-07-16       Impact factor: 5.469

Review 3.  Profile hidden Markov models.

Authors:  S R Eddy
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

4.  CDD: NCBI's conserved domain database.

Authors:  Aron Marchler-Bauer; Myra K Derbyshire; Noreen R Gonzales; Shennan Lu; Farideh Chitsaz; Lewis Y Geer; Renata C Geer; Jane He; Marc Gwadz; David I Hurwitz; Christopher J Lanczycki; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Zhouxi Wang; Roxanne A Yamashita; Dachuan Zhang; Chanjuan Zheng; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

5.  Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors:  Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2009-01-16       Impact factor: 6.937

6.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

7.  AAindex: amino acid index database, progress report 2008.

Authors:  Shuichi Kawashima; Piotr Pokarowski; Maria Pokarowska; Andrzej Kolinski; Toshiaki Katayama; Minoru Kanehisa
Journal:  Nucleic Acids Res       Date:  2007-11-12       Impact factor: 16.971

8.  HMMER web server: 2015 update.

Authors:  Robert D Finn; Jody Clements; William Arndt; Benjamin L Miller; Travis J Wheeler; Fabian Schreiber; Alex Bateman; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2015-05-05       Impact factor: 16.971

9.  RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

Authors:  Pavel S Novichkov; Alexey E Kazakov; Dmitry A Ravcheev; Semen A Leyn; Galina Y Kovaleva; Roman A Sutormin; Marat D Kazanov; William Riehl; Adam P Arkin; Inna Dubchak; Dmitry A Rodionov
Journal:  BMC Genomics       Date:  2013-11-01       Impact factor: 3.969

10.  RefSeq microbial genomes database: new representation and annotation strategy.

Authors:  Tatiana Tatusova; Stacy Ciufo; Boris Fedorov; Kathleen O'Neill; Igor Tolstoy
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

View more
  4 in total

1.  Molecular dynamics simulation unveils the conformational flexibility of the interdomain linker in the bacterial transcriptional regulator GabR from Bacillus subtilis bound to pyridoxal 5'-phosphate.

Authors:  Teresa Milano; Adnan Gulzar; Daniele Narzi; Leonardo Guidoni; Stefano Pascarella
Journal:  PLoS One       Date:  2017-12-18       Impact factor: 3.240

2.  Computational classification of MocR transcriptional regulators into subgroups as a support for experimental and functional characterization.

Authors:  Stefano Pascarella
Journal:  Bioinformation       Date:  2019-02-28

3.  Conformational transitions induced by γ-amino butyrate binding in GabR, a bacterial transcriptional regulator.

Authors:  Mario Frezzini; Leonardo Guidoni; Stefano Pascarella
Journal:  Sci Rep       Date:  2019-12-17       Impact factor: 4.379

4.  Structural properties of the linkers connecting the N- and C- terminal domains in the MocR bacterial transcriptional regulators.

Authors:  Teresa Milano; Sebastiana Angelaccio; Angela Tramonti; Martino Luigi Di Salvo; Roberto Contestabile; Stefano Pascarella
Journal:  Biochim Open       Date:  2016-07-20
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.