Kavitha Kurup1, A Keith Dunker2, Sankaran Krishnaswamy1. 1. Centre of Excellence in Bioinformatics; School of Biotechnology; Madurai Kamaraj University; Madurai, Tamil Nadu, India. 2. Centre for Computational Biology and Bioinformatics; Indiana University School of Medicine; Indianapolis, IN USA.
Abstract
The traditional view of "sequence-structure-function" has been amended by the discovery of intrinsically disordered proteins. Almost 50% of PDB structures are now known to have one or more regions of disorder, which are involved in diverse functions. These regions typically possess low aromatic content and sequence complexity as well as high net charge and flexibility. In this study, we examined the composition and contribution of intrinsic disorder in outer membrane β barrel protein functions. Our systematic analysis to find the dual personality (DP) fragments, which often function by disorder-order transitions, revealed the presence of 61 DP fragments with 234 residues in β barrel trans membrane protein structures. It was found that though the disorder is more prevalent in the periplasmic regions, most of the residues which undergo disorder-order transitions are found in the extracellular regions. For example, the calcium binding sites in BtuB protein are found to undergo disorder to order transition upon binding calcium. The conformational change in the cell receptor binding site of the OpcA protein, which is important in host cell interactions of N. meningitidis, was also found to be due to the disorder-order transitions occurring in the presence of the ligand. The natively disordered nature of DP fragments makes it more appropriate to call them "functional fragments of disorder." The present study provides insight into the roles played by intrinsically disordered regions in outer membrane protein functions.
The traditional view of "sequence-structure-function" has been amended by the discovery of intrinsically disordered proteins. Almost 50% of PDB structures are now known to have one or more regions of disorder, which are involved in diverse functions. These regions typically possess low aromatic content and sequence complexity as well as high net charge and flexibility. In this study, we examined the composition and contribution of intrinsic disorder in outer membrane β barrel protein functions. Our systematic analysis to find the dual personality (DP) fragments, which often function by disorder-order transitions, revealed the presence of 61 DP fragments with 234 residues in β barrel trans membrane protein structures. It was found that though the disorder is more prevalent in the periplasmic regions, most of the residues which undergo disorder-order transitions are found in the extracellular regions. For example, the calcium binding sites in BtuB protein are found to undergo disorder to order transition upon binding calcium. The conformational change in the cell receptor binding site of the OpcA protein, which is important in host cell interactions of N. meningitidis, was also found to be due to the disorder-order transitions occurring in the presence of the ligand. The natively disordered nature of DP fragments makes it more appropriate to call them "functional fragments of disorder." The present study provides insight into the roles played by intrinsically disordered regions in outer membrane protein functions.
Entities:
Keywords:
BtuB protein; OpcA protein; bacterial outer membrane; dual personality fragments; functional fragments of disorder; intrinsic disorder; transmembrane protein
Many regions of proteins or whole protein structures are found to be intrinsically disordered (ID) under native conditions., These regions may adopt a non-globular structure or even remain unfolded in solution. While ID regions typically exhibit high sequence variability, some of these regions show sequence conservation,, and the disordered regions themselves are often conserved., Furthermore, these ID regions possess a variety of biological functions- such as displaying a binding site for associating with another protein or displaying a site for post-translational modification such as phosphorylation or other modifications. The absence of a well-defined structure allows disordered binding sites to interact with several different targets.- Discovery of protein intrinsic disorder in organisms with different complexities, i.e., from viruses to eukaryotes, have revealed that eukaryotes possess a larger portion of ID compared with bacteria and archaea, while viral proteomes have the widest range of predicted disorder., Despite the smaller fraction of ID in prokaryotic proteins, intrinsic disorder has shown to play significant roles in these organisms.- As the intrinsically disordered regions are mostly studied in regulating and signaling proteins, great interest has been generated in studying their roles in prokaryotic and highly structured proteins.The prokaryotic transmembrane β barrel proteins (TMBETA) are found in the outer membranes of gram negative bacteria and they execute the function of import or export of molecules across the membrane. The TM (Transmembrane) regions of the β barrel membrane proteins are found to be highly structured within the membrane bilayer. These surface exposed proteins are found to play a critical role in pathogenic processes, such as adherence, motility, and colonization of the host cells, formation of channels for the removal of antibiotics, as well as the injection of toxins and cellular proteases., Due to these functional properties, outer membrane proteins are also attractive targets for the development of antimicrobial drugs and vaccines., The existence of intrinsic disorder has been reported recently in these highly structured TMBETA proteins through computational approaches.Many ID fragments undergo disorder-to-order transitions upon interaction with proteins. In general, such fragments are identified as segments that become structured upon binding to a partner and are called “molecular recognition features” or “MoRF”s. They have also been termed as preformed structural elements (PSEs) and pres-structured motifs (PreSMos) in the literature., A second way to identify such fragments was suggested by Zhang et al., who showed that different crystal structures of the same protein often contain regions that are ID in 1 crystal but structured in a different crystal. These fragments were referred to as dual personality (DP) fragments because they exist in disordered as well as ordered states. They are mostly enriched next to functional regions (active sites, post-translational modification sites). MoRFs and DP fragments appear to have similar sequence features. If a DP region is involved in partner recognition, it may be termed as a MoRF. In order to understand the functionality of disordered regions in TMBETA proteins, fragments that undergo disorder–order transmission were screened in the present study. The distinct disorder compositions in different regions of TMBETA may also provide a better training set for developing disorder predictors.
Results
Analysis of disordered regions
The compositions of disorder data sets of TMBETA proteins with partially disordered, fully disordered and fully ordered data sets of water soluble globular proteins have earlier been compared. Further, comparison was also made with disorder predictions from different algorithms, as a preliminary step toward checking the need for developing a new disorder predictor for membrane proteins. Distinct amino acid compositions was found in the disordered regions of TMBETA while compared with globular proteins. In our study, the ordered and the different disordered data set from TMBETA proteins were analyzed extensively in order to understand the compositions and functions of intrinsic disorder in these proteins.The loop regions in TMBETA are found mainly on the periplasmic and extracellular regions. There is an assumption that these “exterior to the bilayer” segments are likely to be more similar to the disordered region than the TM segments. In order to check this, the amino acid compositions of loop regions to that of the reference TM segment data set were compared (Fig. 1). It was confirmed that the disorder promoting residues R, Q, S, P, E, K and the other flexible amino acids N and D were highly enriched in the loop data set; whereas, the order promoting residues W, F, I, Y, V, L and H were found to be depleted.
Figure 1. Amino acid profiles are calculated as (Loop-TM)/TM where, Loop is the composition ratio of a given type of amino acid in the loop data set set, and TM is the composition of the same amino acid in the data set being compared, thus giving the fractional difference for each amino acid for the data sets being compared. The X-axis shows the arrangement of the amino acids from the most structure-promoting on the left to the most disorder-promoting on the right.
Figure 1. Amino acid profiles are calculated as (Loop-TM)/TM where, Loop is the composition ratio of a given type of amino acid in the loop data set set, and TM is the composition of the same amino acid in the data set being compared, thus giving the fractional difference for each amino acid for the data sets being compared. The X-axis shows the arrangement of the amino acids from the most structure-promoting on the left to the most disorder-promoting on the right.The amino acid compositions of periplasmic loop regions were compared with the reference extracellular loop regions (Fig. 2A) to distinguish the composition bias in these data sets. Periplasmic loop regions were found to be enriched in the residues I, V, L, A, Q, and P while extracellular loop data set was enriched in residues W, Y, H, T, R, S, N, D, E, and K, showing distinct amino acid compositions among them.
Figure 2. (A) The fractional difference between the amino acid frequencies in the periplasmic loop regions (IC) and extracellular loop regions (EC) of TMBETA proteins. The X-axis shows the arrangement of the amino acids from the most structure-promoting on the left to the most disorder-promoting on the right. (B) The fractional difference for disordered amino acids in the periplasmic loop regions (disin) and extracellular loop regions (disout) are compared.
Figure 2. (A) The fractional difference between the amino acid frequencies in the periplasmic loop regions (IC) and extracellular loop regions (EC) of TMBETA proteins. The X-axis shows the arrangement of the amino acids from the most structure-promoting on the left to the most disorder-promoting on the right. (B) The fractional difference for disordered amino acids in the periplasmic loop regions (disin) and extracellular loop regions (disout) are compared.Subsequently, the compositions of intrinsic disorder present in both periplasmic and extracellular loop regions were compared (Fig. 2B). For this, the disorder fragments of these proteins were mapped to the inner (periplasmic) and outer (extracellular) regions. In total, about 699 disordered residues were found to be completely mapped to the periplasmic region. This is also in agreement with an earlier observation that disorder is more frequent in the periplasmic regions in integral membrane proteins. There were only 470 disordered residues positioned in the extracellular regions. Since the extracellular loops are longer than the periplasmic loops/turns, the ratio between the percentages of disordered residues of extracellular to periplasmic loops was calculated and found to be 1:2. The major residues presented in disordered data sets of periplasmic and extracellular regions were L, A, Q and Y, T, R, G, D, respectively.Clustering of disordered data set with the three different parts of TMBETA was performed to clarify and examine the preference in disordered data set (Fig. 3). The TM region was found to form a separate cluster apart from the disordered regions. Supporting the composition analysis, Figure 3 shows the clustering of disorder with loop regions, especially with the periplasmic regions.
Figure 3. Clustering of amino acids in the TM (transmembrane), Icloop (periplasmic loop), Ecloop (Extracellular loop) and disorder data sets of TMBETA based on their frequency shows the grouping of disorder data set to the Icloop data set. The topmost matrix is the column proximity matrix. Just below it is the raw data matrix and the right one is row proximity matrix. The raw data matrix together with 2 proximity matrices are projected through color spectrums to create matrix maps. The raw data matrix uses green-black-red color scheme to show low to high frequency of amino acids in the 4 data sets used. The distance of the amino acids in the 4 data sets used are shown in row proximity matrix (color spectra - Rainbow 130).
Figure 3. Clustering of amino acids in the TM (transmembrane), Icloop (periplasmic loop), Ecloop (Extracellular loop) and disorder data sets of TMBETA based on their frequency shows the grouping of disorder data set to the Icloop data set. The topmost matrix is the column proximity matrix. Just below it is the raw data matrix and the right one is row proximity matrix. The raw data matrix together with 2 proximity matrices are projected through color spectrums to create matrix maps. The raw data matrix uses green-black-red color scheme to show low to high frequency of amino acids in the 4 data sets used. The distance of the amino acids in the 4 data sets used are shown in row proximity matrix (color spectra - Rainbow 130).
Analysis on DP fragments in TMBETA
The initial work on DP by Zhang et al. analyzed 19,858 PDBs and found that 45% of clusters contain DP fragments. They have reported only 2 clusters of outer membrane proteins in their redundant PDB and none of these clusters were found to possess DP fragments in their structures. Hence, it was interesting to identify the presence of DP fragments in TMBETA and to further characterize their composition and functional roles.Twenty-four CD-HIT clusters with identical protein chains were found to possess DP fragments in TMBETA. The number of DP residues of length <10 AA were 147 (62.82%), and the number of DP residues of length >10 AA were 87 (37.18%) (Fig. 4). Most of the DP fragments were of length between 1–3 AA and the longest DP fragment length found was 18AA. Although an attempt to categorize DP fragments according to their length and to find their specific roles were performed, the small size of the data set and lack of literature makes it difficult to analyze with statistical significance.
Figure 4. The distribution of DP fragments with varying amino acid lengths. The x axis denotes the length of DP fragments, and the y axis is the number of DP fragments with given length. All DP fragments has shown length below 20 amino acids.
Figure 4. The distribution of DP fragments with varying amino acid lengths. The x axis denotes the length of DP fragments, and the y axis is the number of DP fragments with given length. All DP fragments has shown length below 20 amino acids.The secondary structure of DP fragments in outer membrane proteins were found to be different from that of non-membrane proteins mentioned by Zhang et al. wherein, they show the order of secondary structure possessing DP fragments as loop > turns > helix > β. In the case of TMBETA, a complete absence of turns were observed, to make the order loop > helix > β. This was also different from the number of secondary structures in PDB (helix > sheet > turn > loop) (Fig. 5). The number of residues forming C (loop/irregular), H (Helix) and E (Extended Beta strand) structures were 206 (88.03%), 18 (7.70%), and 10 (4.27%), respectively.
Figure 5. Secondary structure distribution of DP fragments in (A) outer membrane proteins, (B) non-membrane proteins and (C) in PDB.
Figure 5. Secondary structure distribution of DP fragments in (A) outer membrane proteins, (B) non-membrane proteins and (C) in PDB.The frequencies of amino acids in the disordered and DP fragment data sets were further compared with that of the background ordered data set and disordered fragments (Fig. 6). It was found that the disordered data set in TMBETA are enriched in the residues H, Q, S, P, and E, whereas the ordered data set of TMBETA are enriched in W, F, I, Y, and V. The enrichment of the residue H, which is not seen in the disordered data set of globular proteins, is found to be a remarkable feature of intrinsic disorder in TMBETA proteins. The DP fragment data set was also found to be enriched in residues H and P, which is found to be a subset of residues enriched in disordered data set.
Figure 6. The fractional difference between the amino acid frequencies of (A) disordered data set and (B) DP data set to that of the reference ordered data set.
Figure 6. The fractional difference between the amino acid frequencies of (A) disordered data set and (B) DP data set to that of the reference ordered data set.Clustering of amino acids based on the ratio between their relative abundance in order, disorder and DP regions were performed using the GAP program. It was found that DP fragments tend to cluster with the ordered fragments in TMBETA though their properties are clearly shown in between that of the ordered and disordered regions (Fig. 7). Likewise, when DP residues and disordered residues were clustered with periplasmic and extracellular loops, disordered residues tend to cluster with periplasmic regions while the DP fragments clustered with the extracellular regions.
Figure 7. Clustering of DP fragments with order and disorder shown using GAP program. The topmost one is the column proximity matrix. Just below it is the raw data matrix and the right one is row proximity matrix. The raw data matrix together with 2 proximity matrices are projected through color spectrums to create matrix maps. The raw data matrix uses green-black-red color scheme to show low to high frequency of 20 amino acids in the 3 data sets used. The distance of the amino acids in the 3 data sets used are shown in row proximity matrix (color spectra - Rainbow 130).
Figure 7. Clustering of DP fragments with order and disorder shown using GAP program. The topmost one is the column proximity matrix. Just below it is the raw data matrix and the right one is row proximity matrix. The raw data matrix together with 2 proximity matrices are projected through color spectrums to create matrix maps. The raw data matrix uses green-black-red color scheme to show low to high frequency of 20 amino acids in the 3 data sets used. The distance of the amino acids in the 3 data sets used are shown in row proximity matrix (color spectra - Rainbow 130).The distribution of DP residues and all residues in the periplasmic (IC), extracellular (EC) and TM regions are shown in Table 1. The number of DP residues in EC region (138) is most compared with the other regions. The logs-odd ratio is highest for the DP residues in the EC region (0.64 bits). This shows that most of the residues, which undergo disorder–order transitions are present in the extracellular regions.
Table 1. Distribution of residues in extracellular (EC), periplasmic (IC), transmembrane (TM) regions in the transmembrane β data set
EC
IC
TM
Total
DP residues
138
90
6
234
All residues
8043
6224
7046
21313
Fraction DP fDP
0.59
0.38
0.03
1.0
Fraction all residues fall
0.38
0.29
0.33
1.0
Odds ratio (fDP/fall)
1.56
1.32
0.08
-
Logs-odd (bits)
0.644
0.397
−3.689
-
The observation that most of the residues, which undergo disorder-order transitions are found in the extracellular regions was a motivation to investigate more on the functionality of disordered regions in TMBETA proteins. The Scansite search was performed to find the presence of functional motifs onto which disorder regions can be mapped. Not a single region was mapped to the current motifs in Scansite possibly indicating unique functions of these regions in the membrane proteins. Two examples of protein intrinsic disorder function we could observe in TMBETA are explained in detail further.
DP fragments in BtuB protein
Calcium-binding sites in proteins play major roles in stabilizing protein structures. It is known that, in the absence of ligands, binding sites in the apo state are often found to be in disordered state. Here, we have used 2 identical structures of spin-labeled BtuB V10R, 1 with bound calcium and the other in the apostate to show the disorder-to-order transitions of this protein upon binding to calcium.The crystal structure of spin-labeled BtuB V10R1 with bound calcium and cyanocobalamin vitamin (3M8D) have fragments, 178–195, 229–240 and 278–287, in ordered state (Fig. 8A). These fragments are disordered in the crystal structure of spin-labeled BtuB V10R1 in the apostate (3M8B). The 2 extracellular loops, 178–195 and 229–240, play important roles in signal transduction after the substrate binding. The calcium binding site in these loops was formed by four negatively charged residues: D178, D193, D195, and D230. The B12 binding site includes the top of the hatch domain and parts of these loops in this structure. With the bound calcium ions, near the loops 178–195 and 229–240, the motion of these loops are found to be more correlated with the motion of hatch domain (Fig. 8B). Therefore, calcium binding to these loops has stabilized the substrate binding site, which increases the binding affinity to B12. The enhanced correlation between these loops and the hatch domain also indicates stiffening of the protein structure, which may promote signal transduction through the hatch domain.
Figure 8. (A) The superimposed backbone structures of 3M8D (Blue–Ligand bound state) and 3M8B (Red–Apostate). 3M8D shows fragments 178–195 (green), 229–240 (yellow), 278–287 (magenta) in ordered state. These fragments are disordered and so cannot be seen in the crystal structure of spin-labeled BtuB V10R1 in the apostate (3M8B). (B) Figure shows the zoom-in of ligand bound state: B12 (cyan) and Ca2+ (spheres) bound to the 4 aspartate residues (orange) in 3M8D.
Figure 8. (A) The superimposed backbone structures of 3M8D (Blue–Ligand bound state) and 3M8B (Red–Apostate). 3M8D shows fragments 178–195 (green), 229–240 (yellow), 278–287 (magenta) in ordered state. These fragments are disordered and so cannot be seen in the crystal structure of spin-labeled BtuB V10R1 in the apostate (3M8B). (B) Figure shows the zoom-in of ligand bound state: B12 (cyan) and Ca2+ (spheres) bound to the 4 aspartate residues (orange) in 3M8D.As the well-conserved DP fragments across different species are biologically meaningful, the conservation of DP fragments in orthologous proteins of the Escherichia coli BtuB was investigated. It was found that, aspartate, the key functional residue in the BtuB DP fragments (Fig. 9), is highly conserved in orthologous proteins.
Figure 9. The orthologous proteins of BtuB from different organisms are aligned and the conserved DP fragments are shown in blue background. The highly conserved aspartate residues are shown in red boxes.
Figure 9. The orthologous proteins of BtuB from different organisms are aligned and the conserved DP fragments are shown in blue background. The highly conserved aspartate residues are shown in red boxes.
DP fragments in OpcA protein
OpcA, the outer membrane adhesin protein from Neisseria meningitidis causes meningitis and septicimia in humans. The interactions of OpcA with the proteoglycan surface receptors of the mammalian epithelial cells promote the binding of N. meningitidis. The understanding on the structural levels of adhesion to the host cells by N. meningitidis has been made possible by the crystal structure of OpcA (PDB ID: 1K24). This structure was solved using crystals of OpcA grown in surfactant based vapor diffusion method, with inclusion of zinc ions, which was mandatory for crystal formation. The crystal structure revealed that the 2 zinc ions are bound to the extracellular loop regions. The second structure of OpcA (PDB ID: 2VDF) was solved using in meso method which is different from the in surfo method used in the original structure. The 2 crystal structures clearly demonstrate the high motion of extracellular loops, which may play a role in ligand binding by induced fit. While investigating the disordered residues in the original in surfo crystal structure of OpcA, apart from the residues 1–4 in the N-terminal, no disordered residues were identified. In the meso structure of OpcA, the residues A1 - T6, K124 - T129, G174 - K180, E203 - S204, and V226 - S232 were found to be disordered. This provided the DP fragment data of the 1K24–2VDF cluster using CD-HIT. The topology of these DP fragments is given in Table 2.
Table 2. Topology of DP fragments in OpcA
DP fragments
Topology
Q5-T6
N-term
K124-T129
Loop-3
G174-K180
Loop-4
E203-S204
Turn-4
V226-S232
Loop-5
The loops L3, L4, and L5, which exhibit disorder (in 2VDF) to order (in 1K24) transitions are found to play significant roles in the formation of crevice, which is the proposed location of heparin/proteoglycan binding site (Fig. 10A). The presence of the zinc ion, which acted as a ligand, have made the disordered loops of the in meso structure to become ordered in the in surfo crystal structure of OpcA (Fig. 10B). The same disorder–order transition of loops may occur in the presence of proteoglycan ligands. Cherezov et al. has reported that information on the disparities between the 2 structures at the loops L3, L4 and L5 are not available due to the weak and untraceable densities in the in meso maps of those regions. The present study suggests that the presence of the ligand is responsible for the formation of the binding pocket mainly by the disorder–order transitions of L3, L4, and L5 and the closing of pocket by L2. The conservation studies of these DP fragments as performed in Btub were not possible in the case of OpcA due to the lack of orthologous proteins of OpcA.
Figure 10. (A) The superimposed backbone structures of 1K24 (Blue–Ligand bound state) and 2VDF (Red–Apostate). The DP fragments 124–129 (green), 174–180 (yellow), 226–232 (magenta) correspond to L3, L4, L5 and the DP fragment 203–204 (black) correspond to T4 in 1K24. (B) Figure shows the zoom-in of the ligand bound state: ordered loops L3, L4, and L5 in the presence of zinc ion (sphere).
Figure 10. (A) The superimposed backbone structures of 1K24 (Blue–Ligand bound state) and 2VDF (Red–Apostate). The DP fragments 124–129 (green), 174–180 (yellow), 226–232 (magenta) correspond to L3, L4, L5 and the DP fragment 203–204 (black) correspond to T4 in 1K24. (B) Figure shows the zoom-in of the ligand bound state: ordered loops L3, L4, and L5 in the presence of zinc ion (sphere).
Disordered residue classification
In this study, it is noticed that though the DP fragments have properties in between those of the ordered and disordered residues, natively they are found in the disordered state. The natively disordered nature of DP fragments makes it more appropriate to call them “functional fragments of disorder.” It should also be noted at this point that not only the DP fragments, but also the MoRFs, SLiMs and even linker regions that do not undergo induced folding may be called as functional fragments of disorder. As the DP residues are considered as natively disordered residues, the disordered residues can be classified into three categories: (1) disordered residues that are only found in 1 structure, (2) disordered residues conserved in 2 or more structures and (3) the DP residues. In TMBETA, the residues which are always disordered were found mostly in single structures. The number of disordered residues conserved among two or more structures was found to decrease as the number of structures increases (Table 3).
Table 3. Disordered residue classification
Always disordered in:
Always disordered in:
DP residues
1 structure
2 structures
3 structures
>3 structures
234
1,003 residues
283 residues
139 residues
34 residues
Discussion
Although several computational studies have shown the significance of disorder in trans membrane α-helical proteins,, the occurrence of disorder in TMBETA proteins and a comparison of their characteristics to the structured and disordered regions of globular proteins is reported only recently. Therefore, the present study aims to give a more detailed vision of intrinsic disorder in TMBETA proteins.During the disordered data set preparation, it was observed that the number of PDB structures for TMBETA is less in number and the disorder fraction in the TMBETA is even lesser in these PDBs. So the first attempt made was to include the maximum number of PDB chains for the study. Clustering PDBs with 100% sequence identity and also taking sequence similar, but structurally dissimilar PDBs helped to increase the disorder data set.The amino acid composition analysis has shown that disordered residues are more prevalent in the loop regions of TMBETA. Hence, the question was which loop region, whether extracellular or periplasmic loops display more disordered regions. We observed that, although the extracellular loops are longer in TMBETA the disordered content was found to be significantly higher in periplasmic loops of TMBETA proteins. This was also supported by clustering analysis showing clustering of disorder with periplasmic data set. It should be noted that, though disorder is more prevalent on periplasmic region, the proposed “functional fragments of disorder” is exhibited more by the extracellular region. The disordered state of the longer extracellular loops, which are continuously exposed to the environment, may allow more flexibility to capture specific ligands and they become ordered while ligand binding. The conformational changes occurring in the extracellular loop regions may also allow the pore opening for the entry of molecules. A recent study has shown that in CorA closed channel X-ray structures, the periplasmic loops are largely disordered. Their structural data suggest that the periplasmic loops have prominent role in stabilization of open conformation of CorA channels. Likewise, the more disordered residues observed in periplasmic regions in our study may be an attribute of such functions. The lack of right partners for binding may also be the reason for the inability for all the disordered regions to exhibit disorder-to-order transitions. The small size of the data set available currently has made it difficult to do more detailed investigation regarding this aspect.The current study demonstrates the importance of intrinsically disordered regions of TMBETA proteins, in making interactions with the external environment, facilitating binding of molecules from the environment for their smooth transport across the membrane and into the cell. The conformational dynamics of OpcA crystal structure when transplanted to the bilayer environment was found to be due to the presence of disordered residues in the external loops. These residues may be helping in the interactions of OpcA with different molecules like heparin, vitronectin, fibronectin, integrins, etc., It is already shown that binding sites for heparin and integrins are found in disordered sequences of extracellular proteins. The native disorder in these loop regions may also provide space to fit different sizes of the ligands, while forming the binding pocket. The Opc protein is already in use, as a surface exposed immunogenic adhesin of N. meningitidis present in an experimental outer membrane vesicle vaccine used for vaccination against serogroup B disease. The transient, weak and specific interactions of the disorder-to-order transitions has opened new avenue to be explored via the drug discovery process. Either the drugs mimicking the critical regions of disordered partner compete with the binding site on structured partner or drugs directly targeting disordered regions can be developed. As the OpcA is an important adhesin particularly in meningitis in humans caused by N. meningitides, drugs which target OpcA by exploiting the disordered interface can be further investigated.Disorder in globular proteins has been earlier shown to be strongly associated with signaling, by providing binding sites for other molecules. In the present study, while looking at the disorder in highly structured TMBETA proteins which exhibits disorder-order transitions, we obtained the properties and functions of disorder that are retained for this type of protein. The distinct compositions of disordered residues in different regions of TMBETA point toward the need for developing different predictors for the prediction of disorder in these proteins. The proposed “functional fragments of disorder” were found to be enriched in residues, which are subsets of enriched residues in the disordered data set. The clustering of functional fragments of disorder with ordered data set shows their tendency to become ordered than to be in their native disordered state. The example of BtuB examined here illustrates the importance of intrinsic disorder in cobalamin transport across the outer membrane. The importance of intrinsic disorder shown in host-pathogen interactions of OpcA protein may open a new way to use the intrinsic disorder in membrane proteins to target candidate drugs.
Materials and Methods
Disordered data set
The 209 structures of TMBETA utilized in this study were retrieved from the PDB of transmembrane proteins (PDBTM) as of November 2011. This data set was filtered to obtain the primary data set as following: after excluding structures other than X-ray, 194 PDBs were used and the structures with resolution worse than 3Å were discarded. In the next step, CD-HIT was used to cluster the remaining 158 PDBs at a 100% identity cut off. Totally, 116 clusters with same number of representative chains were obtained. From this, 2 chains with <50 AA length were removed. Again, repeated chains were also removed in different clusters, which reduced the data set to 109 chains.The software XML2PDB was used to identify the disordered residues in PDB by searching for residues with missing coordinates. Out of the 109 PDBs, only 70 chains have shown to possess at least 1 disordered residue. As the cut off identity used is 100%, some chains in different clusters had similar disordered regions, identified using sequence similarity. Such chains were removed, to keep only 1 representative chain. Further, 4 chains with <3 disordered residues were also removed from the set. The N-terminal His tags and N-Met, treatment of which is not very consistent in protein models, were removed. Thus the primary data set contained a total of 48 chains.
Adding more chains to the primary data set
Although the chains are 100% identical sequentially, there is a chance of structural dissimilarity among them. Therefore, as a further step to increase the number of chains, structural comparison was performed. Out of 115 clusters obtained using CD-HIT, 68 clusters have more than one identical chain. Disorder was identified in each chain in these 68 clusters. Eighteen clusters that did not show any disorder were excluded from data set preparation. Hence, 50 clusters were available in total. Out of these 50 clusters, 31 were removed because each of them had chains from the same structure. So, finally 19 clusters remained for structural comparison.In order to obtain the structural identity of these sequentially identical chains, pairwise structural comparisons were performed for these chains in each cluster using the SUPERPOSE server. Out of these 19 clusters, 5 clusters were found to be structurally dissimilar with an RMSD > 3Å.Taking the 5 structurally different clusters, the mean agreement of disordered residues between the pairs of proteins in each cluster was calculated using the following equation:Od (A, B) = 0.5(Ad∩Bd/AD+Ad∩Bd/BD)Where, Od (A, B) is the overlap between disorder in chain A and chain B. AD and BD are the total number of disorder in A and B chain respectively and Ad∩Bd is the common agreement of disorder between A chain and B chain. “A” must have at least one disordered region while “B” can be fully ordered.By applying the above criteria, out of 5 clusters, 2 clusters were shown to have mean agreement of disorder <70%. Between the structures, 2HDFA:2HDIA, the mean fraction was 0.62 and for 1NQEA:2GUFA, it was 0.595. Therefore, the 2 chains 2HDIA and 2GUFA are included in the data set. Thus after doing structural comparisons, the data set increased to 50 chains. The final disordered data set of TMBETA with 1459 residues was made by filtering out the dual personality fragments and ordered fragments from these 50 chains.
Ordered data set
There were 39 fully ordered chains in TMBETA in the initial data set of 109 PDBs. These chains (after removing His tags) were clustered at 30% identity using CD-HIT to get the final fully ordered data set of 16 chains with 4768 residues.
Data set of dual personality fragments
Out of 115 clusters in our primary data set of TMBETA, 68 clusters that have two or more identical sequences were used to find DP fragments in this study. The residues that are found to be present at least in one protein but absent in other proteins of the same cluster were recognized as DP fragments. Out of 68 clusters, 24 were shown to have DP fragments. The data set totally includes 61 DP fragments with 234 residues.
Composition analysis
The number of amino acids and their frequencies for each data set were calculated using the PIR composition calculation. Composition Profiler was used to find the statistically significant patterns of amino acid enrichment or depletion between the different sets of protein sequences. In this analysis, the fractional differences between the given protein set composition to that of the reference protein set was calculated and then the amino acids were arranged in the order of their flexibility. Ten thousand bootstrap iterations were performed with a significant value of 0.05 and the color scheme used to discriminate amino acids show their disorder propensity. Disorder-promoting residues are colored red, order-promoting residues are colored blue, and disorder-order neutral residues are colored gray. The positive and negative values indicate amino acids that are enriched and depleted respectively in the given data sets compared with the reference data set. Based on the P value table produced from composition profiler, the residues more enriched and depleted in the data sets analyzed were determined. Scansite search was performed to analyze whether any DP fragment is acting as functional signaling motif. Scansite identifies short protein sequence motifs that are recognized by different signaling proteins.
Clustering of amino acids
Based on the distribution of each amino acid in structured, periplasmic and extracellular regions, their preferences were found by using Generalized Association Plot (GAP) program for clustering. This clustering was also performed using disordered, ordered and DP fragments. The input data are the single amino acid distributions on different categories of fragments. For example, AA-order, AA-disorder, AA-DP, and the pair-wised score of each amino acid were the euclidean distances of their 3-dimensional vectors defined by the single amino acid distributions. There are 3 major pieces of information contained in any multivariate data set with n subjects and p variables: (1) the linkage among n subject points in the p-dimensional space, (2) the linkage between p variable vectors in the n-dimensional space, and (3) the interaction linkage between the sets of subjects and variables.
Sequence analysis
The NCBI-microbial BLAST program was utilized to find the orthologous proteins from different species. The proteins from selected genomes were used for multiple sequence alignment using ClustalW. The Jalview sequence editor was used to identify the conserved residues from the alignment.
Structural analysis
Superpose server, was used to perform pairwise superimposing of structures. To view the superimposed structures to distinguish disordered and DP fragments, PYMOL molecular viewer was utilized.
Authors: Bin Xue; Robert W Williams; Christopher J Oldfield; Gerard Kian-Meng Goh; A K Dunker; Vladimir N Uversky Journal: Protein Pept Lett Date: 2010-08 Impact factor: 1.890
Authors: Wei-Lun Hsu; Christopher Oldfield; Jingwei Meng; Fei Huang; Bin Xue; Vladimir N Uversky; Pedro Romero; A Keith Dunker Journal: Pac Symp Biocomput Date: 2012
Authors: Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton Journal: Bioinformatics Date: 2009-01-16 Impact factor: 6.937
Authors: Vadim Cherezov; Wei Liu; Jeremy P Derrick; Binquan Luan; Aleksei Aksimentiev; Vsevolod Katritch; Martin Caffrey Journal: Proteins Date: 2008-04