Literature DB >> 30535291

Insights on protein thermal stability: a graph representation of molecular interactions.

Mattia Miotto1,2,3, Pier Paolo Olimpieri1, Lorenzo Di Rienzo1, Francesco Ambrosetti1,4, Pietro Corsi5, Rosalba Lepore6,7, Gian Gaetano Tartaglia8,9,10, Edoardo Milanetti1,2.   

Abstract

MOTIVATION: Understanding the molecular mechanisms of thermal stability is a challenge in protein biology. Indeed, knowing the temperature at which proteins are stable has important theoretical implications, which are intimately linked with properties of the native fold, and a wide range of potential applications from drug design to the optimization of enzyme activity.
RESULTS: Here, we present a novel graph-theoretical framework to assess thermal stability based on the structure without any a priori information. In this approach we describe proteins as energy-weighted graphs and compare them using ensembles of interaction networks. Investigating the position of specific interactions within the 3D native structure, we developed a parameter-free network descriptor that permits to distinguish thermostable and mesostable proteins with an accuracy of 76% and area under the receiver operating characteristic curve of 78%.
AVAILABILITY AND IMPLEMENTATION: Code is available upon request to edoardo.milanetti@uniroma1.it. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30535291      PMCID: PMC6662296          DOI: 10.1093/bioinformatics/bty1011

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Temperature is one of most crucial factors organisms have to deal with in adapting to extreme environments (Rothschild and Mancinelli, 2001) and plays a key role in many complex physiological mechanisms (Chen and Shakhnovich, 2010). Indeed a fundamental requirement to ensure life at high temperatures is that the organisms maintain functional and correctly folded proteins (Chen and Shakhnovich, 2010; Mozhaev ; Talley and Alexov, 2010). Accordingly, evolution shapes energetic and structural placement of each residue–residue interaction for the whole protein to withstand thermal stress. Studying thermostability is fundamental for several reasons ranging from theoretical to applicative aspects (Huang ), such as gaining insight on the physical and chemical principles governing protein folding (Amadei ; Brinda and Vishveshwara, 2005; Robinson-Rechavi and Godzik, 2005), and improving the thermal stability of enzymes to speed up chemical reactions in biopharmaceutical and biotechnological processes (Chen ; Daniel, 1996). Despite the strong interest in thermostability (Argos ; Bischof and He, 2005; Razvi and Scholtz, 2006), its prediction remains an open problem. As pointed out by Pucci and Alfano , a complete characterization of the thermal properties of a protein is given by the knowledge of two contributions: (i) the thermodynamic stability, defined as the difference in free energy between the folded and unfolded states () and (ii) thermal resistance, described by the melting temperature (Tm). Here, we focus on the thermal resistance, distinguishing high and low thermal stable proteins on the basis of their Tm, experimentally defined as the temperature at which the concentration of the protein in its folded state equals the concentration in the unfolded state. To date, computational approaches, both sequence- and structure-based, have exploited statistical analysis (Amadei ; Pucci , 2017), molecular dynamics (Manjunath and Sekar, 2013; Tavernelli ) and machine learning (Ku ; Wu ) to predict the melting temperature. Most of the studies are based on comparative analyses between pairs of homologs belonging to organisms of different thermophilicity (Mozo-Villarías ; Vogt ). Predicting the stability of a protein ab initio using a structure-based approach has never been achieved so far. Lack of success in this area is mostly due to limitations in our knowledge about the relationship between thermal resistance and role of the interactions that stabilize a protein structure (Folch ). Some differences in terms of amino acid composition or spatial arrangement of residues have been reported (Amadei ; Vijayabaskar and Vishveshwara, 2010; Vishveshwara ). One of most notable differences involves the salt bridges: hyperthermostable proteins have stronger electrostatic interactions than their mesostable counterparts (Lee ). Recently Folch , 2008) reported that distinct salt bridges may be differently affected by the temperature and this might influence the geometry of these interactions as well as the compactness of the protein. Core packing seems related to thermal resistance at least to some extent (Vogt and Argos, 1997). Yet, a lower number of cavities and a higher average relative contact order (i.e. a measure of non-adjacent amino acid proximity within a folded protein) have been also observed while comparing thermostable proteins with their mesostable paralogs and orthologs (Robinson-Rechavi and Godzik, 2005). Noteworthy, the hydrophobic effect and residue hydrophobicity seem to play a rather marginal role on protein stabilization (Priyakumar, 2012; Van den Burg ), while they are considered the main forces driving protein folding. Here, we present a new analysis based on the graph theory that allows us to reveal important characteristics of the energetic reorganization of intramolecular contacts between mesostable and thermostable proteins. In light of our results and to promote their application, we have designed a new computational method able to classify each protein as thermostable or as mesostable without using other information except for the 3D structure.

2 Materials and methods

2.1 Datasets

The Tm dataset, composed of proteins with known melting temperature (Tm), was obtained from the ProTherm database (Kumar ). Each protein of the dataset was accurately manually checked, in order to guarantee both the completeness of the structure and the reliability of the associated experimental melting temperature. The second dataset, consisting of proteins from hyperthermophilic organisms manually collected, is referred to as the Thyper dataset. The union of the two dataset, referred as the Twhole dataset, accounts of 84 proteins (see Supplementary Material for details) and constitutes the largest structural dataset, to the best of our knowledge, of protein with well-defined thermal resistance, experimentally measured at physiological conditions.

2.2 Structural analysis

Proteins from both the Tm and Thyper datasets were analyzed for their secondary structure content and architecture according to the CATH Protein Structure Classification database (Sillitoe ). Per residue secondary structure assignment was done using the DSSP software (Kabsch and Sander, 1983). See section in Supplementary Material for details.

2.3 Network representation and analysis

In this work, protein structures are represented as Residue Interaction Networks (RINs), where each node represents a single amino acid aa. The nearest atomic distance between a given pair of residues aai and aaj is defined as dij. Two RIN nodes are linked together if  Å (Phillips ; Vanommeslaeghe and MacKerell, 2012). Furthermore links are weighted by the sum of two energetic terms: Coulomb (C) and Lennard-Jones (LJ) potentials (see Supplementary Material for more details). Network analysis has been performed using the i-graph package (Csardi and Nepusz, 2006) implemented in R (Ihaka and Gentleman, 1996). For each RIN, the Strength local parameter (Barrat ) is defined as: where the Strength s of the i-esime residue is calculated as the sum of the energetic interactions (E) between the residue i and all the other j residues contacting it ().

2.4 Network randomization

In order to distinguish mesostable from thermostable proteins, we compare the Strength calculated in the real RIN against the same parameter obtained from a random RINs. We defined a T score as: to estimate how much the original RIN mean Strength value deviates from the expected mean value of rRIN distribution. is the average of the Strength parameter for the RIN; and σ are the mean and standard deviation of the average values of the rRIN distribution. See Supplementary Material for details.

2.5 Performance evaluation

We evaluated the performance of the Ts score in discriminating between thermostable and mesostable proteins by a seven cross-validation. The mesostable proteins of the Tm dataset were divided in seven groups, guaranteeing that number of residues and Tm values were as broad distributed as possible. Details are reported in Supplementary Material.

2.6 Clustering and principal component analysis

We clustered the Ts descriptors using the Euclidean distance and the Ward method as linkage function (Ward, 1963) via the ‘hclust’ function of the ‘Stats’ package of R (Ihaka and Gentleman, 1996). Principal component analysis (PCA) was performed over eight graph-based descriptors using ‘princomp’ function of R software and the correlation matrix was used for the analysis (Venables and Ripley, 1997). Each descriptor has been computed using a specific function available in the R i-graph package. We refer to Supplementary Material for details.

3 Results

3.1 Uncovering the differences in energetic organization

Aiming at the comprehension of the basic mechanisms that allow proteins to remain functional at high temperature, we focused on the non-bonded interactions that play a stabilizing role in structural organization (Chakrabarty and Parekh, 2016). In particular, we considered only residue–residue interactions neglecting protein–solvent ones since a quantitative appraisal of their role would require a dynamical approach (Chong ). To investigate how different thermal properties are influenced by the energy distribution at different layers of structural organization, we analyzed the interactions occurring in proteins of the Twhole dataset (see Section 2). To describe the role of single residues in the complex connectivity of whole protein, we adopted a graph-theory approach describing each protein by the RIN: each residue is represented as a node and links between residues are weighed with non-bonded energies (as described in Section 2). At first, we investigated the relationship between thermostability and energy distribution of intramolecular interactions. To this end, the Tm dataset was divided into eight groups according to protein Tm and for each group the energy distribution was evaluated, as shown in Figure 1a. The general shape of the density functions is almost identical between the eight cases, independently from the thermal properties of the macromolecules, and this is clearly due to the general folding energetic requirements.
Fig. 1.

(a) Probability density distributions of total interaction energies for the eight subsets defined in the Tm dataset from lower (dark blue) to higher (dark red) Tm. Each distribution is built using a group of proteins whose melting temperatures lie in the same range. The density functions exhibit a dependence with the melting temperatures ranges and peak heights increase with the temperatures. (b) Correlation between the area of each density peak and the average Tm for the eight groups. (c) Probability density distributions in log-scale of total interaction energies for mesostable (blue) and thermostable (red) proteins belonging to the Twhole dataset. (d) Probability density distributions in log-scale of Strength network parameter for mesostable (blue) and thermostable (red) proteins belonging to the Twhole dataset. Insets show the distributions in log-scale obtained using all proteins. (e) Schematic representation of the strong favorable and unfavorable interactions both for a mesostable (left) and a thermostable network (right) (Color version of this figure is available at Bioinformatics online.)

(a) Probability density distributions of total interaction energies for the eight subsets defined in the Tm dataset from lower (dark blue) to higher (dark red) Tm. Each distribution is built using a group of proteins whose melting temperatures lie in the same range. The density functions exhibit a dependence with the melting temperatures ranges and peak heights increase with the temperatures. (b) Correlation between the area of each density peak and the average Tm for the eight groups. (c) Probability density distributions in log-scale of total interaction energies for mesostable (blue) and thermostable (red) proteins belonging to the Twhole dataset. (d) Probability density distributions in log-scale of Strength network parameter for mesostable (blue) and thermostable (red) proteins belonging to the Twhole dataset. Insets show the distributions in log-scale obtained using all proteins. (e) Schematic representation of the strong favorable and unfavorable interactions both for a mesostable (left) and a thermostable network (right) (Color version of this figure is available at Bioinformatics online.) A strong dependence between thermal stability and the percentage of strong interactions is evident looking at the disposition of the density curves (Fig. 1a): the higher the thermal stability the higher the probability of finding strong interactions. Yet, less thermostable proteins possess a larger number of weak interactions. In particular, as shown in Figure 1a–c, it is possible to identify three ranges of energies that correspond to three peaks of probability density, i.e. a very strong favorable energy region ( kcal/mol), a strong favorable energy region between −70 and −13 kcal/mol, and a strong unfavorable interaction region (E > 11 kcal/mol). More formally, for a protein the probability of having an interaction with energy E, P(E), in the three ranges linearly depends on the protein melting temperature with correlation coefficients of 0.90, 0.85, 0.87, respectively (Fig. 1b). In order to have strong-signal sets, we reduced the division in just two groups, classifying proteins as mesostable or thermostable if their melting temperatures are, respectively, lower or higher than , which is the optimal reaction temperature of thermophilic enzymes (Brock, 1985; Serre and Duguet, 2003). In this way the energy distributions in Figure 1a are calculated only for the mesostable and thermostable distributions in Figure 1c (Twhole dataset). The two-group division allows us to include the hyperthermophilic proteins in our analysis, since their Tm is higher than the threshold. The two resulting distributions, found to be significantly different with a P-value of [non-parametric test of Kolmogorov–Smirnov (Marsaglia )], have an expected value at −0.5 kcal/mol and negative interactions have a probability of more than 60% to be found. Regions below −13 kcal/mol and above 11 kcal/mol represent the 6.6 and 5.2% of the total energy for thermostable and mesostable proteins, respectively. Typically, such energies require the presence of at least one polar or charged amino acid and in particular Arg, Asp, Glu and Lys are involved in more than 90% of the interactions. Noteworthy, the small fraction of energies centered near −120 kcal/mol (see Fig. 1a) is due to polar or charged amino acid interactions taking place at short distance. Next, we investigated the residue Strength, defined for each node as the sum of the weights of its links (see Section 2). The two Strength distributions for mesostable and thermostable proteins are shown in Figure 1d. Even in this case, they are different according to Kolmogorov–Smirnov test with a P-value of . For the first time, our analysis provides both a general intuition on the protein folding and a specific insight on thermal stability. Even if strong positive and strong negative peaks have a comparable height (Fig. 1c), the rearrangement of protein side chains masks the positive interactions, substantially preventing the condensation of unfavorable interactions in a single residue, as testified by the small probability of finding a residue with a positive Strength. Indeed, for the whole dataset there is more than 97% of probability of finding a residue with negative Strength. The most frequent value is found at −27 kcal/mol, with a change in the slope of the density functions around −70 and 5 kcal/mol, corresponding to the regions with negative and positive Strengths. At the Strength level of organization, a difference between thermostable and mesostable proteins is found. Indeed, residues belonging to the group of thermostable proteins show a higher probability of having high negative Strength values with respect to the mesostable ones, testifying an overall higher compactness of thermostable protein fold. Figure 1e shows a schematic representation of the organization of strong energies both for mesostable proteins and thermostable proteins. In fact, the most important finding is that thermostable proteins have more favorable energies concentrated in a few specific residues. In contrast, mesostable proteins tend to have a less organized negative residue–residue interactions network. Given this different way to rearrange amino acidic side chains between proteins with different thermal properties, we mapped the energetic interactions between the protein secondary structures (helix–helix, helix–strand, helix–loop, strand–strand, strand–loop and loop–loop) in order to study how energetic allocation is reflected on a higher level of organization. Looking at the difference in energy of a specific class of interaction with respect to the average, we found that thermostable proteins preferentially gather their energy through helix–loop interactions. These results suggest a stabilizing role for this class (see section in Supplementary Material for details).

3.2 Assessing protein thermal stability

In the light of our findings on the energetic difference between mesostable and thermostable proteins, we looked for a way to assess the thermal resistance of a protein given its structure. The simplest way to quantify the impact of energy distribution on the thermal resistance is the comparison with a protein of same structure but different energy organization, i.e. a homolog (Yang ). Ideally, differences between two homologous proteins with different thermal stability are attributable only to their different thermal resistance. The pronounced reorganization of the interactions in thermostable proteins confirms that they undergo an evolutionary optimization process which introduces fold-independent correlations in the spatial distribution of the interactions. By contrast, mesostable proteins do not have these correlations, thus with respect to thermal stability, their energy organization can be considered more random. We designed a procedure that compares a given protein with modified versions of itself where protein structure is preserved, while chemical interactions have energies typical of mesostable proteins and randomly assigned in a physical way, i.e. maintaining residue–residue distance information (see Section 2). This randomization strategy provides a way to compare each real protein network with an ensemble of re-weighted cases, having the same number of nodes and links but with new weights (i.e. energies). These energies are extracted from the mesostable energy distribution using the interaction distance as constraint for the sampling. This procedure has the purpose of disrupting the effects of evolutionary optimization and is expected to have a larger effect on the highly organized network of thermostable proteins. By virtue of the different energy distribution between mesostable and thermostable proteins, sampling mesostable energies allows to properly assess the difference between the real thermostable protein network and its randomized counterpart. All steps of our method are schematically illustrated in Figure 2. In particular, given a link characterized by an energy weight E and by a distance of interaction d, we replaced the energy with a new one () extracted from an energy distribution defined for the specific distance interval d belongs to. For each distance interval k, we generated a probability density function , using only the energies values observed in such interval in the mesostable proteins. At the end of the process, for each real RIN, we generated an ensemble of random networks (rRINs). The randomization allows us to develop a classifier based on the distance between the real network Strength and the random Strength distribution. The Ts score, defined in Equation (2) (see Section 2), is a measure of how much the original RIN average Strength value deviates from the expected average value of the rRIN distribution. Note that our descriptor is general and parameter-free and can be computed for every kind of weighted graph. The Ts score can be used as a thermal stability classifier setting the threshold value at 0; substantially considering true all predictions for which the Ts score is higher (resp. lower) than 0 and the protein Tm is higher (lower) than . A so defined method is completely parameter-free. It only requires a probability density of mesostable protein interactions. In order to evaluate a possible dependence of the method from the chosen dataset, we performed a cross-validation (7-folds see Section 2) using the Ts score computed with total energy Strength. The method achieves an average accuracy of 72 ±3% with a mean receiver operating characteristic (ROC) curve characterized by an area under the curve (AUC) value of 80 ±2%. The small error on both the performances (due to the dimensions of the dataset) indicates the independence of the method from the input information.
Fig. 2.

Given a protein structure, our method represents it as a RIN (a). (b) The minimal atom–atom distance (8.4 Å in the example), for each residue pair, is calculated. The energy value (green line on the sketch) related to each contact is replaced with another one (yellow), randomly extracted from the energy distribution of mesostable protein contacts lying in the same distance interval (8–8.5 Å in the example). Performing this procedure for each pair, a new network of intramolecular interactions is established characterized by a new energy organization. Reiterating the process, we obtain an ensemble of random networks (c). (d) Finally, for each random network the average Strength parameter is calculated, obtaining a Strength distribution. Green line represents the mean Strength value of the real network, while red and blue region in the random Strength distribution show the classification criterion: if real Strength lies in red (resp. blue) region the protein is classified as thermostable (resp. mesostable) (Color version of this figure is available at Bioinformatics online.)

Given a protein structure, our method represents it as a RIN (a). (b) The minimal atom–atom distance (8.4 Å in the example), for each residue pair, is calculated. The energy value (green line on the sketch) related to each contact is replaced with another one (yellow), randomly extracted from the energy distribution of mesostable protein contacts lying in the same distance interval (8–8.5 Å in the example). Performing this procedure for each pair, a new network of intramolecular interactions is established characterized by a new energy organization. Reiterating the process, we obtain an ensemble of random networks (c). (d) Finally, for each random network the average Strength parameter is calculated, obtaining a Strength distribution. Green line represents the mean Strength value of the real network, while red and blue region in the random Strength distribution show the classification criterion: if real Strength lies in red (resp. blue) region the protein is classified as thermostable (resp. mesostable) (Color version of this figure is available at Bioinformatics online.) Classifying on the threshold of the Ts score, i.e. considering the Ts as a binary variable, does not satisfactorily match with the information contained in the descriptor. In order to have a more sensible classification, we evaluated three different scores, using the total energy and specific interaction terms, i.e. the C and LJ interactions (see Supplementary Material), and performed a clustering analysis. Figure 3a shows the hierarchical clustering obtained clustering all the proteins of our Twhole dataset using the Ward method as linkage function while the Manhattan distance among the three descriptors was used as distance metric. We also tested different metrics and clustering methods obtaining very similar results (data not shown). The optimal clustering cut was estimated using both the Connectivity, Dunn and Silhouette parameters, which indicates the two group division as the optimal one. We called these groups ‘Mesostable’ (right group in Fig. 3a) and ‘Thermostable’ (left group). Indeed, the right cluster, containing 47 proteins, includes almost exclusively mesostable proteins (38), while the left cluster contains 26 thermostable proteins over the total 37 proteins. The overall accuracy of the method is 76%. We correctly assign the right thermal stability to 64 out of 84 proteins. The AUC of the ROC curve for the three Ts descriptors are 78, 79 and 68% (see Fig. 3b).
Fig. 3.

(a) Cluster of the Twhole dataset proteins with three Strength based descriptors, i.e. C, LJ and total energy. Stars indicate proteins on the Thyper dataset. The two groups are discriminated with a P-value of (Fisher’s exact test). (b) ROC curves of the three descriptors with the whole network Ts scores

(a) Cluster of the Twhole dataset proteins with three Strength based descriptors, i.e. C, LJ and total energy. Stars indicate proteins on the Thyper dataset. The two groups are discriminated with a P-value of (Fisher’s exact test). (b) ROC curves of the three descriptors with the whole network Ts scores

3.3 Key residues identification

Here, we investigated the thermal resistance properties of proteins at the residue level. As protein stability is the result of the cooperative effects and the synergic actions of several residues, assessing the specific contribution of each amino acid is difficult (Sadeghi ). We define the score (see Supplementary Material), creating two groups of residues for each protein: with lower or higher than zero. We consider residues belonging to the first group to have a more stabilizing role than the ones in the second group. Consequently, along the lines of the global-protein classification procedure, we defined ‘thermostable’ (respectively ‘mesostable’) residues belonging to the first (second) group. Using a total energy-based score, thermostable residues are the of total residues. In the C network (see Fig. 4a), the most frequent thermostable amino acids are the four charged amino acids: Arg, Asp, Glu and Lys, which cover the 96.6 and 96.1% of thermostable residues in thermostable and mesostable proteins, respectively. Apolar and aromatic residues (Leu, Met, Phe and Tyr) are typically thermostable residues of the van der Waals (vdW) network, including 53 and 54% of the total residues in mesostable and thermostable proteins, respectively (see Fig. 4b).
Fig. 4.

(a, b) Frequencies of thermostable amino acids for the thermostable (red) and mesostable (blue). Frequencies of all the amino acids are shown in gray. (c, d) Projection along the first two principal components of all residues. Thermostable residues for mesostable (resp. thermostable) proteins are indicated in green (orange) dots. All residues are mapped in LC space. In red Arg, Asp, Glu and Lys amino acids are shown as the most frequent thermostable residues of the C network. In yellow dots, Tyr, The, Leu and Met are shown as the most frequent thermostable amino acids of vdW one. In the middle, cartoon representation of Yfh1 and multiple alignment with thermostable and mesostable residues colored in shades of red and blue (Color version of this figure is available at Bioinformatics online.)

(a, b) Frequencies of thermostable amino acids for the thermostable (red) and mesostable (blue). Frequencies of all the amino acids are shown in gray. (c, d) Projection along the first two principal components of all residues. Thermostable residues for mesostable (resp. thermostable) proteins are indicated in green (orange) dots. All residues are mapped in LC space. In red Arg, Asp, Glu and Lys amino acids are shown as the most frequent thermostable residues of the C network. In yellow dots, Tyr, The, Leu and Met are shown as the most frequent thermostable amino acids of vdW one. In the middle, cartoon representation of Yfh1 and multiple alignment with thermostable and mesostable residues colored in shades of red and blue (Color version of this figure is available at Bioinformatics online.) In order to investigate the role of each residue in the complexity of the whole system, we analyzed the properties of all residues using a graph-theory approach, calculating eight network parameters (see Section 2). A PCA was performed in both kinds of network. In Figure 4c and d, all residues were projected along the first two principal components. Thermostable residues are neatly separated from others if we consider the largest eigenvalue of the PCA in the C network and more weakly if we take into account the second and third ones (see Supplementary Material for details). Generally, charged residues form highly energetic electrostatic cages which prevent water inclusion (Levy and Onuchic, 2004; Sabarinathan ) while apolar and aromatic amino acids form short-ranged vdW interactions that confer stability to the overall structure (Lanzarotti ; Paiardini ). Here we identify key residues whose peculiar spatial disposition confers them a particular role in the stabilization of the protein. Notably, our approach, based on a heterogeneous dataset, permits us to confirm and generalize the stabilizing role of both the charged and apolar/aromatic residues formerly suggested by homologous-based studies. The mean shortest path (L) and the clustering coefficient (C) are able to catch the effect of the thermostable residues on maintaining these important structural motifs. The former provides information about the position of the residue in the network with the most central residues, having higher shortest path values. The latter quantifies the residue surrounding packing, being a ratio between the actual links and maximal number of possible links (Atilgan ; Vendruscolo ). In Figure 4c (left panel), we projected all residues in the LC plane coloring in dark red the charged thermostable residues and in cyan the charged non-thermostable residues. Charged residues are concentrated in the region characterized by both small L and C values, with their thermostable subset tending to possess the smaller possible value of C. This means that thermostable residues have both to be exposed and surrounded by residue that makes low energetic interaction between each other’s. In analogy with Coulombian networks, we projected in the LC plane the four kinds of key residues identified in the vdW networks. Even if the signal is weaker, key residues in the thermostable vdW network (Leu, Met, Phe, Tyr) tend to possess a higher clustering coefficient, testifying the packing stabilizing effect of vdW interactions. Densities of C parameter are found to be different with a P-value (non-parametric test of Kolmogorov–Smirnov). These findings allow us to divide residues in eight groups: four groups are identified by the C interaction, i.e. thermostable charged/uncharged residues and non-thermostable charged/uncharged residues; while vdW interaction networks divide residue according to thermostable/non-thermostable being or not being in the Leu-Met-Phe-Tyr group. For each protein of the Twhole dataset it is possible to compute the sum of the scores in each of the eight possible groups, obtaining a vector of eight descriptors for each protein. Performing a linear regression with the four C-based vector component, the four vdW-based ones and with the whole eight-component vector we end up with a preliminary AUC of the ROC curves of 81, 77 and 83%, respectively (see Supplementary Material), and we are currently developing a residue-specific approach for Tm prediction.

3.4 Frataxin: a particular case of study

As a further application of our method, we investigated the stability of Yfh1, the yeast ortholog of frataxin. This highly conserved family of proteins is being deeply studied since in human it is responsible for the Friedreich’s ataxia neurodegenerative disease. Furthermore, Yfh1 displays a very peculiar behavior in its thermal stability properties (Adrover , 2012; Pastore ). In fact, several experimental studies show that both Yfh1’ cold and heat denaturation occurs at experimentally accessible temperature in physiological conditions, at 5 and 35 °C, respectively. This is very rare since usually cold denaturation occurs at very low temperatures, below freezing water one, making this phenomenon very rarely observed in wild-type proteins. To investigate the origins of the marginal stability exhibited by Yfh1, we compared the global and local thermal resistance analysis of Yfh1 with its bacterial (CyaY) and human (hFrata) orthologs, which are thermally stable until 54 and 58 °C under physiological conditions. Our global descriptor correctly classifies all three proteins as mesostable with a positive global Ts score. We then proceeded to assess the local stability by computing the for each residue (see Supplementary Table S5). In particular, we focused on one cluster of charged amino acids experimentally identified by Sanfelice . This cluster is composed by residues D101, E103 and E112, that interact between two different strands of the beta sheet (β1 and β2) and therefore are regarded as responsible for the structural stability. Interestingly, according to our scores (see Fig. 4), all these residues are less stabilizing in Yfh1 with respect to CyaY and hFrata, despite their evolutionarily conservation. Another feature, the local analysis unveiled, is the presence of near-neighbor highly ‘mesostable’ and ‘thermostable’ residues, such as the ones at the beginning of the destabilizing flexible region of Yfh1 N-terminal loop (Adrover ) which are absent in CyaY and hFrata (see 3D structure in Fig. 4). The good agreement between our predictions and experiments shows the capability of the method to determine thermal stability properties both from a global and local point of view.

4 Discussion

Our work aims to represent a step toward the understanding of the thermal properties of a protein given its 3D structure. While the axiom thermophilic organisms have thermostable proteins is certainly correct, some mesophilic proteins may as well be thermostable (Pucci and Rooman, 2017). Knowledge on the organism optimal growth temperature, Tenv, used to classify mesophiles and thermophiles, may be misleading with high value of correlation due to the fact that Tenv is always a lower-bound for Tm. The basic idea behind our method relies on the assumption that thermostable proteins undergo an optimization process during evolution that leads to specific structural arrangement of their energy interactions. Our analysis is based on a RIN in which the 3D structure of a protein is schematized as a graph with the residues acting as nodes and the molecular interactions as links. In our definition of network, links are weighted according to the sum of two non-bonded energetics terms: electrostatic and LJ potential. The analysis of the distribution of energies (links) highlighted the correlation between the thermal stability of protein sets (grouped according to their Tm) and the probability of finding high intramolecular interactions, with a highest correlation of 0.90 considering eight groups of proteins (Fig. 1). Unfortunately, neither it is possible to further divide the dataset in more groups due to the dataset dimension, nor we could not consider the energy distribution for the single protein because the small number of links makes the statistics noisy, especially in strong energy regions. Moreover, moving to higher orders of organization, e.g. considering the individual residual energies (Strength parameter), further reduces the data. For this reason, the next-up analysis was performed with a two-group division of the dataset. Interestingly, we found that not only strong negative energies determine the thermal stability of a protein, but also strong positive interactions play a role. Such finding confirms the complex nature of the protein interaction network and in fact the stabilizing role of repulsive energies can be explained in cases where repulsion between a couple of residues results in a better spatial rearrangement of protein regions. To better grasp the role of favorable and unfavorable energies disposition, we determined the stabilizing contribution of each amino acid, defining the residue Strength [(Equation (1)]. Indeed, this parameter gives an estimate of the residue significance in the overall protein architecture and can be used both as a local property of each individual amino acid and as a global average network feature of the entire protein. Moving to the higher level of organization we investigated the biological role of the secondary structure interactions in thermal stability. The interactions between residues belonging to alpha helixes and loops concentrate more energy in thermostable proteins than mesostable ones. Those results suggest that the thermal stability of a given protein is deeply linked both to the intensity of interactions and to their spatial disposition, and that both are fine-tuned during the evolutionary process. In order to assess the thermal stability, we investigated the network energy organization and compared it against an ensemble of randomized networks. The ensemble comparison has two main purposes: The first consists in overcoming the limitation of the need of pairs of homologous proteins for direct comparison. The second purpose, raised from the observation that thermostable proteins are enriched of high connected nodes (hubs) and have more organized networks of interactions respect mesostable proteins (Jonsdottir ; Kumar ; Pucci and Rooman, 2016), relies in the need introducing a quantitative measure of the evolutionary optimization process thermostable proteins underwent, i.e. the distance between real protein interaction network and a randomized one, in which we disrupt the optimization of energy achieved by thermostable proteins during evolution. As described in the method section, the energies of a network are always obtained from a distribution of mesostable protein interactions. In this way, the more the original network diverts from the ensemble, the higher the probability that the protein belongs to the thermostable class. Moreover, the comparison allows us to assess in a quantitative way the effect of the energetic topology of the protein. Using this protocol to build up the Ts parameter-free descriptor and performing a cluster analysis, we are able to discriminate between mesostable and thermostable proteins, with a maximum accuracy of 76% and a maximum AUC of 78%. At last, we investigated whether evolution acts on particular residues to optimize protein thermal stability or if stability is given by a cooperative effect with evolution acting on the whole protein. Our analysis identifies two sets of key (thermostable) residues according to the kind of energetic interactions the network is built with (C or vdW). Surprisingly, thermostable residue frequency in thermostable and mesostable proteins is comparable and they represent only a small subset of all residues. This single residue approach allows us explore the local contributions to global stability and sheds light on peculiar cases of marginal thermal stability. In particular, we investigate the case of Yfh1 protein, the yeast ortholog of frataxin. Our global descriptor correctly classifies the protein as mesostable while our residue-based descriptors allow us to identify stabilizing/destabilizing regions in agreement with previous works (Sanfelice ). In general, a complete description of the cold denaturation processes needs to explicitly include the water–residue interactions since it has been postulated (Privalov, 1990) and partially confirmed through molecular dynamics simulations at the specific unfolding temperature (Adrover ) such interactions play a paramount role in driving denaturation. In order to better understand the theoretical aspects of thermostability and improve the classification to be used in more applicative fields, we created a new parameter dependent Ts score given by a linear combination of the Ts score of the eighth possible set of residues (see Section 3). The improved performance of 83% of ROC’s AUC highlighted the promising features of the single residue approach. Click here for additional data file.
  53 in total

1.  Factors enhancing protein thermostability.

Authors:  S Kumar; C J Tsai; R Nussinov
Journal:  Protein Eng       Date:  2000-03

Review 2.  Life in extreme environments.

Authors:  L J Rothschild; R L Mancinelli
Journal:  Nature       Date:  2001-02-22       Impact factor: 49.962

3.  Three key residues form a critical contact network in a protein folding transition state.

Authors:  M Vendruscolo; E Paci; C M Dobson; M Karplus
Journal:  Nature       Date:  2001-02-01       Impact factor: 49.962

4.  Small-world communication of residues and significance for protein dynamics.

Authors:  Ali Rana Atilgan; Pelin Akan; Canan Baysal
Journal:  Biophys J       Date:  2004-01       Impact factor: 4.033

Review 5.  Enzymes that cleave and religate DNA at high temperature: the same story with different actors.

Authors:  Marie-Claude Serre; Michel Duguet
Journal:  Prog Nucleic Acid Res Mol Biol       Date:  2003

6.  A simple electrostatic criterion for predicting the thermal stability of proteins.

Authors:  Angel Mozo-Villarías; Juan Cedano; Enrique Querol
Journal:  Protein Eng       Date:  2003-04

7.  Protein dynamics, thermal stability, and free-energy landscapes: a molecular dynamics investigation.

Authors:  Ivano Tavernelli; Simona Cotesta; Ernesto E Di Iorio
Journal:  Biophys J       Date:  2003-10       Impact factor: 4.033

Review 8.  The architecture of complex weighted networks.

Authors:  A Barrat; M Barthélemy; R Pastor-Satorras; A Vespignani
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-08       Impact factor: 11.205

9.  Water and proteins: a love-hate relationship.

Authors:  Yaakov Levy; José N Onuchic
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-01       Impact factor: 11.205

10.  Structural genomics of thermotoga maritima proteins shows that contact order is a major determinant of protein thermostability.

Authors:  Marc Robinson-Rechavi; Adam Godzik
Journal:  Structure       Date:  2005-06       Impact factor: 5.006

View more
  9 in total

1.  Spatial organization of hydrophobic and charged residues affects protein thermal stability and binding affinity.

Authors:  Fausta Desantis; Mattia Miotto; Lorenzo Di Rienzo; Edoardo Milanetti; Giancarlo Ruocco
Journal:  Sci Rep       Date:  2022-07-15       Impact factor: 4.996

2.  Investigating binding dynamics of trans resveratrol to HSA for an efficient displacement of aflatoxin B1 using spectroscopy and molecular simulation.

Authors:  Mohd Aamir Qureshi; Saleem Javed
Journal:  Sci Rep       Date:  2022-02-14       Impact factor: 4.996

3.  Protein stability governed by its structural plasticity is inferred by physicochemical factors and salt bridges.

Authors:  Anindya S Panja; Smarajit Maiti; Bidyut Bandyopadhyay
Journal:  Sci Rep       Date:  2020-02-04       Impact factor: 4.379

4.  Crystal structure of Thermus thermophilus methylenetetrahydrofolate dehydrogenase and determinants of thermostability.

Authors:  Fernando Maiello; Gloria Gallo; Camila Coelho; Fernanda Sucharski; Leon Hardy; Martin Würtele
Journal:  PLoS One       Date:  2020-05-13       Impact factor: 3.240

5.  Characterizing Hydropathy of Amino Acid Side Chain in a Protein Environment by Investigating the Structural Changes of Water Molecules Network.

Authors:  Lorenzo Di Rienzo; Mattia Miotto; Leonardo Bò; Giancarlo Ruocco; Domenico Raimondo; Edoardo Milanetti
Journal:  Front Mol Biosci       Date:  2021-02-26

Review 6.  Methods for the Modification and Evaluation of Cereal Proteins for the Substitution of Wheat Gluten in Dough Systems.

Authors:  Javier Espinoza-Herrera; Luz María Martínez; Sergio O Serna-Saldívar; Cristina Chuck-Hernández
Journal:  Foods       Date:  2021-01-08

7.  Thermometer: a webserver to predict protein thermal stability.

Authors:  Mattia Miotto; Alexandros Armaos; Lorenzo Di Rienzo; Giancarlo Ruocco; Edoardo Milanetti; Gian Gaetano Tartaglia
Journal:  Bioinformatics       Date:  2022-01-10       Impact factor: 6.937

Review 8.  Protein Mutations and Stability, a Link with Disease: The Case Study of Frataxin.

Authors:  Rita Puglisi
Journal:  Biomedicines       Date:  2022-02-11

9.  Structure and Thermal Stability of wtRop and RM6 Proteins through All-Atom Molecular Dynamics Simulations and Experiments.

Authors:  Maria Arnittali; Anastassia N Rissanou; Maria Amprazi; Michael Kokkinidis; Vagelis Harmandaris
Journal:  Int J Mol Sci       Date:  2021-05-31       Impact factor: 5.923

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.