Literature DB >> 18312638

"Hot cores" in proteins: comparative analysis of the apolar contact area in structures from hyper/thermophilic and mesophilic organisms.

Alessandro Paiardini1, Riccardo Sali, Francesco Bossa, Stefano Pascarella.   

Abstract

BACKGROUND: A wide variety of stabilizing factors have been invoked so far to elucidate the structural basis of protein thermostability. These include, amongst the others, a higher number of ion-pairs interactions and hydrogen bonds, together with a better packing of hydrophobic residues. It has been frequently observed that packing of hydrophobic side chains is improved in hyperthermophilic proteins, when compared to their mesophilic counterparts. In this work, protein crystal structures from hyper/thermophilic organisms and their mesophilic homologs have been compared, in order to quantify the difference of apolar contact area and to assess the role played by the hydrophobic contacts in the stabilization of the protein core, at high temperatures.
RESULTS: The construction of two datasets was carried out so as to satisfy several restrictive criteria, such as minimum redundancy, resolution and R-value thresholds and lack of any structural defect in the collected structures. This approach allowed to quantify with relatively high precision the apolar contact area between interacting residues, reducing the uncertainty due to the position of atoms in the crystal structures, the redundancy of data and the size of the dataset. To identify the common core regions of these proteins, the study was focused on segments that conserve a similar main chain conformation in the structures analyzed, excluding the intervening regions whose structure differs markedly. The results indicated that hyperthermophilic proteins underwent a significant increase of the hydrophobic contact area contributed by those residues composing the alpha-helices of the structurally conserved regions.
CONCLUSION: This study indicates the decreased flexibility of alpha-helices in proteins core as a major factor contributing to the enhanced termostability of a number of hyperthermophilic proteins. This effect, in turn, may be due to an increased number of buried methyl groups in the protein core and/or a better packing of alpha-helices with the rest of the structure, caused by the presence of hydrophobic beta-branched side chains.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18312638      PMCID: PMC2294123          DOI: 10.1186/1472-6807-8-14

Source DB:  PubMed          Journal:  BMC Struct Biol        ISSN: 1472-6807


Background

Earth's environments exhibit the most diverse physico-chemical conditions, including extremes of temperature, pressure, salinity and pH. Among these factors, temperature certainly exerts a deep selective pressure on cell biochemistry and physiology [1]. Indeed, temperatures approaching 100°C usually denature proteins and nucleic acids, and increase the fluidity of membranes to lethal levels [2]. It is therefore of great interest to study how organisms coped with the molecular adaptations required to thrive in extreme environments, particularly at high temperatures. Such organisms, which are distributed among the three domains of life, are called "thermophiles" or "hyperthermophiles", if they exhibit an optimal growth in either a 45°C – 80°C or a 80°C – 110°C temperature range, respectively [3]. To date, a number of studies has been carried out to understand how proteins found in hyper/thermophilic organisms are stabilized [1-6]. Thanks to the wealth of sequence and structural information available today on hyper/thermophilic proteins, it is becoming clear that there is not a general rule for the stabilization of proteins at high temperatures. Rather, an increased thermal stability seems to be achieved through a combination of different small structural modifications involving, amongst the others, ion-pairs interactions, hydrogen bonds and packing of hydrophobic residues [6]. Regarding the latter, one frequently invoked theory is that the packing of hydrophobic side chains is improved in thermophilic and hyperthermophilic proteins, when compared to their mesophilic counterparts [7]. Many studies on proteins adaptation to high temperatures focused on the differences in compactness between hyper/thermophilic and mesophilic proteins using accessible surface area [6] or cavity size [8] as judgment criteria. However, as discussed by Robinson-Rechavi and Godzik [9], and by Gromiha [10], these approaches present several drawbacks, e.g., the individual contribution to the enhanced thermostability of different structural environments and inter-residue contacts cannot be assessed. Hence, alternative ways to quantify protein compactness were adopted. For example, Gromiha [10] analyzed the long range and inter-residue contacts in mesophilic and thermophilic proteins of sixteen different protein families, and found that an increase in contacts between hydrogen-bond forming residues increases protein stability. Very recently, the contact order [11] is receiving increasing attention, thanks to the findings obtained by Godzik and his research group [9,12], who found that hyperthermophilic proteins from T. maritima have higher contact order than their mesophilic counterparts. Most importantly, contact order is correlated to the folding rate of proteins that fold with a two-states mechanism [11]. However, a severe limitation of this and other [10,13] studies is that two residues are considered to be in contact if the distance between their Cα atoms or between one atom and any other atom is below an arbitrary threshold. For example, Robinson-Rechavi et al. [12] considered two residues to be in contact if any of their atoms are closer than 4.5 Å, while Gromiha [10] made use of a sphere of 8.0 Å centered on Cα atoms to define long-range contacts. Furthermore, this approach bears another important drawback: it does not permit to quantify the hydrophobic contact area between two interacting residues. The hydrophobic contact area between buried residues represents in fact an indirect measure of both entropic (entropy change due to the rearrangement of the local water molecules as two hydrophobic residues interact [14]) and enthalpic (van der Waals forces in protein core, due to tight packing of neighboring residues [4]) effects (Figure 1).
Figure 1

Computation of the apolar contact area. A-B) Initially, for each amino acid pair (in this case two sample residues, Phe and Lys, are considered), the Van der Walls surface is generated. C) Then, the solvent accessible surface is computed. D) The latter is used to compute the hydrophobic contact surface between the two interacting residues.

Computation of the apolar contact area. A-B) Initially, for each amino acid pair (in this case two sample residues, Phe and Lys, are considered), the Van der Walls surface is generated. C) Then, the solvent accessible surface is computed. D) The latter is used to compute the hydrophobic contact surface between the two interacting residues. Therefore, despite a series of experimental and theoretical studies on the molecular mechanisms of protein folding [15,16] and stability [3,9,17] argued that the hydrophobic contacts play a role of paramount importance in such processes, the difference of apolar contact area between large datasets of proteins from hyper/thermophilic organisms and their mesophilic homologs, to our knowledge, has been never quantified. Such consideration, along with the wealth of information provided very recently by structural genomics projects, prompted the comparison of a large number of protein crystal structures from hyper/thermophilic organisms and their mesophilic homologs, in order to assess the role played by the hydrophobic contacts in the stabilization of the protein core, at high temperatures.

Results

Analysis of the Apolar Contact Area

Two datasets were obtained from a collection of 1563 hyperthermophilic and thermophilic proteins, retrieved from structural databases using several keywords (see Methods section; Table 1 and 2). In the first case a choice criteria favouring quality over quantity of data yielded a non redundant dataset, which will be referred to as "A", including 38 crystal structures, lacking any structural defect and displaying a maximum resolution of 2.0 Å and a maximum R-value of 0.25. Dataset A represents a subset of a second dataset, which will be referred to as "B". Dataset B is composed of 59 crystal structures lacking any structural defect, displaying a maximum resolution of 3.0 Å and a maximum R-value of 0.30. For each structure composing the two datasets, a mesophilic homologous counterpart was collected, following the same above mentioned choice criteria. The computation of the total apolar contact area (ACA) between the residues of each structure pair composing dataset A and B was then carried out. The statistical significance of the observed differences of ACA between hyper/thermophilic proteins and their mesophilic counterparts was assessed with a paired t-test. The results are reported in Table 3 (see also Additional file 1 for additional information). T-test values are expressed as the associated probability P of acceptance of the null hypothesis, that is, there are no significant differences of ACA between hyper/thermophilic and mesophilic pairs. T-values scoring > 2.0 (P(t) < 0.05) are considered statistically significant. Figure 2 shows the difference of apolar contact area computed over the whole structures of the protein pairs composing the two analysed datasets. The obtained values were normalized by the sequence length of each protein. In dataset A, 22 (13 hyperthermophilic/mesophilic and 9 thermophilic/mesophilic protein pairs) of the 38 considered protein pairs showed an increase of the ACA (Figure 2A); the corresponding P(t) was ~0.086 (0.079 for hyperthermophiles and 0.690 for thermophiles). In dataset B, 38 (24 hyperthermophilic/mesophilic and 14 thermophilic/mesophilic protein pairs) of the 59 protein pairs showed an increase of the ACA (Figure 2B); the corresponding P(t) was ~0.012 (0.020 for hyperthermophiles and 0.474 for thermophiles). Although the obtained differences were not considered statistically significant, according to the t-test validation analysis, for both datasets (Table 3), nonetheless they indicated a general increase of the apolar contact area in hyperthermophilic proteins, compared to their mesophilic counterparts.
Table 1

Hyperthermophilic/Mesophilic (1–24) and Thermophilic/Mesophilic (25–38) pairs in dataset A*

IDPDBClassOrganismRes (Å)PDBClassMesophileRes (Å)ΔÅ%identityFunctional ClassDescription
11A2Z Aa/bThermococcus litoralis1.731AUG Aa/bBacillus amyloliquefaciens2.000.2737PeptidasePyrrolidone Carboxyl Peptidase
21A53 0a/bSulfolobus solfataricus2.001PII 0a/bEscherichia coli2.000.0038SynthaseIndole-3-Glycerolphosphate Synthase
31DD3 Aa/bThermotoga maritima2.001CTF 0a/bEscherichia coli1.700.369RibosomalRibosomal Protein
41DQI Amainly bPyrococcus furiosus1.701DFX 0mainly bD. desulfuricans1.900.2034OxidoreductaseSuperoxide Reductase
51FTR Aa+bMethanopyrus kandleri1.701M5S Aa+bMethanosarcina barkeri1.850.1559TransferaseFormyltransferase
61G29 1a/bThermococcus litoralis1.901B0U Aa/bSalmonella typhimurium1.500.4031Sugar BindingMalk Protein
71HQK Aa/bAquifex aeolicus1.601W19 Aa/bM. tuberculosis2.000.4050TransferaseLumazine Synthase
81IU8 Aa/bPyrococcus horikoshii1.601AUG Aa/bBacillus amyloliquefaciens2.000.4045HydrolasePyrrolidone-Carboxylate Peptidase
91J31 Aa/bPyrococcus horikoshii1.601UF5 Aa/bAgrobacterium sp.1.600.0031UnknownHypothetical Protein Ph0642
101JI0 Aa/bThermotoga maritima2.001G6H Aa/bEscherichia coli1.600.4031CarrierAbc Transporter
111JVB Aa/bSulfolobus solfataricus1.851M6H Aa/bHomo sapiens2.000.1531OxidoreductaseAlcohol Dehydrogenase
121LK5 Aa/bPyrococcus horikoshii1.751M0S Aa/bHaemophilus influenzae1.900.1542IsomeraseD-Ribose-5-Phosphate Isomerase
131M2K Aa/bArchaeoglobus fulgidus1.471S5P Aa/bEscherichia coli1.960.4941Trascriptional RegulatorSir2 Homologue
141M5H Aa+bArchaeoglobus fulgidus2.001M5S Aa+bMethanosarcina barkeri1.850.1568TransferaseFormyltransferase
151NSJ 0a/bThermotoga maritima2.001PII 0a/bEscherichia coli2.000.0033IsomeraseP-Ribosylanthranilate Isomerase
161P1L Aa/bArchaeoglobus fulgidus2.001NAQ Aa/bEscherichia coli1.700.333UnknownCation Resistent Protein Cut-A
171U1I Aa/bArchaeoglobus fulgidus1.901P1J Aa/bSaccharomyces cerevisiae1.700.2031IsomeraseMyo-Inositol Phosphate Synthase
181UKU Aa/bPyrococcus horikoshii1.451NAQ Aa/bEscherichia coli1.700.2539Metal Binding ProteinCation Resistent Protein Cut-A
191V3W Amainly bPyrococcus horikoshii1.501XHD Amainly bBacillus cereus1.900.4040LyaseFerripyochelin Binding Protein
201V7R Aa/bPyrococcus horikoshii1.401K7K Aa/bEscherichia coli1.500.1034HydrolaseHypothetical Protein Ph1917
211VE0 Aa/bSulfolobus tokodaii2.001VMH Aa/bC. acetobutylicum1.310.6942Metal Binding ProteinHypothetical Protein St2072
221VPE 0a/bThermotoga maritima2.001HDI Aa/bSus scrofa1.800.2047TransferasePhosphoglycerate Kinase
231XGS Amainly aPyrococcus furiosus1.751B6A 0mainly aHomo sapiens1.600.1540AminopeptidaseMethionine Aminopeptidase
241XTY Aa/bPyrococcus abyssi1.801Q7S Aa/bHomo sapiens2.000.2048HydrolasePeptidyl-Trna Hydrolase
251EE8 Amainly aThermus thermophilus1.901TDZ Amainly aLactococcus lactis1.800.1035Dna Binding ProteinFpg Protein
261GD7 Amainly bThermus thermophilus2.001PXF Amainly bEscherichia coli1.870.1334Rna Binding ProteinCsaa Protein
271J09 Aa/bThermus thermophilus1.801NZJ Aa/bEscherichia coli1.500.3033LigaseGlutamil-Trna Synthase
281J3N Aa/bThermus thermophilus2.001E5M Aa/bSynechocystis sp.1.540.4655TransferaseAcyl Carrier Protein
291JBO Amainly aT. elongatus1.451B8D Amainly aGriffithsia monilis1.900.4538PhotosynthesisPhycocyanin
301MNG Amainly aThermus thermophilus1.801GV3 Amainly aAnabaena sp.2.000.2059OxidoreductaseSuperoxide Dismutase
311SRV Aa/bThermus thermophilus1.701KID 0a/bEscherichia coli1.700.0069ChaperoneGroel
321UZB Aa/bThermus thermophilus1.401O0A Aa/bHalobacterium salinarum1.420.0234Oxidoreductase1-Pyrroline-5-Carboxylate Dehydrogenase
331V6S Aa/bThermus thermophilus1.5016PK 0a/bTrypanosoma brucei1.600.1043TransferasePhosphoglycerate Kinase
341V8F Aa/bThermus thermophilus1.901N2E Aa/bM. tuberculosis1.600.3055LigasePantothenate Synthetase
351VC4 Aa/bThermus thermophilus1.801PII 0a/bEscherichia coli2.000.2037LyaseIndole-3-Glycerolphosphate Synthase
361VCD Aa/bThermus thermophilus1.701SJY Aa/bDeinococcus radiodurans1.390.3134HydrolaseAp6a Hydroxylase Ndx1
371YYA Aa/bThermus thermophilus1.601MO0 Aa/bCaenorhabditis elegans1.700.1044IsomeraseTriosephosphate Isomerase
382PRD 0a/bThermus thermophilus2.001SXV Aa/bM. tuberculosis1.300.7051HydrolaseInorganic Pyrophosphatase

* Optimal growth temperatures are between 50°C and 80°C for thermophiles, and above 80°C for hyperthermophiles

Table 2

Hyperthermophilic/Mesophilic (1–38) and Thermophilic/Mesophilic (39–59) pairs in dataset B

IDPDBClassOrganismRes (Å)PDBClassMesophileRes (Å)ΔÅ%identityFunctional ClassDescription
11A2Z Aa/bThermococcus litoralis1.731AUG Aa/bBacillus amyloliquefaciens2.000.2737PeptidasePyrrolidone Carboxyl Peptidase
21A53 0a/bSulfolobus solfataricus2.001PII 0a/bEscherichia coli2.000.0038SynthaseIndole-3-Glycerolphosphate Synthase
31DQI Amainly bPyrococcus furiosus1.701DFX 0mainly bDesulfovibrio desulfuricans1.900.2034OxidoreductaseSuperoxide Reductase
41FTR Aa+bMethanopyrus kandleri1.701M5S Aa+bMethanosarcina barkeri1.850.1559TransferaseFormyltransferase
51DD3 Aa/bThermotoga maritima2.001CTF 0a/bEscherichia coli1.700.369RibosomalRibosomal Protein
61G29 1a/bThermococcus litoralis1.901B0U Aa/bSalmonella typhimurium1.500.4031Sugar BindingMalk Protein
71HDG Oa/bThermotoga maritima2.501RM4 Aa/bSpinacia oleracea2.000.5056OxidoreductaseGlyceraldehyde 3 Phosphate Dehydrogenase
81HQK Aa/bAquifex aeolicus1.601W19 Aa/bMycobacterium tuberculosis2.000.4050TransferaseLumazine Synthase
91I4N Aa/bThermotoga maritima2.501PII 0a/bEscherichia coli2.000.5034LyaseIndole-3-Glycerolphosphate Synthase
101IOF Aa/bPyrococcus furiosus2.201AUG Aa/bBacillus amyloliquefaciens2.000.2043HydrolasePyrrolidone-Carboxylate Peptidase
111IU8 Aa/bPyrococcus horikoshii1.601AUG Aa/bBacillus amyloliquefaciens2.000.4045HydrolasePyrrolidone-Carboxylate Peptidase
121J0A Aa/bPyrococcus horikoshii2.501TZJ Aa/bPseudomonas sp.1.990.5131LyaseAminocyclopropane Carboxylate Deaminase
131J31 Aa/bPyrococcus horikoshii1.601UF5 Aa/bAgrobacterium sp.1.600.0031UnknownHypothetical Protein Ph0642
141JI0 Aa/bThermotoga maritima2.001G6H Aa/bEscherichia coli1.600.4031CarrierAbc Transporter
151JJI Aa/bArchaeoglobus fulgidus2.201JKM Ba/bBacillus subtilis1.850.3535HydrolaseCarboxylesterase
161JVB Aa/bSulfolobus solfataricus1.851M6H Aa/bHomo sapiens2.000.1531OxidoreductaseAlcohol Dehydrogenase
171LK5 Aa/bPyrococcus horikoshii1.751M0S Aa/bHaemophilus influenzae1.900.1542IsomeraseD-Ribose-5-Phosphate Isomerase
181M2K Aa/bArchaeoglobus fulgidus1.471S5P Aa/bEscherichia coli1.960.4941Trascriptional RegulatorSir2 Homologue
191M4Y Aa+bThermotoga maritima2.101G3K Aa+bHaemophilus influenzae1.900.2066HydrolaseHslv
201M5H Aa+bArchaeoglobus fulgidus2.001M5S Aa+bMethanosarcina barkeri1.850.1568TransferaseFormyltransferase
211MXG Aa/bPyrococcus woesei1.601VJS 0a/bBacillus licheniformis1.700.1031IdrolasiAAmilase
221NSJ 0a/bThermotoga maritima2.001PII 0a/bEscherichia coli2.000.0033IsomeraseP-Ribosylanthranilate Isomerase
231P1L Aa/bArchaeoglobus fulgidus2.001NAQ Aa/bEscherichia coli1.700.333UnknownCation Resistent Protein Cut-A
241OJU Aa/bArchaeoglobus fulgidus2.791GUZ Aa/bChlorobium vibrioforme2.000.7934OxidoreductaseMalate Dehydrogenase
251U1I Aa/bArchaeoglobus fulgidus1.901P1J Aa/bSaccharomyces cerevisiae1.700.2031IsomeraseMyo-Inositol Phosphate Synthase
261UE8 Amainly aSulfolobus tokodaii3.001ODO Amainly aStreptomyces coelicolor1.851.1532UnknownCytochrome P450
271UKU Aa+bPyrococcus horikoshii1.451NAQ Aa+bEscherichia coli1.700.2539Metal Binding ProteinCation Resistent Protein Cut-A
281ULZ Aa/bAquifex aeolicus2.201DV1 Aa/bEscherichia coli1.900.3053LigasePyruvate Carboxylase
291UVV Aa/bThermotoga maritima2.751GS5 Aa/bEscherichia coli1.501.2535TransferaseAcetylglutamate Kinase
301V3W Amainly bPyrococcus horikoshii1.501XHD Amainly bBacillus cereus1.900.4040LyaseFerripyochelin Binding Protein
311V7R Aa/bPyrococcus horikoshii1.401K7K Aa/bEscherichia coli1.500.1034HydrolaseHypothetical Protein Ph1917
321VE0 Aa/bSulfolobus tokodaii2.001VMH Aa/bClostridium acetobutylicum1.310.6942Metal Binding ProteinHypothetical Protein St2072
331VFF Aa/bPyrococcus horikoshii2.551E4I Aa/bBacillus polymyxa2.000.5532HydrolaseB-Glucosidase
341VPE 0a/bThermotoga maritima2.001HDI Aa/bSus scrofa1.800.2048TransferasePhosphoglycerate Kinase
351WPW Aa/bSulfolobus tokodaii2.801A05 Aa/bThiobacillus ferrooxidans2.000.8040OxidoreductaseIpm Dehydrogenase
361XGS Amainly aPyrococcus furiosus1.751B6A 0mainly aHomo sapiens1.600.1539AminopeptidaseMethionine Aminopeptidase
371XTY Aa/bPyrococcus abyssi1.801Q7S Aa/bHomo sapiens2.000.2048HydrolasePeptidyl-Trna Hydrolase
381B33 Amainly aM. laminosus2.301XG0 Cmainly aRhodomonas0.971.3332PhotosynthesisAllophycocianin
391BXB Aa/bThermus aquaticus2.201MUW Aa/bStreptomyces olivochromogenes0.861.3458IsomeraseXilose Isomerase
401EE8 Amainly aThermus thermophilus1.901TDZ Amainly aLactococcus lactis1.800.1035Dna Binding ProteinFpg Protein
411GD7 Amainly bThermus thermophilus2.001PXF Amainly bEscherichia coli1.870.1334Rna Binding ProteinCsaa Protein
421J09 Aa/bThermus thermophilus1.801NZJ Aa/bEscherichia coli1.500.3033LigaseGlutamil-Trna Synthase
431J3N Aa/bThermus thermophilus2.001E5M Aa/bSynechocystis sp.1.540.4655TransferaseAcyl Carrier Protein
441JBO Amainly aT. elongatus1.451B8D Amainly aGriffithsia monilis1.900.4538PhotosynthesisPhycocyanin
451MNG Amainly aThermus thermophilus1.801GV3 Amainly aAnabaena sp.2.000.2059OxidoreductaseSuperoxide Dismutase
461SRV Aa/bThermus thermophilus1.701KID 0a/bEscherichia coli1.700.0069ChaperoneGroel
471UKW Amainly aThermus thermophilus2.401RX0 Amainly aHomo sapiens1.770.6339OxidoreductaseAcil-Coa Dehydrogenase
481UZB Aa/bThermus thermophilus1.401O0A Aa/bHalobacterium salinarum1.420.0234Oxidoreductase1-Pyrroline-5-Carboxylate Dehydrogenase
491V6S Aa/bThermus thermophilus1.5016PK 0a/bTrypanosoma brucei1.600.1044TransferasePhosphoglycerate Kinase
501V8F Aa/bThermus thermophilus1.901N2E Aa/bMycobacterium tuberculosis1.600.3055LigasePantothenate Synthetase
511V8G Aa/bThermus thermophilus2.101VQU Aa/bNostoc sp.1.850.2542TransferaseAnthranilate Phosphoribosyltransferase
521VC2 Aa/bThermus thermophilus2.601GAD Oa/bEscherichia coli1.800.8051OxidoreductaseGlyceraldehyde 3 Phosphate Dehydrogenase
531VC4 Aa/bThermus thermophilus1.801PII 0a/bEscherichia coli2.000.2037LyaseIndole-3-Glycerolphosphate Synthase
541VCD Aa/bThermus thermophilus1.701SJY Aa/bDeinococcus radiodurans1.390.3134HydrolaseAp6a Hydroxylase Ndx1
551WXD Aa/bThermus thermophilus2.101NYT Aa/bEscherichia coli1.500.6036OxidoreductaseShikimate 5-Dehydrogenase
561XAA 0a/bThermus thermophilus2.101CNZ Aa/bSalmonella typhimurium1.760.3452Oxidoreductase3-Isopropylmalate Dehydrogenase
571YYA Amainly bThermus thermophilus1.601MO0 Amainly bCaenorhabditis elegans1.700.1044IsomeraseTriosephosphate Isomerase
581YKF Aa/bT. brockii2.501JQB Aa/bClostridium beijerinckii0.531.9777OxidoreductaseNadp-Dependent Alcohol Dehydrogenase
592PRD 0a/bThermus thermophilus2.001SXV Aa/bMycobacterium tuberculosis1.300.7052HydrolaseInorganic Pyrophosphatase
Table 3

T-tests results for the ACA distributions, measured in different structural environments*

ACA Distributions+
Structural environment
P ≤ 0.05**TotalSCRsα-Helices in SCRsβ-strands in SCRs

All
Dataset A0.08640.06400.08590.9437
Dataset B0.01240.00690.01590.1745
Shapiro-Wilk Test°0.90/0.990.07/0.002°°0.96/0.59
Hyperthermophiles
Dataset A0.07900.00290.05240.8120
Shapiro-Wilk Test°0.26/0.900.97/0.16
Dataset B0.02050.00010.01130.061
Shapiro-Wilk Test°0.53/0.420.49/0.360.13/0.003°°°
Thermophiles
Dataset A0.69010.51390.83870.7080
Dataset B0.33570.75300.31230.6027

* Values are expressed as the associated probability P of acceptance of the null hypothesis

** P ≤ 0.05 are considered statistically significant, and are bolded

+ The statistical significance of the observed differences of ACA between hyper/thermophilic proteins and their mesophilic counterparts

°The obtained P(t) of the Shapiro-Wilk test for significant results. The distributions of ACA are presented in the form hyper/thermophilic-mesophilic distribution

°°The obtained P(t) of the Shapiro-Wilk test is 0.46 removing 2 outliers; P(t) of the associated t-test = 0.005 removing the outliers

°°°The obtained P(t) of the Shapiro-Wilk test is 0.62 removing 3 outliers; P(t) of the associated t-test = 0.001 removing the outliers

Figure 2

Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the whole protein structure. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B).

Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the whole protein structure. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B). Hyperthermophilic/Mesophilic (1–24) and Thermophilic/Mesophilic (25–38) pairs in dataset A* * Optimal growth temperatures are between 50°C and 80°C for thermophiles, and above 80°C for hyperthermophiles Hyperthermophilic/Mesophilic (1–38) and Thermophilic/Mesophilic (39–59) pairs in dataset B T-tests results for the ACA distributions, measured in different structural environments* * Values are expressed as the associated probability P of acceptance of the null hypothesis ** P ≤ 0.05 are considered statistically significant, and are bolded + The statistical significance of the observed differences of ACA between hyper/thermophilic proteins and their mesophilic counterparts °The obtained P(t) of the Shapiro-Wilk test for significant results. The distributions of ACA are presented in the form hyper/thermophilic-mesophilic distribution °°The obtained P(t) of the Shapiro-Wilk test is 0.46 removing 2 outliers; P(t) of the associated t-test = 0.005 removing the outliers °°°The obtained P(t) of the Shapiro-Wilk test is 0.62 removing 3 outliers; P(t) of the associated t-test = 0.001 removing the outliers A more detailed analysis on the structurally conserved regions [18] (SCRs; see methods section) of the structures composing dataset A and B indicated that, in both datasets, a number of hyperthermophilic proteins underwent a highly significant (P(t) < 0.001) increase of the hydrophobic contact area of those residues composing the SCRs (Figure 3; Table 3). SCRs were defined as regions displaying a similar local conformation, lacking insertions and deletions and composed of at least three consecutive residues. SCRs are therefore protein segments that conserve the same main-chain conformation in each pair of structures analysed, excluding the intervening regions whose structure differs markedly amongst different proteins [19]. Considering the role of great importance played by the hydrophobic contacts in stabilizing and possibly driving the protein folding mechanism, it seemed interesting to analyse how, during evolution, the SCRs coped with the modifications of the hydrophobic contacts necessary to achieve the correct fold at high temperatures. In dataset A (Figure 3A), 22 (17 hyperthermophilic/mesophilic and 5 thermophilic/mesophilic protein pairs, respectively) of the 38 considered protein pairs showed an increase of the ACA (P(t) ~0.0029). The same trend was also observed for dataset B (Figure 3B), in which 37 of 59 protein pairs (27 hyperthermophilic/mesophilic and 10 thermophilic/mesophilic) displayed an increased ACA in the direction mesophile → hyper/thermophile (P(t) ~0.0001). The measured mean ΔACA was 0.39 Å2/residue and 0.37 Å2/residue for datasets A and B, respectively. However, if only the hyperthermophilic/mesophilic pairs were considered, the mean ΔACA was 0.74 Å2/residue and 0.63 Å2/residue for datasets A and B, respectively. The maximum measured difference was 2.92 Å2/residue for the pair 1V7R/1K7K (nucleotide triphosphate pyrophosphatase from P. horikoshii/E. coli). Since these quite high differences of ACA can be due to other factors than acquired thermostability (i.e., different overall conformations), the t-test validation analysis was repeated without these extreme pairs, obtaining again not significant results (see "Methods" section and supplementary material).
Figure 3

Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the SCRs. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B).

Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the SCRs. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B). To get a deeper insight into the statistically significant increase of the hydrophobic contact area of protein cores from hyperthermophilic organisms, the possible occurrence of a larger amount of hydrophobic contact area has been examined in different secondary structure elements. In dataset A (Figure 4A), 16 out of the 24 hyperthermophilic proteins considered showed an increase of ACA in the α-helices of the protein core, compared to their mesophilic counterparts, while in dataset B (Figure 4B) the same ratio was 25 out of 37 proteins, with a measured significance P(t) ~0.0524 and P(t) ~0.0113 for datasets A and B, respectively. Although in this latter case significant deviations from normality, as judged by the application of the Shapiro-Wilk normality test, were observed for the distribution of mesophilic values, nonetheless removing three outliers gave a Shapiro-Wilk P(t) ~0.62 and a t-test P(t) ~0.001. These results indicated that α-helices are mainly involved in the increased amount of hydrophobic contact area which was observed comparing hyperthermophilic/mesophilic proteins. Conversely, no statistically significant trends have been observed in the comparison of the ACA in the β-strands of the SCRs (Table 3). In dataset A, 21 (14 hyperthermophilic/mesophilic protein pairs) of the 38 considered protein pairs showed an increase of the ACA, while in dataset B, 34 (24 hyperthermophilic/mesophilic proteins) of the 59 pairs exhibited an increase of the ACA. The mean value of ΔACA is -0.02 Å2/residue and 0.34 Å2/residue for dataset A and B. Therefore, at least for the hyperthermophilic/mesophilic protein pairs, it can be concluded that the statistically significant increase of the hydrophobic contact area of protein cores involves mainly the α-helices and not the β-strands.
Figure 4

Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the α-helices of the SCRs. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B).

Differences in the apolar contact area (ΔACA) for each protein pair, composing dataset A and B, computed over the α-helices of the SCRs. Values for hyperthermophilic/mesophilic protein pairs and thermophilic/mesophilic pairs are expressed in Å2/residue and represented as light grey and dark grey bars, respectively. Numbers on X-axis refer to Table 1 (A) and Table 2 (B).

Differences in the amino acid composition of the residues involved in conserved hydrophobic contacts

The differences of amino acid composition of the residues involved in conserved hydrophobic contacts (CHCs; Table 4) [19] between hyperthermophilic proteins and their mesophilic counterparts is expressed in units of standard deviation from the measured mean value, R. Rvalues > 0 or < 0 indicate, respectively, a frequency of residue type aa higher or lower than the expected mean. Rvalues ≥ 3.0 standard deviations (P ≤ 0.01) from the mean value (that approximates zero) were considered statistically significant. Compositional analysis shows no statistically significant differences between hyperthermophilic and mesophilic proteins, regarding the identity of the residues involved in the formation of hydrophobic contacts, except for isoleucine, that scored at ~3.6 standard deviations from the mean in both datasets A and B. It is important to emphasize that, in evaluating the differences of amino acid composition of the residues involved in conserved hydrophobic contacts, dataset B, containing 13 hyperthermophilic/mesophilic protein pairs more than dataset A, is probably more confident. In any case, since both datasets A and B gave very similar results, the role played by isoleucine is probably independent from the number and type of structures analysed.
Table 4

Amino acid composition of CHCs*

DATASET ADATASET B
Amino acidHyperthermophiles vs. MesophilesAmino acidHyperthermophiles vs. Mesophiles
A-1.045A-0.680
V-0.107V-0.115
F0.305F0.216
I3.661I3.635
L-1.609L-1.585
D-0.451D-0.365
E0.211E0.432
G-0.058G-0.136
K0.130K0.645
S-0.245S-0.355
T-0.398T-0.554
Y0.471Y0.821
C-0.850C-0.683
N0.285N0.231
Q-0.813Q-0.933
P0.334P0.207
M-0.036M-0.412
R0.500R0.114
H-0.167H-0.284
W-0.407W-0.398

* Values are expressed in units of standard deviation from the mean (Z-score). R values ≥ 3.0 are considered statistically significant and are bolded.

Amino acid composition of CHCs* * Values are expressed in units of standard deviation from the mean (Z-score). R values ≥ 3.0 are considered statistically significant and are bolded.

Preferred amino acid interactions in conserved hydrophobic contacts

In order to further investigate the statistically significant increase of isoleucine in CHCs of hyperthermophilic proteins, compared to their mesophilic counterparts, an analysis was carried out to infer which amino acid pairs are preferred in the formation of hydrophobic contacts. Preferred amino acid pairs forming hydrophobic contacts were identified by computing the number of times a particular pair of residues comprised in SCRs makes a hydrophobic contact, displaying an apolar contact area > 0.0 Å2. The results of this analysis are shown in Tables 5 and 6, where each element ij of the interaction matrix reports, in units of standard deviation from the mean value, the measured frequency of interaction between residue i and residue j. For dataset A, accounting for 17864 apolar contacts, five types of interactions (Ile/Ala, Ile/Val, Ile/Phe, Ile/Ile and Ile/Leu) showed a frequency ≥ 3.0 standard deviations from the mean value; in every case, isoleucine is involved in such interactions. Similar results were obtained for dataset B, where 33546 interactions were counted: of six types of interactions scoring at > 3.0 standard deviations, five (Ile/Ala, Ile/Val, Ile/Tyr, Ile/Ile and Ile/Leu) involved the amino acid isoleucine. The other statistically significant interaction is between glutamate and lysine, scoring at 3.28 standard deviations from the mean. The closeness between the apolar atoms composing Glu and Lys residues might be only a secondary effect in the generation of strong ion-pairs between these two residues.
Table 5

Preferred amino acid interactions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset A are compared*

ALAVALPHEILELEUASPGLUGLYLYSSERTHRTYRCYSASNGLNPROMETARGHISTRPXXX
ALA-2.03
VAL-0.36-0.55
PHE0.85-0.46-0.26
ILE3.076.093.174.33
LEU-4.00-1.110.423.82-4.56
ASP-1.230.05-0.490.88-0.23-0.13
GLU-0.830.460.490.042.71-0.130.95
GLY-0.60-0.52-0.350.81-0.45-0.510.920.01
LYS-1.73-1.030.462.451.13-0.482.370.390.98
SER-0.87-0.35-0.081.13-0.60-0.160.07-0.810.480.11
THR-1.48-0.43-0.400.02-1.030.04-1.010.520.290.130.16
TYR-0.231.540.171.44-1.510.37-0.190.22-0.490.38-0.080.53
CYS-1.76-1.79-0.67-0.57-3.03-0.17-0.01-0.59-0.35-0.77-0.37-0.30-0.12
ASN0.670.200.010.33-0.82-0.270.220.240.16-0.23-0.56-0.590.060.06
GLN-1.19-0.880.120.23-2.61-0.34-1.13-0.550.31-0.56-1.32-0.56-0.16-0.04-0.23
PRO0.03-0.730.360.270.100.150.710.750.49-0.16-0.32-0.21-0.480.44-0.490.21
MET-0.850.44-0.081.22-0.23-0.290.79-0.610.310.230.29-0.25-0.570.15-0.23-0.120.04
ARG-0.311.080.151.651.01-0.121.51-0.040.10-0.07-0.050.490.05-0.18-0.600.340.430.49
HIS-0.500.23-0.130.28-1.55-0.24-0.920.06-0.93-0.11-0.240.16-0.55-0.05-0.02-0.02-0.05-0.140.37
TRP0.01-0.40-0.190.640.550.200.95-0.11-0.25-0.08-0.06-0.30-0.14-0.48-0.040.25-0.230.46-0.480.21
XXX0.350.090.390.610.740.220.350.040.090.230.260.230.000.050.090.180.000.06-0.080.090.13

* Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 0.10.

Table 6

Preferred amino acid interactions in CHCs. Thermophilic versus mesophilic proteins of dataset B are compared*

ALAVALPHEILELEUASPGLUGLYLYSSERTHRTYRCYSASNGLNPROMETARGHISTRPXXX
ALA-0.81
VAL-0.80-0.62
PHE0.29-0.96-0.09
ILE3.276.362.804.21
LEU-1.86-2.020.684.17-4.10
ASP-0.76-0.23-0.581.21-0.49-0.23
GLU0.130.580.510.791.370.110.89
GLY-0.44-0.77-0.501.11-0.26-0.530.57-0.34
LYS-0.460.100.752.511.690.373.280.651.16
SER-1.04-1.38-0.361.47-0.05-0.220.05-0.630.780.06
THR-2.05-1.15-0.800.17-0.90-0.12-0.890.000.49-0.150.42
TYR0.601.740.903.06-0.540.670.640.490.840.530.480.86
CYS-1.56-1.49-0.83-0.64-2.55-0.14-0.08-0.48-0.26-0.57-0.53-0.30-0.12
ASN0.480.310.15-0.02-0.70-0.040.230.350.49-0.11-0.42-0.150.010.08
GLN-1.58-1.09-0.33-0.38-2.48-0.88-1.02-0.79-0.27-0.73-1.02-0.65-0.24-0.19-0.42
PRO0.13-0.940.470.16-0.370.040.720.620.70-0.22-0.26-0.01-0.470.21-0.750.09
MET-0.86-0.500.091.19-0.84-0.210.72-0.530.350.050.09-0.52-0.600.01-0.360.11-0.05
ARG-1.220.26-0.111.52-0.12-0.440.750.02-0.20-0.61-0.340.68-0.12-0.11-0.710.100.030.24
HIS-0.510.01-0.340.12-1.45-0.25-0.850.15-0.95-0.28-0.310.20-0.47-0.03-0.260.11-0.05-0.430.14
TRP-0.01-0.23-0.420.580.300.040.77-0.290.03-0.05-0.15-0.09-0.32-0.28-0.090.19-0.170.33-0.550.32
XXX0.270.010.280.370.510.170.240.000.080.150.250.170.000.030.060.150.000.03-0.070.090.10

* Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 0.12.

Preferred amino acid interactions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset A are compared* * Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 0.10. Preferred amino acid interactions in CHCs. Thermophilic versus mesophilic proteins of dataset B are compared* * Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 0.12.

Preferred amino acid substitutions in conserved hydrophobic contacts

Favoured amino acid substitutions between the hyperthermophilic and mesophilic proteins were calculated from the results obtained by the CHC_FIND tool [19]. The residues exchange analysis was indeed limited to the identified conserved hydrophobic contacts. The obtained substitution matrices are shown in Tables 7 and 8. Values are expressed in units of standard deviation from the mean. Only values scoring at 3.0 standard deviations or more from the mean were considered statistically significant. Again, almost all of the most significant exchanges involve isoleucine in both datasets (dataset A: ValIle 6.32, LeuIle 6.36; dataset B: ValIle 6.39, LeuIle 6.84 and PheIle 3.12). These exchanges are reflected in the variation of average amino acid composition of hyperthermophiles (Table 4), where a marked increase of isoleucine content can be detected. The only other exchange observed not involving isoleucine is AlaVal, scoring at 3.20 standard deviations from the mean.
Table 7

Preferred amino acid substitutions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset A are compared*

TO HYPERTHERMOPHILE
ALAVALPHEILELEUASPGLUGLYLYSSERTHRTYRCYSASNGLNPROMETARGHISTRPXXX
FROM MESOPHILEALA0.003.201.281.67-0.850.261.790.471.92-0.13-2.220.04-2.48-0.380.43-0.30-1.02-0.13-0.640.090.00
VAL-3.200.001.076.31-1.58-0.300.21-0.600.130.21-1.790.38-2.09-0.73-0.130.340.340.900.04-0.260.00
PHE-1.28-1.070.002.31-0.73-0.260.04-0.090.38-0.210.600.51-0.21-0.38-0.300.131.54-0.47-0.21-0.640.00
ILE-1.67-6.31-2.310.00-6.360.470.51-0.600.60-0.210.30-0.09-0.34-0.77-0.900.000.90-1.11-1.02-0.09-0.17
LEU0.851.580.736.360.000.13-0.900.30-1.02-0.300.21-0.13-1.710.26-0.900.13-1.621.37-0.680.380.00
ASP-0.260.300.26-0.47-0.130.001.320.090.73-0.13-0.210.090.00-0.770.09-0.130.000.130.510.040.00
GLU-1.79-0.21-0.04-0.510.90-1.320.00-0.30-0.94-0.73-0.470.04-0.47-0.43-1.07-0.47-0.090.17-0.17-0.170.00
GLY-0.470.600.090.60-0.30-0.090.300.00-0.30-1.280.380.380.000.51-0.51-0.090.000.770.170.000.00
LYS-1.92-0.13-0.38-0.601.02-0.730.940.300.000.00-1.02-0.38-0.30-0.04-1.11-0.260.040.77-0.600.130.00
SER0.13-0.210.210.210.300.130.731.280.000.001.580.000.09-0.56-0.430.300.130.43-0.09-0.130.00
THR2.221.79-0.60-0.30-0.210.210.47-0.381.02-1.580.00-0.21-0.510.040.260.34-0.300.340.56-0.470.00
TYR-0.04-0.38-0.510.090.13-0.09-0.04-0.380.380.000.210.00-0.81-0.26-0.130.170.43-0.850.340.430.00
CYS2.482.090.210.341.710.000.470.000.30-0.090.510.810.000.130.040.170.900.130.260.000.00
ASN0.380.730.380.77-0.260.770.43-0.510.040.56-0.040.26-0.130.00-0.85-0.04-0.26-0.13-0.130.130.00
GLN-0.430.130.300.900.90-0.091.070.511.110.43-0.260.13-0.040.850.000.56-0.380.380.130.430.00
PRO0.30-0.34-0.130.00-0.130.130.470.090.26-0.30-0.34-0.17-0.170.04-0.560.00-0.170.17-0.210.090.00
MET1.02-0.34-1.54-0.901.620.000.090.00-0.04-0.130.30-0.43-0.900.260.380.170.00-0.64-0.470.38-0.30
ARG0.13-0.900.471.11-1.37-0.13-0.17-0.77-0.77-0.43-0.340.85-0.130.13-0.38-0.170.640.00-0.980.430.00
HIS0.64-0.040.211.020.68-0.510.17-0.170.600.09-0.56-0.34-0.260.13-0.130.210.470.980.000.000.00
TRP-0.090.260.640.09-0.38-0.040.170.00-0.130.130.47-0.430.00-0.13-0.43-0.09-0.38-0.430.000.000.00
XXX0.000.000.000.170.000.000.000.000.000.000.000.000.000.000.000.000.300.000.000.000.00

* Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 23.41.

Table 8

Preferred amino acid substitutions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset B are compared*

TO HYPERTHERMOPHILE
ALAVALPHEILELEUASPGLUGLYLYSSERTHRTYRCYSASNGLNPROMETARGHISTRPXXX
FROM MESOPHILEALA0.001.730.662.91-0.760.071.540.141.82-0.26-2.440.59-2.34-0.500.430.69-0.47-0.24-0.80-0.310.00
VAL-1.730.001.476.39-0.920.170.76-0.590.690.62-1.470.43-1.23-0.52-0.280.120.210.31-0.21-0.900.00
PHE-0.66-1.470.003.12-1.56-0.26-0.05-0.240.05-0.33-0.141.80-0.14-0.31-0.400.140.59-0.31-0.35-0.830.00
ILE-2.91-6.39-3.120.00-6.840.310.38-0.640.85-0.50-0.380.07-0.38-0.54-0.78-0.350.31-0.88-0.57-0.09-0.09
LEU0.760.921.566.840.000.07-0.310.141.090.170.431.35-0.76-0.17-1.560.21-2.960.40-0.400.640.00
ASP-0.07-0.170.26-0.31-0.070.000.800.170.92-0.24-0.280.210.070.07-0.17-0.05-0.140.020.090.280.00
GLU-1.54-0.760.05-0.380.31-0.800.00-0.50-0.43-0.78-0.330.33-0.14-0.35-0.90-0.59-0.05-0.66-0.520.050.00
GLY-0.140.590.240.64-0.14-0.170.500.000.21-1.020.280.28-0.170.35-0.57-0.210.050.450.26-0.070.00
LYS-1.82-0.69-0.05-0.85-1.09-0.920.43-0.210.000.02-0.73-0.40-0.17-0.14-1.99-0.520.00-0.64-0.660.310.00
SER0.26-0.620.330.50-0.170.240.781.02-0.020.000.500.140.170.43-0.350.210.000.19-0.07-0.120.00
THR2.441.470.140.38-0.430.280.33-0.280.73-0.500.000.62-0.62-0.33-0.12-0.090.090.210.21-0.260.00
TYR-0.59-0.43-1.80-0.07-1.35-0.21-0.33-0.280.40-0.14-0.620.00-0.52-0.21-0.24-0.210.00-1.16-0.540.690.00
CYS2.341.230.140.380.76-0.070.140.170.17-0.170.620.520.000.190.020.140.570.070.280.190.00
ASN0.500.520.310.540.17-0.070.35-0.350.14-0.430.330.21-0.190.00-0.95-0.05-0.19-0.33-0.35-0.020.00
GLN-0.430.280.400.781.560.170.900.571.990.350.120.24-0.020.950.000.21-0.020.570.350.240.00
PRO-0.69-0.12-0.140.35-0.210.050.590.210.52-0.210.090.21-0.140.05-0.210.00-0.500.26-0.170.070.00
MET0.47-0.21-0.59-0.312.960.140.05-0.050.000.00-0.090.00-0.570.190.020.500.00-0.62-0.210.14-0.17
ARG0.24-0.310.310.88-0.40-0.020.66-0.450.64-0.19-0.211.16-0.070.33-0.57-0.260.620.00-0.920.500.00
HIS0.800.210.350.570.40-0.090.52-0.260.660.07-0.210.54-0.280.35-0.350.170.210.920.000.380.00
TRP0.310.900.830.09-0.64-0.28-0.050.07-0.310.120.26-0.69-0.190.02-0.24-0.07-0.14-0.50-0.380.000.00
XXX0.000.000.000.090.000.000.000.000.000.000.000.000.000.000.000.000.170.000.000.000.00

* Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 42.10

Preferred amino acid substitutions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset A are compared* * Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 23.41. Preferred amino acid substitutions in CHCs. Hyperthermophilic versus mesophilic proteins of dataset B are compared* * Values are expressed in units of standard deviation from the mean (Z-score). Values ≥ 3.0 are considered statistically significant and are bolded. Mean = 0.00; standard deviation = 42.10

Discussion

The main goal of this study was to evaluate on a quantitative basis the relationship between hydrophobic contacts and proteins adaptation to high temperatures. An essential prerequisite to carry out such a study is to assemble a large and minimally redundant set of very high resolution crystal structures. Indeed, despite the observation that each protein family seems to adopt different structural strategies to adapt to high temperatures [5], common trends may be outlined if a large number of structural data is available [8]. At the same time, since computed values of apolar contact area are mostly influenced by the relative position of the interacting residues, their precision is affected by the resolution of the crystal structures analysed. Therefore two datasets were culled from a set of 1563 crystal structures from thermophilic (optimal growth temperature between 50°C and 80°C) and hyperthermophilic (optimal growth temperature above 80°C) organisms, and their mesophilic counterparts. The rationale of this choice was to assure that the obtained results were not biased either by the paucity of data, or by the quality of the collected crystal structures. As already discussed by Chen et al. [7], the increase of the apolar contact area in hyperthermophilic and thermophilic proteins may be achieved at least by two different mechanisms: an evenly distributed increase over all residues; a local increase over key residues. The latter mechanism, that has been shown to be a major contribute to the enhanced thermostability of proteins from T. maritima [9], seems to involve mainly residues already implied in the formation of hydrophobic contacts. This suggests that a better compactness may originate from an even better connectivity in those protein regions that already have a tendency to compactness and not by simply "tightening the loops" [9]. The results obtained in this work on the difference of apolar contact area (ΔACA) agree with this hypothesis: a significant increase of ACA was measured in both datasets only when the analysis was limited to the SCRs of the hyperthermophilic structures. The SCRs were presumably subject to similar constraints during the divergent evolution of a family of proteins from a common ancestor, and therefore they possibly contain most of the determinants necessary to maintain the fold. Considering the role played by hydrophobic contacts in this sense, it is not surprising that the residues composing the SCRs and engaging hydrophobic contacts were mostly involved in the structural modifications necessary to achieve and maintain a proper fold at high temperatures. Moreover, the finding that the measure of the difference of ACA resulted highly significant only when limited to the SCRs, could explain some apparently not significant results previously obtained by measuring accessible surface area [8] or cavity size [6]. The statistically significant increase of ~0.75 Å2/residue of apolar contact area was observed only in the SCRs of hyperthermophilic proteins. Therefore, it can be argued that proteins from thermophilic organisms usually adopt different strategies to enhance thermostability. Indeed, it has been demonstrated that moderately and extremely thermostable proteins rely on different mechanisms to achieve greater stability [8,20]. Ion-pairs interactions represent presumably a predominant force in thermophilic proteins, as well as in many hyperthermophilic proteins [8,21]. On the other hand, comparisons of mesophilic and hyperthermophilic protein structures indicate that the hydrophobic effect has a contribution to stability only at high temperatures, while only moderately thermophilic proteins show an increase in the polarity of their exposed surface [20]. Two factors could be responsible for this difference: the temperature dependence of the thermodynamic forces involved in protein stabilization, and/or the phylogenetic origin of the extremely thermophilic organisms, that belong to the domain Archaea, and are therefore distinct from moderately thermophilic organisms, which are mostly Bacteria. In any case, the obtained results strongly suggest that packing of hyperthermophilic proteins, in comparison with their mesophilic homologs, has improved significantly, and it is reasonable to deduce that this increased amount of apolar contact area contributes to the stabilization of the native state of the protein. Our analysis revealed that α-helices were mainly involved in the increased amount of ACA. Surprisingly, no statistically significant trends have been observed in the comparison of the ACA in the β-strands of the SCRs. We cannot provide a clear explanation of this different behaviour between secondary structures. An intriguing possibility is that β-strands are, generally, already almost optimally packed, even in mesophilic proteins, resulting in a small margin of improvement. However, it is also possible that this observation is due to 'sample bias' e.g., the peculiarities of the available protein structures. Structural stabilization of α-helices in protein cores may therefore represent a component of great importance for the enhanced termostability of hyperthermophilic proteins. A number of studies in the past has stressed the importance of the enhanced stability of α-helices as a general feature of many hyperthermophilic proteins. In order to investigate the role of α-helices in protein thermostability, Petukhov et al. [22] compared energy characteristics of α-helices from four families of hyperthermophilic and mesophilic proteins, using statistical mechanical theory for describing helix/coil transitions. They found that the magnitude of the observed decrease in intrinsic free energy on α-helix formation of the thermostable proteins was sufficient to explain the experimentally determined increase of their thermostability. Furthermore, protein engineering studies showed that a well-packed α-helix structure is related to large increase in thermostability [23,24]. It is well known that the flexibility of α-helices is often required to assure protein function, such as conformational transitions in substrate binding or protein-protein interactions [25]. However, an excessive flexibility of this secondary structure element, at high temperatures, could result in an insufficient stability to maintain its native conformation, causing the entire protein to unfold. According to thermodynamic studies on model peptides in aqueous environments, two main factors appear to play a key role in the structural stability of the α-helices: the presence of amino acids with intrinsic helical propensity, and side chain-side chain interactions [26,27]. Therefore, we further investigated the nature of the increased stabilization of α-helices composing the SCRs of hyperthermostable proteins, determining the differences in amino acid composition of the residues involved in CHCs. The results of this analysis strongly suggest that isoleucine and, to a lesser extent valine, mostly to the detriment of leucine, are involved in the formation of more hydrophobic contacts in hyperthermophilic proteins, compared to their mesophilic counterparts. Likewise, the importance of isoleucine in the formation of CHCs of hyperthermophilic proteins was confirmed by the analysis of the preferred amino acid interactions in CHCs, where almost all types of interactions scoring at > 3.0 standard deviations involved the amino acid isoleucine, and by the favoured amino acid substitutions between the hyperthermophilic and mesophilic proteins in CHCs. A large amount of theoretical and experimental studies demonstrates the importance of isoleucine in the stabilization of protein structures from thermophilic organisms. Malakauskas and Mayo [24] reported the computer-aided engineering of a seven-fold mutant of the β1 domain of the Streptococcal protein G, exhibiting a melting temperature above 100°C and an enhancement in thermodynamic stability of 4.3 kcal mol-1 at 50°C over the wild-type protein. Of seven mutations, five were of type XXX→ Ile, and they improved side-chain packing in the interior of the protein. An increased content of isoleucine in thermophilic and hyperthermophilic proteins, to the detriment of leucine, was also noted by Haney et al. [28] and Kumar et al. [6]. More recently, a structural genomics based study carried out by Chakravarty and Varadarajan [29] reported that leucine is preferentially substituted by the β-branched residues valine and isoleucine, at buried sites. Several studies have demonstrated in the past that leucine has a slightly higher α-helix propensity than isoleucine and, generally, β-branched residues [27,30]. This assumption, which is apparently in contrast with the results obtained by this work, derives from substitution experiments in short polyalanine α-helices-forming peptides in water [31]. This process is mainly associated with the loss of conformational entropy of residues during the folding of α-helices in an aqueous environment: freezing side chain with fewer internal rotational degrees in the α-helix conformation would be entropically less expensive. However, it must be noted that these experiments, and many derived propensity scales, do not take into account solvent entropy effects. As discussed by Creamer and Rose [30], neglect of solvent entropy appears justified for a peptide side chain because no significant differences in solvation energy are expected in the side chain of a solitary polyalanyl helix during a helix-coil transition. In either case, the side chain is highly solvent-exposed. The same situation would not be appropriate for a protein helix that, upon association with the remainder of the molecule, engages a solvent-shielded interaction surface. In this study, only the α-helices composing the SCRs and therefore mostly found in the protein core were considered for further investigation. Therefore, application of helix propensity scales might be not appropriate in this case. For example, Li and Deber [15] have shown that α-helices propensity scales are not appropriate for non aqueous environments and that β-branched amino acids, as valine and isoleucine, rank among the best helix promoters in an apolar environment, as a lipid bilayer. On the other side, hydrophobic contacts deriving by side chain interactions could play a role of great importance in the stabilization of the α-helices composing the SCRs of hyperthermostable proteins. At temperatures above 80°C, the hydrophobic effect, that is considered to be a dominant force in protein folding [32,33], is mainly enthalpy driven [34]. In fact, while at high temperatures the entropy contribution to the protein stability tends to zero, the loss or gain of van der Waals interactions acquires increased importance. For example, constructing 15 Barnase mutants in which hydrophobic interactions were deleted, Serrano et al. [35] found a strong correlation between the degree of Barnase destabilization and the number of methyl side chain groups that were lost (r = 0.91). These data agree with the preferred substitutions (RAla→Val = 3.20; RVal→Ile = 6.31) observed in the CHCs of our datasets.

Conclusion

In conclusion, taken together the obtained results indicate the preference, in the hydrophobic contacts, for isoleucine and valine residues as an important feature contributing to the enhanced thermostability of α-helices in hyperthermophilic proteins, possibly occurring through a decreased flexibility of these elements of secondary structure. This effect, in turn, may be due to an increased number of buried methyl groups in protein core and/or a better packing of α-helices with the rest of the structure, caused by the presence of hydrophobic β-branched side chains. Despite the advances in the design of hyperthermostable protein variants [17], a potential drawback of these approaches is still constituted by the time consumed by computer algorithms for exploring the whole sequence protein space. Other things being equal, focussing on the apolar contact area of the α-helices of the protein core through substitutions increasing the number of methyl side chain groups and/or resulting in a better packing of the secondary structure elements, will potentially give clues for the thermostabilization of the protein.

Methods

Data Collection

Hyperthermophilic and thermophilic protein structures were retrieved from Protein Data Bank (PDB)[36], by initially searching for the words "thermo", "thermophile" and "hyperthermophile". This search yielded about 300 proteins and their corresponding sources. An additional search was then performed using as query the name of such organisms, after having assessed that their optimal growth temperatures were between 50°C and 80°C for thermophiles, and above 80°C for hyperthermophiles [3]. Optimal growth temperatures for each organism were obtained from Entrez [37] and the "Prokaryotic Growth Temperature Database" [38]. As a first refinement step, the entries in which protein structures were determined by nuclear magnetic resonance (NMR) were discarded, yielding about 1563 crystal structures. As a second refinement step, all the entries were examined by means of the PISCES tool [39], and culled from the original dataset by maximum percentage of identity (90%), maximum resolution (2.0 Å), maximum R-value (0.25) and minimum chain length (50 residues) criteria. Furthermore, a second dataset was collected following less stringent criteria (maximum resolution at 3.0 Å and maximum R-value at 0.30), in order to cull a greater number of structures. This second step yielded 458 and 767 proteins for dataset A and B, respectively. Each dataset was then further reduced by eliminating proteins displaying any structural defect, such as missing side-chains or chain breaks due to missing residues, using the MAXIT tool, available at [46]. At the end of this refinement step, 93 and 144 structures comprised dataset A and B, respectively. Each structure of the two datasets was then exploited to check for the presence in PDB of a mesophilic counterpart. To this purpose, a search with the blast tool [40,41] was carried out, adopting the following criteria: 30% minimum sequence identity, that is usually accepted as a threshold value to assure a homology relationship between two proteins [42]; 90% maximum sequence identity, in order to avoid any redundancy of data; 40% maximum difference in length between the sequences, to avoid the presence of large indels between the two structures. Furthermore, the retrieved mesophilic proteins had to satisfy the same above described structural criteria to be accepted. In those cases yielding several mesophilic homologous structures available for each hyperthermophic/mesophilic protein, the one displaying the highest percent of sequence identity was collected. At the end of this search, 38 protein pairs for dataset A (14 thermophilic/mesophilic pairs and 24 hyperthermophilic/mesophilic pairs) and 59 protein pairs for dataset B (22 thermophilic/mesophilic pairs and 37 hyperthermophilic/mesophilic pairs) were collected (Table 1 and Table 2).

Computation of the Apolar Contact Area

Computation of the total apolar contact area between the residues of each structure composing dataset A and B was carried out by means of the pdb_np_cont tool [43], which computes pairwise atom contact areas between non-polar atoms from structural protein data in a standard PDB coordinate file. Briefly, this method is based on the classification of points located on a sphere of interaction radius, surrounding each non-polar atom. The interaction radius is the van der Waals radius of each atom type, plus the radius of a water molecule. The output of this program was utilized to calculate the pairwise residue contact areas for every possible pair of residues belonging to the structures analysed. Heteroatoms were ignored. The total apolar contact area was then normalized by sequence length of each protein structure. In order to assess the role played by the hydrophobic contacts in the stabilization of the protein core, at high temperatures, each pair of homologous hyperthermophilic/mesophilic and thermophilic/mesophilic structures was initially superposed by means of the CE-MC tool [44]. The resulting alignment was then utilized to derive manually refined structural alignments. Every pair of structures was visually inspected and, where necessary, modified to optimise the matching of several structural features, including observed secondary elements, functionally conserved residues and hydrophobic regions, in order to give the most accurate structural alignment. Each structural alignment obtained as described above was utilized to identify the common core and the structurally conserved regions between the pairs of proteins taken into consideration (SCRs). SCRs were defined as regions displaying a similar local conformation, with a mean positional RMSD of the equivalent α-carbon positions of the structures superposed ≤ 3.0 Å [18], lacking indels (insertions and deletions) and composed of at least three consecutive residues. For every structurally equivalent position of the pairwise structural alignment, the RMSD from the center of mass of the structurally equivalent Cα atoms was computed. To avoid the presence of SCRs with indels, positions with gaps were not considered. A window of size w = 3 positions was then scrolled through the alignment and used to define seed positions with a mean RMSD ≤ 3.0 Å. Each time a seed position was found, w was increased iteratively by one position until the mean score remained belove 3.0 Å, or until the window reached the end of the alignment. The obtained SCRs were then visually inspected to avoid the possible presence of regions with different conformations. Then, the hydrophobic contacts involving pairs of topologically equivalent residues in both of the structures analysed (Conserved Hydrophobic Contacts, CHCs) were extracted from the identified SCRs. The SCR_FIND and CHC_FIND tools [19] were utilized to this purpose. The differences observed in the amount of apolar contact area between the SCRs of the hyperthermophilic/mesophilic and thermophilic/mesophilic protein pairs were further investigated through the analysis of such differences in the regular secondary structure elements: α-helices and β-strands. Secondary structures were determined by using the program DSSP [45]. The amount of apolar contact area measured in the SCRs and secondary structure elements of each structure were finally normalized by the number of residues belonging to SCRs, α-helices and β-strands, respectively.

Amino acid Composition of the residues involved in CHCs

Differences in amino acid composition were measured by: where Dis the difference in amino acid composition for residue aa, nand nare the number of residues of type aa in hyperthermophilic/thermophilic (T) and mesophilic (M) structures and nis the total number of residues in hyperthermophilic, thermophilic (T) and mesophilic (M) structures. The Dvalues measured for each pair of the structures analysed were then used to calculate the difference in amino acid composition Cover the k pairs composing dataset A and dataset B: The mean and standard deviation for the Celements were determined; the significance Rof the difference in amino acid composition for residue aa was then calculated by dividing the difference between Cand the overall mean by the standard deviation σ: Rvalues ≥ 3.0 standard deviations (corresponding to a probability P ≤ 0.01 that the observed difference was obtained by chance) from the mean value were considered statistically significant.

Preferred amino acid pairs in CHCs

Preferred amino acid pairs forming hydrophobic contacts were identified by computing the number of times a particular pair of residues comprised in SCRs makes a hydrophobic contact. The obtained counts were then normalized by the number of pairs of interacting residues present in the SCRs of the structure taken into consideration. An interaction matrix reporting the differences in the number of apolar contacts for each possible pair of residues, between hyperthermophilic/mesophilic and thermophilic/mesophilic structures, was derived: where k represents the number of elements of dataset A or B, Cis the element of the matrix reporting the differences in the number of apolar contacts for the pair XY of interacting residues, Cand Care the normalized counts for the hyperthermophilic/thermophilic and the mesophilic proteins, respectively. The mean and standard deviation for the non-zero elements of the overall interaction matrix were determined; the significance Rof the interaction XY was then calculated by dividing the difference between Cand the overall matrix mean by the standard deviation σ: Rvalues ≥ 3.0 standard deviations (corresponding to a probability P ≤ 0.01 that the observed difference was obtained by chance) from the mean value were considered statistically significant.

Preferred amino acid substitutions in CHCs

Amino acid substitutions of residues involved in the formation of conserved hydrophobic contacts between hyperthermophilic and mesophilic proteins were determined by analysing the alignment of the SCRs of each pair. For each residue X, belonging to a mesophilic protein and involved in making CHCs, aaX→Y was defined as the number of times X is substituted by the residue Y of the hyperthermophilic sequence. Likewise, aaY→X is defined. Therefore, a substitution matrix can be obtained by computing the difference between aaX→Y and aaY→X over the whole dataset of protein pairs k, according to: where CS is the element of the substitution matrix. The mean and standard deviation for the non-zero elements of the overall exchange matrix were determined; the significance Rof the exchange X → Y was then calculated by dividing the difference between C, and the overall matrix mean by the standard deviation σ: Rvalues ≥ 3.0 standard deviations (corresponding to a probability P ≤ 0.01 that the observed difference was obtained by chance) from the mean value were considered statistically significant.

Statistical significance

The statistical significance of the observed differences of ACA between hyper/thermophilic proteins and their mesophilic counterparts was assessed with a paired t-test (applied to every pair of structures composing dataset A and dataset B, respectively), to judge the rejection of the null hypothesis (t > 2.0; P(t) < 5%). The null hypothesis to be rejected with the paired t-test analysis is that there is not a significant difference between the measured values of ACA in the hyper/thermophilic and mesophilic proteins. In order to ensure that the measured P(t) was not biased by the extreme values of the distributions, the t-test validation analyses were repeated, removing the highest and lowest values from the datasets. The Shapiro-Wilk normality test was applied to judge the distribution of the obtained values for the two datasets. The null hypothesis of this test is that the analysed samples of data are taken from a Gaussian distribution; therefore, the returned P(t) of this test represents a criteria of acceptance or rejection of the null hypothesis. A P(t) < 0.05 was considered statistically significant to reject the supposition of normality.

Authors' contributions

AP conceived the study, interpreted the data and wrote the final manuscript. AP and RS both contributed source code. RS collected the structures, the datasets and implemented most of the various computational analyses. SP supervised the study and helped draft the manuscript. FB coordinated the study and helped to draft the manuscript. All authors read and approved the final manuscript.

Additional file 1

Paired T-Test analysis, datasets and distribution of data. This file includes the datasets A and B, described in this paper, and the statistical analysis of the distribution of ACA. Click here for file
  44 in total

1.  A new algorithm for the alignment of multiple protein structures using Monte Carlo optimization.

Authors:  C Guda; E D Scheeff; P E Bourne; I N Shindyalov
Journal:  Pac Symp Biocomput       Date:  2001

2.  Factors enhancing protein thermostability.

Authors:  S Kumar; C J Tsai; R Nussinov
Journal:  Protein Eng       Date:  2000-03

Review 3.  Life in extreme environments.

Authors:  L J Rothschild; R L Mancinelli
Journal:  Nature       Date:  2001-02-22       Impact factor: 49.962

Review 4.  Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability.

Authors:  C Vieille; G J Zeikus
Journal:  Microbiol Mol Biol Rev       Date:  2001-03       Impact factor: 11.056

5.  Increasing the thermostability of staphylococcal nuclease: implications for the origin of protein thermostability.

Authors:  J Chen; Z Lu; J Sakon; W E Stites
Journal:  J Mol Biol       Date:  2000-10-20       Impact factor: 5.469

6.  Designer proteins in biotechnology. International Titisee Conference on protein design at the crossroads of biotechnology, chemistry and evolution.

Authors:  Hauke Lilie
Journal:  EMBO Rep       Date:  2003-03-14       Impact factor: 8.807

7.  PGTdb: a database providing growth temperatures of prokaryotes.

Authors:  Shir-Ly Huang; Li-Cheng Wu; Han-Kuen Liang; Kuan-Ting Pan; Jorng-Tzong Horng; Ming-Tat Ko
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

8.  Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes.

Authors:  Emma E Hill; Veronica Morea; Cyrus Chothia
Journal:  J Mol Biol       Date:  2002-09-06       Impact factor: 5.469

9.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

10.  Sequence and structural analysis of cellular retinoic acid-binding proteins reveals a network of conserved hydrophobic interactions.

Authors:  Kannan Gunasekaran; Arnold T Hagler; Lila M Gierasch
Journal:  Proteins       Date:  2004-02-01
View more
  6 in total

1.  Comparison of the structural basis for thermal stability between archaeal and bacterial proteins.

Authors:  Yanrui Ding; Yujie Cai; Yonggang Han; Bingqiang Zhao
Journal:  Extremophiles       Date:  2011-10-21       Impact factor: 2.395

2.  A basis for reduced chemical library inhibition of firefly luciferase obtained from directed evolution.

Authors:  Douglas S Auld; Ya-Qin Zhang; Noel T Southall; Ganesha Rai; Marc Landsman; Jennifer MacLure; Daniel Langevin; Craig J Thomas; Christopher P Austin; James Inglese
Journal:  J Med Chem       Date:  2009-03-12       Impact factor: 7.446

3.  Shape and evolution of thermostable protein structure.

Authors:  Ryan G Coleman; Kim A Sharp
Journal:  Proteins       Date:  2010-02-01

4.  Structural adaptation of extreme halophilic proteins through decrease of conserved hydrophobic contact surface.

Authors:  Alessandro Siglioccolo; Alessandro Paiardini; Maria Piscitelli; Stefano Pascarella
Journal:  BMC Struct Biol       Date:  2011-12-22

5.  Insights on protein thermal stability: a graph representation of molecular interactions.

Authors:  Mattia Miotto; Pier Paolo Olimpieri; Lorenzo Di Rienzo; Francesco Ambrosetti; Pietro Corsi; Rosalba Lepore; Gian Gaetano Tartaglia; Edoardo Milanetti
Journal:  Bioinformatics       Date:  2019-08-01       Impact factor: 6.937

6.  ProtDataTherm: A database for thermostability analysis and engineering of proteins.

Authors:  Hassan Pezeshgi Modarres; Mohammad R Mofrad; Amir Sanati-Nezhad
Journal:  PLoS One       Date:  2018-01-29       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.