| Literature DB >> 20487512 |
Todd J Taylor1, Iosif I Vaisman.
Abstract
BACKGROUND: There is a considerable literature on the source of the thermostability of proteins from thermophilic organisms. Understanding the mechanisms for this thermostability would provide insights into proteins generally and permit the design of synthetic hyperstable biocatalysts.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20487512 PMCID: PMC2873828 DOI: 10.1186/1472-6807-10-S1-S5
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1The CE structural alignment of 1aisA, a TATA-box-binding protein from the extreme thermophile Pyrococcus woesi , in red and 1ytbA, a TATA-box-binding protein from the mesophile Saccharomyces cerevisiae, in blue. The rmsd is 2Å and sequence identity is 40%. Clearly these are very similar structures.
Figure 2The all atom Van Der Waals spacefill representation (left) of phosphoglycerate kinase (PDB code 16pk), the Delaunay tessellation of 16pk with no simplex edge length cutoff (middle), and a view of the tessellation with a 10 Å cutoff (right). Notice that the surface of the tessellation with a cutoff corresponds more closely to that of the real molecule.
t-tests of sequence percent compositions of mesophilc, thermophilic, and hyperthermophilic proteins.
| mean_comp_meso: | 8.26 | 1.92 | 5.80 | 6.53 | 3.85 | 7.29 | 2.29 | 5.48 | 6.22 | 8.86 |
| sd_comp_meso: | 4.31 | 3.05 | 2.22 | 3.02 | 1.89 | 3.08 | 1.53 | 2.52 | 3.38 | 3.52 |
| mean_comp_therm: | 10.05 | 0.78 | 5.13 | 8.32 | 3.62 | 8.30 | 2.15 | 4.80 | 4.77 | 10.26 |
| sd_comp_therm: | 3.48 | 1.23 | 2.29 | 2.87 | 1.45 | 2.12 | 1.18 | 3.04 | 2.29 | 3.87 |
| t_therm_wrt_meso: | 6.03 | -9.08 | -3.57 | 7.49 | -1.83 | 5.42 | -1.35 | -2.72 | -7.21 | 4.40 |
| mean_comp_hyper: | 7.26 | 0.78 | 5.20 | 10.05 | 4.07 | 7.17 | 1.64 | 7.73 | 8.50 | 9.10 |
| sd_comp_hyper: | 2.95 | 1.05 | 1.76 | 2.51 | 1.64 | 2.23 | 1.05 | 2.44 | 2.60 | 2.39 |
| t_hyper_wrt_meso: | -4.09 | -10.19 | -4.17 | 17.47 | 1.72 | -0.65 | -7.37 | 11.72 | 10.74 | 1.21 |
| t_therm_wrt_hyp: | 7.99 | 0.00 | -0.32 | -5.93 | -2.71 | 4.83 | 4.22 | -9.80 | -14.19 | 3.30 |
| mean_comp_meso: | 2.17 | 4.51 | 4.52 | 4.04 | 4.79 | 6.07 | 5.62 | 6.90 | 1.43 | 3.44 |
| sd_comp_meso: | 1.58 | 2.30 | 2.91 | 2.07 | 2.62 | 2.61 | 2.51 | 2.62 | 1.25 | 1.95 |
| mean_comp_therm: | 1.87 | 3.19 | 5.43 | 2.69 | 6.82 | 4.07 | 4.74 | 8.42 | 1.31 | 3.30 |
| sd_comp_therm: | 1.11 | 2.29 | 2.05 | 1.59 | 2.81 | 2.13 | 2.34 | 2.36 | 1.16 | 1.71 |
| t_therm_wrt_meso: | -3.14 | -6.94 | 5.10 | -9.87 | 8.77 | -11.07 | -4.52 | 7.65 | -1.20 | -1.00 |
| mean_comp_hyper: | 2.11 | 3.49 | 4.02 | 1.80 | 5.56 | 4.51 | 4.00 | 8.44 | 1.02 | 3.56 |
| sd_comp_hyper: | 1.14 | 1.62 | 1.57 | 1.16 | 2.27 | 1.72 | 1.48 | 2.28 | 1.07 | 1.71 |
| t_hyper_wrt_meso: | -0.59 | -7.65 | -3.60 | -22.12 | 4.23 | -10.82 | -12.71 | 8.45 | -4.78 | 0.87 |
| t_therm_wrt_hyp: | -1.98 | -1.39 | 7.11 | 5.88 | 4.55 | -2.10 | 3.46 | -0.08 | 2.41 | -1.41 |
t-tests of sequence percent compositions of mesophilc, thermophilic, and hyperthermophilic proteins Amino acids significantly over-represented in hyperthermophiles with respect to mesophiles are: Glu, Ile, Lys, Arg, and Val. Amino acids significantly under-represented in hyperthermophiles with respect to mesophiles are: Ala, Cys, Asp, His, Asn, Pro, Gln, Ser, Thr, and Trp. Amino acids significantly over-represented in thermophiles with respect to mesophiles are: Ala, Glu, Gly, Leu, Pro, Arg, and Val. Amino acids significantly under-represented in thermophiles with respect to mesophiles are: Cys, Asp, Lys, Met, Asn, Gln, Ser, and Thr. Amino acids significantly over-represented in hyperthermophiles with respect to thermophiles are: Glu, Ile, and Lys. Amino acids significantly under-represented in hyperthermophiles with respect to thermophiles are: Ala, Gly, His, Leu, Pro, Gln, Arg, and Thr. These statistics were tabulated from nonredundant sets of 184 hyperthermophilic structures (45419 residues), 162 thermophilic structures (41470 residues), and 1262 mesophilic structures (269799 residues).
Statistical significance of fractions in Table 1.
| Fraction | P for thermophile pairs | P for hyperthermophile pairs |
|---|---|---|
| 0.50 | 1.000 | 1.000 |
| 0.55 | 0.297 | 0.319 |
| 0.60 | 0.0328 | 0.0369 |
| 0.65 | 6.85 x 10-4 | 0.00143 |
| 0.70 | 6.97 x 10-6 | 1.65 x 10-5 |
| 0.75 | 1.94 x 10-8 | 4.99 x 10-8 |
| 0.80 | 3.12 x 10-12 | 8.34 x 10-12 |
| 0.85 | 2.66 x 10-16 | 6.78 x 10-16 |
| 0.90 | <2.20 x 10-16 | <2.20 x 10-16 |
| 0.95 | <2.20 x 10-16 | <2.20 x 10-16 |
Statistical significance of the fractions in Table 1 given by a two-sided binomial test. Fractions below ~0.6-0.65 should not be considered signficant.
Discriminatory power of structure and sequence derived quantities
| Numerical index | Thermophile (127 pairs) | Hyperthermophile (122 pairs) |
|---|---|---|
| coordination number (no cutoff) | 0.559 | 0.689 |
| clustering coefficient (no cutoff) | 0.551 | 0.672 |
| characteristic path (no cutoff) | 0.520 | 0.631 |
| total count 400 over-rep quads/residue | 0.850 | 0.943 |
| 4-body potential/residue (20Å cutoff) | 0.858 | 0.844 |
| 4-body potential/residue (no cutoff) | 0.843 | 0.852 |
| 4-body potential/res (hyper only, no cutoff) | ----- | 0.820 |
| 4-body potential/res (meso only, no cutoff) | 0.732 | 0.803 |
| 4-body potential/res (thermo only,no cutoff) | 0.866 | ----- |
| ProsaII combined score | 0.554 | 0.693 |
| median circumsphere radius(no cutoff) | 0.701 | 0.639 |
| mean tetrahedrality (no cutoff) | 0.598 | 0.574 |
| number simplices/residue (10Å cutoff) | 0.528 | 0.557 |
| number simplices/residue (no cutoff) | 0.567 | 0.697 |
| Naccess solvent accessible area | 0.567 | 0.598 |
| Delaunay surface area (no cutoff) | 0.606 | 0.669 |
| van der Waals area | 0.559 | 0.549 |
| Delaunay volume (no cutoff) | 0.598 | 0.701 |
| Van der Waals volume | 0.528 | 0.598 |
| Delaunay area/volume (10Å cutoff) | 0.583 | 0.549 |
| Delaunay area/volume (no cutoff) | 0.669 | 0.803 |
| van der Waals area/volume | 0.512 | 0.557 |
| packing density | 0.543 | 0.549 |
| van der Waals volume/Delaunay volume | 0.685 | 0.779 |
| mean B-factor | 0.661 | 0.533 |
| secondary structure content (H+E 3 state DSSP) | 0.614 | 0.689 |
| number of residues | 0.528 | 0.672 |
| total Kyte-Doolittle hydrophobicity | 0.575 | 0.549 |
| sd Kyte-Doolittle hydrophobicity | 0.677 | 0.836 |
| CvP bias | 0.803 | 0.918 |
| (E+K)/(Q+H) | 0.591 | 0.861 |
| IVYWREL | 0.827 | 0.926 |
A table showing the discriminatory power of sequence and structure based indices-the fraction of thermophile/mesophile pairs for which the quantity was systematically higher or lower by any amount. The contact network quantities are described in the introduction. The four body threading contact potentials are described in [1]. The cutoff indicates that simplices with at least one edge longer than the cutoff were omitted when frequencies are tallied during the calculation of the potential. "Hyper only" indicates that the potential was trained only on chains from hyperthermophilic organisms. The Delaunay simplex geometry indices are discussed in the introduction. The volume and surface area criteria are fairly self-explanatory except, perhaps, for packing density that is defined here as the ratio of the van der Waals volume of the protein divided by the all atom Voronoi volume. The sequence composition based indices CvP, (E+K)/(Q+H), and IVYWREL are described in the introduction.
Highly over-represented and highly under-represented residue quadruplets at the vertices of tessellated thermostable proteins and the factors by which they differ with respect to mesophiles.
| hyperthermophiles | thermophiles | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| EEEE | 7.473 | PQRT | 0.303 | EEER | 6.008 | AIQS | 0.380 | |||
| EEEK | 7.332 | GGNQ | 0.302 | ELWW | 5.520 | DLQS | 0.378 | |||
| EEER | 7.048 | FLLQ | 0.301 | RRRV | 5.222 | KLST | 0.377 | |||
| MRRR | 6.490 | AQVY | 0.301 | AEER | 5.106 | ILNQ | 0.373 | |||
| EEKR | 5.654 | AAAN | 0.299 | EEPR | 5.087 | ENQS | 0.370 | |||
| IRRR | 5.605 | AAEQ | 0.298 | EERR | 4.801 | KKLS | 0.370 | |||
| EEEF | 5.597 | DSTT | 0.296 | AERR | 4.538 | KNQR | 0.368 | |||
| EEKK | 5.282 | ADDT | 0.296 | RRRY | 4.508 | FKQS | 0.368 | |||
| EEEV | 4.936 | AENQ | 0.296 | EEEE | 4.508 | LNQS | 0.367 | |||
| EEIK | 4.889 | NSTT | 0.295 | ERRR | 4.458 | EFNS | 0.366 | |||
| EIKK | 4.881 | ALPQ | 0.292 | IRRR | 4.186 | DILQ | 0.365 | |||
| EEIV | 4.346 | ADQR | 0.292 | EEGP | 4.177 | LMSY | 0.364 | |||
| EIKR | 4.332 | ANRT | 0.292 | EEGR | 4.113 | KLNS | 0.362 | |||
| EKRR | 4.256 | DNQV | 0.292 | ERRV | 4.104 | DDQS | 0.361 | |||
| IKRR | 4.228 | FGLQ | 0.291 | EGPR | 4.092 | LNSS | 0.361 | |||
| EEKV | 4.225 | AQST | 0.291 | EERW | 4.015 | EHIK | 0.360 | |||
| EEIR | 4.169 | ANTY | 0.290 | GPRR | 3.963 | KKST | 0.358 | |||
| EEEN | 4.107 | INSY | 0.290 | EEPP | 3.951 | GKSS | 0.356 | |||
| EEEY | 4.092 | DNQT | 0.289 | EERV | 3.904 | CGLL | 0.354 | |||
| KRRR | 4.085 | ANQV | 0.289 | AELR | 3.792 | LNQT | 0.353 | |||