| Literature DB >> 20109199 |
Yunqi Li1, C Russell Middaugh, Jianwen Fang.
Abstract
BACKGROUND: The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20109199 PMCID: PMC3098108 DOI: 10.1186/1471-2105-11-62
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The list of organisms whose proteins were used to generate the non-redundant hyperthermophilic (upper) and mesophilic (bottom) orthologous pairs (adopted from [48]).
| Organism | Number of proteins | OGT (°C) |
|---|---|---|
| Aquifex aeolicus VF5 | 1560 | 96 |
| Methanocaldococcus jannaschii DSM | 1786 | 85 |
| Thermotoga maritima MSB8 | 1858 | 80 |
| Pyrococcus abyssi GE5 | 1898 | 103 |
| Corynebacterium glutamicus ATCC | 2993 | 30 - 40 |
| Escherichia coli K12 | 4237 | 37 |
| Mycobacterium tuberculosis H37Rv | 3991 | 37 |
| Bacillus halodurans C-125 | 4066 | 25 - 35 |
| Streptococcus pneumoniae TIGR4 | 2094 | 30 - 35 |
Two wild-type ADKs and a series of chimeric enzymes generated from these two enzymes[38].
| Seq_ID | Comm_meso | Comm_hyp | Tm (°C) | Ranking |
|---|---|---|---|---|
| MJA | 0 | 62 | 103 | 8 |
| V36J | 9 | 53 | 98 | 7 |
| J160V | 9 | 53 | 96 | 6 |
| JVJ | 37 | 25 | 89 | 4 |
| VJV | 20 | 42 | 82.5 | 5 |
| V160J | 51 | 11 | 74 | 2 |
| J36V | 53 | 9 | 73 | 3 |
| MVO | 62 | 0 | 69 | 1 |
Comm_meso and comm_hyp are the counts of the identical residues in the MP sequence MVO and the HP sequence MJA, respectively. The last column is the relative stability ranked by our scoring function (from least to most stable).
The ranking of relative thermostability of wild type proteins and their mutated sequences using the scoring function.
| Protein name | length | Tm(°C) | Ranking |
|---|---|---|---|
| Dmeh (GI: 640374) | 51 | 49 | 1 |
| Dmeh_UMC | 51 | 99 | 2 |
| Dmeh_UVF | 51 | 99 | 3 |
| BsCSP (GI: 16077975) | 67 | 53.8 | 1 |
| BsCSP_mt1 | 67 | 69.7 | 2 |
| BsCSP_mt2 | 67 | 83.7 | 3 |
| PhyA (GI: 464382) | 467 | 55 | 1 |
| PhyA_mt18 | 467 | 62 | 2 |
| PhyA_mt24 | 467 | 62+ | 3 |
| PTDH (GI: 194552172) | 336 | 39 | 1 |
| PTDH_12x | 336 | 59.7 | 2 |
| PTDH_opt14 | 336 | 64.4 | 3 |
| CbADH (GI: 187935035) | 351 | 64.5 | 1 |
| cbADH_Q100P | 351 | 76 | 2 |
| β-GUS (GI: 868020) | 602 | 45 | 1 |
| β-GUS_TR3337 | 602 | 65 | 2 |
| FAOX (GI: 20302586) | 372 | 37 | 1 |
| FAOX_TE | 372 | 45 | 2 |
| Shble (GI: 3891709) | 121 | 67.4 | 1 |
| Shble_HTS | 121 | 85.1 | 2 |
| EcHPH (GI: 12539) | 341 | 51 | 1 |
| EcHPH_hph5 | 341 | 67 | 2 |
| PDAO (GI: 129305) | 347 | 45 | 2 |
| PDAO_F42C | 347 | 55 | 1 |
The data were originally collected by Montanucci et al. [20]. The sequence of cbADH was retrieved from the original literature by Goihberg et al.[49].
The list of the 83 features used in the study.
| Protein feature | Number of Features | Source |
|---|---|---|
| Sequence length (L) | 1 | In-house script |
| Count and composition of amino acids | 40 | In-house script |
| Number and percentage of positive, negative and all charged residues, as well as the net charges | 8 | In-house script |
| Number and percentage of small (T and D), tiny (G, A, S and P), aromatic (F, H, Y, W), aliphatic, hydrophobic and polar residues | 12 | In-house script |
| Number and percentage of residues which can form hydrogen bond in sidechain | 2 | In-house script |
| Number of sulfide atoms | 1 | In-house script |
| Average solubility of amino acids in aqueous solutions under room temperature | 1 | ** |
| The average of the maximum solvent accessible surface area (ASA) of each amino acid | 1 | Eisenhaber[ |
| Predicted isoelectric point (pI) of the protein, the average pI on all residues (pIa) | 2 | ProtParam[ |
| Instability index and instability class | 2 | |
| Aliphatic index | 1 | |
| Gravy hydropathy index | 1 | |
| Composition of the predicted secondary structure residues | 3 | Psipred[ |
| Predicted percentages of buried/exposed residues | 2 | Accpro[ |
| The overall length and percentage of all coils, rem465, and hotloop | 6 | disEMBL[ |
**Obtained from The Merck Index, Merck & Co., Inc., Whitehouse Station, NJ 12 (1996).
Comparison of the composition of the amino acids in hyperthermophilic and mesophilic proteins and their significance p-values of t-test and paired t-test.
| Amino acid | Composition in HP | Composition in MP | p-value ( | p-value (paired |
|---|---|---|---|---|
| 0.044 ± 0.016 | 0.050 ± 0.015 | 9.60×10-9 | 5.61×10-12 | |
| 0.019 ± 0.011 | 0.037 ± 0.015 | 6.81×10-85 | 1.24×10-94 | |
| 0.035 ± 0.014 | 0.035 ± 0.015 | 0.88 | 0.85 | |
| 0.042 ± 0.014 | 0.055 ± 0.016 | 1.02×10-40 | 2.44×10-56 | |
| 0.009 ± 0.011 | 0.010 ± 0.011 | 0.36 | 0.08 | |
| 0.075 ± 0.019 | 0.079 ± 0.020 | 9.14×10-4 | 9.68×10-10 | |
| 0.066 ± 0.023 | 0.080 ± 0.028 | 3.41×10-48 | 1.08×10-87 | |
| 0.017 ± 0.010 | 0.024 ± 0.013 | 3.00×10-20 | 4.64×10-40 | |
| 0.024 ± 0.011 | 0.026 ± 0.010 | 0.02 | 3.00×10-3 | |
| 0.033 ± 0.014 | 0.027 ± 0.013 | 6.10×10-15 | 4.47×10-31 | |
| 0.038 ± 0.015 | 0.033 ± 0.014 | 3.00×10-8 | 1.29×10-14 | |
| 0.086 ± 0.021 | 0.082 ± 0.020 | 2.32×10-4 | 5.36×10-7 | |
| 0.089 ± 0.021 | 0.089 ± 0.022 | 0.73 | 0.59 | |
| 0.041 ± 0.015 | 0.040 ± 0.014 | 0.39 | 0.16 | |
| 0.077 ± 0.020 | 0.066 ± 0.019 | 6.46×10-20 | 3.15×10-29 | |
| 0.008 ± 0.007 | 0.007 ± 0.007 | 0.05 | 3.00×10-3 | |
| 0.050 ± 0.015 | 0.057 ± 0.016 | 1.92×10-12 | 5.64×10-22 | |
| 0.097 ± 0.023 | 0.079 ± 0.022 | 6.73×10-38 | 5.63×10-75 | |
| 0.091 ± 0.023 | 0.060 ± 0.023 | 1.21×10-87 | 5.25×10-117 | |
| 0.056 ± 0.023 | 0.055 ± 0.023 | 0.57 | 0.36 | |
Figure 1The pariwise comparisons of amino acid compositions in the three different sets of proteins. The solid lines show the best-fit of linear regression lines with regression coefficient and slope displayed and the dash lines show the orthogonal line. Bac_M, arc_H and bac_H are proteins from mesophilic bacteria, hyperthermophilic archaea and bacteria, respectively.
Figure 2Amino acid substitutions between mesophilic and hyperthermophilic proteins. The top number in each cell is the observed substitution instances and the bottom one (in italics) is the ratio of the number of the substitution cases to the opposite substitution. Significant biased substitutions (p-value < 10-10, two-sided Fisher's exact test) are highlighted in bold. Red cells are significant HP favored substitutions while blues are MP favored.
Figure 3The 25 most important features ranked by the Gini importance of the random forest algorithm. The prefixes c_ and x_ of each feature indicate that the feature is an absolute count or normalized value, respectively.
Figure 4The cumulative curves of the 10 most important features against the relative difference between hyperthermophilic and mesophilic sequences.
The final weights of the ten features used in the scoring function.
| Feature | ASA | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0.75 | 0.20 | 0.80 | 0.20 | 0.90 | -0.20 | -0.20 | -0.30 | -0.10 | -0.20 |