| Literature DB >> 18586713 |
Ludovica Montanucci1, Piero Fariselli, Pier Luigi Martelli, Rita Casadio.
Abstract
MOTIVATION: A basic question in protein science is to which extent mutations affect protein thermostability. This knowledge would be particularly relevant for engineering thermostable enzymes. In several experimental approaches, this issue has been serendipitously addressed. It would be therefore convenient providing a computational method that predicts when a given protein mutant is more thermostable than its corresponding wild-type.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18586713 PMCID: PMC2718644 DOI: 10.1093/bioinformatics/btn166
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Experimental dataset description
| Protein name | Length | Temp. (○C) | Mutant name | Temp. (○C) | Mutated residues |
|---|---|---|---|---|---|
| Shble | 124 | HTS | G18E,D32V,L63Q,G98V | ||
| UVF | 39 mutations | ||||
| Dmeh | 54 | ||||
| UMC | 40 mutations | ||||
| β-GUS | 603 | 45 | TR3337 | 65 | Q493R,T509A,M532T,N550S,G559S,N566S |
| mt1 | A46K,S48R | ||||
| BsCSP | 67 | ||||
| mt2 | M1R,E3K,K65I | ||||
| EcHPH | 341 | 51 | hph5 | 67 | D20G,A118V,S225P,Q226L,T246A |
| 12x | 59.7 | V71I,E130K,Q132R,Q137R,I150F,Q215L,R275Q,L276Q,I313L,V315A,A319E,A325V | |||
| PTDH | 355 | 39 | |||
| opt14 | 64.4 | V71I,E130K,Q132K,Q137H,I150F,Q215L,R275L,L276C,I313L,V315A,A319E,A325V,A146S,F198M | |||
| CbADH | 452 | Q100P | Δ | Q100P | |
| FAOX | 372 | 37 | FAOX_TE | 45 | T60A,A188G,M244L,N257S,L261M |
| PDAO | 347 | 45 | F42C | 55 | F42C |
| mt18 | Δ | A58E,P65S,Q191R,T271R | |||
| PhyA | 467 | 55 | |||
| mt24 | Δ | A58E,P65S,Q191R,T271R,E228K,S149P,F131L |
In the first two columns the wild-type protein name and the protein length are reported. In the fourth column, the name of the mutant is reported. Columns 3 and 5 report optimal functional temperatures of the wild-type and the mutated sequence, respectively; Tm when present refers to the melting temperature; column 6 reports the mutated residues. In the case of Dmeh, the two mutants have 39 and 40 mutated residues, respectively (Shah et al., 2007). The considered proteins are: Shble: bleomycin-binding protein from the mesophilic bacterium Streptoalloteichus hindustanus (Brouns et al., 2005); Dmeh: Drosophila melanogaster engrailed homeodomain (Shah et al., 2007); β-GUS: β-glucuronidase (Xiong et al., 2007); BsCSP: cold shock proteins from Bacillus subtilis (Max et al., 2007); EcHPH: Escherichia coli hygromycin B phosphotransferase (Nakamura et al., 2005); PTDH: phosphite dehydrogenase from Pseudomonas stutzeri (Johannes et al., 2005; McLachlan et al., 2007) CbADH: Clostridium beijerinckii alcohol dehydrogenase (Goihberg et al., 2007); FAOX: fructosyl-amino acid oxidase from Corynebacterium sp. (Sakaue and Kajiyama, 2003); pDAO: porcine kidney D-amino acid oxidase (Bakke et al., 2006); PhyA: 3-phytase A from Aspergillus niger (Zhang and Lei, 2007).
Performances of the two SVMs and the Combined SVM method
| Method | Accuracy (%) | Correlation | Sensitivity (%) | Specificity(%) | ||
|---|---|---|---|---|---|---|
| + | − | + | − | |||
| L20 | 86 | 0.73 | 87 | 86 | 86 | 87 |
| L400 | 85 | 0.70 | 85 | 85 | 85 | 85 |
| Combined | 88 | 0.75 | 88 | 88 | 88 | 88 |
The performances are evaluated using the leave-one-cluster-out procedure on the training dataset. The symbol + and − indicate the direction of increased an decreased thermostability, respectively.
Fig. 1.ROC curve of the three predictors. Solid gray line: L20 SVM predictor; dotted black line: L400 SVM predictor; solid black line: combined predictor.
Fig. 2.The accuracy of the combined SVM method is plotted with respect to the sequence identity, grouped into bins of identity, in the pair. Bars indicate the frequency of pairs in the training set with a given identity value.
Fig. 3.The accuracy of the combined SVM method is plotted with respect to the protein length in the pair. For each pair the maximum protein length was chosen. Bars indicate the frequency of pairs in the training set with a given protein length.
Fig. 4.The accuracy of the combined SVM method is plotted with respect to the reliability index. Bars represent the fraction of the database with a given value of reliability index.
SVM performances for the experimental dataset
| Protein | Mutant | Δ | L20 | L400 | Combined | |
|---|---|---|---|---|---|---|
| Dmeh | UVF | 50 | 39 | Yes | Yes | Yes |
| Dmeh | UMC | 50 | 40 | Yes | Yes | Yes |
| BsCSP | mt2 | 29.9 | 3 | Yes | Yes | Yes |
| PTDH | opt14 | 25.4 | 14 | Yes | Yes | Yes |
| PTDH | 12x | 20.7 | 12 | Yes | Yes | Yes |
| β-GUS | TR3337 | 20 | 6 | Yes | Yes | |
| Shble | HTS | 17.7 | 4 | Yes | Yes | Yes |
| EcHPH | hph5 | 16 | 5 | Yes | Yes | Yes |
| BsCSP | mt1 | 15.9 | 2 | Yes | Yes | Yes |
| CbADH | Q100P | 11.5 | 1 | Yes | Yes | |
| pDAO | F42C | 10 | 1 | Yes | Yes | |
| FAOX | TE | 8 | 5 | Yes | ||
| PhyA | mt24 | >7 | 4 | Yes | Yes | Yes |
| PhyA | mt18 | 7 | 7 | Yes | ||
| Accuracy for all the mutations (%) | 12/14 (86) | 11/14 (79) | 12/14 (86) | |||
| Accuracy for the subset with Δ | 9/11 (82) | 10/11 (91) | 11/11 (100) | |||
Protein is the short name of the wild-type protein (refer to Table 1 for details); Mutant is the name of the mutated sequence; ΔT is the experimentally measured increase in the optimal (or melting) temperature; N.muts is the number of mutations. The correct (yes) or incorrect (no) predictions of the three methods are reported in the last three columns.
Fig. 5.The values of the components of the hyperplane vector of SVM L20 are plotted as bars. The average compositional differences obtained by averaging all the training examples are plotted as dots connected by a line.
Performances obtained with random splitting of the cross-validation sets
| Method | Accuracy (%) | Correlation | Sensitivity (%) | Specificity (%) | ||
|---|---|---|---|---|---|---|
| + | − | + | − | |||
| L20 | 93 | 0.86 | 93 | 93 | 93 | 93 |
| L400 | 97 | 0.94 | 97 | 97 | 97 | 97 |
L20 is the SVM trained with the residue composition. L400 is trained with the difference in dipeptide composition of the sequences. Symbols + and − indicate increased and decreased thermostability classes, respectively.