| Literature DB >> 22007208 |
Abstract
Wet laboratory mutagenesis to determine enzyme activity changes is expensive and time consuming. This paper expands on standard one-shot learning by proposing an incremental transductive method (T2bRF) for the prediction of enzyme mutant activity during mutagenesis using Delaunay tessellation and 4-body statistical potentials for representation. Incremental learning is in tune with both eScience and actual experimentation, as it accounts for cumulative annotation effects of enzyme mutant activity over time. The experimental results reported, using cross-validation, show that overall the incremental transductive method proposed, using random forest as base classifier, yields better results compared to one-shot learning methods. T2bRF is shown to yield 90% on T4 and LAC (and 86% on HIV-1). This is significantly better than state-of-the-art competing methods, whose performance yield is at 80% or less using the same datasets.Entities:
Year: 2011 PMID: 22007208 PMCID: PMC3189455 DOI: 10.1155/2011/958129
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Delaunay tessellation (solid) and Voronoi Diagram (dotted) in 2D [16].
Figure 2(a) An example of a C-alpha trace of a protein with the mutated position indicated (left); the potential profile for the mutated (right, top) and wild-type protein (right, middle) and the residual profile (right, bottom, figure adapted from Masso [18]). (b) Graphical representation of an input vector (residual profile) resulting from the mutation of the T4 lysozyme protein at position 107. Note the sparse nature of the vector; there are lots of positions with zero values and very few non-zero values.
One-shot learning.
| Dataset | Algorithm | Avg. Acc. | St. Dev. | Sensitivity | Specificity |
|---|---|---|---|---|---|
| HIV-1 | AdaBoost | 68.84 | 4.77 | 0.87 | 0.63 |
| LogitBoost | 75.93 | 3.90 | 0.72 | 0.86 | |
| SVM | 68.65 | 3.50 | 0.66 | 0.75 | |
| RF | 79.28 | 1.96 | 0.88 | 0.76 | |
| DT | 77.57 | 1.21 | 0.73 | 0.81 | |
| T1 NN | 73.13 | 2.57 | 0.69 | 0.76 | |
| T1 RF | 74.64 | 3.24 | 0.65 | 0.74 | |
|
| |||||
| T4 | AdaBoost | 85.10 | 0.14 | 0.98 | 0.11 |
| LogitBoost | 85.65 | 0.31 | 0.97 | 0.20 | |
| SVM | 86.88 | 0.24 | 0.99 | 0.17 | |
| RF | 87.12 | 0.44 | 0.97 | 0.30 | |
| DT | 85.33 | 0.56 | 0.93 | 0.34 | |
| T1 NN | 75.46 | 6.99 | 0.80 | 0.46 | |
| T1 RF | 85.02 | 7.44 | 0.94 | 0.35 | |
|
| |||||
| LAC | AdaBoost | 60.53 | 0.31 | 0.99 | 0.12 |
| LogitBoost | 71.88 | 0.58 | 0.91 | 0.48 | |
| SVM | 72.15 | 0.16 | 0.88 | 0.52 | |
| RF | 80.80 | 0.37 | 0.86 | 0.75 | |
| DT | 78.71 | 0.34 | 0.83 | 0.74 | |
| T1 NN | 65.23 | 3.58 | 0.76 | 0.39 | |
| T1 RF | 77.73 | 3.64 | 0.78 | 0.77 | |
Results of the AdaBoost, LogitBoost, SVM, random forest, decision tree, and transduction T1 algorithms. Using one-shot learning, no selectivity, and 4-fold cross-validation for the HIV-1 dataset and 10-fold cross-validation for T4 and LAC datasets.
Incremental Transductive Learning.
| Dataset | Strategy | Avg. Acc. | St.Dev. | Sensitivity | Specificity |
|---|---|---|---|---|---|
| HIV-1 | T2aNN | 75.53 | 2.70 | 0.71 | 0.76 |
| T2bNN | 78.05 | 2.50 | 0.75 | 0.81 | |
| T2aRF | 83.46 | 2.62 | 0.78 | 0.82 | |
| T2bRF | 86.88 | 2.55 | 0.76 | 0.83 | |
|
| |||||
| T4 | T2aNN | 82.69 | 4.11 | 0.89 | 0.50 |
| T2bNN | 82.64 | 5.14 | 0.90 | 0.56 | |
| T2aRF | 89.71 | 3.54 | 0.93 | 0.63 | |
| T2bRF | 90.97 | 3.48 | 0.94 | 0.67 | |
|
| |||||
| LAC | T2aNN | 76.17 | 2.88 | 0.78 | 0.75 |
| T2bNN | 82.51 | 2.93 | 0.80 | 0.75 | |
| T2aRF | 86.54 | 2.71 | 0.86 | 0.80 | |
| T2bRF | 90.84 | 2.87 | 0.86 | 0.80 | |
Results of transductive learning algorithms T2a and T2b on HIV-1, T4, and LAC using incremental transductive learning, and selectivity. (the number of folds used for cross-validation is 4 for the HIV-1 dataset and 10 for T4 and LAC.)