| Literature DB >> 20929563 |
Ye Tian1, Christopher Deutsch, Bala Krishnamoorthy.
Abstract
BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention.Entities:
Year: 2010 PMID: 20929563 PMCID: PMC2958853 DOI: 10.1186/1748-7188-5-33
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Delaunay tessellation of a protein in 2D. The dots represent amino acids, and the thick solid line connecting the dots is the backbone. Dotted lines are Delaunay triangles and thin solid lines represent the Voronoi cells. The four shaded edges illustrate the four degrees of buriedness for two body contacts (see Section on Delaunay Buriedness of Contacts). These edges are named e, for b = 0, 1, 2, 3 as shown in Figure 3.
Figure 2Backbone connectivity classes for three body contacts. i, j, k, etc., are residue numbers. The connectivity indices (0, 1, 2) are ordered from most non-bonded to most bonded, or connected.
Figure 3Buriedness classes for two body contacts. White/dotted elements are buried and black/solid elements are on the surface. Note that in Figure 1, solid lines represent the backbone of the protein.
Figure 4Three body Buriedness classes. White/dotted elements are buried and black/solid elements are on the surface. Thus the solid triangle type 0 is fully on the surface - the face, three edges, and three vertices, all are on the surface.
Figure 5Triplet buriedness classes 1 and 5. Instances of triplet buriedness class 1 (left) and 5 (right), shown in red. The tube represents the backbone, and Delaunay triangles are shown in blue. The class 1 triplet is formed by the residues 7LYS, 8PRO, and 10GLN in the protein 1VQB. The class 5 triplet is formed by residues 6LEU, 53GLY, and 86ILE in the protein 2ACY. Images generated using the package VMD [43]. It is best to visualize these as well as other triplet types in 3 D. Scripts to draw all the triangles for the above two proteins in VMD are made available on the web page for the paper [37]. The reader is encouraged to load the PDB file, run the script, and then rotate the molecule appropriately in 3 D in order to visualize the same.
Dataset of mutations studied.
| # | Article | Study | Mutants | Pred | TOT |
|---|---|---|---|---|---|
| 1 | [ | Mutagenesis experiments for APOBEC3G | L260A, C261A, W168A, C281A, C288A, C308A L234A, L235A, F241A, L253A, L371A | 9 | 11 |
| 2 | [ | AA replacement improving solubility | N159D | 0 | 1 |
| 3 | [ | AA Contribution to solubility | Y76 D, Y76R, Y76 S, Y76E, Y76K, Y76G, Y76A, Y76 H, Y76N, Y76P, Y76C, Y76 M, Y76V, Y76L, Y76I, Y76F, Y76W | 12 | 17 |
| 4 | [ | mutagenesis of Ab42 s'Alzheimer's peptide | F19 D, F19E, F19N, F19R, F19Q, F19 H, F19T, F19G, F19K, F19P, F19 S, F19A, F19C, F19 M, F19W, F19Y, F19L, F19V, F19I | 18 | 19 |
| 5 | [ | Polymerization and solubility of recombination | E6F, E6W, E6L | 2 | 3 |
| 6 | [ | Genetic selection for protein solubility | (H6Q/V12A/V24A/I32M/V36G), (V12A/I32T/L34P), (V12E/V18E/M35T/I41N), (F19S/L34P), (L34P), (F4I/S8P/V24A/L34P), I32S | 6 | 7 |
| 7 | [ | Isolation of viral coat protein mutants | (A26T/I118F), N27 S, A107T (N24S/C46R/A96V/N116S), Q109L, (V48A/Q109H), I104V, (N12D/S34G/S52P/I92M/C101R/Q109L/S120T), (A21S/N24D/Q40R/V79A), (Q6L/N12D/I33T/R56C/F95L), (T15N/N24S/V29A/W32C/T45S/I60T/N98Y/I104N/S126P), (V61E/L103F/K106R/Y129H), (F4S/W32R/Q50R) | 13 | 13 |
| 8 | [ | Improved solubility of TEV protease | (T17S/N68D/I77V), (T17S/R80S) | 2 | 2 |
| 9 | [ | Primary structure and solubility | W131A, V165K, A104T, Y203 H, W140F, C19Y, P28T, V32 M, G36R, T288 M, A384P, C70 S, C26 S, C93 S, W140K, W140L, W140C, (W86F/W140F), (W130F/W140F), P28K, H44Y, (W86F/W130F/W140F), R68C, G346 S, G349 S, A198V | 21 | 26 |
| 10 | [ | Substitutions affecting protein solubility | K97R, (K113F/W140K), (K113F/W140L), (K113F/W140C), K63 M, L104 M, T90A, L87 M, (T90A/E97A), L127 M, V74F, E97A, K69 M, (T345L/M358R), M358L, K97G, K97V, W140C, L10N, L10 D, L10T | 12 | 21 |
| 11 | [ | Dual selection for functionally active mutants | (Y35Q/F37R), (Y35L/F37T), (Y35G/F37L), (Y35L/F37R), K27E | 4 | 5 |
| 12 | [ | Assay for increased protein solubility | K185F, K185I, K185V, K185L, K185N, K185D | 6 | 6 |
| 13 | [ | Phage T4 vertex protein gp24 | (E89A/E90A) | 1 | 1 |
| 14 | [ | Human cell surface receptor CD58 | (Q21V/S85T/S1F/K9V/K58V/G93L) | 1 | 1 |
| 15 | [ | Solubility and folding of a genetic marker | W232E, Y242E, I317E, (G32D/I33P) | 4 | 4 |
Key: Multi-point mutants have each substitution separated by "/", and the entire mutant enclosed within braces. Pred gives the number of mutants correctly predicted by the LP-based method, out of the total number given under TOT.
Statistics for LOOCV using LP, SVM, and Lasso models.
| Measure | LP | SVM | Lasso |
|---|---|---|---|
| Accuracy | 0.810 | 0.708 | 0.701 |
| MCC | 0.617 | 0.405 | 0.423 |
| Precision(class | 0.762 | 0.661 | 0.909 |
| Precision(class | 0.851 | 0.735 | 0.661 |
Statistics for 10-fold CV using LP, SVM, and Lasso models.
| Measure | LP | SVM | Lasso |
|---|---|---|---|
| Accuracy | 0.766 | 0.752 | 0.708 |
| MCC | 0.545 | 0.496 | 0.448 |
| Precision(class | 0.719 | 0.705 | 0.952 |
| Precision(class | 0.822 | 0.790 | 0.664 |
Statistics for 3-fold CV using LP, SVM, and Lasso models.
| Measure | LP | SVM | Lasso |
|---|---|---|---|
| Accuracy | 0.766 | 0.686 | 0.715 |
| MCC | 0.529 | 0.359 | 0.452 |
| Precision(class | 0.714 | 0.638 | 0.917 |
| Precision(class | 0.811 | 0.722 | 0.673 |
Accuracy and MCC values for k-fold CV using LP, SVM, and Lasso models, when the folds are created using sequence similarity scores.
| LP | SVM | Lasso | ||||
|---|---|---|---|---|---|---|
| ACC | MCC | ACC | MCC | ACC | MCC | |
| 19 | 0.504 | 0.289 | 0.569 | -0.056 | 0.569 | -* |
| 30 | 0.642 | 0.279 | 0.511 | -0.075 | 0.584 | 0.140 |
| 50 | 0.650 | 0.289 | 0.409 | -0.185 | 0.708 | 0.448 |
| 70 | 0.686 | 0.364 | 0.650 | 0.269 | 0.708 | 0.448 |
Key: k = 19 represents leave one protein out CV. There was no MCC value (denoted by -*) for predictions by the Lasso model in this case, as all mutants were predicted to see a decrease in solubility.