| Literature DB >> 24688649 |
Rajni Verma1, Ulrich Schwaneberg2, Danilo Roccatano3.
Abstract
The combination of computational and directed evolution methods has proven a winning strategy for protein engineering. We refer to this approach as computer-aided protein directed evolution (CAPDE) and the review summarizes the recent developments in this rapidly growing field. We will restrict ourselves to overview the availability, usability and limitations of web servers, databases and other computational tools proposed in the last five years. The goal of this review is to provide concise information about currently available computational resources to assist the design of directed evolution based protein engineering experiment.Entities:
Keywords: directed evolution; diversity generation; focused library; mutational effect; rational design; semi-rational design
Year: 2012 PMID: 24688649 PMCID: PMC3962222 DOI: 10.5936/csbj.201209008
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Figure 1Schematic representation of four CAPDE approaches (as the quarters of the circle): (1) generated diversity and library size (in red), (2) evolutionary conservation based focused library (in green), (3) structure-based focused library (in purple) and (4) mutational effects in protein (in cyan). The servers, tools and databases associated with the approaches are shown in boxes.
Summarizing computational tools to analyze amino acid diversity, size and completeness of the library generated by mutagenesis methods.
| Approach | Name | Input | Case study examples | URL |
|---|---|---|---|---|
| Statistics of generated diversity |
| Nucleotide sequence or protein structure. | Cytochrome P450BM-3, [ |
|
|
| Nucleotide sequence, mutation rate, library size, indel rate, nucleotide mutation matrix. | α-synuclein, Phosphoribosylpyrophosphate amidotransferase [ |
| |
| Library size and completeness |
| Library size and randomization techniques. | Randomization scheme: NNK, NDT, NNB, NAY [ |
|
|
| Probability required by library size and randomization techniques. | Randomization scheme: NNN, NNB, NNK, MAX [ |
|
Figure 2a) The MAP3D analysis for the amino acid diversity generated by balanced epPCR (Taq (MnCl2, G=A=C=T) method. Y-axis shows the original amino acid species and the X-axis shows the amino acid substitution patterns. The MAP3D analysis is restricted to the active site residues (Ala11, Ser47, Thr48, Tyr137, Ile139, Lys165, Thr167, Gly189, Tyr190). For this analysis, the amino acids are grouped into four classes according to their chemical nature (charged, neutral, aromatic and aliphatic) with stop codon ((structure disrupting) and glycine/proline (helix destabilizing) as separate classes. The probabilities of amino acid substitutions were mapped on the protein sequence and structure (PDB Id: 1NAL) of N-acetylneuraminic acid and represented in b and c, respectively. b) The Jmol [31] applet is used for the visualization of amino acid substitution patterns using RWB (Red-white-blue) color gradient scheme and active site residues as sticks. Y-axis shows sequence id, PDB id, amino acid name and in c) secondary structure elements (T: hydrogen bonded turn and bend, *: loop or irregular structure), d) normalized Cα b-factor to differentiate flexible (F) and rigid (R) residues, and e) relative solvent associability to identify exposed (E) or buried (B) residues.
Summarizing computational tools for evolutionary conservation based focused library generation.
| Approach | Name | Description | Case study examples | URL |
|---|---|---|---|---|
| Hotspot identification |
| The web server performs MSA and calculates evolutionary conservation rate to identify conserved positions in protein or nucleotide sequence/structure. | GAL4 transcription factor [ |
|
|
| The database provides the predicted results of | Cytochrome c [ |
| |
|
| The Evolutionary trace based method performs MSA on a set of homologous sequences (from PSI-BLAST) after Gibbs like sampling. The aligned homologous sequences are used to construct distance tree based on Neighbor Joining algorithm. The clustering method is parameterized to identify protein interface or core residues by taking into account the physical-chemical properties and evolutionary conservation. | DNA polymerase I, DNA transferase, allophycocyanin, Leucine dehydrogenase, β-trypsin proteinase, phosphotransferase, human CDC42 gene regulation protein, oncogene protein, signal transduction protein etc [ |
| |
|
| The database provides information about hotspots in protein interface using conservation rate and solvent accessibility of the residues. | Numb phosphotyrosine-binding domain [ |
| |
|
| The web server predicts residue mutability of functionally important residues and visualizes it on protein sequence and structure. | Haloalkane dehalogenase, Phosphotriesterase, 1,3-1,4-b-D-Glucan 4-glucanohydrolase, β-Lactamase [ |
| |
|
| The web server detects selection forces on biologically significant sites in the target protein during evolutionary process. | TRIM5α protein [ |
| |
| Protein superfamily based MSA |
| The database performs structure based MSA for a protein superfamily with sequence, structural, molecular interaction and mutational information from the literature. | α/β hydrolase fold [ |
|
|
| The database performs protein superfamily based MSA and annotates functionally relevant amino acid positions with structural and mutational information. | Lipases [ |
| |
|
| Epoxide hydrolases and haloalkane dehalogenase [ |
| ||
|
| Laccases [ |
| ||
|
| Cytochrome P450s [ |
| ||
|
| Polyhydroxyalkanoates depolymerase [ |
| ||
|
| Lactamases [ |
| ||
|
| SHV lactamases [ |
| ||
| Literature based protein mutant data |
| The database provides literature based protein mutant information with structure and functional annotation. |
| |
|
| The database provides literature based protein mutant information with thermodynamic parameters and experimental conditions integrated with sequence, structure and function annotation. |
| ||
|
| The database provides literature based protein mutant information, kinetic parameters and experimental conditions integrated with user-friendly and flexible query system to fetch data using reaction name or substrate or inhibitor name or structure and mutations. | Cytochrome P450s [ |
| |
Summarizing the computational tools for structure-based focused library generation
| Approach | Name | Description | Case study examples | URL |
|---|---|---|---|---|
| Ligand binding site |
| The web server identifies ligand binding site via MSA and clustering algorithm. | Target T0483 in CASP8 |
|
|
| The web server detects binding site using MSA and characterizes it using local structural pairwise alignment. | Biotin carboxylase, TATA binding protein [ |
| |
|
| The database provides structurally similar protein binding site using | Cytochrome c [ |
| |
|
| The web server characterizes ligand binding site using molecular interaction descriptors. | Cyclooxygenase, adenylate kinase [ |
| |
|
| The method facilitates to model mutant, dock ligand in the protein and calculates reaction pathways for the characterization of protein-ligand interactions using Semi-empirical quantum-mechanics approach. | PA-IIL lectin and its mutants [ |
| |
| Protein interaction |
| The web server calculates the molecular interactions using published criteria. | - |
|
|
| The web server analyzes and visualizes interfaces in biological complexes using intermolecular contact maps based on distance or physicochemical properties. | Hen egg lysozyme interaction with two antibodies [ |
| |
| Residue depth and stability |
| The web server predicts binding cavity and mutational effect on protein stability using residue depth and solvent accessible surface area. | West Nile Virus NS2B/NS3 protease [ |
|
|
| The web server predicts the contribution of residues in protein stability using interactions with its spatial neighbors and their evolutionary conservation. | TIM-barrel proteins [ |
| |
| Protein surface and interface |
| The web server identifies large positively charged electrostatic patches on protein surface using Poisson Boltzmann electrostatic potential. | DNA binding domain of TATA binding protein [ |
|
|
| The web server performs evolutionary conservation analysis of the protein complex. | Rho–RhoGAP complex [ |
| |
| Protein flexibility |
| The web server performs flexible backbone modeling using Backrub [ | hGH-hGHr interface [ |
|
|
| The method generates conformation ensemble and transitions using geometrical constrains based prediction of protein conformational flexibility. | Osmoprotection protein [ |
| |
|
| The web server predicts residue flexibility in the protein using SVM approach. | Human PrP [ |
| |
|
| The web server predicts large amplitude motions in the protein using NMA. | HIV-1 protease, |
| |
|
| Calcium ATPase [ |
| ||
|
| The web server determines and analyzes protein flexibility using coarse-grained modeling approach. | - |
| |
|
| The web server detects hinge region in the protein using both GNM and ANM. | Calmodulin protein, hemoglobin [ |
| |
|
| The web server predicts domain motions using conformational changes in the protein. | Hemoglobin, 70S ribosome [ |
| |
Summarizing the computational tools to analyze the mutational effect on protein stability and activity.
| Approach | Name | Description | URL |
|---|---|---|---|
| SVM |
| The web server predicts protein stability change upon point mutation. |
|
|
|
| ||
| Decision tree (DT) |
| The web server predicts protein stability change with residues information. |
|
|
| The web server predicts protein stability change upon double mutation with residue information. |
| |
| Random forests (RF) |
| The web server predicts mutational effect on protein function. |
|
|
|
| ||
| Statistical potential based method |
| The web server predicts mutational effect on protein stability. |
|
|
| The web server predicts thermodynamic stability change upon mutation. |
| |
| Empirical force field |
| The plugin predicts mutational effect on protein and facilitates |
|
|
| The program suite predict mutational effect on protein stability, ligand affinity and pKa values. |
| |
|
| The web server predicts mutational effect on protein stability. |
| |
| RF, SVM, Tree and SVM regression |
| The web server predicts mutational effect on protein stability and activity (up to 19 mutations). |
|
| Evolutionary conservation |
| The web server predicts mutational effect on protein function. |
|