| Literature DB >> 32140203 |
Jingtian Zhao1, Yang Cao2, Le Zhang1.
Abstract
Proteins participate in various essential processes in vivo via interactions with other molecules. Identifying the residues participating in these interactions not only provides biological insights for protein function studies but also has great significance for drug discoveries. Therefore, predicting protein-ligand binding sites has long been under intense research in the fields of bioinformatics and computer aided drug discovery. In this review, we first introduce the research background of predicting protein-ligand binding sites and then classify the methods into four categories, namely, 3D structure-based, template similarity-based, traditional machine learning-based and deep learning-based methods. We describe representative algorithms in each category and elaborate on machine learning and deep learning-based prediction methods in more detail. Finally, we discuss the trends and challenges of the current research such as molecular dynamics simulation based cryptic binding sites prediction, and highlight prospective directions for the near future.Entities:
Keywords: Deep learning; Ligand binding site; Machine learning; Protein; Protein–ligand binding
Year: 2020 PMID: 32140203 PMCID: PMC7049599 DOI: 10.1016/j.csbj.2020.02.008
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 13D schematic of a protein structure and its binding ligands generated from the PDB website. The protein shown above is the crystal structure of human deoxyhaemoglobin at 1.74 Å resolution, published on PDB (Access Code: 4HHB). The amplified ligand is [HEM (PROTOPORPHYRIN IX CONTAINING FE)] 142: C with its bonds (Hydrogen, Halogen, et al).
Published 3D structure-based LBS prediction methods.
| Method | Type | Feature | Year |
|---|---|---|---|
| A computational procedure (with no specific name) | Probe Energy-based | Contour surfaces at appropriate energy levels are calculated for each probe and displayed with the protein structure | 1985 |
| POCKET | Spatial Geometry Measurement | Place spheres between atoms and surfaces of pockets are modeled using marching cubes algorithm | 1992 |
| SURFNET | Spatial Geometry Measurement | Place spheres at the gap between any two protein atoms | 1995 |
| LIGSITE | Spatial Geometry Measurement | Set up some regular 3D meshes to cover the target protein | 1997 |
| CAST | Spatial Geometry Measurement | Calculate by using alpha shape and discrete flow theory | 1998 |
| CASTp | Spatial Geometry Measurement | Use alpha shape and the pocket algorithm | 2003 |
| QSiteFinder | Probe Energy-based | Use the interaction energy between the protein and a simple van der Waals probe | 2005 |
| LIGSITECSC | Spatial Geometry Measurement | An extension and implementation of the LIGSITE algorithm by using the Connolly surface | 2006 |
| VISCANA | Probe Energy-based | A total energy of the molecule is evaluated by summation of fragment energies and interfragment interaction energies | 2006 |
| Fpocket | Spatial Geometry Measurement | Voronoi tessellation and alpha spheres are used to detect pockets | 2009 |
| SITEHOUND | Probe Energy-based | The carbon probe and phosphate probe used to detect interaction force between the probe and the protein | 2009 |
| MSPocket | Spatial Geometry Measurement | Identify surface pocket regions according to the normal vector directions at the vertices on the surface | 2010 |
| FTSite | Probe Energy-based | Use 16 different probes on these grids to detect free energy | 2011 |
| SiteComp | Probe Energy-based | Discovery of subsites with different interaction properties and for fast calculations of residue contribution to binding sites | 2012 |
| LISE | Spatial Geometry Measurement | Compute a score by counting geometric motifs extracted from substructures of interaction networks connecting protein and ligand atoms | 2013 |
| Patch-Surfer2. 0 | Spatial Geometry Measurement | Represent and compare pockets at the level of small local surface patches that characterize physicochemical properties of the local regions | 2014 |
| CurPocket | Spatial Geometry Measurement | Compute the curvature distribution of protein surface and identify the clusters of concave regions | 2019 |
Published template similarity-based LBS prediction methods.
| Method | Type | Feature | Year |
|---|---|---|---|
| ConSurf | Sequence Template-based | Phylogenetic relationships among the sequences and the similarity between the amino acids are taken into account | 2003 |
| A Sequence template-based approach with no specific name | Sequence Template-based | An information-theoretic approach for estimating sequence conservation based on Jensen–Shannon divergence | 2007 |
| FINDSITE | Structure Template-based | PROSPECTOR 3 threading algorithm and TMalign tool are used | 2008 |
| A two‐stage template‐based LBS prediction method | Structure Template-based | Construct protein’s 3D model and use structural clustering of ligand‐containing templates on the predicted 3D model | 2009 |
| 3DLigandSite | Structure Template-based | MAMMOTH is used | 2010 |
| FunFOLD | Structure Template-based | Use an automatic approach for cluster identification and residue selection | 2011 |
| COFACTOR | Structure and Sequence Template-based | Use global-to-local sequence and structural comparison algorithm | 2012 |
| webPDBinder | Structure Template-based | Search a protein structure against a library of known binding sites and a collection of control nonbinding pockets. | 2013 |
| S-SITE | Sequence Template-based | Needleman–Wunsch algorithms are used | 2013 |
| TM-SITE | Structure and Sequence Template-based | Mix Structure Template-based and Sequence Template-based method | 2013 |
Traditional machine learning-based LBS prediction and binding affinity research methods.
| Method | Machine Learning Algorithm | Year |
|---|---|---|
| Knowledge-based QSAR approach | Kernel-Partial Least Squares (K-PLS) | 2004 |
| Multi-RELIEF | RELIEF algorithm | 2007 |
| SFCscore | multiple linear regression partial least squares analysis | 2008 |
| ATPint | Support Vector Machine | 2009 |
| ConCavity | K-Means algorithm | 2009 |
| MetaPocket | hierarchical clustering algorithm | 2009 |
| RF-Score | The Random Forest algorithm | 2010 |
| MetaDBSite | Support Vector Machine | 2011 |
| NsitePred | Support Vector Machine | 2011 |
| NNSCORE | Artificial Neural Network (shallow neural network | 2011 |
| L1pred | L1-Logreg Regression classifier | 2012 |
| TargetS | Support Vector Machine | 2013 |
| eFindSite | Support Vector Machine | 2013 |
| VitaPred | Support Vector Machine | 2013 |
| COACH | Support Vector Machine | 2013 |
| LigandRFs | The Random Forest algorithm | 2014 |
| OSML | Support Vector Machine | 2015 |
| LigandDSES | The Random Forest algorithm | 2015 |
| PRANK | The Random Forest algorithm | 2015 |
| A method for protein‐ligand binding affinity prediction | Gradient Boosting Regressor | 2018 |
| SAnDReS | Regression Analysis | 2016 |
| P2Rank | The Random Forest algorithm | 2018 |
| COACH-D | Support Vector Machine | 2018 |
| Taba | Regression Analysis | 2019 |
Fig. 2A simple schematic of SVM A hyperplane divides the points into two categories.
Fig. 3A simple model of a convolutional neural network Hidden Layers are used to generate the classification result (multiple convolutional layers and pooling layers can be set in a CNN).
Fig. 4A simple demonstration of deep belief network DBNs are constructed by combining multiple RBMs. Training of DBNs is performed layer by layer. The hidden layer is first inferred from the data vector, and this hidden layer is used as the input data vector of the next layer.
Deep learning-based LBS prediction and binding affinity research methods.
| Method | Main Goal | Network Type | Year |
|---|---|---|---|
| A deep learning framework for modeling structural features of RNA-binding protein targets | Binding references modeling of RNA-binding proteins | DBN | 2015 |
| DeepBind | Sequence specificities prediction of DNA- and RNA-binding proteins | CNN | 2015 |
| DeepDTA | Drug-target interaction identification | CNN | 2018 |
| KDEEP | Protein-ligand binding affinity prediction | CNN | 2018 |
| DEEPSite | LBS Prediction | CNN | 2017 |
| DeepCSeqSite | LBS Prediction | CNN | 2019 |
| DeepConv-DTI | Drug-target interaction identification | CNN | 2019 |
| DeepDrug3D | Binding pockets characterization and classification | CNN | 2019 |
| Onionnet | Protein-ligand binding affinity prediction | CNN | 2019 |