| Literature DB >> 30109435 |
Radoslav Krivák1, David Hoksza2.
Abstract
BACKGROUND: Ligand binding site prediction from protein structure has many applications related to elucidation of protein function and structure based drug discovery. It often represents only one step of many in complex computational drug design efforts. Although many methods have been published to date, only few of them are suitable for use in automated pipelines or for processing large datasets. These use cases require stability and speed, which disqualifies many of the recently introduced tools that are either template based or available only as web servers.Entities:
Keywords: Binding site prediction; Ligand binding sites; Machine learning; Protein pockets; Protein surface descriptors; Random forests
Year: 2018 PMID: 30109435 PMCID: PMC6091426 DOI: 10.1186/s13321-018-0285-8
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Availability of existing tools for ligand binding site prediction from protein structure introduced since 2009
| Name | Year | Type | Web server | Stand-alone | Fully automated† | Source Code |
|---|---|---|---|---|---|---|
| SiteMap [ | 2009 | Geometric | – | Yes | Yes | – |
| Fpocket [ | 2009 | Geometric | Yes | Yes | Yes | Yes |
| SiteHound [ | 2009 | Energetic | Yes | Yes | Yes | Yes |
| ConCavity [ | 2009 | Conservation | Yes | Yes | – | Yes |
| 3DLigandSite [ | 2010 | Template | Yes | – | – | – |
| POCASA [ | 2010 | Geometric | Yes | – | – | – |
| DoGSite [ | 2010 | Geometric | Yes | – | – | – |
| MetaPocket 2.0 [ | 2011 | consensus | Yes | – | – | – |
| MSPocket [ | 2011 | Geometric | – | Yes | Yes | Yes |
| FTSite [ | 2012 | Energetic | Yes | – | – | – |
| LISE [ | 2012 | Knowledge/conservation | Yes | Yes | – | – |
| COFACTOR [ | 2012 | Template | Yes | Yes | Yes | – |
| COACH [ | 2013 | Template† † | Yes | Yes | Yes | – |
| G-LoSA [ | 2013 | Template | – | Yes | – | Yes |
| eFindSite [ | 2013 | Template | Yes | Yes | – | Yes |
| GalaxySite [ | 2014 | Template/docking | Yes | – | – | – |
| LIBRA [ | 2015 | Template | Yes | Yes | – | – |
| P2Rank (this work) | 2015* | Machine learning | –** | Yes | Yes | Yes |
| bSiteFinder [ | 2016 | Template | Yes | – | – | – |
| ISMBLab-LIG [ | 2016 | Machine learning | Yes | – | – | – |
| DeepSite [ | 2017 | Machine learning | Yes | – | – | – |
†Applies to stand-alone versions
††Consensus of template based methods: TM-SITE, S-SITE and COFACTOR (also FINDSITE and ConCavity in web version)
*Algorithm introduced in conference proceedings [49]
**In development
Prediction speed
| Method | Time† |
|---|---|
| COACH (web server) | 15 h (self reported estimate) |
| eFindSite (web server) | |
| COACH (stand-alone) | |
| GalaxySite (web server) | 2 h (self reported estimate) |
| 3DLigandSite (web server) | 1–3 h (self reported estimate) |
| ISMBLab-LIG (web server) | |
| FTSite (web server) | |
| LISE (web server) | |
| MetaPocket 2.0 (web server) | |
| DeepSite (web server) | |
| SiteHound (stand-alone) | |
| P2Rank (stand-alone) | |
| 0.9 s (in larger dataset*) | |
| Fpocket (stand-alone) |
†Average time required for LBS prediction on a single protein. Displayed is self reported estimate or a result of our test on a small dataset of 5 proteins á 2500 atoms. Stand-alone tools were tested on a single 3.7 GHz CPU core. For web servers the wall time from submitting a job to receiving the result was measured.
*Difference is due to JVM initialization and model loading cost
Fig. 1Visualization of ligand binding sites predicted by P2Rank for structure 1FBL. Protein is covered in a layer of points lying on the Solvent Accessible Surface of the protein. Each point represents its local chemical neighborhood and is colored according to its predicted ligandability score (from 0 = green to 1 = red). Points with high ligandablity score are clustered to form predicted binding sites (marked by coloring adjacent protein surface). In this case, the largest predicted pocket (shown in the close-up) is indeed a correctly predicted true binding site that binds a known ligand (magenta). Visualization is based on a PyMOL script produced by P2Rank
Comparison of predictive performance on COACH420 and HOLO4K datasets
| COACH420 | HOLO4K | |||
|---|---|---|---|---|
| Top-n | Top-(n+2) | Top-n | Top-(n+2) | |
| Fpocket | 56.4 | 68.9 | 52.4 | 63.1 |
| Fpocket+PRANKa | 63.6 | 76.5 | 62.0 | 71.0 |
| SiteHound† | 53.0 | 69.3 | 50.1 | 62.1 |
| MetaPocket 2.0† | 63.4 | 74.6 | 57.9 | 68.6 |
| DeepSite† | 56.4 | 63.4 | 45.6 | 48.2 |
| P2Rank[protrusion]b | 64.2 | 73.0 | 59.3 | 67.7 |
| P2Rank |
|
|
|
|
The numbers represent identification success rate [%] measured by DCCcriterion (distance from pocket center to closest ligand atom) with 4 Å threshold considering only pockets ranked at the top of the list (n is the number of ligands in considered structure)
†These methods failed to produce predictions for some portion of input proteins. Here we display success rates calculated only based on subsets of proteins, on which they finished successfully. Detailed, pairwise comparison with P2Rank on the exact subsets can be found in the Additional file 1.
aPredictions of Fpocket re-scored by PRANK algorithm (which is included in P2Rank software package)
bReduced version of P2Rank that uses only single geometric feature: protrusion
Average number of predicted binding sites
| COACH420 | HOLO4K | |
|---|---|---|
| avg. protein atoms | 2179 | 3908 |
| avg. true sites | 1.2 | 2.4 |
| Fpocket | 14.6 | 27 |
| SiteHound | 66.2 | 99.5 |
| MetaPocket 2.0 | 6.3 | 6.4 |
| DeepSite | 3.2 | 2.8 |
| P2Rank | 6.3 | 12.6 |
Displayed is the average total number of binding sites predicted per protein by each method on a given dataset