| Literature DB >> 35412635 |
Jake E McGreig1, Hannah Uri1, Magdalena Antczak1, Michael J E Sternberg2, Martin Michaelis1, Mark N Wass1.
Abstract
3DLigandSite is a web tool for the prediction of ligand-binding sites in proteins. Here, we report a significant update since the first release of 3DLigandSite in 2010. The overall methodology remains the same, with candidate binding sites in proteins inferred using known binding sites in related protein structures as templates. However, the initial structural modelling step now uses the newly available structures from the AlphaFold database or alternatively Phyre2 when AlphaFold structures are not available. Further, a sequence-based search using HHSearch has been introduced to identify template structures with bound ligands that are used to infer the ligand-binding residues in the query protein. Finally, we introduced a machine learning element as the final prediction step, which improves the accuracy of predictions and provides a confidence score for each residue predicted to be part of a binding site. Validation of 3DLigandSite on a set of 6416 binding sites obtained 92% recall at 75% precision for non-metal binding sites and 52% recall at 75% precision for metal binding sites. 3DLigandSite is available at https://www.wass-michaelislab.org/3dligandsite. Users submit either a protein sequence or structure. Results are displayed in multiple formats including an interactive Mol* molecular visualization of the protein and the predicted binding sites.Entities:
Year: 2022 PMID: 35412635 PMCID: PMC9252821 DOI: 10.1093/nar/gkac250
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.An overview of the 3DLigandSite method. Users submit either a protein sequence or structure. Where sequences are submitted, the PDBe and AlphaFold DB are searched for a matching structure; where one is not available, Phyre2 is used to model the 3D structure. HHSearch is used to search a sequence library of protein structures with ligands bound. Hits from this search are aligned with the structure of query protein, and the ligands from these structures are clustered. Each cluster of ligands represents a potential binding site in the query protein. A machine learning classifier is used to predict which of the residues around the cluster are likely to form part of a binding site.
Figure 2.Benchmarking the 3DLigandSite machine learning classifier. ROC curves and precision–recall curves are shown for the prediction of binding sites of non-metal (A and B) and metal (C and D) ligands.
Figure 3.Viewing results on the 3DLigandSite web server. Results are presented in three main sections: a sequence view, which maps sequence conservation and the different clusters identified onto the protein sequence. Second, details of the clusters, including the number of ligands and type of ligand, are displayed as well as a table listing the residues predicted to form the binding site for each cluster. Finally, the structural analysis section includes a Mol* molecular viewer to visualize the protein, the predicted binding site and the clusters used to make the predictions. A separate control panel (on the right) enables users to easily modify the display.