Sayoni Das1, Harry M Scholes2, Neeladri Sen2, Christine Orengo2. 1. PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK. 2. Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK.
Abstract
MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION: https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION: https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Braelyn M Page; Tyler A Martin; Collette L Wright; Lauren A Fenton; Maite T Villar; Qingling Tang; Antonio Artigues; Audrey Lamb; Aron W Fenton; Liskin Swint-Kruse Journal: Protein Sci Date: 2022-07 Impact factor: 6.993
Authors: Pâmela M Rezende; Joicymara S Xavier; David B Ascher; Gabriel R Fernandes; Douglas E V Pires Journal: Brief Bioinform Date: 2022-07-18 Impact factor: 13.994
Authors: Stephen R Bowen; Daniel S Hippe; Hannah M Thomas; Balukrishna Sasidharan; Paul D Lampe; Christina S Baik; Keith D Eaton; Sylvia Lee; Renato G Martins; Rafael Santana-Davila; Delphine L Chen; Paul E Kinahan; Robert S Miyaoka; Hubert J Vesselle; A McGarry Houghton; Ramesh Rengan; Jing Zeng Journal: Adv Radiat Oncol Date: 2021-11-21