Norberto Sánchez-Cruz1, José L Medina-Franco1, Jordi Mestres2,3, Xavier Barril4,5. 1. Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico. 2. Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomedica (PRBB), 08003 Barcelona, Catalonia, Spain. 3. Chemotargets SL, Parc Cientific de Barcelona (PCB), 08028 Barcelona, Catalonia, Spain. 4. Institut de Biomedicina de la Universitat de Barcelona (IBUB) and Facultat de Farmacia, Universitat de Barcelona, 08028 Barcelona, Spain. 5. Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain.
Abstract
MOTIVATION: Machine-learning scoring functions (SFs) have been found to outperform standard SFs for binding affinity prediction of protein-ligand complexes. A plethora of reports focus on the implementation of increasingly complex algorithms, while the chemical description of the system has not been fully exploited. RESULTS: Herein, we introduce Extended Connectivity Interaction Features (ECIF) to describe protein-ligand complexes and build machine-learning SFs with improved predictions of binding affinity. ECIF are a set of protein-ligand atom-type pair counts that take into account each atom's connectivity to describe it and thus define the pair types. ECIF were used to build different machine-learning models to predict protein-ligand affinities (pKd/pKi). The models were evaluated in terms of 'scoring power' on the Comparative Assessment of Scoring Functions 2016. The best models built on ECIF achieved Pearson correlation coefficients of 0.857 when used on its own, and 0.866 when used in combination with ligand descriptors, demonstrating ECIF descriptive power. AVAILABILITY AND IMPLEMENTATION: Data and code to reproduce all the results are freely available at https://github.com/DIFACQUIM/ECIF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Machine-learning scoring functions (SFs) have been found to outperform standard SFs for binding affinity prediction of protein-ligand complexes. A plethora of reports focus on the implementation of increasingly complex algorithms, while the chemical description of the system has not been fully exploited. RESULTS: Herein, we introduce Extended Connectivity Interaction Features (ECIF) to describe protein-ligand complexes and build machine-learning SFs with improved predictions of binding affinity. ECIF are a set of protein-ligand atom-type pair counts that take into account each atom's connectivity to describe it and thus define the pair types. ECIF were used to build different machine-learning models to predict protein-ligand affinities (pKd/pKi). The models were evaluated in terms of 'scoring power' on the Comparative Assessment of Scoring Functions 2016. The best models built on ECIF achieved Pearson correlation coefficients of 0.857 when used on its own, and 0.866 when used in combination with ligand descriptors, demonstrating ECIF descriptive power. AVAILABILITY AND IMPLEMENTATION: Data and code to reproduce all the results are freely available at https://github.com/DIFACQUIM/ECIF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Fernando D Prieto-Martínez; Eli Fernández-de Gortari; José L Medina-Franco; L Michel Espinoza-Fonseca Journal: Artif Intell Life Sci Date: 2021-09-12
Authors: K Eurídice Juárez-Mercado; Fernando D Prieto-Martínez; Norberto Sánchez-Cruz; Andrea Peña-Castillo; Diego Prada-Gracia; José L Medina-Franco Journal: Pharmaceuticals (Basel) Date: 2020-12-27