Debby D Wang1, Haoran Xie2, Hong Yan3. 1. Institute of Medical Information Engineering, School of Medical Instrument and Food Engineering,University of Shanghai for Science and Technology, 516 Jungong Rd, Shanghai 200093, China. 2. Department of Computing and Decision Sciences, Lingnan University, 8 Castle Peak Rd, Tuen Mun, Hong Kong. 3. Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong.
Abstract
MOTIVATION: Reliable predictive models of protein-ligand binding affinity are required in many areas of biomedical research. Accurate prediction based on current descriptors or molecular fingerprints remains a challenge. We develop novel interaction fingerprints (IFPs) to encode protein-ligand interactions and use them to improve the prediction. RESULTS: Proteo-chemometrics IFPs (PrtCmm IFPs) formed by combining extended connectivity fingerprints (ECFPs) with the proteo-chemometrics concept, were developed. Combining PrtCmm IFPs with machine-learning models led to efficient scoring models, which were validated on the PDBbind v2019 core set and CSAR-HiQ sets. The PrtCmm IFP Score outperformed several other models in predicting protein-ligand binding affinities. Besides, conventional ECFPs were simplified to generate new IFPs, which provided consistent but faster predictions. The relationship between the base atom properties of ECFPs and the accuracy of predictions was also investigated. AVAILABILITY: PrtCmm IFP has been implemented in the IFP Score Toolkit on github https://github.com/debbydanwang/IFPscore. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Reliable predictive models of protein-ligand binding affinity are required in many areas of biomedical research. Accurate prediction based on current descriptors or molecular fingerprints remains a challenge. We develop novel interaction fingerprints (IFPs) to encode protein-ligand interactions and use them to improve the prediction. RESULTS: Proteo-chemometrics IFPs (PrtCmm IFPs) formed by combining extended connectivity fingerprints (ECFPs) with the proteo-chemometrics concept, were developed. Combining PrtCmm IFPs with machine-learning models led to efficient scoring models, which were validated on the PDBbind v2019 core set and CSAR-HiQ sets. The PrtCmm IFP Score outperformed several other models in predicting protein-ligand binding affinities. Besides, conventional ECFPs were simplified to generate new IFPs, which provided consistent but faster predictions. The relationship between the base atom properties of ECFPs and the accuracy of predictions was also investigated. AVAILABILITY: PrtCmm IFP has been implemented in the IFP Score Toolkit on github https://github.com/debbydanwang/IFPscore. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.