Literature DB >> 30730731

Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.

Angela Lopez-Del Rio1,2,3,4, Alfons Nonell-Canals2, David Vidal2, Alexandre Perera-Lluna1,3,4.   

Abstract

Binding prediction between targets and drug-like compounds through deep neural networks has generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database, and (4) splitting based both in the clustering and in the source database. These schemas are applied to a deep learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our deep learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compound clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.

Entities:  

Mesh:

Year:  2019        PMID: 30730731     DOI: 10.1021/acs.jcim.8b00663

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  5 in total

1.  The effect of statistical normalization on network propagation scores.

Authors:  Sergio Picart-Armada; Wesley K Thompson; Alfonso Buil; Alexandre Perera-Lluna
Journal:  Bioinformatics       Date:  2021-05-05       Impact factor: 6.937

2.  The Random Forest Model Has the Best Accuracy Among the Four Pressure Ulcer Prediction Models Using Machine Learning Algorithms.

Authors:  Jie Song; Yuan Gao; Pengbin Yin; Yi Li; Yang Li; Jie Zhang; Qingqing Su; Xiaojie Fu; Hongying Pi
Journal:  Risk Manag Healthc Policy       Date:  2021-03-18

3.  STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products.

Authors:  Nicholas T Cockroft; Xiaolin Cheng; James R Fuchs
Journal:  J Chem Inf Model       Date:  2019-10-24       Impact factor: 4.956

4.  Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.

Authors:  Paul G Francoeur; Tomohide Masuda; Jocelyn Sunseri; Andrew Jia; Richard B Iovanisci; Ian Snyder; David R Koes
Journal:  J Chem Inf Model       Date:  2020-09-10       Impact factor: 4.956

5.  Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction.

Authors:  Angela Lopez-Del Rio; Maria Martin; Alexandre Perera-Lluna; Rabie Saidi
Journal:  Sci Rep       Date:  2020-09-03       Impact factor: 4.379

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.