Literature DB >> 31167503

Machine Learning for Molecular Modelling in Drug Design.

Pedro J Ballester1.   

Abstract

Machine learning (ML) has become a crucial component of early drug discovery. This researcharea has been fueled by two main factors [...].

Entities:  

Mesh:

Year:  2019        PMID: 31167503      PMCID: PMC6627644          DOI: 10.3390/biom9060216

Source DB:  PubMed          Journal:  Biomolecules        ISSN: 2218-273X


Machine learning (ML) has become a crucial component of early drug discovery. This research area has been fueled by two main factors. The first is the fast-growing availability of relevant experimental data. Examples of such datasets are those containing the bioactivities of molecules of known chemical structure against a non-molecular target (e.g., a cancer cell line), binding affinities of such molecules against a molecular target (e.g., a particular kinase validated for a specific cancer type) or the X-ray crystal structures of a molecular target. This factor has been boosted by the development of community resources, such as ChEMBL [1], PubChem [2], NCI-60 [3], or PDBbind [4], that curate and facilitate re-using these datasets for predictive modelling. The second factor is the easy access to high-quality and well-documented implementations of a range of ML algorithms, including those of recent advances such as XGBoost [5], deep learning [6], or conformal prediction [7]. As a result, an increasing number of data-driven ML models have been proposed and found advantageous in some way in identifying new starting points for the drug discovery process. This Special Issue showcases five studies investigating the application of ML for molecular modelling in drug design. These studies have been carried out by 21 academic and industry researchers from around the World. ML techniques include support vector machines (SVM), random forest (RF), k-nearest neighbors (k-NN), convolutional neural network (CNN), or recurrent neural network (RNN), either alone or integrated with dimensionality reduction techniques such as GA(genetic algorithm)-based feature selection (FS) and principal component analysis (PCA). The first of these papers by Cruz et al. [8] investigated quantitative structure–activity relationship (QSAR) models to predict which molecules are able to inhibit the growth of HCT116, a human colon carcinoma cell line. Regression models were developed with this purpose, using a total of 7339 molecules with chemical structure and half-maximal inhibitory concentration (IC50) data. The QSAR classification models were also built, this time using nuclear magnetic resonance (NMR) data as features. Models were built with k-NN, RF, and SVM algorithms. The authors concluded that the developed models were sufficiently predictive to permit the identification of new inhibitors of this non-molecular target. Chen et al. [9] aimed at identifying new inhibitors of the C1 target that could be used to advance towards new treatments for hereditary angioedema. The QSAR models were built integrating SVM with PCA and GA-based FS. Once these models were retrospectively validated, they were used to screen 72 million PubChem compounds against C1. Large hit rates were obtained following in vitro tests. Some of these new inhibitors have previously unknown active scaffolds for this target and are single-digit μM. Detection of mutagenicity during early stages of drug discovery is important to reduce the likelihood of developing drugs with harmful side effects. Norinder et al. [10] applied the conformal prediction method to the prediction of mutagenicity of primary aromatic amines (PAAs) using Leadscope features in conjunction with RF. Conformal prediction is attractive in that it predicts how reliable model predictions are. Such RF-based QSAR models were built and validated. The authors concluded that it was possible to predict this type of mutagenicity in an independent set of compounds while estimating the errors of each of these individual predictions using their methodology. Bjerrum and Sattarov [11] demonstrated that the QSAR model accuracy can be improved by using heteroencoders of the molecules as features. The common approach of using autoencoders on canonical simplified molecular-input line-entry system (SMILES) is hampered by their poor neighborhood behavior (i.e., similar chemical structures mapping onto dissimilar canonical SMILES). A heteroencoder is introduced as an autoencoder considering several non-canonical SMILES as input, instead of a single canonical SMILES, for each molecule to factor in the impact of different chemical representations on modelling. These heteroencoders were trained using CNNs and RNNs with long short-term memory cells. In comparison to using autoencoders, the use of heteroencoders resulted in better predictive performance of the resulting QSAR models. Furthermore, the spanned latent space led to a better agreement between SMILES similarity and circular fingerprint similarity of the considered molecules. Machine learning has been used to generate diverse ligand-based predictive models in these four contributions so far [8,9,10,11] by exploiting chemical structure and bioactivity data. However, by also exploiting X-ray crystal structure data, ML can also be used to build protein-ligand predictive models. These models are known as ML scoring functions (SFs) and have been found to be an important complement to classical SFs in docking [12]. The last paper in this issue [13] investigated whether the well-known superiority of ML SFs over classical SFs on average across targets is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. We addressed this question by using 24 similarity-based training sets, a widely used test set, and four SFs. We found that an RF-based SF outperforms the best classical SF even when 68% of the most similar proteins are removed from the training set. In addition, unlike the classical SF, the RF-based SF is able to keep learning as the training set size grows, becoming substantially more predictive when the full 1105 data instances are used for training. These results show that ML SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set.
  13 in total

1.  Deep neural nets as a method for quantitative structure-activity relationships.

Authors:  Junshui Ma; Robert P Sheridan; Andy Liaw; George E Dahl; Vladimir Svetnik
Journal:  J Chem Inf Model       Date:  2015-02-17       Impact factor: 4.956

2.  Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results.

Authors:  Yan Li; Li Han; Zhihai Liu; Renxiao Wang
Journal:  J Chem Inf Model       Date:  2014-06-02       Impact factor: 4.956

3.  Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships.

Authors:  Robert P Sheridan; Wei Min Wang; Andy Liaw; Junshui Ma; Eric M Gifford
Journal:  J Chem Inf Model       Date:  2016-12-13       Impact factor: 4.956

Review 4.  The NCI60 human tumour cell line anticancer drug screen.

Authors:  Robert H Shoemaker
Journal:  Nat Rev Cancer       Date:  2006-10       Impact factor: 60.716

5.  The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

Authors:  Hongjian Li; Jiangjun Peng; Yee Leung; Kwong-Sak Leung; Man-Hon Wong; Gang Lu; Pedro J Ballester
Journal:  Biomolecules       Date:  2018-03-14

6.  Predicting Aromatic Amine Mutagenicity with Confidence: A Case Study Using Conformal Prediction.

Authors:  Ulf Norinder; Glenn Myatt; Ernst Ahlberg
Journal:  Biomolecules       Date:  2018-08-29

7.  PubChem BioAssay: 2014 update.

Authors:  Yanli Wang; Tugba Suzek; Jian Zhang; Jiyao Wang; Siqian He; Tiejun Cheng; Benjamin A Shoemaker; Asta Gindulyte; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2013-11-05       Impact factor: 16.971

8.  The ChEMBL bioactivity database: an update.

Authors:  A Patrícia Bento; Anna Gaulton; Anne Hersey; Louisa J Bellis; Jon Chambers; Mark Davies; Felix A Krüger; Yvonne Light; Lora Mak; Shaun McGlinchey; Michal Nowotka; George Papadatos; Rita Santos; John P Overington
Journal:  Nucleic Acids Res       Date:  2013-11-07       Impact factor: 16.971

9.  In Silico HCT116 Human Colon Cancer Cell-Based Models En Route to the Discovery of Lead-Like Anticancer Drugs.

Authors:  Sara Cruz; Sofia E Gomes; Pedro M Borralho; Cecília M P Rodrigues; Susana P Gaudêncio; Florbela Pereira
Journal:  Biomolecules       Date:  2018-07-17

10.  Improving Chemical Autoencoder Latent Space and Molecular De Novo Generation Diversity with Heteroencoders.

Authors:  Esben Jannik Bjerrum; Boris Sattarov
Journal:  Biomolecules       Date:  2018-10-30
View more
  6 in total

Review 1.  Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases.

Authors:  David A Winkler
Journal:  Front Chem       Date:  2021-03-15       Impact factor: 5.221

Review 2.  Commercial SARS-CoV-2 Targeted, Protease Inhibitor Focused and Protein-Protein Interaction Inhibitor Focused Molecular Libraries for Virtual Screening and Drug Design.

Authors:  Sebastjan Kralj; Marko Jukič; Urban Bren
Journal:  Int J Mol Sci       Date:  2021-12-30       Impact factor: 5.923

Review 3.  From Data to Knowledge: Systematic Review of Tools for Automatic Analysis of Molecular Dynamics Output.

Authors:  Hanna Baltrukevich; Sabina Podlewska
Journal:  Front Pharmacol       Date:  2022-03-10       Impact factor: 5.810

Review 4.  Application of Artificial Intelligence in Discovery and Development of Anticancer and Antidiabetic Therapeutic Agents.

Authors:  Amal Alqahtani
Journal:  Evid Based Complement Alternat Med       Date:  2022-04-25       Impact factor: 2.650

5.  Deep Learning Promotes the Screening of Natural Products with Potential Microtubule Inhibition Activity.

Authors:  Xiao-Nan Jia; Wei-Jia Wang; Bo Yin; Lin-Jing Zhou; Yong-Qi Zhen; Lan Zhang; Xian-Li Zhou; Hai-Ning Song; Yong Tang; Feng Gao
Journal:  ACS Omega       Date:  2022-08-05

Review 6.  A Review on Applications of Computational Methods in Drug Screening and Design.

Authors:  Xiaoqian Lin; Xiu Li; Xubo Lin
Journal:  Molecules       Date:  2020-03-18       Impact factor: 4.411

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.