Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.

Literature DB >> 33313673

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.

Zhenxing Wu¹, Minfeng Zhu², Yu Kang¹, Elaine Lai-Han Leung³, Tailong Lei¹, Chao Shen¹, Dejun Jiang¹, Zhe Wang¹, Dongsheng Cao⁴, Tingjun Hou⁵.

Abstract

Although a wide variety of machine learning (ML) algorithms have been utilized to learn quantitative structure-activity relationships (QSARs), there is no agreed single best algorithm for QSAR learning. Therefore, a comprehensive understanding of the performance characteristics of popular ML algorithms used in QSAR learning is highly desirable. In this study, five linear algorithms [linear function Gaussian process regression (linear-GPR), linear function support vector machine (linear-SVM), partial least squares regression (PLSR), multiple linear regression (MLR) and principal component regression (PCR)], three analogizers [radial basis function support vector machine (rbf-SVM), K-nearest neighbor (KNN) and radial basis function Gaussian process regression (rbf-GPR)], six symbolists [extreme gradient boosting (XGBoost), Cubist, random forest (RF), multiple adaptive regression splines (MARS), gradient boosting machine (GBM), and classification and regression tree (CART)] and two connectionists [principal component analysis artificial neural network (pca-ANN) and deep neural network (DNN)] were employed to learn the regression-based QSAR models for 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. The results show that rbf-SVM, rbf-GPR, XGBoost and DNN generally illustrate better performances than the other algorithms. The overall performances of different algorithms can be ranked from the best to the worst as follows: rbf-SVM > XGBoost > rbf-GPR > Cubist > GBM > DNN > RF > pca-ANN > MARS > linear-GPR ≈ KNN > linear-SVM ≈ PLSR > CART ≈ PCR ≈ MLR. In terms of prediction accuracy and computational efficiency, SVM and XGBoost are recommended to the regression learning for small data sets, and XGBoost is an excellent choice for large data sets. We then investigated the performances of the ensemble models by integrating the predictions of multiple ML algorithms. The results illustrate that the ensembles of two or three algorithms in different categories can indeed improve the predictions of the best individual ML algorithms.

Entities: Disease

Keywords: QSAR; XGBoost; ensemble learning; machine learning; support vector machine

Year: 2021 PMID： 33313673 DOI： 10.1093/bib/bbaa321

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Keyword Cloud
Cited

7 in total

1. A Discovery Strategy for Active Compounds of Chinese Medicine Based on the Prediction Model of Compound-Disease Relationship.

Authors: Mengqi Huo; Sha Peng; Jing Li; Yanling Zhang; Yanjiang Qiao
Journal: J Oncol Date: 2022-07-08 Impact factor: 4.501

2. An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning.

Authors: Bowei Yan; Xiaona Ye; Jing Wang; Junshan Han; Lianlian Wu; Song He; Kunhong Liu; Xiaochen Bo
Journal: Molecules Date: 2022-05-12 Impact factor: 4.927

3. Predicting acupuncture efficacy for functional dyspepsia based on routine clinical features: a machine learning study in the framework of predictive, preventive, and personalized medicine.

Authors: Tao Yin; Hui Zheng; Tingting Ma; Xiaoping Tian; Jing Xu; Ying Li; Lei Lan; Mailan Liu; Ruirui Sun; Yong Tang; Fanrong Liang; Fang Zeng
Journal: EPMA J Date: 2022-02-02 Impact factor: 6.543

4. A comparative mapping of plant species diversity using ensemble learning algorithms combined with high accuracy surface modeling.

Authors: Yapeng Zhao; Xiaozhe Yin; Yan Fu; Tianxiang Yue
Journal: Environ Sci Pollut Res Int Date: 2021-10-21 Impact factor: 4.223

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.

1. A Discovery Strategy for Active Compounds of Chinese Medicine Based on the Prediction Model of Compound-Disease Relationship.

2. An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning.

3. Predicting acupuncture efficacy for functional dyspepsia based on routine clinical features: a machine learning study in the framework of predictive, preventive, and personalized medicine.

4. A comparative mapping of plant species diversity using ensemble learning algorithms combined with high accuracy surface modeling.

5. Complex metabolic interactions between ovary, plasma, urine, and hair in ovarian cancer.

6. Deqi Sensation to Predict Acupuncture Effect on Functional Dyspepsia: A Machine Learning Study.

7. A Methylation Diagnostic Model Based on Random Forests and Neural Networks for Asthma Identification.