Literature DB >> 28715209

Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

Richard L Marchese Robinson1,2, Anna Palczewska3, Jan Palczewski4, Nathan Kidley1.   

Abstract

The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.

Mesh:

Year:  2017        PMID: 28715209     DOI: 10.1021/acs.jcim.6b00753

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  15 in total

1.  Multiclass machine learning vs. conventional calculators for stroke/CVD risk assessment using carotid plaque predictors with coronary angiography scores as gold standard: a 500 participants study.

Authors:  Ankush D Jamthikar; Deep Gupta; Laura E Mantella; Luca Saba; John R Laird; Amer M Johri; Jasjit S Suri
Journal:  Int J Cardiovasc Imaging       Date:  2020-11-12       Impact factor: 2.357

2.  Exploiting machine learning for end-to-end drug discovery and development.

Authors:  Sean Ekins; Ana C Puhl; Kimberley M Zorn; Thomas R Lane; Daniel P Russo; Jennifer J Klein; Anthony J Hickey; Alex M Clark
Journal:  Nat Mater       Date:  2019-04-18       Impact factor: 43.841

Review 3.  Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system.

Authors:  Vertika Gautam; Anand Gaurav; Neeraj Masand; Vannajan Sanghiran Lee; Vaishali M Patil
Journal:  Mol Divers       Date:  2022-07-11       Impact factor: 3.364

4.  Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

Authors:  Rocco Meli; Garrett M Morris; Philip C Biggin
Journal:  Front Bioinform       Date:  2022-06-17

5.  Risk prediction for delayed clearance of high-dose methotrexate in pediatric hematological malignancies by machine learning.

Authors:  Min Zhan; Zebin Chen; Changcai Ding; Qiang Qu; Guoqiang Wang; Sixi Liu; Feiqiu Wen
Journal:  Int J Hematol       Date:  2021-06-25       Impact factor: 2.490

6.  A low-cost machine learning-based cardiovascular/stroke risk assessment system: integration of conventional factors with image phenotypes.

Authors:  Ankush Jamthikar; Deep Gupta; Narendra N Khanna; Luca Saba; Tadashi Araki; Klaudija Viskovic; Harman S Suri; Ajay Gupta; Sophie Mavrogeni; Monika Turk; John R Laird; Gyan Pareek; Martin Miner; Petros P Sfikakis; Athanasios Protogerou; George D Kitas; Vijay Viswanathan; Andrew Nicolaides; Deepak L Bhatt; Jasjit S Suri
Journal:  Cardiovasc Diagn Ther       Date:  2019-10

7.  Implicit-descriptor ligand-based virtual screening by means of collaborative filtering.

Authors:  Raghuram Srinivas; Pavel V Klimovich; Eric C Larson
Journal:  J Cheminform       Date:  2018-11-22       Impact factor: 5.514

8.  Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications.

Authors:  Chia-Hsiu Chen; Kenichi Tanaka; Masaaki Kotera; Kimito Funatsu
Journal:  J Cheminform       Date:  2020-03-30       Impact factor: 5.514

9.  Machine learning for the prediction of severe pneumonia during posttransplant hospitalization in recipients of a deceased-donor kidney transplant.

Authors:  You Luo; Zuofu Tang; Xiao Hu; Shuo Lu; Bin Miao; Songlin Hong; Haiyun Bai; Chen Sun; Jiang Qiu; Huiying Liang; Ning Na
Journal:  Ann Transl Med       Date:  2020-02

Review 10.  Decoys Selection in Benchmarking Datasets: Overview and Perspectives.

Authors:  Manon Réau; Florent Langenfeld; Jean-François Zagury; Nathalie Lagarde; Matthieu Montes
Journal:  Front Pharmacol       Date:  2018-01-24       Impact factor: 5.810

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.