Literature DB >> 25746224

Recursive Random Forests Enable Better Predictive Performance and Model Interpretation than Variable Selection by LASSO.

Xiang-Wei Zhu, Yan-Jun Xin, Hui-Lin Ge1.   

Abstract

Variable selection is of crucial significance in QSAR modeling since it increases the model predictive ability and reduces noise. The selection of the right variables is far more complicated than the development of predictive models. In this study, eight continuous and categorical data sets were employed to explore the applicability of two distinct variable selection methods random forests (RF) and least absolute shrinkage and selection operator (LASSO). Variable selection was performed: (1) by using recursive random forests to rule out a quarter of the least important descriptors at each iteration and (2) by using LASSO modeling with 10-fold inner cross-validation to tune its penalty λ for each data set. Along with regular statistical parameters of model performance, we proposed the highest pairwise correlation rate, average pairwise Pearson's correlation coefficient, and Tanimoto coefficient to evaluate the optimal by RF and LASSO in an extensive way. Results showed that variable selection could allow a tremendous reduction of noisy descriptors (at most 96% with RF method in this study) and apparently enhance model's predictive performance as well. Furthermore, random forests showed property of gathering important predictors without restricting their pairwise correlation, which is contrary to LASSO. The mutual exclusion of highly correlated variables in LASSO modeling tends to skip important variables that are highly related to response endpoints and thus undermine the model's predictive performance. The optimal variables selected by RF share low similarity with those by LASSO (e.g., the Tanimoto coefficients were smaller than 0.20 in seven out of eight data sets). We found that the differences between RF and LASSO predictive performances mainly resulted from the variables selected by different strategies rather than the learning algorithms. Our study showed that the right selection of variables is more important than the learning algorithm for modeling. We hope that a standard procedure could be developed based on these proposed statistical metrics to select the truly important variables for model interpretation, as well as for further use to facilitate drug discovery and environmental toxicity assessment.

Entities:  

Mesh:

Year:  2015        PMID: 25746224     DOI: 10.1021/ci500715e

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  5 in total

1.  Establishment of the Radiologic Tumor Invasion Index Based on Radiomics Splenic Features and Clinical Factors to Predict Serous Invasion of Gastric Cancer.

Authors:  Bujian Pan; Weiteng Zhang; Wenjing Chen; Jingwei Zheng; Xinxin Yang; Jing Sun; Xiangwei Sun; Xiaodong Chen; Xian Shen
Journal:  Front Oncol       Date:  2021-08-09       Impact factor: 6.244

2.  Evaluation of Epidermal Growth Factor Receptor 2 Status in Gastric Cancer by CT-Based Deep Learning Radiomics Nomogram.

Authors:  Xiao Guan; Na Lu; Jianping Zhang
Journal:  Front Oncol       Date:  2022-07-11       Impact factor: 5.738

3.  Analysing the influencing factors on caregivers' burden among amyotrophic lateral sclerosis patients in China: a cross-sectional study based on data mining.

Authors:  Ling Lian; Minying Zheng; Ruojie He; Jianing Lin; Weineng Chen; Zhong Pei; Xiaoli Yao
Journal:  BMJ Open       Date:  2022-09-21       Impact factor: 3.006

4.  Reliability of the ASA Physical Status Classification System in Predicting Surgical Morbidity: a Retrospective Analysis.

Authors:  Gen Li; Jeremy P Walco; Dorothee A Mueller; Jonathan P Wanderer; Robert E Freundlich
Journal:  J Med Syst       Date:  2021-07-22       Impact factor: 4.920

5.  A New Application of Multimodality Radiomics Improves Diagnostic Accuracy of Nonpalpable Breast Lesions in Patients with Microcalcifications-Only in Mammography.

Authors:  Shujun Chen; Xiaojun Guan; Zhenyu Shu; Yongfeng Li; Wenming Cao; Fei Dong; Minming Zhang; Guoliang Shao; Feng Shao
Journal:  Med Sci Monit       Date:  2019-12-20
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.