Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 An extensive experimental survey of regression methods.

Literature DB >> 30654138

An extensive experimental survey of regression methods.

M Fernández-Delgado¹, M S Sirsat², E Cernadas², S Alawadi², S Barro², M Febrero-Bande³.

Abstract

Regression is a very relevant problem in machine learning, with many different available approaches. The current work presents a comparison of a large collection composed by 77 popular regression models which belong to 19 families: linear and generalized linear models, generalized additive models, least squares, projection methods, LASSO and ridge regression, Bayesian models, Gaussian processes, quantile regression, nearest neighbors, regression trees and rules, random forests, bagging and boosting, neural networks, deep learning and support vector regression. These methods are evaluated using all the regression datasets of the UCI machine learning repository (83 datasets), with some exceptions due to technical reasons. The experimental work identifies several outstanding regression models: the M5 rule-based model with corrections based on nearest neighbors (cubist), the gradient boosted machine (gbm), the boosting ensemble of regression trees (bstTree) and the M5 regression tree. Cubist achieves the best squared correlation ( R2) in 15.7% of datasets being very near to it, with difference below 0.2 for 89.1% of datasets, and the median of these differences over the dataset collection is very low (0.0192), compared e.g. to the classical linear regression (0.150). However, cubist is slow and fails in several large datasets, while other similar regression models as M5 never fail and its difference to the best R2 is below 0.2 for 92.8% of datasets. Other well-performing regression models are the committee of neural networks (avNNet), extremely randomized regression trees (extraTrees, which achieves the best R2 in 33.7% of datasets), random forest (rf) and ε-support vector regression (svr), but they are slower and fail in several datasets. The fastest regression model is least angle regression lars, which is 70 and 2,115 times faster than M5 and cubist, respectively. The model which requires least memory is non-negative least squares (nnls), about 2 GB, similarly to cubist, while M5 requires about 8 GB. For 97.6% of datasets there is a regression model among the 10 bests which is very near (difference below 0.1) to the best R2, which increases to 100% allowing differences of 0.2. Therefore, provided that our dataset and model collection are representative enough, the main conclusion of this study is that, for a new regression problem, some model in our top-10 should achieve R2 near to the best attainable for that problem.

Keywords: Cubist; Extremely randomized regression tree; Gradient boosted machine; M5; Regression; UCI machine learning repository

Mesh：

Year: 2018 PMID： 30654138 DOI： 10.1016/j.neunet.2018.12.010

Source DB: PubMed Journal: Neural Netw ISSN： 0893-6080

Keyword Cloud
Cited

11 in total

1. Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults.

Authors: Jaime Lynn Speiser; Kathryn E Callahan; Denise K Houston; Jason Fanning; Thomas M Gill; Jack M Guralnik; Anne B Newman; Marco Pahor; W Jack Rejeski; Michael E Miller
Journal: J Gerontol A Biol Sci Med Sci Date: 2021-03-31 Impact factor: 6.053

2. Comparison of logP and logD correction models trained with public and proprietary data sets.

Authors: Ignacio Aliagas; Alberto Gobbi; Man-Ling Lee; Benjamin D Sellers
Journal: J Comput Aided Mol Des Date: 2022-04-01 Impact factor: 3.686

3. A Machine Learning Approach in Autism Spectrum Disorders: From Sensory Processing to Behavior Problems.

Authors: Heba Alateyat; Sara Cruz; Eva Cernadas; María Tubío-Fungueiriño; Adriana Sampaio; Alberto González-Villar; Angel Carracedo; Manuel Fernández-Delgado; Montse Fernández-Prieto
Journal: Front Mol Neurosci Date: 2022-05-09 Impact factor: 6.261

4. An Ensemble Learning Based Classification Approach for the Prediction of Household Solid Waste Generation.

Authors: Abdallah Namoun; Burhan Rashid Hussein; Ali Tufail; Ahmed Alrehaili; Toqeer Ali Syed; Oussama BenRhouma
Journal: Sensors (Basel) Date: 2022-05-05 Impact factor: 3.847

5. Developing and Validating Methods to Assemble Systemic Lupus Erythematosus Births in the Electronic Health Record.

Authors: April Barnado; Amanda M Eudy; Ashley Blaske; Lee Wheless; Katie Kirchoff; Jim C Oates; Megan E B Clowse
Journal: Arthritis Care Res (Hoboken) Date: 2022-03-16 Impact factor: 5.178

6. Potential of spectroscopic analyses for non-destructive estimation of tea quality-related metabolites in fresh new leaves.

Authors: Hiroto Yamashita; Rei Sonobe; Yuhei Hirono; Akio Morita; Takashi Ikka
Journal: Sci Rep Date: 2021-02-18 Impact factor: 4.379

7. Intra-abdominal infection in acute pancreatitis in eastern China: microbiological features and a prediction model.

Authors: Cheng Zhu; Sheng Zhang; Han Zhong; Zhichun Gu; Yuening Kang; Chun Pan; Zhijun Xu; Erzhen Chen; Yuetian Yu; Qian Wang; Enqiang Mao
Journal: Ann Transl Med Date: 2021-03

8. Viability Study of Machine Learning-Based Prediction of COVID-19 Pandemic Impact in Obsessive-Compulsive Disorder Patients.

Authors: María Tubío-Fungueiriño; Eva Cernadas; Óscar F Gonçalves; Cinto Segalas; Sara Bertolín; Lorea Mar-Barrutia; Eva Real; Manuel Fernández-Delgado; Jose M Menchón; Sandra Carvalho; Pino Alonso; Angel Carracedo; Montse Fernández-Prieto
Journal: Front Neuroinform Date: 2022-02-10 Impact factor: 4.081

9. Development and validation of a model to predict rebleeding within three days after endoscopic hemostasis for high-risk peptic ulcer bleeding.

Authors: Yongkang Lai; Yuling Xu; Zhenhua Zhu; Xiaolin Pan; Shunhua Long; Wangdi Liao; Bimin Li; Yin Zhu; Youxiang Chen; Xu Shu
Journal: BMC Gastroenterol Date: 2022-02-14 Impact factor: 3.067

10. Identification of miRNA-Based Signature as a Novel Potential Prognostic Biomarker in Patients with Breast Cancer.

Authors: Jia Tang; Wei Ma; Qinlong Zeng; Jieliang Tan; Keshen Cao; Liangping Luo
Journal: Dis Markers Date: 2019-12-30 Impact factor: 3.434