Evangelia Christodoulou1, Jie Ma2, Gary S Collins3, Ewout W Steyerberg4, Jan Y Verbakel5, Ben Van Calster6. 1. Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium. 2. Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK. 3. Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK; Oxford University Hospitals NHS Foundation Trust, Oxford, UK. 4. Department of Biomedical Data Sciences, Leiden University Medical Centre, Albinusdreef 2, Leiden, 2333 ZA The Netherlands. 5. Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium; Department of Public Health & Primary Care, KU Leuven, Kapucijnenvoer 33J box 7001, Leuven, 3000 Belgium; Nuffield Department of Primary Care Health Sciences, University of Oxford, Woodstock Road, Oxford, OX2 6GG UK. 6. Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium; Department of Biomedical Data Sciences, Leiden University Medical Centre, Albinusdreef 2, Leiden, 2333 ZA The Netherlands. Electronic address: ben.vancalster@kuleuven.be.
Abstract
OBJECTIVES: The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. STUDY DESIGN AND SETTING: We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. RESULTS: We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. CONCLUSION: We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.
OBJECTIVES: The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. STUDY DESIGN AND SETTING: We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. RESULTS: We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. CONCLUSION: We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.
Authors: Matthew Moll; Dandi Qiao; Elizabeth A Regan; Gary M Hunninghake; Barry J Make; Ruth Tal-Singer; Michael J McGeachie; Peter J Castaldi; Raul San Jose Estepar; George R Washko; James M Wells; David LaFon; Matthew Strand; Russell P Bowler; MeiLan K Han; Jorgen Vestbo; Bartolome Celli; Peter Calverley; James Crapo; Edwin K Silverman; Brian D Hobbs; Michael H Cho Journal: Chest Date: 2020-04-27 Impact factor: 9.410
Authors: Z Shi; B Hu; U J Schoepf; R H Savage; D M Dargis; C W Pan; X L Li; Q Q Ni; G M Lu; L J Zhang Journal: AJNR Am J Neuroradiol Date: 2020-03-12 Impact factor: 3.825
Authors: Thomas G Myers; Prem N Ramkumar; Benjamin F Ricciardi; Kenneth L Urish; Jens Kipper; Constantinos Ketonis Journal: J Bone Joint Surg Am Date: 2020-05-06 Impact factor: 5.284