Literature DB >> 34798287

Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review.

Swj Nijman1, A M Leeuwenberg2, I Beekers3, I Verkouter3, Jjl Jacobs3, M L Bots2, F W Asselbergs4, Kgm Moons2, Tpa Debray5.   

Abstract

OBJECTIVES: Missing data is a common problem during the development, evaluation, and implementation of prediction models. Although machine learning (ML) methods are often said to be capable of circumventing missing data, it is unclear how these methods are used in medical research. We aim to find out if and how well prediction model studies using machine learning report on their handling of missing data. STUDY DESIGN AND
SETTING: We systematically searched the literature on published papers between 2018 and 2019 about primary studies developing and/or validating clinical prediction models using any supervised ML methodology across medical fields. From the retrieved studies information about the amount and nature (e.g. missing completely at random, potential reasons for missingness) of missing data and the way they were handled were extracted.
RESULTS: We identified 152 machine learning-based clinical prediction model studies. A substantial amount of these 152 papers did not report anything on missing data (n = 56/152). A majority (n = 96/152) reported details on the handling of missing data (e.g., methods used), though many of these (n = 46/96) did not report the amount of the missingness in the data. In these 96 papers the authors only sometimes reported possible reasons for missingness (n = 7/96) and information about missing data mechanisms (n = 8/96). The most common approach for handling missing data was deletion (n = 65/96), mostly via complete-case analysis (CCA) (n = 43/96). Very few studies used multiple imputation (n = 8/96) or built-in mechanisms such as surrogate splits (n = 7/96) that directly address missing data during the development, validation, or implementation of the prediction model.
CONCLUSION: Though missing values are highly common in any type of medical research and certainly in the research based on routine healthcare data, a majority of the prediction model studies using machine learning does not report sufficient information on the presence and handling of missing data. Strategies in which patient data are simply omitted are unfortunately the most often used methods, even though it is generally advised against and well known that it likely causes bias and loss of analytical power in prediction model development and in the predictive accuracy estimates. Prediction model researchers should be much more aware of alternative methodologies to address missing data.
Copyright © 2021 The Authors. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Machine learning; Missing data; literature review; prediction; reporting

Mesh:

Year:  2021        PMID: 34798287     DOI: 10.1016/j.jclinepi.2021.11.023

Source DB:  PubMed          Journal:  J Clin Epidemiol        ISSN: 0895-4356            Impact factor:   6.437


  6 in total

1.  Smart Grid Stability Prediction Model Using Neural Networks to Handle Missing Inputs.

Authors:  Madiah Binti Omar; Rosdiazli Ibrahim; Rhea Mantri; Jhanavi Chaudhary; Kaushik Ram Selvaraj; Kishore Bingi
Journal:  Sensors (Basel)       Date:  2022-06-08       Impact factor: 3.847

2.  Reproducibility of prediction models in health services research.

Authors:  Lazaros Belbasis; Orestis A Panagiotou
Journal:  BMC Res Notes       Date:  2022-06-11

3.  Modern Learning from Big Data in Critical Care: Primum Non Nocere.

Authors:  Benjamin Y Gravesteijn; Ewout W Steyerberg; Hester F Lingsma
Journal:  Neurocrit Care       Date:  2022-05-05       Impact factor: 3.532

4.  Discrete Missing Data Imputation Using Multilayer Perceptron and Momentum Gradient Descent.

Authors:  Hu Pan; Zhiwei Ye; Qiyi He; Chunyan Yan; Jianyu Yuan; Xudong Lai; Jun Su; Ruihan Li
Journal:  Sensors (Basel)       Date:  2022-07-28       Impact factor: 3.847

5.  Data Incompleteness May form a Hard-to-Overcome Barrier to Decoding Life's Mechanism.

Authors:  Liya Kondratyeva; Irina Alekseenko; Igor Chernov; Eugene Sverdlov
Journal:  Biology (Basel)       Date:  2022-08-12

6.  Stacking Ensemble Method for Gestational Diabetes Mellitus Prediction in Chinese Pregnant Women: A Prospective Cohort Study.

Authors:  Ruiyi Liu; Yongle Zhan; Xuan Liu; Yifang Zhang; Luting Gui; Yimin Qu; Hairong Nan; Yu Jiang
Journal:  J Healthc Eng       Date:  2022-09-13       Impact factor: 3.822

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.