Literature DB >> 34882223

Evaluating the state of the art in missing data imputation for clinical data.

Yuan Luo1.   

Abstract

Clinical data are increasingly being mined to derive new medical knowledge with a goal of enabling greater diagnostic precision, better-personalized therapeutic regimens, improved clinical outcomes and more efficient utilization of health-care resources. However, clinical data are often only available at irregular intervals that vary between patients and type of data, with entries often being unmeasured or unknown. As a result, missing data often represent one of the major impediments to optimal knowledge derivation from clinical data. The Data Analytics Challenge on Missing data Imputation (DACMI) presented a shared clinical dataset with ground truth for evaluating and advancing the state of the art in imputing missing data for clinical time series. We extracted 13 commonly measured blood laboratory tests. To evaluate the imputation performance, we randomly removed one recorded result per laboratory test per patient admission and used them as the ground truth. DACMI is the first shared-task challenge on clinical time series imputation to our best knowledge. The challenge attracted 12 international teams spanning three continents across multiple industries and academia. The evaluation outcome suggests that competitive machine learning and statistical models (e.g. LightGBM, MICE and XGBoost) coupled with carefully engineered temporal and cross-sectional features can achieve strong imputation performance. However, care needs to be taken to prevent overblown model complexity. The challenge participating systems collectively experimented with a wide range of machine learning and probabilistic algorithms to combine temporal imputation and cross-sectional imputation, and their design principles will inform future efforts to better model clinical missing data.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Keywords:  clinical laboratory test; machine learning; missing data imputation; time series

Mesh:

Year:  2022        PMID: 34882223      PMCID: PMC8769894          DOI: 10.1093/bib/bbab489

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  13 in total

1.  MissForest--non-parametric missing value imputation for mixed-type data.

Authors:  Daniel J Stekhoven; Peter Bühlmann
Journal:  Bioinformatics       Date:  2011-10-28       Impact factor: 6.937

2.  Using Machine Learning to Predict Laboratory Test Results.

Authors:  Yuan Luo; Peter Szolovits; Anand S Dighe; Jason M Baron
Journal:  Am J Clin Pathol       Date:  2016-06-21       Impact factor: 2.493

3.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices.

Authors:  Rahul Mazumder; Trevor Hastie; Robert Tibshirani
Journal:  J Mach Learn Res       Date:  2010-03-01       Impact factor: 3.654

4.  Computational medicine: translating models to clinical care.

Authors:  Raimond L Winslow; Natalia Trayanova; Donald Geman; Michael I Miller
Journal:  Sci Transl Med       Date:  2012-10-31       Impact factor: 17.956

5.  Predicting Missing Values in Medical Data via XGBoost Regression.

Authors:  Xinmeng Zhang; Chao Yan; Cheng Gao; Bradley A Malin; You Chen
Journal:  J Healthc Inform Res       Date:  2020-08-03

6.  Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.

Authors:  Yi Deng; Changgee Chang; Moges Seyoum Ido; Qi Long
Journal:  Sci Rep       Date:  2016-02-12       Impact factor: 4.379

7.  When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts.

Authors:  Janus Christian Jakobsen; Christian Gluud; Jørn Wetterslev; Per Winkel
Journal:  BMC Med Res Methodol       Date:  2017-12-06       Impact factor: 4.615

8.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

9.  Biases in electronic health record data due to processes within the healthcare system: retrospective observational study.

Authors:  Denis Agniel; Isaac S Kohane; Griffin M Weber
Journal:  BMJ       Date:  2018-04-30

Review 10.  Missing Data in Clinical Research: A Tutorial on Multiple Imputation.

Authors:  Peter C Austin; Ian R White; Douglas S Lee; Stef van Buuren
Journal:  Can J Cardiol       Date:  2020-12-01       Impact factor: 5.223

View more
  4 in total

Review 1.  Current and Future Applications of Artificial Intelligence in Coronary Artery Disease.

Authors:  Nitesh Gautam; Prachi Saluja; Abdallah Malkawi; Mark G Rabbat; Mouaz H Al-Mallah; Gianluca Pontone; Yiye Zhang; Benjamin C Lee; Subhi J Al'Aref
Journal:  Healthcare (Basel)       Date:  2022-01-26

2.  Comparison between machine learning methods for mortality prediction for sepsis patients with different social determinants.

Authors:  Hanyin Wang; Yikuan Li; Andrew Naidech; Yuan Luo
Journal:  BMC Med Inform Decis Mak       Date:  2022-06-16       Impact factor: 3.298

3.  Discrete Missing Data Imputation Using Multilayer Perceptron and Momentum Gradient Descent.

Authors:  Hu Pan; Zhiwei Ye; Qiyi He; Chunyan Yan; Jianyu Yuan; Xudong Lai; Jun Su; Ruihan Li
Journal:  Sensors (Basel)       Date:  2022-07-28       Impact factor: 3.847

4.  Curating a knowledge base for individuals with coinfection of HIV and SARS-CoV-2: a study protocol of EHR-based data mining and clinical implementation.

Authors:  Chen Liang; Sharon Weissman; Bankole Olatosi; Eric G Poon; Michael E Yarrington; Xiaoming Li
Journal:  BMJ Open       Date:  2022-09-13       Impact factor: 3.006

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.