Literature DB >> 33283143

Predicting Missing Values in Medical Data via XGBoost Regression.

Xinmeng Zhang1, Chao Yan1, Cheng Gao2, Bradley A Malin2, You Chen2.   

Abstract

PURPOSE: The data in a patient's laboratory test result is a notable resource to support clinical investigation and enhance medical research. However, for a variety of reasons, this type of data often contains a non-trivial number of missing values. For example, physicians may neglect to order tests or document the results. Such a phenomenon reduces the degree to which this data can be utilized to learn efficient and effective predictive models. To address this problem, various approaches have been developed to impute missing laboratory values; however, their performance has been limited. This is due, in part, to the fact no approaches effectively leverage the contextual information 1) in individual or 2) between laboratory test variables.
METHOD: We introduce an approach to combine an unsupervised prefilling strategy with a supervised machine learning approach, in the form of extreme gradient boosting (XGBoost), to leverage both types of context for imputation purposes. We evaluated the methodology through a series of experiments on approximately 8,200 patients' records in the MIMIC-III dataset. RESULT: The results demonstrate that the new model outperforms baseline and state-of-the-art models on 13 commonly collected laboratory test variables. In terms of the normalized root mean square derivation (nRMSD), our model exhibits an imputation improvement by over 20%, on average.
CONCLUSION: Missing data imputation on the temporal variables can be largely improved via prefilling strategy and the supervised training technique, which leverages both the longitudinal and cross-sectional context simultaneously.

Entities:  

Keywords:  XGBoost; imputation; laboratory tests; missing values

Year:  2020        PMID: 33283143      PMCID: PMC7709926          DOI: 10.1007/s41666-020-00077-1

Source DB:  PubMed          Journal:  J Healthc Inform Res        ISSN: 2509-498X


  25 in total

1.  Multiple imputation of missing blood pressure covariates in survival analysis.

Authors:  S van Buuren; H C Boshuizen; D L Knook
Journal:  Stat Med       Date:  1999-03-30       Impact factor: 2.373

2.  Missing value estimation methods for DNA microarrays.

Authors:  O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman
Journal:  Bioinformatics       Date:  2001-06       Impact factor: 6.937

3.  MissForest--non-parametric missing value imputation for mixed-type data.

Authors:  Daniel J Stekhoven; Peter Bühlmann
Journal:  Bioinformatics       Date:  2011-10-28       Impact factor: 6.937

4.  Random Forest Missing Data Algorithms.

Authors:  Fei Tang; Hemant Ishwaran
Journal:  Stat Anal Data Min       Date:  2017-06-13       Impact factor: 1.051

5.  Clinical research informatics and electronic health record data.

Authors:  R L Richesson; M M Horvath; S A Rusincovitch
Journal:  Yearb Med Inform       Date:  2014-08-15

6.  Using Machine Learning to Predict Laboratory Test Results.

Authors:  Yuan Luo; Peter Szolovits; Anand S Dighe; Jason M Baron
Journal:  Am J Clin Pathol       Date:  2016-06-21       Impact factor: 2.493

7.  MISSING DATA IMPUTATION IN THE ELECTRONIC HEALTH RECORD USING DEEPLY LEARNED AUTOENCODERS.

Authors:  Brett K Beaulieu-Jones; Jason H Moore
Journal:  Pac Symp Biocomput       Date:  2017

8.  Multiple Imputation: A Flexible Tool for Handling Missing Data.

Authors:  Peng Li; Elizabeth A Stuart; David B Allison
Journal:  JAMA       Date:  2015-11-10       Impact factor: 56.272

9.  Comparison of imputation methods for missing laboratory data in medicine.

Authors:  Akbar K Waljee; Ashin Mukherjee; Amit G Singal; Yiwei Zhang; Jeffrey Warren; Ulysses Balis; Jorge Marrero; Ji Zhu; Peter Dr Higgins
Journal:  BMJ Open       Date:  2013-08-01       Impact factor: 2.692

10.  Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.

Authors:  Yi Deng; Changgee Chang; Moges Seyoum Ido; Qi Long
Journal:  Sci Rep       Date:  2016-02-12       Impact factor: 4.379

View more
  2 in total

1.  Evaluating the state of the art in missing data imputation for clinical data.

Authors:  Yuan Luo
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

2.  A machine learning-based data mining in medical examination data: a biological features-based biological age prediction model.

Authors:  Qing Yang; Sunan Gao; Junfen Lin; Ke Lyu; Zexu Wu; Yuhao Chen; Yinwei Qiu; Yanrong Zhao; Wei Wang; Tianxiang Lin; Huiyun Pan; Ming Chen
Journal:  BMC Bioinformatics       Date:  2022-10-03       Impact factor: 3.307

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.