| Literature DB >> 30998482 |
Janani Venugopalan, Nikhil Chanani, Kevin Maher, May D Wang.
Abstract
The diversity and number of parameters monitored in an intensive care unit (ICU) make the resulting databases highly susceptible to quality issues, such as missing information and erroneous data entry, which adversely affect the downstream processing and predictive modeling. Missing data interpolation and imputation techniques, such as multiple imputation, expectation maximization, and hot-deck imputation techniques do not account for the type of missing data, which can lead to bias. In our study, we first model the missing data as three types: "neglectable" also known as a.k.a "missing completely at random," "recoverable" a.k.a. "missing at random," and "not easily recoverable" a.k.a. "missing not at random." We then design imputation techniques for each type of missing data. We use a publicly available database (MIMIC II) to demonstrate how these imputations perform with random forests for prediction. Our results indicate that these novel imputation techniques outperformed standard mean filling techniques and expectation maximization with a statistical significance p ≤ 0.01 in predicting ICU mortality.Year: 2019 PMID: 30998482 DOI: 10.1109/JBHI.2018.2883606
Source DB: PubMed Journal: IEEE J Biomed Health Inform ISSN: 2168-2194 Impact factor: 5.772