| Literature DB >> 30622091 |
Anis Sharafoddini1, Joel A Dubin1,2, David M Maslove3, Joon Lee1,4,5.
Abstract
BACKGROUND: The data missing from patient profiles in intensive care units (ICUs) are substantial and unavoidable. However, this incompleteness is not always random or because of imperfections in the data collection process.Entities:
Keywords: clinical laboratory tests; electronic health records; hospital mortality; machine learning
Year: 2019 PMID: 30622091 PMCID: PMC6329436 DOI: 10.2196/11605
Source DB: PubMed Journal: JMIR Med Inform
Figure 1An example of the augmented data matrix, the imputed data matrix (imputed values are underlined and italicized), and the auxiliary matrix (containing the missingness indicators: 0-present, 1-absent).
Figure 2The retrospective cohort study design. LOS: length of stay.
Figure 3The average missingness rate among patients for laboratory tests in the first 72 hours of admission.
Figure 4Visualization of the correlation matrix for variable indicators in first 72 hours.
The top 18 variables selected on each day after employing predictive mean matching imputation with regard to 30-day mortality. I at the beginning of the variables’ names means indicator. Numbers represent the ranking after aggregating the ranking results from the 3 different feature selection methods.
| Day 1 | Day 2 | Day 3 | |||
| Variable | Score | Variable | Score | Variable | Score |
| BUNa | .762397 | AGb | .795419 | RDWc | .748997 |
| RDW | .680087 | HCO3d | .783337 | BUN | .666667 |
| MCHCe | .668965 | BUN | .77677 | HCO3 | .544964 |
| AG | .540484 | BEf | .609532 | BE | .540542 |
| I-Cag | .436429 | RDW | .608711 | pH | .488433 |
| Crh | .436071 | I-PO2i | .587151 | AG | .450426 |
| HCO3 | .416741 | I-PCO2 | .585947 | I-Lacj | .418716 |
| PO2k | .404289 | I-BE | .585592 | I-pH | .40463 |
| MCVl | .386964 | Clm | .53158 | Cr | .400008 |
| I-Phosn | .374431 | PTo | .462085 | Phos | .387661 |
| PTTp | .353913 | Lac | .461869 | I-PCO2 | .387019 |
| HGBq | .342786 | Cr | .451999 | I-PO2 | .386739 |
| pH | .32767 | PTT | .424956 | I-BE | .385935 |
| Lac | .320339 | Nar | .422474 | PCO2 | .367257 |
| BE | .320299 | Phos | .419171 | NEs | .360791 |
| I-Lac | .318216 | I-Lac | .415475 | MCV | .351266 |
| PCO2 | .316668 | MCV | .368343 | I-PTT | .338352 |
| I-TBilt | .31277 | MCHC | .363146 | Lac | .331205 |
aBUN: blood urea nitrogen.
bAG: anion gap.
cRDW: red cell distribution width.
dHCO3: bicarbonate.
eMCHC: mean corpuscular hemoglobin concentration.
fBE: base excess.
gCA: calcium.
hCr: creatinine.
iPO2: partial pressure of oxygen.
jLac: lactate.
kPCO2: partial pressure of carbon dioxide.
lMCV: mean corpuscular volume.
mCl: chloride.
nPhos: phosphate.
oPT: prothrombin time.
pPTT: partial thromboplastin time.
qHGB: hemoglobin.
rNa: sodium.
sNE: absolute neutrophils.
tTBil: total bilirubin.
The top 18 variables selected on each day after employing predictive mean matching imputation with regard to in-hospital mortality. I at the beginning of the variables names means indicator. Numbers represent the ranking after aggregating the ranking results from the 3 different feature selection methods.
| Day 1 | Day 2 | Day 3 | |||
| Variable | Score | Variable | Score | Variable | Score |
| BUNa | .825715 | BUN | 1 | RDWb | .75246 |
| AGc | .668918 | RDW | .711852 | BUN | .635729 |
| RDW | .573188 | HCO3d | .684191 | BEe | .633926 |
| HCO3 | .531746 | AG | .664339 | HCO3 | .62367 |
| MCHCf | .507343 | BE | .528778 | I-BE | .595553 |
| PCO2g | .489483 | MCHC | .503805 | I-PCO2 | .595238 |
| Crh | .480181 | PTi | .453111 | I-PO2j | .594924 |
| BE | .452599 | Clk | .429405 | pH | .556242 |
| I-Lacl | .436382 | I-Lac | .425279 | Phosm | .494694 |
| Lac | .415773 | Cr | .395266 | AG | .492864 |
| HGBn | .414263 | I-PO2 | .382404 | I-pH | .470007 |
| pH | .402466 | I-PCO2 | .381737 | I-Lac | .469215 |
| I-TBilo | .399363 | I-BE | .381448 | Cr | .415249 |
| I-Ca | .395278 | PTTp | .357339 | Lac | .396136 |
| I-ALTq | .376004 | Phos | .352738 | NEr | .338372 |
| I-ASTs | .375944 | Nat | .345109 | PT | .326491 |
| LYu | .375163 | I-PT | .333936 | LY | .319146 |
| I-ALKv | .366346 | BGw | .320947 | MCVx | .314868 |
aBUN: blood urea nitrogen.
bRDW: red cell distribution width.
cAG: anion gap.
dHCO3: bicarbonate.
eBE: base excess.
fMCHC: mean corpuscular hemoglobin concentration.
gPCO2: partial pressure of carbon dioxide.
hCr: creatinine.
iPT: prothrombin time.
jPO2: partial pressure of oxygen.
kCl: chloride.
lLac: lactate.
mPhos: phosphate.
nHGB: hemoglobin.
oTBil: total bilirubin.
pPTT: partial prothrombin time.
qALT: alanine transaminase.
rNE: absolute neutrophils.
sAST: aspartate transaminase
tNa: sodium
uLY: absolute lymphocytes.
vALK: alkaline phosphatase.
wBG: blood glucose.
xMCV: mean corpuscular volume.
Results from feature selection by least absolute shrinkage and selection operator (LASSO) for 3 days (area under the curve of the receiver operating characteristics are reported with the SE). The best performing model refers to the model with a lambda value associated with minimum cross-validation error. The adjusted model refers to a LASSO model with the largest value of lambda such that the error remains within 1 SE of the minimum.
| Criteria, outcome, and imputation method | Day 1 | Day 2 | Day 3 | ||
| HDb | 0.7858 (0.0033) | 0.7685 (0.0041) | 0.7302 (0.0043) | ||
| PMMc | 0.7876 (0.0039) | 0.7708 (0.0046) | 0.7391 (0.0053) | ||
| HD | 0.7983 (0.0040) | 0.7804 (0.0046) | 0.7476 (0.0042) | ||
| PMM | 0.8007 (0.0047) | 0.7838 (0.0049) | 0.7582 (0.0054) | ||
| HD | 23 (43) | 24 (48) | 19 (707) | ||
| PMM | 26 (45) | 26 (47) | 17 (68) | ||
| HD | 28 (46) | 29 (48) | 21 (60) | ||
| PMM | 29 (47) | 27 (49) | 24 (62) | ||
| HD | 0.7826 (0.0034) | 0.7646 (0.0043) | 0.7262 (0.0041) | ||
| PMM | 0.7840 (0.0038) | 0.7667 (0.0045) | 0.7339 (0.0044) | ||
| HD | 0.7944 (0.0043) | 0.7762 (0.0047) | 0.7439 (0.0041) | ||
| PMM | 0.7961 (0.0049) | 0.7793 (0.0050) | 0.7536 (0.0045) | ||
| HD | 20 (45) | 16 (48) | 22 (67) | ||
| PMM | 19 (45) | 16 (52) | 31 (62) | ||
| HD | 20 (47) | 13 (42) | 16 (64) | ||
| PMM | 18 (50) | 11 (41) | 16 (62) | ||
aAUROC: area under the curve of the receiver operating characteristic.
bHD: hot deck.
cPMM: predictive mean matching.
Figure 5The 95% CIs of the area under the curve of the receiver operating characteristic for logistic regression, decision tree, and random forest models on missingness indicators, simplified acute physiology score-II, and actual variables with and without the missingness indicators.
Figure 6The receiver operating characteristic curves for logistic regression 30-day mortality prediction on day 1.