| Literature DB >> 32975523 |
Adrienne Kline1,2,3, Theresa Kline4, Zahra Shakeri Hossein Abad3,5, Joon Lee3,5,6.
Abstract
BACKGROUND: Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account.Entities:
Keywords: item response theory; machine learning; mortality; statistical model
Mesh:
Year: 2020 PMID: 32975523 PMCID: PMC7547395 DOI: 10.2196/20268
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Medical Information Mart for Intensive Care III variables based on Simplified Acute Physiology Score II.
| Feature name | Description | Normal values, units |
| AIDS | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Heme malignancy | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Metastatic cancer | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Minimum GCSa | Glasgow Coma Scale | 15b, 1-15 |
| WBCc minimum | Lowest white blood cell | 4-10, 109 |
| WBC maximum | Highest white blood cell | 4-10, 109 |
| Na minimum | Sodium minimum | 135-145, mmol/L |
| Na maximum | Sodium maximum | 135-145, mmol/L |
| K minimum | Potassium minimum | 3.5-5, mmol/L |
| K maximum | Potassium maximum | 3.5-5, mmol/L |
| Bilirubin maximum | Bilirubin maximum | ≤1.52, mg/dL |
| HCO3 minimum | Bicarbonate minimum | 24-30, mmol/L |
| HCO3 maximum | Bicarbonate maximum | 24-30, mmol/L |
| BUNd minimum | Blood urea nitrogen minimum | 7-22, mg/dL |
| BUN maximum | Blood urea nitrogen maximum | 7-22, mg/dL |
| PO2 | Partial pressure of oxygen | 85-105, mm Hg |
| FiO2 | Fraction of inspired oxygen | 21, % |
| Heart rate mean | Mean heart rate | 60-100, bpm |
| BP mean | Mean systolic blood pressure | 95-145, mm Hg |
| Max temp | Maximum temperature | 36.5-37.5, ℃ |
| Urine output | Urine output | 800-2000e, mL/24h |
| Sex | Male or female | Male: 1, Female: 0, Male or female |
| Age | Age in years | ≤65: 0, years |
| Admission type | Emergency or elective | Emergency: 1; else: 0, N/Af |
aGCA: Glasgow Coma Scale.
bTeasdale and Jennett, 1974 [41]; Teasdale and Jennett, 1976 [42].
cWBC: white blood cell.
dBUN: blood urea nitrogen.
eMedical CMP, 2011 [43].
fN/A: not applicable.
Electronic intensive care unit data set variables based on Acute Physiology and Chronic Health Evaluation IV.
| Feature name | Description | Normal values, Units |
| GCSa | Glasgow Coma Scale | 15b, 1-15 |
| Urine output | Urine output in 24 hours | 800-2000c, mL/24 hour |
| WBCd | White blood cell count | 4-10, 109 |
| Na | Serum sodium | 135-145, mmol/L |
| Temperature | Temperature in Celsius | 36.5-37.5e, ℃ |
| Respiration rate | Highest white blood cell | 12-20f, breaths/min |
| Heart rate | Heart rate/min | 60-100f, bpm |
| Mean blood pressure | Mean arterial pressure | 70-100g, mm Hg |
| Creatinine | Serum creatinine | 0.57-1.02 (Fh); 0.79-1.36 (Mi), mEq/L |
| pH | Arterial pH | 7.35-7.45, N/Aj |
| Hematocrit | Red blood cell volume | 37-46 (F); 38-50 (M), % |
| Albumin | Serum albumin | 3.5-5.0, g/dL |
| PO2 | Partial pressure of oxygen | 85-105, mm Hg |
| PCO2 | Partial pressure carbon dioxide | 35-45, mm Hg |
| BUNk | Blood urea nitrogen maximum | 7-22, mg/dL |
| Glucose | Blood sugar level | 68-200, mL/dL |
| Bili | Serum bilirubin | ≤1.52, md/dL |
| FiO2 | Fraction of inspired oxygen | 21l, % |
| Sex | Male or female | Male: 1; female: 0, M or F |
| Age | Age in years | ≤65: 0, years |
| Leukemia | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Lymphoma | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Cirrhosis | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Hepatic failure | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Metastatic cancer | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| AIDS | Pre-existing diagnosis | Absent: 0, 0 or 1 |
| Thrombolytics | Medical intervention | Absent: 0, 0 or 1 |
| Ventilator | Medical intervention | Absent: 0, 0 or 1 |
| Dialysis | Medical intervention | Absent: 0, 0 or 1 |
| Immunosuppressed | Medical intervention | Absent: 0, 0 or 1 |
| Elective surgery | Medical intervention | Absent: 0, 0 or 1 |
aGCS: Glasgow Coma Scale.
bTeasdale and Jennett, 1974 [41]; Teasdale and Jennett, 1976 [42].
cMedical CMP, 2011 [43].
dWBC: white blood cell.
eLapum et al. 2018 [44].
fMDCalc [45].
gHealthline [46].
hF: female.
iM: male.
jN/A: not applicable.
kBUN: blood urea nitrogen.
leICU Collaborative Research Database [47].
Figure 1Characteristic curve using a 2-parameter logistic model.
Item response theory case classification difficulty index results.
| Data set | CDIa range | Overall, mean (SD) | Point-biserial correlationsb | No death, mean (SD) | Death, mean (SD) | |||
|
|
|
|
|
| ||||
| MIMIC-IIId balanced | −1.81 to +2.16 | 0.00 (0.85) | 0.37 | <.001 | −0.32 (0.79) | 0.32 (0.80) | 35.76 (8077) | <.001 |
| MIMIC-III imbalanced | −1.70 to +2.27 | 0.00 (0.85) | 0.35 | <.001 | −0.21 (0.80) | 0.42 (0.80) | 40.88 (12116) | <.001 |
| eICUe balanced | −2.63 to +2.83 | 0.00 (0.80) | 0.50 | <.001 | −0.40 (0.73) | 0.40 (0.64) | 86.18 (21939) | <.001 |
| eICU imbalanced | −2.55 to +2.93 | 0.00 (0.81) | 0.51 | <.001 | −0.29 (0.73) | 0.59 (0.61) | 109.09 (32909) | <.001 |
aCDI: classification difficulty index.
bBetween CDI and outcome (no death or death).
cDifference between no death and death means.
dMIMIC III: Medical Information Mart for Intensive Care.
eeICU: electronic intensive care unit.
Figure 2Classification Difficulty Indexes in MIMIC-III (A) balanced and (B) imbalanced data. CDI: classification difficulty index; MIMIC: Medical Information Mart for Intensive Care.
Figure 3Classification Difficulty Indexes in eICU (A) balanced and (B) imbalanced data. eICU: electronic Intensive Care Unit; DT: decision tree; KNN: K-nearest neighbors; LDA: linear discriminant analysis; LR: logistic regression; NB: naive Bayes; NN: neural network.
Figure 4Medical Information Mart for Intensive Care (MIMIC) III generalized linear mixed model (GLMM) accuracy results; machine learning classifier against CDI for (A) balanced and (B) imbalanced data. DT: decision tree; KNN: K-nearest neighbors; LDA: linear discriminant analysis; LR: logistic regression; NB: naive Bayes; NN: neural network.
Figure 5Electronic intensive care unit (eICU) generalized linear mixed model (GLMM) accuracy results; machine learning classifier against CDI for (A) balanced and (B) imbalanced data. DT: decision tree; KNN: K-nearest neighbors; LDA: linear discriminant analysis; LR: logistic regression; NB: naive Bayes; NN: neural network.
Medical Information Mart for Intensive Care III feature parameters.
| Feature parameters | Slope | Location | |
|
| |||
|
| Blood urea nitrogen (minimum) | 5.64 | 0.09 |
|
| Urine output | 0.15 | −2.23 |
|
| |||
|
| Blood urea nitrogen (minimum) | 5.22 | 0.02 |
|
| Urine output | 0.09 | −3.59 |
Electronic intensive care unit feature parameters.
| Feature parameter | Slope | Location | |
|
| |||
|
| Blood urea nitrogen (minimum) | 1.55 | −0.33 |
|
| Urine output | 0.04 | −1.19 |
|
| |||
|
| Blood urea nitrogen (minimum) | 1.49 | −0.1 |
|
| Urine output | 0.03 | −1.39 |
Medical Information Mart for Intensive Care III classification performance in traditional metrics.
| Metric | LRa (%) | LDAb (%) | KNNc (%) | DTd (%) | NBe (%) | NNf (%) | |
|
| |||||||
|
| Accuracy | 75.3 | 75.0 | 67.2 | 70.9 | 70.4 | 76.1 |
|
| Precision | 75.8 | 75.6 | 69.3 | 71.1 | 79.5 | 75.6 |
|
| Recall | 74.3 | 73.8 | 61.8 | 70.6 | 54.9 | 77.2 |
|
| F1 | 75.0 | 74.7 | 65.3 | 70.8 | 64.9 | 76.4 |
|
| AUCg | 75.3 | 75.0 | 67.2 | 70.9 | 70.4 | 76.5 |
|
| |||||||
|
| Accuracy | 78.3 | 77.9 | 72.8 | 73.7 | 75.3 | 80.5 |
|
| Precision | 73.3 | 73.8 | 63.1 | 60.6 | 67.7 | 72.7 |
|
| Recall | 54.8 | 52.1 | 44.4 | 60.6 | 49.6 | 66.6 |
|
| F1 | 62.7 | 61.1 | 52.2 | 60.6 | 57.3 | 69.5 |
|
| AUC | 72.4 | 71.4 | 65.7 | 70.9 | 68.9 | 76.9 |
aLR: logistic regression.
bLDA: linear discriminant analysis.
cKNN: K-nearest neighbor.
dDT: decision tree.
eNB: naive Bayes.
fNN: neural network.
gAUC: area under the curve.
Item response theory–based Medical Information Mart for Intensive Care III mortality prediction accuracy stratified by classification difficulty index.
| Number of cases | CDIa | LRb (%) | LDAc (%) | KNNd (%) | DTe (%) | NBf (%) | NNg (%) | |
|
| ||||||||
|
| 1 | 2.5 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
|
| 13 | 2.0 | 92.3 | 92.3 | 84.6 | 92.3 | 92.3 | 92.3 |
|
| 316 | 1.5 | 90.2 | 88.2 | 80.4 | 80.4 | 89.2 | 88.3 |
|
| 1884 | 1.0 | 75.6 | 74.9 | 68.2 | 68.8 | 68.4 | 77.0 |
|
| 1321 | 0.5 | 70.5 | 70.6 | 63.5 | 65.9 | 65.4 | 71.1 |
|
| 952 | 0.0 | 72.0 | 72.4 | 62.8 | 68.8 | 66.2 | 73.9 |
|
| 1346 | −0.5 | 70.9 | 70.6 | 60.4 | 67.1 | 63.7 | 72.1 |
|
| 1955 | −1.0 | 77.0 | 77.1 | 70.9 | 75.4 | 75.2 | 78.3 |
|
| 288 | −1.5 | 94.8 | 94.8 | 83.3 | 91.0 | 95.5 | 94.5 |
|
| 3 | −2.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
|
| ||||||||
|
| 1 | 2.5 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
|
| 30 | 2.0 | 93.3 | 93.3 | 76.7 | 73.3 | 93.3 | 93.3 |
|
| 571 | 1.5 | 77.4 | 75.7 | 64.1 | 71.1 | 77.4 | 78.3 |
|
| 1886 | 1.0 | 70.6 | 70.3 | 63.9 | 65.0 | 64.6 | 73.3 |
|
| 1537 | 0.5 | 76.3 | 75.5 | 67.3 | 71.2 | 72.7 | 79.7 |
|
| 1251 | 0.0 | 78.7 | 78.0 | 75.6 | 74.5 | 76.8 | 80.3 |
|
| 2794 | −0.5 | 75.0 | 74.5 | 71.0 | 72.1 | 72.3 | 78.4 |
|
| 2722 | −1.0 | 88.3 | 88.3 | 85.0 | 83.3 | 87.1 | 89.1 |
|
| 325 | −1.5 | 99.1 | 99.1 | 96.6 | 98.2 | 99.1 | 98.8 |
aCDI: classification difficulty index.
bLR: logistic regression.
cLDA: linear discriminant analysis.
dKNN: K-nearest neighbor.
eDT: decision tree.
fNB: naive Bayes.
gNN: neural network.
Electronic intensive care unit classification performance in traditional metrics.
| Metric | LRa (%) | LDAb (%) | KNNc (%) | DTd (%) | NBe (%) | NNf (%) | |
|
| |||||||
|
| Accuracy | 77.9 | 77.4 | 67.2 | 76.7 | 66.6 | 84.7 |
|
| Precision | 77.9 | 78.1 | 67.9 | 76.7 | 73.7 | 84.5 |
|
| Recall | 77.9 | 76.3 | 65.3 | 76.8 | 51.6 | 84.9 |
|
| F1 | 77.8 | 77.2 | 66.6 | 76.7 | 60.7 | 84.7 |
|
| AUCg | 77.9 | 77.4 | 67.2 | 77.1 | 66.6 | 85.9 |
|
| |||||||
|
| Accuracy | 78.0 | 80.1 | 73.6 | 81.6 | 73.3 | 89.5 |
|
| Precision | 73.6 | 75.1 | 64.1 | 72.1 | 62.0 | 84.7 |
|
| Recall | 62.1 | 60.2 | 47.2 | 72.9 | 51.5 | 83.5 |
|
| F1 | 67.4 | 66.8 | 54.4 | 72.5 | 56.3 | 84.1 |
|
| AUC | 75.5 | 75.1 | 67.0 | 79.3 | 67.9 | 87.8 |
aLR: logistic regression.
bLDA: linear discriminant analysis.
cKNN: K-nearest neighbor.
dDT: decision tree.
eNB: naive Bayes.
fNN: neural network.
gAUC: area under the curve.
Item response theory–based electronic intensive care unit mortality prediction accuracy stratified by classification difficulty index.
| Number of cases | CDIa | LRb (%) | LDAc (%) | KNNd (%) | DTe (%) | NBf (%) | NNg (%) | ||||||||
|
| |||||||||||||||
|
| 2 | 3.0 | 100.0 | 100.0 | 100.0 | 50.0 | 100.0 | 100.0 | |||||||
|
| 61 | 2.5 | 82.0 | 82.0 | 75.4 | 78.7 | 86.9 | 85.2 | |||||||
|
| 160 | 2.0 | 81.3 | 82.5 | 75.0 | 76.3 | 81.9 | 83.4 | |||||||
|
| 621 | 1.5 | 86.2 | 86.8 | 74.5 | 79.2 | 83.7 | 87.9 | |||||||
|
| 3167 | 1.0 | 83.7 | 82.9 | 72.1 | 78.3 | 66.3 | 85.4 | |||||||
|
| 4998 | 0.5 | 74.0 | 72.7 | 64.7 | 73.1 | 55.2 | 80.9 | |||||||
|
| 4776 | 0.0 | 70.9 | 70.1 | 58.5 | 71.5 | 57.3 | 80.0 | |||||||
|
| 3864 | −0.5 | 73.8 | 74.5 | 63.3 | 74.4 | 67.4 | 84.3 | |||||||
|
| 2858 | −1.0 | 85.4 | 85.5 | 74.4 | 84.8 | 83.1 | 91.8 | |||||||
|
| 1183 | −1.5 | 92.5 | 92.6 | 84.3 | 91.7 | 91.9 | 96.4 | |||||||
|
| 240 | −2.0 | 97.1 | 97.1 | 91.7 | 95.8 | 96.3 | 97.9 | |||||||
|
| 10 | −2.5 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | |||||||
|
| |||||||||||||||
|
| 6 | 3.0 | 66.7 | 83.3 | 83.3 | 66.6 | 66.6 | 83.3 | |||||||
|
| 58 | 2.5 | 82.8 | 81.0 | 69.0 | 75.9 | 87.9 | 84.5 | |||||||
| 215 | 2.0 | 79.1 | 78.6 | 67.0 | 72.6 | 76.3 | 82.3 | ||||||||
|
| 1369 | 1.5 | 79.8 | 79.0 | 65.4 | 75.2 | 72.8 | 85.7 | |||||||
|
| 4776 | 1.0 | 72.2 | 72.4 | 61.6 | 74.8 | 58.4 | 83.9 | |||||||
|
| 6657 | 0.5 | 67.3 | 67.0 | 72.1 | 57.3 | 57.3 | 83.1 | |||||||
|
| 7068 | 0.0 | 76.4 | 76.9 | 70.0 | 78.8 | 70.3 | 88.5 | |||||||
|
| 6396 | −0.5 | 87.1 | 87.3 | 83.2 | 87.3 | 83.4 | 93.7 | |||||||
|
| 4265 | −1.0 | 94.8 | 95.0 | 92.0 | 94.3 | 92.7 | 97.7 | |||||||
|
| 1763 | -1.5 | 98.0 | 98.0 | 97.1 | 97.9 | 97.3 | 99.4 | |||||||
|
| 317 | -2.0 | 99.1 | 99.1 | 98.4 | 98.4 | 98.4 | 99.1 | |||||||
|
| 20 | -2.5 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | |||||||
aCDI: classification difficulty index.
bLR: logistic regression.
cLDA: linear discriminant analysis.
dKNN: K-nearest neighbor.
eDT: decision tree.
fNB: naive Bayes.
gNN: neural network.
Tests of the effects of classification difficulty index, classifier, and their interaction for the Medical Information Mart for Intensive Care III data set.
| Effect | Significance | Significant paired comparisons ( | ||
|
|
| |||
|
| ||||
|
| ||||
|
| CDIa | 123 (6,48456) | <.001 |
−1.5 vs −1.0, −0.5, 0.0 −1.0 vs −.05, 0.0 +1.0 vs +0.5, 0.0 +1.5 vs +1.0, +0.5, 0.0 |
|
| MLb classifier | 52 (5,48456) | <.001 |
LRc, LDAd, NBe, NNf vs KNNg, DTh DT vs KNN |
|
| CDI×ML classifier | 2 (30,48456) | <.001 |
−1.5: LR, LDA, NB, NN, DT vs KNN −1.0: LR, LDA, NB, NN, DT vs KNN −0.5: LR, LDA, DT, NN vs NB, KNN 0.0: LR, LDA, DT, NN vs NB, KNN +0.5: LR, LDA, NN vs NB, KNN, DT +1.0: LR, LDA, NN vs NB, KNN, DT +1.5: LR, LDA, NB, NN vs KNN DT |
|
| ||||
|
| CDI | 314 (6,72660) | <.001 |
−1.5 vs −1.0, −0.5, 0.0 −1.0 vs −.05, 0.0 0.0 vs −0.5, +0.5, +1.0 +0.5 vs +1.0 +1.5 vs +1.0 |
|
| ML classifier | 12 (5,72660) | <.001 |
LR, LDA, NB, NN vs KNN, DT |
|
| CDI×ML classifier | 2 (30,72660) | .004 |
−1.5: no differences −1.0: LR, LDA, NB, NN vs KNN, DT −0.5: LR, LDA, NN vs NB, KNN, DT 0.0: NN vs DT |
aCDI: classification difficulty index.
bML: machine learning.
cLR: logistic regression.
dLDA: linear discriminant analysis.
eNB: naive Bayes.
fNN: neural network.
gKNN: K-nearest neighbor.
hDT: decision tree.
Tests of the effects of classification, classifier, and their interaction for the electronic intensive care unit data set.
| Effect | Significance | Significant paired comparisons ( | |||||
|
|
| ||||||
|
| |||||||
|
| CDIa | 382 (8,131586) | <.001 |
−2.0 vs −1.5, −1.0, −0.5, 0.0 −1.5 vs −1.0, −0.5, 0.0 −1.0 vs −.05, 0.0 +1.0 vs +0.5, 0.0 +1.5 vs +1.0, +0.5, 0.0 +2.0 vs +0.5, 0.0 | |||
|
| MLb classifier | 58 (5,131586) | <.001 |
NNc vs LRd, LDAe, DTf vs NBg vs KNNh | |||
|
| CDI×ML classifier | 9 (40,131586) | <.001 |
−2.0: NN vs KNN −1.5: NN vs LR, LDA, NB, DT vs KNN −1.0: NN vs LR, LDA, NB, DT vs KNN −0.5: NN vs LR, LDA, DT vs NB vs KNN 0.0: NN vs LR, LDA, DT vs NB vs KNN +0.5: NN vs LR, LDA, DT vs KNN vs NB +1.0: NN vs LR, LDA vs DT vs KNN vs NB +1.5: NN, LR, LDA vs NB vs DT vs KNN −2.0: NN vs KNN | |||
|
| |||||||
|
| Difficulty CDI | 1138 (8,197406) | <.001 |
−2.0 vs −1.0, −0.5, 0.0 −1.5 vs −1.0, −0.5, 0.0 −1.0 vs −.05, 0.0 −0.5 vs 0.0 0.0 vs +0.5, +1.0 +1.0 vs +0.5 +1.5 vs +0.5, +1.0 +2.0 vs +1.0, +0.5 | |||
|
| ML classifier | 28 (5,197406) | <.001 |
NN vs LR, LDA vs DT vs NB, KNN | |||
|
| CDI×ML classifier | 4 (40,197406) | <.001 |
−2.0: no differences −1.5: NN vs LR, LDA, NB, KNN, DT −1.0: NN vs LR, LDA, DT vs KNN, NB −0.5: NN vs LR, LDA, DT vs KNN, NB 0.0: NN vs LR, LDA vs DT vs KNN, NB +0.5: NN vs LR, LDA vs DT vs KNN, NB +1.0: NN vs LR, LDA, DT vs KNN, NB +1.5: NN, LR vs LDA vs DT, NB vs KNN +2.0: NN, LR vs KNN | |||
aCDI: classification difficulty index.
bML: machine learning.
cNN: neural network.
dLR: logistic regression.
eLDA: linear discriminant analysis.
fDT: decision tree.
gNB: naive Bayes.
hKNN: K-nearest neighbor.