| Literature DB >> 35284429 |
Kitty Yu-Yeung Lau1, Kei-Shing Ng2, Ka-Wai Kwok3, Kevin Kin-Man Tsia4, Chun-Fung Sin5, Ching-Wan Lam5, Varut Vardhanabhuti2.
Abstract
Background: To better understand the different clinical phenotypes across the disease spectrum in patients with COVID-19 using an unsupervised machine learning clustering approach. Materials andEntities:
Keywords: COVID-19; clinical phenotypes; clustering; laboratory test; machine learning
Year: 2022 PMID: 35284429 PMCID: PMC8907521 DOI: 10.3389/fmed.2021.764934
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Demographics and clinical characteristics of 7,606 COVID-19 positive patients.
|
|
|
|
|---|---|---|
|
| ||
| Age (years) | 47 (32–61) | 0 (0%) |
| Sex (Males) | 3,697 (48.6%) | 0 (0%) |
|
| ||
| White blood cell count ( | 5.3 (4.3–6.6) | 0 (0%) |
| Neutrophil count ( | 3.2 (2.4–4.3) | 110 (1.4%) |
| Lymphocyte count ( | 1.3 (1.0–1.8) | 110 (1.4%) |
| Monocyte count ( | 0.5 (0.4–0.7) | 110 (1.4%) |
| Hemoglobin (g/dL) | 13.7 (12.6–14.7) | 0 (0%) |
| Hematocrit (L/L) | 0.40 (0.37–0.43) | 1 (<0.1%) |
| Platelet ( | 214 (173-264) | 5 (<0.1%) |
|
| ||
| Albumin (g/L) | 40.0 (37.2–43.5) | 3 (<0.1%) |
| Total bilirubin (μmol/L) | 7.9 (5.8–10.9) | 8 (<0.1%) |
| Alanine aminotransferase (μ/L) | 23.4 (16.0–36.0) | 8 (<0.1%) |
| Alkaline phosphatase (μ/L) | 66 (54–81) | 8 (<0.1%) |
|
| ||
| Urea (mmol/L) | 3.9 (3.1–4.8) | 5 (<0.1%) |
| Creatinine (μmol/L) | 69.1 (58.0–83.0) | 5 (<0.1%) |
|
| ||
| C-reactive protein | 0.4 (0.1–1.5) | 872 (11.5%) |
|
| ||
| Sodium (mmol/L) | 138 (137–140) | 5 (<0.1%) |
| Potassium (mmol/L) | 3.8 (3.5–4.1) | 31 (0.4%) |
| Phosphate (mmol/L) | 1.04 (0.90–1.18) | 2,778 (36.5%) |
| Calcium (mmol/L) | 2.26 (2.18–2.34) | 2,720 (35.8%) |
|
| ||
| Lactate dehydrogenase (μ/L) | 193.0 (165.0–235.0) | 238 (3.1%) |
| Creatine kinase (μ/L) | 91 (63–143) | 818 (10.8%) |
Decimal places were kept according to normal reference range.
Figure 1Flow chart showing the overall study design with data collection, preparation, model building, and prediction steps.
Demographics and clinical characteristics of four clusters and deceased cohort for comparison.
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
|
| |||||||
| Age | 36a | 38b | 51c | 65d | <0.001 | 81 | |
| Sex | 285a | 1947b | 442c | 1023d | 87 | ||
|
| |||||||
| White blood Cell count | 6.2a | 5.6b | 3.8c | 5.8d | <0.001 | 6.9 | |
| Neutrophil count | 3.7a | 3.3b | 2.2c | 4.0d | <0.001 | 4.9 | |
| Lymphocyte count | 1.7a | 1.5b | 1.1c | 1.0d | <0.001 | 1.0 | |
| Monocyte count | 0.5a | 0.6b | 0.4c | 0.5a | <0.001 | 0.6 | |
| Hemoglobin | 13.0a | 15.1b | 13.2c | 13.3c | <0.001 | 12.2 | |
| Hematocrit | 0.38a | 0.44b | 0.39c | 0.39c | <0.001 | 0.36 | |
|
| |||||||
| Albumin | 41.2a | 43.0b | 40.0c | 36.0d | <0.001 | 34.0 | |
| Total bilirubin | 7.0a | 9.5b | 6.6c | 8.2d | <0.001 | 8.2 | |
| Alanine aminotransferase | 18.0a | 29.0b | 20.0a | 29.0b | <0.001 | 20.8 | |
| Alkaline phosphatase | 67a | 69b | 60c | 67b | <0.001 | 72 | |
|
| |||||||
| Urea | 3.4a | 4.2b | 3.6c | 4.9d | <0.001 | 7.0 | |
| Creatinine | 57.0a | 80.0b | 72.0c | 82.0d | <0.001 | 101.1 | |
|
| |||||||
| C–reactive protein | 0.2a | 0.3b | 0.5c | 3.6d | <0.001 | 3.6 | |
|
| |||||||
| Sodium | 139a | 139a | 139b | 136c | <0.001 | 136 | |
| Potassium | 3.8a | 3.9b | 3.7c | 3.8a | <0.001 | 4.0 | |
| Phosphate | 1.11a | 1.07b | 1.02c | 0.95d | <0.001 | 1.08 | |
| Calcium | 2.31a | 2.33a | 2.22b | 2.17c | <0.001 | 2.14 | |
|
| |||||||
| Lactate dehydrogenase | 176.5a | 182.2b | 191.0c | 263.0d | <0.001 | 249.0 | |
| Creatine kinase | 67a | 104b | 82c | 141d | <0.001 | 131 | |
Most used normal reference range.
Each subscript letter denotes a cluster whose column proportions do not differ significantly from each other at the 0.05 significance level. The clusters with different letter are significantly different from each other at the 0.05 significance level.
Figure 2Distinguishable clinical characteristics of four clusters with deceased cases for reference. (A) Demographics: Cluster 4 has the highest mean age, and Cluster 2 has the highest proportion of males. Cluster 1 is the youngest group with the lowest proportion of males. (B) Complete blood count: Cluster 3 has the smallest range of white blood cell count; Cluster 4 has a larger range of neutrophil count and smallest range of lymphocyte count. Cluster 2 has the largest median for hemoglobin and hematocrit. (C) Liver function: Cluster 4 has the smallest mean value of albumin. (D) Kidney function: Cluster 4 has a higher range of urea and creatinine. (E) Inflammatory marker: Cluster 4 has a more extensive range of C-reactive protein. (F) Others: Cluster 4 has elevated lactate dehydrogenase. (A value of 1.5 times more than the IQR and away from the bottom or top of the box is considered an outlier.) (G) Any comorbidity: Cluster 4 has the most significant proportion with at least one comorbidity. (H) Clinical outcome: Highest fatality observed in Cluster 4.
Figure 3SHAP plots demonstrating differential importance of different features and clusters. (A) Mean SHAP value of the prediction, (B) SHAP value for cluster 1, (C) SHAP value for cluster 2, (D) SHAP value for cluster 3, (E) SHAP value for cluster 4. ALT, Alanine aminotransferase; Alb, albumin; ALP, alkaline phosphatase; Ca, calcium; CRP, C-reactive protein; CK, creatine kinase; Cr, creatinine; HCT, hematocrit; HGB, hemoglobin; LDH, lactate dehydrogenase; LYM, lymphocyte count; MON, monocyte count; NEUT, neutrophil count; P, phosphate; PLT, platelet; K, potassium; Na, sodium; TBIL, total bilirubin; WBC, white blood cell.
Figure 4Population pyramid of the different age and sex with mortality subgroup.