| Literature DB >> 34945852 |
Eleni Karlafti1,2, Athanasios Anagnostis3, Evangelia Kotzakioulafi1, Michaela Chrysanthi Vittoraki1, Ariadni Eufraimidou1, Kristine Kasarjyan1, Katerina Eufraimidou1, Georgia Dimitriadou1, Chrisovalantis Kakanis1, Michail Anthopoulos1, Georgia Kaiafa1, Christos Savopoulos1, Triantafyllos Didangelos1.
Abstract
Since the beginning of the COVID-19 pandemic, 195 million people have been infected and 4.2 million have died from the disease or its side effects. Physicians, healthcare scientists and medical staff continuously try to deal with overloaded hospital admissions, while in parallel, they try to identify meaningful correlations between the severity of infected patients with their symptoms, comorbidities and biomarkers. Artificial intelligence (AI) and machine learning (ML) have been used recently in many areas related to COVID-19 healthcare. The main goal is to manage effectively the wide variety of issues related to COVID-19 and its consequences. The existing applications of ML to COVID-19 healthcare are based on supervised classifications which require a labeled training dataset, serving as reference point for learning, as well as predefined classes. However, the existing knowledge about COVID-19 and its consequences is still not solid and the points of common agreement among different scientific communities are still unclear. Therefore, this study aimed to follow an unsupervised clustering approach, where prior knowledge is not required (tabula rasa). More specifically, 268 hospitalized patients at the First Propaedeutic Department of Internal Medicine of AHEPA University Hospital of Thessaloniki were assessed in terms of 40 clinical variables (numerical and categorical), leading to a high-dimensionality dataset. Dimensionality reduction was performed by applying a principal component analysis (PCA) on the numerical part of the dataset and a multiple correspondence analysis (MCA) on the categorical part of the dataset. Then, the Bayesian information criterion (BIC) was applied to Gaussian mixture models (GMM) in order to identify the optimal number of clusters under which the best grouping of patients occurs. The proposed methodology identified four clusters of patients with similar clinical characteristics. The analysis revealed a cluster of asymptomatic patients that resulted in death at a rate of 23.8%. This striking result forces us to reconsider the relationship between the severity of COVID-19 clinical symptoms and the patient's mortality.Entities:
Keywords: COVID-19; Gaussian mixture models; clinical severity; clustering; unsupervised machine learning
Year: 2021 PMID: 34945852 PMCID: PMC8705973 DOI: 10.3390/jpm11121380
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Flowchart of the proposed methodology.
The variables contained in the dataset.
| Type | Variable |
|---|---|
| General (mixed) | sex, age |
| Comorbidities (categorical) | Cardiovascular_disease, chronic_kidney_disease, chronic_obstructive_pulmonary_disease, asthma, |
| Symptoms (categorical) | cough, fever, weakness, headache, dizziness, abdominal_pain, nausea, |
| Measurable (numerical) | oxygen, temperature, d-dimmers, WBC, Ht, eosinophils, basophils, |
| Reference (mixed) | Hospitalization_ICU, death |
Count and percentages of the number of patients with present comorbidities.
| Comorbidities | Counts ( | Percentages (%) |
|---|---|---|
| Cardiovascular Disease | 68 | 25.37 |
| Chronic Kidney Disease | 8 | 2.98 |
| Chronic obstructive pulmonary disease | 10 | 3.73 |
| Asthma | 5 | 1.87 |
| Diabetes mellitus | 48 | 17.91 |
| Arterial Hypertension | 98 | 36.57 |
| Immunosuppresion | 9 | 3.36 |
| Cancer | 15 | 5.60 |
Figure 2Distribution plot of the patients’ age.
Figure 3Distribution plot of the recorded temperatures.
Figure 4Distribution plot of the recorded oxygen saturation.
Figure 5Cluster selection using Bayesian information criterion (BIC).
Figure 6Clustering of the dataset on a 2D-plane.
Figure 7Data distribution with patient’s ICU admittance (a) and death (b).
Mean values and standard deviations of each cluster’s numerical variables.
| Cluster # | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| total patients | 110 | 21 | 58 | 79 |
| sex (male/female) | 29/81 | 13/8 | 41/17 | 53/26 |
| age | 61.3 ± 15.0 | 65.4 ± 13.9 | 64.5 ± 13.9 | 65.2 ± 13.9 |
| oxygen | 94.2 ± 3.3 | 87.5 ± 11.4 | 90.6 ± 6.1 | 92.9 ± 3.1 |
| temperature | 37.1 ± 0.7 | 37.5 ± 0.8 | 37.3 ± 0.8 | 37.5 ± 0.7 |
| d-dimers | 451.2 ± 697.2 | 1833.7 ± 3845.4 | 1020.7 ± 1233.4 | 483.4 ± 539.9 |
| WBC | 6.2 ± 2.8 | 11.3 ± 5.8 | 10.4 ± 8.3 | 6.5 ± 2.8 |
| Ht | 36.3 ± 10.9 | 29.2 ± 16.9 | 32.9 ± 15.9 | 31.1 ± 17.0 |
| eosinophils | 0.08 ± 0.4 | 0.008 ± 0.01 | 0.04 ± 0.1 | 0.1 ± 0.7 |
| basophils | 0.04 ± 0.1 | 0.02 ± 0.02 | 0.02 ± 0.04 | 0.03 ± 0.07 |
| PLT | 217.2 ± 92.2 | 291.3 ± 136.3 | 240.9 ± 134.5 | 194.4 ± 80.7 |
| ferritin | 291.2 ± 217.7 | 2084.4 ± 2353.3 | 811.7 ± 720.8 | 421.4 ± 280.4 |
| AST | 29.7 ± 14.3 | 146.3 ± 103.8 | 58.5 ± 33.2 | 30.6 ± 13.6 |
| ALT | 25.4 ± 15.5 | 112.3 ± 91.2 | 47.1 ± 38.7 | 23.1 ± 12.9 |
| LDH | 266.8 ± 83.0 | 646.9 ± 250.8 | 473.6 ± 38.7 | 306.7 ± 89.6 |
| albumin | 3.6 ± 0.4 | 3.3 ± 0.4 | 9.7 ± 48.4 | 8.3 ± 41.9 |
| CRP | 4.0 ± 4.0 | 179.6 ± 747.2 | 9.4 ± 9.2 | 11.9 ± 47.3 |
| IL6 | 28.2 ± 32.6 | 160.9 ± 246.6 | 94.9 ± 144.7 | 31.4 ± 23.6 |
| lymphocytes | 2.39 ± 8.6 | 1.3 ± 0.8 | 18.1 ± 129.3 | 1.2 ± 0.6 |
Ratio of positive over total occurrences of each cluster’s categorical variables.
| Cluster # | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| total patients | 110 | 21 | 58 | 79 |
| sex (male/female) | 29/81 | 13/8 | 41/17 | 53/26 |
| cardiovascular disease (%) | 21 (19.1%) | 7 (33.3%) | 12 (20.7%) | 28 (35.4%) |
| chronic kidney disease (%) | 1 (0.9%) | 1 (4.8%) | 3 (5.2%) | 3 (3.8%) |
| chronic obstructive pulmonary disease (%) | 3 (2.7%) | 1 (4.8%) | 3 (5.2%) | 3 (3.8%) |
| asthma (%) | 3 (2.7%) | 1 (4.8%) | 1 (1.7%) | 0 (0%) |
| diabetes (%) | 13 (11.8%) | 3 (14.3%) | 8 (13.8%) | 24 (30.4%) |
| arterial hypertension (%) | 28 (25.5%) | 4 (19.0%) | 25 (43.1%) | 41 (51.9%) |
| immunosuppression (%) | 3 (2.7%) | 0 (0%) | 1 (1.7%) | 5 (6.3%) |
| cancer (%) | 6 (5.5%) | 2 (9.5%) | 2 (3.4%) | 5 (6.3%) |
| cough (%) | 19 (17.3%) | 7 (33.3%) | 19 (32.8%) | 28 (35.4%) |
| fever (%) | 63 (57.3%) | 4 (19.0%) | 49 (84.5%) | 77 (97.5%) |
| weakness (%) | 23 (20.9%) | 2 (9.5%) | 20 (34.5%) | 38 (48.1%) |
| headache (%) | 1 (0.9%) | 0 (0%) | 1 (1.7%) | 3 (3.8%) |
| dizziness (%) | 6 (5.5%) | 0 (0%) | 2 (3.4%) | 3 (3.8%) |
| abdominal ache (%) | 2 (1.8%) | 0 (0%) | 1 (1.7%) | 1 (1.3%) |
| nausea (%) | 2 (1.8%) | 0 (0%) | 0 (0%) | 4 (5.1%) |
| diarrhea (%) | 7 (6.4%) | 0 (0%) | 4 (6.9%) | 6 (7.6%) |
| vomit (%) | 4 (3.6%) | 0 (0%) | 3 (5.2%) | 3 (3.8%) |
| anosmia (%) | 2 (1.8%) | 0 (0%) | 0 (0%) | 1 (1.3%) |
| tastelessness (%) | 1 (0.9%) | 0 (0%) | 0 (0%) | 1 (1.3%) |
| throat ache (%) | 4 (3.6%) | 0 (0%) | 0 (0%) | 1 (1.3%) |
| hospitalization ICU (%) | 2 (1.8%) | 5 (23.8%) | 8 (13.8%) | 6 (7.6%) |
| death (%) | 1 (0.9%) | 5 (23.8%) | 5 (8.6%) | 6 (7.6%) |