| Literature DB >> 32838344 |
Yichao Zheng1,2, Yinheng Zhu1,2, Mengqi Ji3,2, Rongpin Wang4, Xinfeng Liu4, Mudan Zhang4, Jun Liu5,6, Xiaochun Zhang7, Choo Hui Qin1,2, Lu Fang1,2, Shaohua Ma1,2.
Abstract
The emergence of the novel coronavirus disease 2019 (COVID-19) is placing an increasing burden on healthcare systems. Although the majority of infected patients experience non-severe symptoms and can be managed at home, some individuals develop severe symptoms and require hospital admission. Therefore, it is critical to efficiently assess the severity of COVID-19 and identify hospitalization priority with precision. In this respect, a four-variable assessment model, including lymphocyte, lactate dehydrogenase, C-reactive protein, and neutrophil, is established and validated using the XGBoost algorithm. This model is found to be effective in identifying severe COVID-19 cases on admission, with a sensitivity of 84.6%, a specificity of 84.6%, and an accuracy of 100% to predict the disease progression toward rapid deterioration. It also suggests that a computation-derived formula of clinical measures is practically applicable for healthcare administrators to distribute hospitalization resources to the most needed in epidemics and pandemics.Entities:
Keywords: COVID-19; hospitalization priority; learning-based model
Year: 2020 PMID: 32838344 PMCID: PMC7396968 DOI: 10.1016/j.patter.2020.100092
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Baseline Characteristics of Patients Enrolled to the Study
| Clinical Features | Overall |
|---|---|
| 47.63 ± 15.6 | |
| Male | 319 (53) |
| Female | 280 (47) |
| 149 (24.8) | |
| Hypertension | 67 (11) |
| Endocrine disease | 40 (6.7) |
| Cardiovascular disease | 18 (3) |
| Chronic lung disease | 9 (1.5) |
| Digestive disease | 20 (3.3) |
| Renal disease | 7 (1.2) |
| Tumor | 3 (0.5) |
| Cerebrovascular/nervous disease | 7 (1.2) |
| Immune disorder | 8 (1.3) |
| Others | 35 (5.8) |
| Fever | 379 (63.1) |
| Cough | 301 (50.1) |
| Expectoration | 6 (1.0) |
| Hemoptysis | 2 (0.3) |
| Dyspnea | 56 (9.3) |
| Catarrh | 9 (1.5) |
| Fatigue | 128 (21.3) |
| Anorexia | 3 (0.5) |
| Nausea/emesis | 14 (2.3) |
| Myalgia | 47 (7.8) |
| Dizziness/headache | 37 (6.2) |
| Pharyngalgia | 9 (1.5) |
| Abdominal pain/diarrhea | 7 (0.2) |
| White blood cell count, 109/L | 6.02 ± 15.98 |
| Lymphocyte count, 109/L | 1.25 ± 1.22 |
| Neutrophil count, 109/L | 3.64 ± 2.73 |
| Erythrocyte sedimentation rate, mm/h | 43.75 ± 28.86 |
| C-reactive protein, mg/L | 25.24 ± 28.92 |
| Procalcitonin, ng/mL | 0.91 ± 7.58 |
| D-dimer, μg/mL | 68.26 ± 515.42 |
| Alanine aminotransferase, U/L | 25.54 ± 15.47 |
| Aspartate aminotransferase, U/L | 29.5 ± 16.15 |
| Total bilirubin, μmol/L | 13.56 ± 8.09 |
| Albumin, g/L | 39.75 ± 9.28 |
| Lactate dehydrogenase, U/L | 229.5 ± 118.41 |
| Blood urea nitrogen, mmol/L | 5.29 ± 3.34 |
| Serum creatinine, μmol/L | 62.52 ± 34.39 |
| Prothrombin time, s | 13.27 ± 3.34 |
| Lactic acid, mmol/L | 271.37 ± 359.03 |
| Creatine kinase, U/L | 110.23 ± 118.38 |
| SpO2, % | 93.93 ± 14.69 |
Classification of the COVID-19 Severity
| Classifications | Definitions |
|---|---|
| Non-severe COVID-19 | Patients have non-specific symptoms, such as fever, cough, fatigue, myalgia, pharyngalgia, but have no signs of dehydration, sepsis, or shortness of breath. The radiological examination shows no signs of severe pneumonia. |
| Severe COVID-19 | Adult cases meeting any of the following criteria in the quiescent state: |
The Performance of 12-Variable Models for Identification of Severe COVID-19 on Admission
| LDA | Logistic Regression | Random Forest | Decision Tree | SVM | XGBoost | |
|---|---|---|---|---|---|---|
| AUC macro | 0.929 | 0.917 | 0.903 | 0.676 | – | 0.953 |
| F1 weighted | 0.891 | 0.854 | 0.848 | 0.769 | 0.848 | 0.896 |
| Accuracy | 0.892 | 0.862 | 0.862 | 0.800 | 0.862 | 0.892 |
| Sensitivity | 0.692 | 0.538 | 0.462 | 0.231 | 0.462 | 0.846 |
| Specificity | 0.942 | 0.942 | 0.962 | 0.942 | 0.962 | 0.904 |
Figure 1Receiver Operating Characteristic Curve for the Performance of 12-Variable Models in Discriminating the Severe COVID-19 Cases
The 12 variables included age, fever, dyspnea, lymphocyte, neutrophil, C-reactive protein, lactic dehydrogenase, creatine kinase, D-dimer, alanine aminotransferase, aspartate aminotransferase, and albumin.
Figure 2Top Key Clinical Variables That Are Ranked According to Their Importance in the Multi-Tree XGBoost Algorithm
Abbreviations: L, lymphocyte; LDH, lactic dehydrogenase; CRP, C-reactive protein; N, neutrophil; ALT, alanine aminotransferase; ALB, albumin; CK, creatine kinase; AST, aspartate aminotransferase.
Figure 3Important Variables Ranked by the XGBoost Algorithm Were Sequentially Assembled to Investigate Their Incremental Effects on the Model Performance
Abbreviation: AUC, area under the receiver operating characteristic curve.
The Performance of 4-Variable Models for Identification of Severe COVID-19 on Admission
| LDA | Logistic Regression | Random Forest | Decision Tree | SVM | XGBoost | |
|---|---|---|---|---|---|---|
| AUC macro | 0.876 | 0.879 | 0.864 | 0.680 | – | 0.859 |
| F1 weighted | 0.815 | 0.802 | 0.815 | 0.769 | 0.815 | 0.856 |
| Accuracy | 0.831 | 0.815 | 0.831 | 0.800 | 0.831 | 0.846 |
| Sensitivity | 0.385 | 0.385 | 0.385 | 0.231 | 0.385 | 0.846 |
| Specificity | 0.942 | 0.923 | 0.942 | 0.942 | 0.942 | 0.846 |
Figure 4Receiver Operating Characteristic Curve for the Performance of 4-Variable Models in Discriminating the Severe COVID-19 Cases
The four variables were lymphocyte, lactic dehydrogenase, C-reactive protein, and neutrophil.
The Performance of the 4-Variable XGBoost Model for Prediction of COVID-19 Deterioration
| LDA | Logistic Regression | Random Forest | Decision Tree | SVM | XGBoost | |
|---|---|---|---|---|---|---|
| Accuracy | 1.000 | 1.000 | 0.974 | 0.974 | 1.000 | 1.000 |