| Literature DB >> 35174469 |
Min Sue Park1, Hyeontae Jo2, Haeun Lee3, Se Young Jung4,5,6, Hyung Ju Hwang7,8,9.
Abstract
INTRODUCTION: A prompt severity assessment model of patients with confirmed infectious diseases could enable efficient diagnosis while alleviating burden on the medical system. This study aims to develop a SARS-CoV-2 severity assessment model and establish a medical system that allows patients to check the severity of their cases and informs them to visit the appropriate clinic center on the basis of past treatment data of other patients with similar severity levels.Entities:
Keywords: COVID-19; Deep learning; Machine learning; Mortality; SARS-CoV-2; Triage protocol
Year: 2022 PMID: 35174469 PMCID: PMC8853007 DOI: 10.1007/s40121-022-00600-4
Source DB: PubMed Journal: Infect Dis Ther ISSN: 2193-6382
Baseline characteristics of input features
| Type | Variables | % | |
|---|---|---|---|
| Basic information | Sex | ||
| Male | 75,073 | 50.23 | |
| Female | 74,398 | 49.77 | |
| Age | Mean = 44.36 (std = 20.27) | ||
| Area of residence | |||
| Latitude | Mean = 36.93 (std = 0.93) | ||
| Longitude | Mean = 127.39 (std = 0.76) | ||
| Body temperature ( | |||
| | 121,557 | 81.32 | |
| 36.5 < | 6310 | 4.22 | |
| 37.5 ≤ | 17,227 | 11.53 | |
| | 4377 | 2.93 | |
| Respiratory symptom | Cough | ||
| True | 34,201 | 22.88 | |
| False | 99,997 | 66.90 | |
| Sputum | |||
| True | 17,108 | 11.45 | |
| False | 117,090 | 78.34 | |
| Sore throat | |||
| True | 25,078 | 16.78 | |
| False | 109,120 | 73.00 | |
| Dyspnea | |||
| True | 1962 | 1.31 | |
| False | 132,236 | 88.47 | |
| Non-respiratory symptom | Musculoskeletal pain | ||
| True | 24,017 | 16.07 | |
| False | 110,181 | 73.71 | |
| Headache | |||
| True | 16,337 | 10.93 | |
| False | 117,861 | 78.85 | |
| Chill | |||
| True | 17,227 | 11.53 | |
| False | 116,971 | 78.26 | |
| Ageusia | |||
| True | 4846 | 3.24 | |
| False | 129,352 | 86.54 | |
| Anosmia | |||
| True | 5498 | 3.68 | |
| False | 128,700 | 86.10 | |
Underlying diseases of study participants
| Disease | Count | Total ( | |
|---|---|---|---|
| % | |||
| Liver diseasea | 0 | 148,632 | 99.44 |
| 1 | 354 | 0.24 | |
| 2 | 475 | 0.32 | |
| 3 | 10 | 0.01 | |
| Cancerb | 0 | 147,260 | 98.52 |
| 1 | 594 | 0.4 | |
| 2 | 1423 | 0.95 | |
| 3 | 187 | 0.13 | |
| 4 | 5 | 0.00 | |
| 5 | 2 | 0.00 | |
| Diabetes mellitus | 0 | 139,063 | 93.04 |
| 1 | 10,408 | 6.96 | |
| Cardio-cerebrovascular diseasec | 0 | 127,608 | 85.37 |
| 1 | 2165 | 1.45 | |
| 2 | 18,719 | 12.52 | |
| 3 | 825 | 0.55 | |
| 4 | 139 | 0.09 | |
| 5 | 15 | 0.01 | |
| Renal diseased | 0 | 148,698 | 99.48 |
| 1 | 758 | 0.51 | |
| 2 | 15 | 0.01 | |
| Degenerative diseasee | 0 | 146,945 | 98.31 |
| 1 | 2331 | 1.56 | |
| 2 | 193 | 0.13 | |
| 3 | 2 | 0.00 | |
| Lung diseasef | 0 | 147,253 | 98.52 |
| 1 | 2086 | 1.40 | |
| 2 | 122 | 0.08 | |
| 3 | 10 | 0.01 | |
aLiver disease includes hepatitis B, cirrhosis, and any other hepatitis
bCancer includes liver cancer, thyroid cancer, oral cancer, acute myelogenous white blood, ovarian cancer, brain cancer, colon cancer, lymphoma, chronic myelogenous white blood, bladder cancer, esophageal cancer, cancer, stomach cancer, cervical cancer, uterine cancer, prostate cancer, rectal cancer, skin cancer, hematoma, laryngeal cancer, prostate cancer, hematologic cancer, hematoma, and blood cancer
cCardio-cerebrovascular disease includes hypertension, stroke, cerebral infarction, myocardial infarction, myocardial hemorrhage, arteriosclerosis, and angina
dRenal disease includes renal failure, renal failure, and glomerular disease
eDegenerative diseases include Alzheimer disease, other dementia, and Parkinson disease
fLung disease includes emphysema and any other lung disease
Fig. 1Management strategy of COVID-19 confirmed cases in South Korea
Fig. 2Classification of the previous prediction models according to the type of learning data and type of prediction models
Previous research regarding COVID-19 prediction models
| Class | Studies | Prediction type | Outcome variable | Data type | Sample size | Easy-to-measure input features |
|---|---|---|---|---|---|---|
| Our model | Prognosis | Mortality | Nationwide | 149,471 | Yes | |
| 1 | Zoabi et al. [ | Diagnosis | RT-PCR | Nationwide | 99,232 | Yes |
| Yanamala et al. [ | Diagnosis | RT-PCR | Local | 3883 | No | |
| Gozes et al. [ | Diagnosis | RT-PCR | Local | 157 | No | |
| Song et al. [ | Diagnosis | RT-PCR | Local | 275 | No | |
| Feng et al. [ | Diagnosis | RT-PCR | Local | 164 | No | |
| Jin et al. [ | Diagnosis | RT-PCR | Local | 11,356 | No | |
| Punn et al. [ | Diagnosis | RT-PCR | Local | 1214 | No | |
| Menni et al. [ | Diagnosis | RT-PCR | Nationwide | 2,618,862 | Yes | |
| 2 | Cifuentes et al. [ | Prognosis | Mortality | Nationwide | 1,033,218 | Yes |
| Cho et al. [ | Prognosis | Mortality | Nationwide | 7590 | No | |
| Ikemura et al. [ | Prognosis | Mortality | Local | 4313 | No | |
| Her et al. [ | Prognosis | Mortality | Nationwide | 5628 | No | |
| 3 | Subudhi et al. [ | Prognosis | Complication or mortality | Local | 10,826 | No |
| Shamout et al. [ | Prognosis | Complication or mortality | Local | 3661 | No | |
| Marcos et al. [ | Prognosis | Complication or mortality | Local | 1270 | No | |
| Kim et al. [ | Prognosis | Complication or mortality | Nationwide | 4787 | Yes | |
| Su et al. [ | Prognosis | Complication or mortality | Local | 14,418 | No | |
| 4 | Rinderknecht et al. [ | Prognosis | Complication | Nationwide | 15,753 | Yes |
| Wang et al. [ | Prognosis | Complication | Local | 3008 | No |
RT-PCR reverse transcription polymerase chain reaction
Fig. 3Histogram of patients' distribution by latitude (top) and longitude (bottom)
Fig. 4Cumulative number of confirmed cases per month
Fig. 5a ROC curve and b precision–recall curve. The gray bands around the curves are pointwise 95% TI and 95% CI, which are derived by bootstrapping with 1000 repetitions
Performance of four different models
| XGBoost | Light GBM | Random forest | CatBoost | |
|---|---|---|---|---|
| AUPRC | 0.268 | 0.260 | 0.240 | 0.261 |
| AUROC | 0.950 | 0.943 | 0.944 | 0.947 |
| Precision | 0.923 | 0.925 | 0.978 | 0.881 |
| Recall | 0.807 | 0.769 | 0.025 | 0.897 |
| F1 | 0.861 | 0.840 | 0.049 | 0.889 |
| Youden’s index | 0.739 | 0.707 | 0.025 | 0.776 |
| Specificity | 0.933 | 0.938 | 0.999 | 0.879 |
Fig. 6Feature importance plot
Fig. 7Decision curve analysis and the histogram of predicted probabilities of the XGB model
Fig. 8First-visit facility of patients with COVID-19 according to the patients’ mortality probabilities
| Traditional risk prediction models are limited to identifying the condition of an asymptomatic patient who deteriorates from mild to moderate or extremely severe risk of COVID-19 at triage |
| Existing disease risk assessment models were developed with limited size data sets, input variables, and unstandardized independent features without specific machine learning algorithms |
| This prediction model, trained with patient-generated health data (PGHD) from nationwide COVID-19 screening centers, can be globally utilized to monitor hospitalized or quarantined patients with confirmed SARS-CoV-2 infection daily |
| This risk assessment model, developed with multivariable factors like demographic, geographic, and clinical characteristics of a superior performance, can be successfully deployed to triage patients with COVID-19 |