Doo Hwan Kim1, Min Gul Kim2, Seong J Yang3, Eun Jung Lee4, Sang Woo Yeom5, Yeon Seok You6, Jong Seung Kim7. 1. Director of Big-Data Center, National Health Insurance Service (NHIS), Wonju, Republic of Korea. 2. Department of Pharmacology, Jeonbuk National University, Jeonju, Republic of Korea. 3. Department of Statistics (Institute of Applied Statistics), Jeonbuk National University, Jeonju, Republic of Korea. 4. Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Jeonbuk National University, Jeonju-si 54907, Republic of Korea; Research Institute of Clinical Medicine of Jeonbuk National University - Biomedical, Research Institute of Jeonbuk National University Hospital, Jeonju-si 54907, Republic of Korea. 5. Department of Medical Informatics, College of Medicine, Jeonbuk National University, Jeonju-si 54907, Republic of Korea. 6. Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Jeonbuk National University, Jeonju-si 54907, Republic of Korea; Department of Medical Informatics, College of Medicine, Jeonbuk National University, Jeonju-si 54907, Republic of Korea; Research Institute of Clinical Medicine of Jeonbuk National University - Biomedical, Research Institute of Jeonbuk National University Hospital, Jeonju-si 54907, Republic of Korea. 7. Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Jeonbuk National University, Jeonju-si 54907, Republic of Korea; Department of Medical Informatics, College of Medicine, Jeonbuk National University, Jeonju-si 54907, Republic of Korea; Research Institute of Clinical Medicine of Jeonbuk National University - Biomedical, Research Institute of Jeonbuk National University Hospital, Jeonju-si 54907, Republic of Korea. Electronic address: kjsjdk@gmail.com.
Dear Editor,At October 19 2020, 40 million patients had been infected with COVID-19 worldwide, and about 1.1 million had died from the disease. This virus belongs to the same coronavirus family as the MERS virus that circulated in 2015, but it is much more infectious, and the world is currently experiencing a pandemic. However, the factors affecting disease severity and mortality have not yet been clearly identified. The machine learning (ML) algorithm is a model suitable for the medical field because it has a fairly accurate prediction capability for large-scale new, never-seen-before inputs such as COVID-19 pandemic. In this paper, we have analyzed the factors affecting the severity and mortality of 8070 COVID-19 patients registered in the National Health Insurance Service (NHIS) of South Korea using ML algorithms.(NHIS-2020-1-479)The severity of COVID-19 was defined as the end result with one of following conditions. (1) Intensive care unit (ICU) care; (2) Extracorporeal membrane oxygenation (ECMO) treatment; (3) Mechanical ventilator care; (4) Oxygen supply. The mortality of COVID-19 was also checked because the NHIS data was connected to the Korea Disease Control and Prevention Agency and Statistics Korea, which has the mortality data.A total of 21 diseases (Hypertension (HTN), Diabetes mellitus (DM), Influenza, Cancer, Pulmonary disease, Angiotensin Converting Enzyme or Angiotensin Receptor Blocker (ARB) among hypertensive patients, Gastroesophageal reflux disease (GERD), Acute sinusitis (A_sinusitis), Chronic sinusitis (C_sinusitis), Osteoporosis, Cardiovascular disease (CVD), Angina, Peripheral vascular disease (PVD), Congestive heart failure (CHF), Depression, Rheumatologic disease (RA), Hepatitis, Myocardial infarction (MI), Inflammatory bowel disease (IBD), Non-tuberculosis mycobacterium (NTM), olfactory loss (Anosmia)) were chosen as the underlying diseases in the 8070 COVID-19 patients. NHIS-customized data for the past 5 years were selected for the patients confirmed with COVID-19, and hospital use records for the past 5 years were used to identify the following inclusion criteria.A total of 8070 COVID-19 confirmed patients were included in this study. (Fig. 1
A) Their average age was 39.9 years (SD: 19.7 years), 3236 (40.1%) males and 4834 (59.9%) females. Of the 785 patients classified as severe, 374 were men and 411 were women (p<0.001). The mean age of severely ill patients was 61.6 years (SD 16.0 years). There were a total of 248 patients who died. Among the patients who died, 136 were male and 112 were female (p = 0.0008). The average age of the patients who died was 72.1 years (10.2 years) (Fig. 1B).
Fig. 1
A. Flowchart of the entire study design. B. Age distribution of 8070 COVID-19 confirmed patients in Korea in this study. C. Comorbidity of 8070 COVID-19 confirmed patients in Korea in this study. D. ROC curves and AUC values in the prediction of severity of COVID-19. E. BD in the prediction of severity of COVID-19. F. Variable importance of the neural network model in the prediction of severity of COVID-19. G. ROC curves and AUC values in the prediction of mortality of COVID-19. H. BD in the prediction of mortality of COVID-19. I. Coefficient Heatmap of the three logistic model in the prediction of mortality of COVID-19.
A. Flowchart of the entire study design. B. Age distribution of 8070 COVID-19 confirmed patients in Korea in this study. C. Comorbidity of 8070 COVID-19 confirmed patients in Korea in this study. D. ROC curves and AUC values in the prediction of severity of COVID-19. E. BD in the prediction of severity of COVID-19. F. Variable importance of the neural network model in the prediction of severity of COVID-19. G. ROC curves and AUC values in the prediction of mortality of COVID-19. H. BD in the prediction of mortality of COVID-19. I. Coefficient Heatmap of the three logistic model in the prediction of mortality of COVID-19.Regarding the underlying diseases in COVID-19 patients, 4572 patients had a history of pulmonary disease, 674 patients with influenza, 231 patients with ARB, and 77 patients with anosmia (Fig. 1C).Model selection was made by comparing area under the ROC curve (AUC) values for each model. Among the various models, the model with the best prediction of severity was the neural network with an AUC value of 85.06%, followed by logistic regression elastic net (EN) (84.74%) (Fig. 1D). The most important variable for predicting severity in the neural network model was a history of influenza (relative importance: 0.083). (Fig. 1F, Table 1
).
Table 1
Outcomes
Model
Measure
Variable importance
Value
Outcomes
Model
Measure
Variable importance
Value
Severity
Lasso
Estimated coefficient
Age
1.276
Mortality
Lasso
Estimated coefficient which is not zero
Age
2.203
DM
0.431
Metropolitan
−0.783
Male
0.415
Influenza
0.763
Anosmia
−0.379
Anosmia
−0.684
HTN
0.266
Male
0.682
ARB
0.222
NTM
0.598
Influenza
0.211
DM
0.393
CVD
0.209
HTN
0.322
Pulmonary
0.135
Pulmonary
0.257
A_Sinusitis
0.092
PVD
0.243
Elastic
Estimated coefficient
Age
1.203
Elastic
Estimated coefficient which is not zero
Age
2.136
DM
0.442
Anosmia
−1.438
Anosmia
−0.413
Metropolitan
−0.985
Male
0.397
NTM
0.980
HTN
0.309
Influenza
0.830
CVD
0.235
Male
0.710
ARB
0.234
DM
0.405
Influenza
0.222
HTN
0.365
Pulmonary
0.147
Pulmonary
0.295
A_Sinuistis
0.091
Depression
0.280
Ridge
Estimated coefficient
Age
1.006
Ridge
Estimated coefficient which is not zero
Age
1.389
Anosmia
−0.838
Anosmia
−1.388
DM
0.480
NTM
1.002
HTN
0.419
Metropolitan
−0.761
Male
0.400
Influenza
0.722
Influenza
0.397
HTN
0.642
CVD
0.326
Male
0.582
NTM
0.310
DM
0.442
ARB
0.301
Pulmonary
0.352
MI
−0.214
Depression
0.346
Random Forest
Mean decrease in Gini impurity
Age
174.074
Random Forest
Mean decrease in Gini impurity
Age
38.970
HTN
51.519
HTN
8.036
DM
36.373
Male
6.669
CVD
20.110
DM
6.427
Osteoporosis
17.828
CVD
5.842
Male
16.432
PVD
5.094
Pulmonary
14.944
RA
4.963
Cancer
14.928
Osteoporosis
4.883
ARB
14.676
Cancer
4.588
A_Sinuistis
14.159
Pulmonary
4.581
Bagging
Mean decrease in Gini impurity
Age
193.724
Bagging
Mean decrease in Gini impurity
Age
40.825
HTN
60.297
Male
9.054
DM
35.551
HTN
8.948
Male
23.011
DM
7.458
Pulmonary
22.394
CVD
7.336
Cancer
21.379
Pulmonary
7.193
Osteoporosis
21.278
Cancer
7.128
CVD
20.893
RA
7.006
A_Sinuistis
20.869
Osteoporosis
6.700
RA
19.921
PVD
6.366
Neural Network
Relative importance
Influenza
0.083
Neural Network
Relative importance
CVD
0.076
ARB
0.075
Age
0.074
Age
0.062
Male
0.0659
Anosmia
0.060
RA
0.062
C_Sinuistis
0.059
C_Sinuistis
0.053
A_Sinuistis
0.055
Influenza
0.051
Osteoporosis
0.054
IBD
0.048
MI
0.051
PVD
0.045
RA
0.047
HTN
0.045
NTM
0.047
Pulmonary
0.044
The model with the best prediction of death was the logistic regression EN model with an AUC value of 93.89%, followed by the logistic regression lasso model (93.84%), the neural network model (93.73%) (Fig. 1G). The most important variables for mortality in the EN model were age (coefficient: 2.136) and anosmia (coefficient: –1.438) (Fig. 1I, Table 1).We analyzed 24 factors affecting severity and mortality in 8070 patients using a novel ML algorithm that has recently emerged. Foremost, influenza history was a very important variable in terms of COVID-19 severity (neural network 1st, ridge 6th) and mortality (EN 5th, lasso 3rd, ridge 5th). (Fig. 1I, Table 1) It has been reported that oseltamivir cannot prevent worsening of symptoms and disease in patients with COVID-19 as different molecular docking sites have been found in vitro and retrospective studies in COVID-19. Among recent papers, it has been reported that influenza vaccination can alleviate the risk of death in a pandemic situation caused by COVID-19. Since the symptoms of influenza and COVID-19 are similar, it can be confusing which disease is present, so vaccination can be important in preventing the twindemic of COVID-19 and influenza co-infection. In this paper, we studied the history of influenza and the severity of COVID-19. A history of influenza can sometimes cause pulmonary fibrosis, a common sequelae of virus-induced pneumonia, and this complication is estimated to cause increased severity and mortality of COVID-19 infection. These results are in line with the current policy recommending influenza virus vaccination, mainly considering the current COVID-19 epidemic and the prevalence of influenza during the period from autumn to spring.Anosmia was also identified as an important variable in predicting the severity of COVID-19. The best predictive models for mortality were the EN and lasso models, and the second most important variable in both these models was anosmia. This means that the mortality rate was low in patients with olfactory loss after the COVID-19 diagnosis. There are papers which indicate that recent olfactory loss in mild to moderate COVID-19 patients is an important factor that differentiates COVID-19 from other infectious disease, and in most cases, the sense of smell recovers well.
,
Another paper reports that anosmia is associated with lower in-hospital mortality in COVID-19, which is in line with our research results. The novel finding in our study is that anosmia will continue to be an indicator that should be carefully examined in COVID-19 infection.Influenza was found to be a major adverse factor in COVID-19 in addition to the factors of old age and male sex, and which are already known to be related to disease severity and mortality. In addition, anosmia was found to be a major factor associated with lower severity and mortality rates. Therefore, in the current situation where there is no adequate COVID-19 treatment at present, examining the history of influenza vaccination and anosmia in addition to age and sex will be important indicators for predicting the severity and mortality of COVID-19 patients.Abbreviations: (Receiver Operating Characteristic (ROC), Area Under the Curve (AUC), Binomial Deviances (BD), Hypertension (HTN), Diabetes mellitus (DM), Influenza, Cancer, Pulmonary disease, Angiotensin Converting Enzyme or Angiotensin Receptor Blocker (ARB) among hypertensive patients, Gastroesophageal reflux disease (GERD), Acute sinusitis (A_sinusitis), Chronic sinusitis (C_sinusitis), Osteoporosis, Cardiovascular disease (CVD), Angina, Peripheral vascular disease (PVD), Congestive heart failure (CHF), Depression, Rheumatologic disease (RA), Hepatitis, Myocardial infarction (MI), Inflammatory bowel disease (IBD), Non-tuberculosis mycobacterium (NTM), olfactory loss (Anosmia))
Author Contributions
Doo Hwan Kim: Contributed to the study design, protocol and study materials, collected study data, provided data access, and helped write the first draft of the manuscript (Methods and Results sections).Min Gul Kim: Contributed to the study design, protocol, study materials and data analysis, and helped write the first draft of the manuscript (Methods and Results sections).Seong J. Yang: Designed the statistical plan, assisted with data analysis and interpretation of the data, and helped write the first draft of the manuscript (Methods section).Eun Jung Lee: Contributed to the study design, protocol, and study materials, and helped write the first draft of the manuscript (Results section).Sang Woo Yeom: Collected the study data, performed the statistical analysis, and helped write the first draft of the manuscript (Methods section).Yeon Seok You: Contributed to the study design, protocol and study materials, collected study data.Jong Seung Kim: Contributed to the study design, protocol and study materials, designed the statistical plan and data analysis, performed the statistical analysis, wrote the first draft of the manuscript
Authors: Blanca Talavera; David García-Azorín; Enrique Martínez-Pías; Javier Trigo; Isabel Hernández-Pérez; Gonzalo Valle-Peñacoba; Paula Simón-Campo; Mercedes de Lera; Alba Chavarría-Miranda; Cristina López-Sanz; María Gutiérrez-Sánchez; Elena Martínez-Velasco; María Pedraza; Álvaro Sierra; Beatriz Gómez-Vicente; Ángel Guerrero; Juan Francisco Arenillas Journal: J Neurol Sci Date: 2020-10-01 Impact factor: 3.181
Authors: Aylin Çalıca Utku; Gökçen Budak; Oğuz Karabay; Ertuğrul Güçlü; Hüseyin Doğuş Okan; Aslı Vatan Journal: Scott Med J Date: 2020-08-17 Impact factor: 0.729