Literature DB >> 33599308

Two novel nomograms for predicting the risk of hospitalization or mortality due to COVID-19 by the naïve Bayesian classifier method.

Eda Karaismailoglu¹, Serkan Karaismailoglu².

Abstract

Coronavirus disease 2019 (COVID-19) has become a global pandemic that has affected millions of people worldwide. The presence of multiple risk factors for COVID-19 makes it difficult to plan treatment and optimize the use of medical resources. The aim of this study is to determine potential risk factors for hospitalization or mortality in patients with COVID-19 via two novel naive Bayesian nomograms. The publicly available COVID-19 National data published by the Mexican Ministry of Health through the "Dirección General de Epidemiología" website was analyzed. Univariable logistic regression was utilized to identify potential risk factors that may affect hospitalization or mortality in patients with COVID-19. The naïve Bayesian classifier method was implemented to predict nomograms. The nomograms were verified by the area under the receiver operating characteristic curve (AUC), classification accuracy (CA), F1 score, precision, recall, and calibration plot. A total of 979,430 patients (45.3 ± 15.9 years old, and 51.1% male) tested positive for COVID-19 from January 1 to November 22, 2020. Among them, 22.3% of the patients required hospitalization and 99,964 patients (9.8%) died. The most important risk factors to predict the probability of hospitalization and mortality were pneumonia, age, chronic kidney failure, chronic obstructive respiratory disease, and diabetes. The performance measures demonstrated good discrimination and calibration (hospitalization: AUC = 0.896, CA = 0.880; mortality: AUC = 0.903, CA = 0.899). Two novel nomograms to estimate the risk of hospitalization and mortality were proposed, which could be used to facilitate individualized decision-making for patients newly diagnosed with COVID-19.

Entities: Chemical

Keywords: COVID-19; hospitalization; mortality; nomogram; prediction; risk factor

Mesh：

Year: 2021 PMID： 33599308 PMCID： PMC8013381 DOI： 10.1002/jmv.26890

Source DB: PubMed Journal: J Med Virol ISSN： 0146-6615 Impact factor: 20.693

INTRODUCTION

The coronavirus disease 2019 (COVID‐19) was first observed in Wuhan, China in December 2019, and declared as a global pandemic on March 11, 2020. By the end of November 2020, more than 64 million confirmed cases including more than 1.4 million deaths were reported. The presence of comorbidities and older age are risk factors for hospitalization and mortality in COVID‐19 patients. Some nomograms have been developed to determine risk factors for COVID‐19 but the sample sizes were relatively small. , , Therefore, more comprehensive studies are needed to be developed and validated. A nomogram is a graphical representation method that has been used in medicine to predict the probability of any disease or risk of death through risk factors. This method provides the identification of the most important risk factors and calculation of the individualized prediction of disease according to the point values of each factor. Statistical models, especially logistic regression or Cox proportional hazards regression are generally used to build a nomogram. On the other hand, the naïve Bayesian classifier is a powerful classification technique to compose predictive models despite simple and easy calculations. It is based on Bayes' theorem with the assumption of conditional independence of features given the class. Although this feature seems to be a disadvantage, many studies have reported high classification accuracy (CA). , , Park and Lee compared Bayesian and logistic regression nomograms and showed the superiority of the naïve Bayesian approach. This study, it was aimed to construct two novel nomograms using the naïve Bayesian classifier technique including demographics and comorbidities for the prediction of hospitalization or mortality in COVID‐19 patients. The performance of the naïve Bayesian nomograms was evaluated by calibration and discrimination capability. To assess the calibration of the nomograms, a calibration plot was established which compares the agreement between predicted and actual probabilities. The area under the receiver operating characteristic (ROC) curve (AUC), CA, precision, recall, and F1 score were utilized to evaluate the accuracy of the classification.

MATERIALS AND METHODS

Dataset

In this study, the publicly available COVID‐19 dataset was used released by the Mexican Ministry of Health via the “Dirección General de Epidemiología” website (https://www.gob.mx/salud/documentos/datos-abiertos-152127). A total of 2,559,993 patients (over 18 years) were admitted to the hospitals on suspicion of COVID‐19 in Mexico between January 1 and November 22, 2020. Twelve risk factors (features) were included in the analyses that were age, gender, smoking status, the presence of pneumonia, diabetes, obesity, hypertension, cardiovascular, chronic obstructive respiratory disease (CORD), chronic kidney failure, immunosuppressed, and asthma.

Naïve Bayesian classifier

The naïve Bayesian classifier determines the probability for a particular class by using Bayes' theorem and assumes that the attribute values are independent of each other. Let C be a dependent binary variable where 0 indicates the other class and 1 the target class, X = {x 1, x 2,…,x } be a set of attributes. Based on Bayes theorem, is the posterior probability of the target class. Therefore, the odds of C = 1 can be obtained as follows: After taking log of odds, the Equation (2) transforms to logit, where LR is the likelihood ratio. The summation term of the right side of Equation (3) is used for the construction of a nomogram that relates the feature values to the point score. It estimates the ratio of posterior to prior probability given the feature value x . Finally, to get P(C = 1|X); Equation (4) is the final probability value of C = 1 class when the attribute value is X.

Nomogram construction via naïve Bayesian classifier

Let x show attribute values where {i = 1, 2, …, n} is the number of attributes and {j = 1, 2, …, m} is the number of categories for each attribute. LR(x ) values are obtained as follows: By using Equation (5), the points of each attribute (a ) are calculated as follows: In Equation (6), the denominator describes the largest attribute value among the absolute values of the log‐likelihood ratios of all attribute values, which means the most important attribute values for target class C. The numerator describes the log‐likelihood ratio of the jth category in attribute i. As a result, the points line is constructed by Equation (6). By using Equation (6), a for each category of attribute values can be calculated. The total points value is obtained by summing the points of the corresponding categories (Equation 7) and so, the total points line can be drawn. The probabilities corresponding to the total points' values are obtained by substitution of Equation (7) into Equation (4).

Evaluation of the performance of the nomogram

A stratified 10‐fold cross‐validation method was implemented through the data training process. For this purpose, the data was divided into 10 folds. Onefold was considered as a test subset and ninefold as a training subset. Five different discrimination metrics were used which are ROC curve, CA, F1 score, precision, and recall. Among them, the ROC curve is plotted against the 1‐specificity for various cut‐off points. As the curve close to the upper left corner of the graph, the probability of a true positive test result is higher than the probability of a false‐negative test result. AUC, defined as the integral of the area between the ROC curve and the (1‐specificity) x‐axis, is utilized to assess the accuracy. The AUC differs between 0.5 and 1.0, where 0.5 denotes bad discrimination and 1 denotes perfect discrimination. CA is the ratio of the correctly predicted observations to the total number of predictions. Precision is calculated as correctly predicted positives divided by the total predicted positives. The recall is the number of true positive observations divided by the total number of true positives and the number of false negatives. Finally, the F1 score is the weighted average of precision and recall metrics. Calibration of the nomograms was assessed by calibration plots that are drawn by observed probabilities against predicted probabilities calculated with the nomogram. As the observed outcomes are close to the predicted outcomes, this means that there is a concordance between curves. Finally, decision curve analysis (DCA) was performed to assess the clinical usefulness of the nomograms. The graph shows the clinical net benefit according to various threshold probabilities. The net benefit is calculated as follows: There are two reference lines for “intervention‐for‐all patients” (light gray line) and “intervention‐for‐none” (thin black line) in the graph. A nomogram should be found superior to both references to justify being used in clinical practice.

Statistical analysis

Logistic regression analysis was performed by IBM SPSS (Version 23). Nomograms, ROC curves, and discrimination measures were obtained by Orange software version 3.27.1. Calibration curves and DCA were plotted using R software version 4.0.0 (“rms” and “rmda” packages, respectively). Continuous variables were expressed as mean ± standard deviation and categorical variables summarized with frequency and percentage. A two‐sided p value of less than .05 was considered statistically significant.

RESULTS

A total of 2,559,993 suspected cases of COVID‐19 (over 18 years) were admitted to the hospitals in Mexico between January 1 and November 22, 2020. A total of 979,430 patients (38.3%) were diagnosed with COVID‐19. Among them, 22.3% hospitalized, and 9.8% of individuals died. The mean age of the patients was 45.3 ± 15.9 years, and 51.1% of them were male. Descriptive statistics of hospitalization and mortality status of COVID‐19 patients are summarized in Table 1.

Table 1

Descriptive statistics of hospitalization and mortality status of COVID‐19 patients

Variables	Hospitalization n (%)		Mortality n (%)
Variables	No	Yes	No	Yes
Agea	41.77 ± 14.15	57.68 ± 15.29	43.37 ± 14.85	62.94 ± 13.79
Sex (male)	368,766 (48.5)	131,889 (60.4)	76,716 (58.3)	55,173 (63.5)
Smoking (yes)	57,288 (7.5)	17,039 (7.8)	9959 (7.6)	7080 (8.2)
Pneumonia (yes)	24,236 (3.2)	142,483 (65.2)	77,069 (58.6)	65,414 (75.3)
Chronic kidney failure (yes)	6206 (0.8)	11,847 (5.4)	5340 (4.1)	6507 (7.5)
CORD (yes)	5425 (0.7)	8125 (3.7)	3883 (3.0)	4242 (4.9)
Diabetes (yes)	75,874 (10.0)	72,841 (33.5)	38,755 (29.6)	34,086 (39.4)
Hypertension (yes)	105,901 (14.0)	83,613 (38.4)	43,383 (33.1)	40,230 (46.5)
Cardiovascular (yes)	9034 (1.2)	9391 (4.3)	4685 (3.6)	4706 (5.4)
Immunosuppressed (yes)	5083 (0.7)	4728 (2.2)	2560 (2.0)	2168 (2.5)
Obesity (yes)	120,705 (15.9)	50,471 (23.2)	29,561 (22.5)	20,910 (24.2)
Asthma (yes)	19,568 (2.6)	4772 (2.2)	3090 (2.4)	1682 (1.9)

Abbreviations: CORD: Chronic obstructive pulmonary disease; COVID‐19, coronavirus disease 2019.

Mean ± standard deviation

Descriptive statistics of hospitalization and mortality status of COVID‐19 patients Abbreviations: CORD: Chronic obstructive pulmonary disease; COVID‐19, coronavirus disease 2019. Mean ± standard deviation Univariable logistic regression showed that gender, age, the presence of pneumonia, diabetes, CORD, asthma, immunosuppressed, hypertension, cardiovascular, obesity, chronic kidney failure, and smoking were associated with risk of hospitalization (Model 1) or mortality (Model 2) due to COVID‐19 (all p < .001). The most important risk factor for both models was pneumonia. Accordingly, there was a 57.06‐fold and 22.20‐fold increase in the risk of hospitalization and mortality, respectively (Table 2).

Table 2

Univariable logistic regression analysis predicting hospitalization (Model 1) and mortality (Model 2)

	Model 1 (n = 979,430)				Model 2 (n = 218,399)
Variables	Odds ratio	95% Confidence interval		p Value	Odds ratio	95% Confidence interval		p Value
Pneumonia (Ref: no)	57.06	56.18	57.95	<.001	22.20	21.85	22.55	<.001
Chronic kidney failure (Ref: no)	6.98	6.77	7.20	<.001	6.42	6.23	6.62	<.001
CORD (Ref: no)	5.39	5.20	5.58	<.001	5.05	4.87	5.24	<.001
Diabetes (Ref: no)	4.53	4.48	4.58	<.001	4.44	4.38	4.51	<.001
Hypertension (Ref: no)	3.85	3.81	3.89	<.001	4.35	4.29	4.41	<.001
Cardiovascular (Ref: no)	3.74	3.64	3.86	<.001	3.78	3.66	3.91	<.001
Immunosuppressed (Ref: no)	3.29	3.17	3.43	<.001	2.97	2.84	3.11	<.001
Gender (Ref: female)	1.62	1.61	1.64	<.001	1.77	1.75	1.80	<.001
Obesity (Ref: no)	1.60	1.58	1.62	<.001	1.60	1.57	1.62	<.001
Age	1.07	1.07	1.08	<.001	1.08	1.08	1.09	<.001
Smoking (Ref: no)	1.04	1.02	1.06	<.001	1.08	1.05	1.10	<.001
Asthma (Ref: no)	0.85	0.82	0.87	<.001	0.77	0.73	0.81	<.001

Abbreviations: CORD: chronic obstructive pulmonary disease; Ref: reference.

Univariable logistic regression analysis predicting hospitalization (Model 1) and mortality (Model 2) Abbreviations: CORD: chronic obstructive pulmonary disease; Ref: reference. Two nomograms for predicting the probability of hospitalization or mortality were constructed via the naïve Bayes classifier method. A typical nomogram comprises a point line (topmost line, range from −100 to 100) to calculate each risk factor's point, straight lines for each risk factor, a total point line, and finally a probability line. As the importance of an attribute (risk factor) increases, the length of the point line is longer. Attribute values for each risk factor in the nomogram are assigned points via the point line (a ). The probability of the risk is estimated by summing points of attribute values (total a ) and is matched to the probability through total the point line, and the probability line. In the current study, the risk factors were sorted in order of absolute importance (the length of the point line) in the nomograms. Accordingly, the most important risk factors to predict the probability of hospitalization were pneumonia (yes: 100 points), age (>55: 40 points), chronic kidney failure (yes: 62.75 points), CORD (yes: 54.71 points), diabetes (yes: 40 points) (Figure 1A). Similarly, the most important risk factors to predict the probability of mortality were age (>55: 47.86 points), pneumonia (yes: 74.12 points), chronic kidney failure (yes: 69.69 points), CORD (yes: 61.35 points), and diabetes (yes: 43.87 points) (Figure 2A).

Figure 1

Figure 2

(A) Nomogram to estimate the risk of mortality in patients with COVID‐19. (B) ROC curve for discrimination of the nomogram. (C) Calibration plot for calibration of the nomogram. (D) Decision curve analysis for mortality. CORD, chronic obstructive pulmonary disease; COVID‐19, coronavirus disease 2019; ROC, receiver operating characteristic

(A) Nomogram to estimate the risk of hospitalization in patients with COVID‐19. (B) ROC curve for discrimination of the nomogram. (C) Calibration plot for calibration of the nomogram. (D) Decision curve analysis for hospitalization. CORD, chronic obstructive pulmonary disease; COVID‐19, coronavirus disease 2019; ROC, receiver operating characteristic (A) Nomogram to estimate the risk of mortality in patients with COVID‐19. (B) ROC curve for discrimination of the nomogram. (C) Calibration plot for calibration of the nomogram. (D) Decision curve analysis for mortality. CORD, chronic obstructive pulmonary disease; COVID‐19, coronavirus disease 2019; ROC, receiver operating characteristic The calibration curves verified a good consistency for predicting mortality and the perfect consistency for predicting hospitalization (Figures 1C and 2C). The discrimination of the constructed nomogram evaluated with the AUC, CA, F1, precision, and recall metrics showed an almost excellent performance (Table 3). The clinical usefulness of the nomograms was evaluated by DCA. The analysis revealed that both nomograms had higher net benefit than “intervention‐for‐all‐patients” or “intervention‐for‐none” as well as the nomogram in predicting hospitalization had more benefit than in predicting mortality (Figures 1D and 2D).

Table 3

Evaluation of discrimination performances for predicting hospitalization and mortality

	AUC	CA	F1	Precision	Recall
Hospitalization	0.896	0.880	0.879	0.878	0.880
Mortality	0.903	0.899	0.903	0.907	0.899

Abbreviations: AUC, area under ROC curve; CA, classification accuracy.

Evaluation of discrimination performances for predicting hospitalization and mortality Abbreviations: AUC, area under ROC curve; CA, classification accuracy.

DISCUSSION

In this study, two novel nomograms for predicting the risk of hospitalization and mortality in COVID‐19 patients were developed and validated. Although various nomograms have been developed by studies on COVID‐19, we have analyzed a larger sample size (979,430 patients) via the naïve Bayesian classifier technique. , , The predictive validity of the nomograms was verified with calibration and discrimination. Although many studies have utilized logistic regression analysis for building nomograms, the naïve Bayesian nomogram has several advantages. First, the naïve Bayesian nomogram considers negative and positive influences of the point values of risk factors (range between −100 and 100). Conversely, the logistic regression nomogram depicts only negative influence over risk factors. Second, the naïve Bayes nomogram may be used in the presence of missing data in contrast to the logistic regression method. Finally, the Bayesian nomogram is based on simple calculation and principles compared to the logistic regression nomogram. Therefore, naïve Bayes nomogram has been preferred in recent years. , The results demonstrated that pneumonia and age factors had the greatest influence on hospitalization (range, −35 to 100) and mortality (range, −100 to 50), respectively. On the other hand, the smallest influencing factor was smoking status for both nomograms. The most significant risk factor was pneumonia for hospitalization (100 points) and mortality (74.12 points). Coronavirus can affect multiple organs but the most common indication for hospitalization is viral pneumonia. , , Otherwise, some studies have reported that COVID‐19 causes a higher mortality rate in individuals aged 50 years or older. , In the United States, the mortality was ranging from 3% to 11% among persons aged 65–84 years, 1%–3% among persons aged 55–64 years. Another study examining COVID‐19 reports from 16 countries showed that individuals aged 65 years or older had about 62 times and those aged 55–64 years had about 8.1 times higher mortality rate compared with individuals ages 54 years or younger. This study indicated that common comorbidities were associated with COVID‐19 related hospitalization and mortality. Although the presence of chronic kidney failure, CORD, cardiovascular, immunosuppressed risk factors increased the risk of COVID‐19 for hospitalization or mortality, the absence of these risk factors had little or no effect. Conversely, the absence of diabetes, hypertension, obesity decreased the risk for both conditions. These comorbidities have been associated with COVID‐19 in previous studies. , , , In accordance with the literature, our results showed that being male increased the risk of mortality. Interestingly, according to our analysis, the presence of asthma appears to be a risk‐reducing factor, but its influencing factor was very small. Indeed, the relationship between asthma and COVID‐19 is still uncertain. Smoking does not seem to have a notable increasing or decreasing effect on risk factors in this study. Surprisingly, Farsalinos et al. stated that nicotine may be considered as a potential treatment option. This study has some limitations. The dataset lacks some information such as laboratory results and treatment of comorbidities, which could have been useful to better understand the clinical pattern of COVID‐19. In conclusion, the proposed Bayesian nomograms can be used to assess patients with COVID‐19 symptoms and to facilitate medical decision‐making.

CONFLICT OF INTERESTS

The authors declare that there are no conflict of interests.

26 in total

Review 1. How to build and interpret a nomogram for cancer prognosis.

Authors: Alexia Iasonos; Deborah Schrag; Ganesh V Raj; Katherine S Panageas
Journal: J Clin Oncol Date: 2008-03-10 Impact factor: 44.544

2. Development and validation of a simplified nomogram predicting individual critical illness of risk in COVID-19: A retrospective study.

Authors: Ranran Xu; Junwei Cui; Liu Hu; Yiru Wang; Tao Wang; Dawei Ye; Yongman Lv; Qingquan Liu
Journal: J Med Virol Date: 2020-10-14 Impact factor: 2.327

3. Discrimination and Calibration of Clinical Prediction Models: Users' Guides to the Medical Literature.

Authors: Ana Carolina Alba; Thomas Agoritsas; Michael Walsh; Steven Hanna; Alfonso Iorio; P J Devereaux; Thomas McGinn; Gordon Guyatt
Journal: JAMA Date: 2017-10-10 Impact factor: 56.272

4. Association of Cardiac Injury With Mortality in Hospitalized Patients With COVID-19 in Wuhan, China.

Authors: Shaobo Shi; Mu Qin; Bo Shen; Yuli Cai; Tao Liu; Fan Yang; Wei Gong; Xu Liu; Jinjun Liang; Qinyan Zhao; He Huang; Bo Yang; Congxin Huang
Journal: JAMA Cardiol Date: 2020-07-01 Impact factor: 14.676

5. Systematic review of the prevalence of current smoking among hospitalized COVID-19 patients in China: could nicotine be a therapeutic option?

Authors: Konstantinos Farsalinos; Anastasia Barbouni; Raymond Niaura
Journal: Intern Emerg Med Date: 2020-05-09 Impact factor: 3.397

6. Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis.

Authors: Jing Yang; Ya Zheng; Xi Gou; Ke Pu; Zhaofeng Chen; Qinghong Guo; Rui Ji; Haojia Wang; Yuping Wang; Yongning Zhou
Journal: Int J Infect Dis Date: 2020-03-12 Impact factor: 3.623

7. A Predicting Nomogram for Mortality in Patients With COVID-19.

Authors: Deng Pan; Dandan Cheng; Yiwei Cao; Chuan Hu; Fenglin Zou; Wencheng Yu; Tao Xu
Journal: Front Public Health Date: 2020-08-11

8. Early prediction of disease progression in COVID-19 pneumonia patients with chest CT and clinical characteristics.

Authors: Zhichao Feng; Qizhi Yu; Shanhu Yao; Lei Luo; Wenming Zhou; Xiaowen Mao; Jennifer Li; Junhong Duan; Zhimin Yan; Min Yang; Hongpei Tan; Mengtian Ma; Ting Li; Dali Yi; Ze Mi; Huafei Zhao; Yi Jiang; Zhenhu He; Huiling Li; Wei Nie; Yin Liu; Jing Zhao; Muqing Luo; Xuanhui Liu; Pengfei Rong; Wei Wang
Journal: Nat Commun Date: 2020-10-02 Impact factor: 14.919

9. The Clinical and Chest CT Features Associated With Severe and Critical COVID-19 Pneumonia.

Authors: Kunhua Li; Jiong Wu; Faqi Wu; Dajing Guo; Linli Chen; Zheng Fang; Chuanming Li
Journal: Invest Radiol Date: 2020-06 Impact factor: 10.065

5 in total

1. Two novel nomograms for predicting the risk of hospitalization or mortality due to COVID-19 by the naïve Bayesian classifier method.

Authors: Eda Karaismailoglu; Serkan Karaismailoglu
Journal: J Med Virol Date: 2021-03-01 Impact factor: 20.693

2. Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach.

Authors: Kenneth Chi-Yin Wong; Yong Xiang; Liangying Yin; Hon-Cheong So
Journal: JMIR Public Health Surveill Date: 2021-09-30

3. Predicting South Korea adolescents vulnerable to depressive disorder using Bayesian nomogram: A community-based cross-sectional study.

Authors: Haewon Byeon
Journal: World J Psychiatry Date: 2022-07-19

4. A Multi-Task Convolutional Neural Network for Lesion Region Segmentation and Classification of Non-Small Cell Lung Carcinoma.

Authors: Zhao Wang; Yuxin Xu; Linbo Tian; Qingjin Chi; Fengrong Zhao; Rongqi Xu; Guilei Jin; Yansong Liu; Junhui Zhen; Sasa Zhang
Journal: Diagnostics (Basel) Date: 2022-07-31

5. Influence of the COVID-19 Pandemic on the Subjective Life Satisfaction of South Korean Adults: Bayesian Nomogram Approach.

Authors: Haewon Byeon
Journal: Diagnostics (Basel) Date: 2022-03-21

5 in total