Literature DB >> 35847143

Using logistic regression to develop a diagnostic model for COVID-19: A single-center study.

Raoof Nopour1, Mostafa Shanbehzadeh2, Hadi Kazemi-Arpanahi3,4.   

Abstract

BACKGROUND: The main manifestations of coronavirus disease-2019 (COVID-19) are similar to the many other respiratory diseases. In addition, the existence of numerous uncertainties in the prognosis of this condition has multiplied the need to establish a valid and accurate prediction model. This study aimed to develop a diagnostic model based on logistic regression to enhance the diagnostic accuracy of COVID-19.
MATERIALS AND METHODS: A standardized diagnostic model was developed on data of 400 patients who were referred to Ayatollah Talleghani Hospital, Abadan, Iran, for the COVID-19 diagnosis. We used the Chi-square correlation coefficient for feature selection, and logistic regression in SPSS V25 software to model the relationship between each of the clinical features. Potentially diagnostic determinants extracted from the patient's history, physical examination, and laboratory and imaging testing were entered in a logistic regression analysis. The discriminative ability of the model was expressed as sensitivity, specificity, accuracy, and area under the curve, respectively.
RESULTS: After determining the correlation of each diagnostic regressor with COVID-19 using the Chi-square method, the 15 important regressors were obtained at the level of P < 0.05. The experimental results demonstrated that the binary logistic regression model yielded specificity, sensitivity, and accuracy of 97.3%, 98.8%, and 98.2%, respectively.
CONCLUSION: The destructive effects of the COVID-19 outbreak and the shortage of healthcare resources in fighting against this pandemic require increasing attention to using the Clinical Decision Support Systems equipped with supervised learning classification algorithms such as logistic regression. Copyright:
© 2022 Journal of Education and Health Promotion.

Entities:  

Keywords:  Coronavirus; coronavirus disease-2019; logistic regression; prognostic modeling

Year:  2022        PMID: 35847143      PMCID: PMC9277749          DOI: 10.4103/jehp.jehp_1017_21

Source DB:  PubMed          Journal:  J Educ Health Promot        ISSN: 2277-9531


Introduction

Since December 2019, a new strand of coronavirus disease at first named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) or 2019 n-CoV has emerged in Wuhan District, Hubei Province, China. It is thought that SARS-CoV-2 has animal origins that slipped from animal species into the human population.[12] The World Health Organization later on February 11, 2020 announced coronavirus disease 2019 “COVID-19” as the name of this new disease.[34] COVID-19 is a highly contagious viral infectious disease and continues to spreading aggressively around the whole world. The virus affects the lungs and causes severe respiratory pneumonia and is more dangerous in people with underlying conditions such as low immune function system, cardiovascular and respiratory diseases, malignancies, co-infectious diseases, and diabetes.[5678910] The complex and highly contagious nature of the COVID-19 had led the WHO to declare this outbreak a public health emergency, which consequently brought significant health, economic, and social challenges.[1112] It is mandatory to be able to differentiate between COVID-19 from other pneumonia-like diseases early after symptom development.[4] Due to the high-level spread and increasing epidemiology trend of COVID-19, its early diagnosis and rapid isolation of infected people play a key role in confining this virus and thereby reducing the disease outbreak and mortality rate.[9] Deadly complications, long latency period, difficulties of detection and testing, unknown of many characteristics, ambiguous transmission modes, and differential diagnosis with other respiratory morbidities such as pneumonia, have increased the vagueness and challenges in understanding the risk factors leading to the disease, its natural history, and controlling the outbreak.[13] In addition, it is necessary to seek early detection and isolation of positive cases as rapidly and accurately as possible for containing the transmission of the virus, especially for asymptomatic cases in an early stage.[4] Now, accurate nucleic acid detection plays a key role in COVID-19 detection and inhibition, and reverse transcription-polymerase chain reaction is the core technique for nucleic acid detection. However, it is not possible to use this time-consuming and labor-intensive method for screening a large increasing number of supposed individuals and symptomless infected cases.[14] Thus, an accurate and reliable diagnostic method to provide patient risk classification to support reliable clinical decision-making with the hope of improving patient outcomes and quality of care can help reduce misdiagnosis and poor prognosis of COVID-19.[13] Therefore, to fight against COVID-19, risk assessment has become increasingly imperative during this pandemic. Several different statistical methods can be used to develop a risk prediction model including but not restricted to logistic regression, linear regression, Cox regression, and machine learning (ML).[151617] To address the uncertainties in COVID-19 diagnosis, logistic regression as a supervised learning classification algorithm can be utilized to provide patient risk stratification to support tailored clinical decision-making including measuring the probability of a disease, assessing the disease likelihood, forecasting the spread, and predicting fatality.[1318192021] To study the risk factors associated with COVID-19, logistic regressions have become a fundamental section of any data analysis related to the explanation of association among an outcome variable and one or more predictor variables. Therefore, this study was undertaken to develop a diagnostic model to predict the risk of the development of COVID-19. This can be used as a quick screening tool to improve the diagnostic efficiency of COVID-19 through statistical analysis of the significant factors.

Materials and Methods

Study design and setting

A hospital-based, retrospective, and applied study was designed in Ayatollah Talleghani hospital (COVID-19 referral center) Abadan, Iran, to develop a diagnosis model using logistic regression analysis in COVID-19 patients.

Study participants and sampling

The study participants included case records of 435 patients that referred to the center for COVID-19 diagnosis. Of the 435 case records for the study, 35 cases of missing data, have been excluded from the analysis. Among participants who were eligible for the study, 250 cases belonged to people with laboratory-confirmed infection, and 150 cases belonged to NonCOVID-19 patients suffering from pneumonia of other origins.

Model development and evaluation

For model development, multiple analyses were conducted to extract the most important regressors from the patient's medical record where COVID-19 positive cases were classified as 1 and healthy individuals classified as 0. For preliminary analysis of the data, the Chi-Square correlation coefficient method was used. Then, the diagnostic regressors that are correlated with the output class variable (in COVID-19-positive and non-COVID-19 groups) was obtained at the level of P < 0.05 were selected. Furthermore, to create a suitable model for COVID-19 diagnosis, by considering the output class (binary), the binary logistic regression (BLR) with the backward conditional method was used. In the current logistic regression model, the success was defined when the dependent variable took the value positive COVID-19. Finally, the efficiency of the developed model was evaluated using the amount of Log-likelihood (Formula1), confusion matrix, as well as its accuracy, precision, and sensitivity were determined. The amount of Log-likelihood is a coefficient of the observed and predicted value according to Equation (1) and reducing it by adding the variables indicates a better model performance. In Table 1, the true positive (TP) and true negative (TN) represent the number of samples that belonged to sick and healthy individuals, which are correctly classified by the model, respectively. The false positive (FP) represents the number of healthy individuals that are classified as the patient, falsely by the model. The false negative (FN) represents the patients that the model is incorrectly classified as healthy people, therefore, the specificity, sensitivity, and accuracy of the model, according to relationships were obtained (Equitation 1-3).
Table 1

If term removed model (step 5)

Variable nameModel log-likelihoodChange in −2 log-likelihoodDfSignificance of change
History of contact with suspected people−21.4415.52710.19
History of alcohol consumption−19.0220.66810.407
History of the immunosuppressive drug in taking−18.8490.34210.552
History of ARDS−19.2801.20310.273
Fever−28.47819.60110
Cough−20.9154.47310.034
Chest pain−19.4521.54810.213
Disability−19.6491.94110.164
Rhinorrhea−22.9918.62610.003
Lung lesion−51.45565.55410

ARDS=Acute respiratory distress syndrome, Df=Degree of freedom

If term removed model (step 5) ARDS=Acute respiratory distress syndrome, Df=Degree of freedom

The area under the curve

The area under the curve (also called AUC) is equal to the probability that one algorithm will score a selected positive sample higher than a selected negative one in a random state. The area under the curve is given by (the integral boundaries are reversed as large T has a lower value on the x-axis) (Equations 4 and 5).[22] Equation 5:

Ethical consideration

The research deputy of Abadan University of Medical Sciences (ethical code: IR.ABADANUMS.REC.1400.064) approved the current study. To protect the privacy and confidentiality of patients, we concealed the unique identifying information of all patients in the process of data collection and presentation.

Results

After analyzing all the diagnostic regressors of COVID-19 (or predictors) using the Chi-square bivariate correlation method, the 15 important variables were identified, as follows: Blood oxygen saturation, with P < 0.01 was obtained as a useful laboratory finding in the diagnosis of COVID-19. The history of alcohol consumption (P = 0.01), consumption of immunosuppressive drugs (P < 0.01), history of respiratory failure (acute respiratory distress syndrome) (P < 0.01), history of respiratory infection (pneumonia) (P = 0.01) were considered as important diagnostics factors. The history of traveling to high-risk areas (P < 0.01) and the history of contact with people with the suspected COVID-19 (P < 0.01) were considered effective epidemiological factors in diagnosing COVID-19. The developing COVID-19 diagnostic model based on binary logistic regression analysis using the conditional backward method was performed in five steps that are shown in Table 1. In this method, by removing the diagnostic regressors of COVID-19 disease in five steps (each step equals the removal of one variable) that were less important [Table 1], the 10 diagnostic regressors that are the most important variables in creating a regression model for diagnosing COVID-19 disease has been determined. In this model, by removing the less important variables, the log-likelihood function was reduced for a certain degree of freedom (DF = 1); therefore, the efficiency of the logistic regression model in diagnosing COVID-19 has been improved by five sequential steps. Figures 1 and 2 show the results of the classification of possible groups predicted by the regression detection model in terms of different frequency values observed at the 1st and 5th steps of this model, respectively.
Figure 1

The predicted probability group according to the different iteration frequency (step 1)

Figure 2

The predicted probability group according to the different iteration frequency (step 5)

The predicted probability group according to the different iteration frequency (step 1) The predicted probability group according to the different iteration frequency (step 5) The criterion for classifying healthy and sick people according to the predicted value (cutoff value) in these forms, as can be seen, is 0.5, in other words, the value is higher than 0.5 in the group of sick people and lower than 0.5 in the group of healthy people. Furthermore, each number (case) in the figure represents five samples. As shown in Figure 1, a case representing COVID-19 in the iterated frequency of 10 was incorrectly classified in the predicted probability group (0 < X < 0.1) (X = predicted probability group). However, at the 5th step of the regression model, all cases were classified correctly. Furthermore, in the 5th step, all the cases predicted by the model were in according to the actual cases for different iterated frequencies, which has a better performance than the first step. Furthermore, the cases without disease are better classified in the 5thstep than the 1ststep with an incorrectly classified case described. The result of classifying the cases of the sick and healthy individuals at the 1stand 5thsteps of classifying of BLR function is shown in Table 2.
Table 2

The 1th and 5th step of binary logistic regression confusion matrix

BLR’s stepsCases classified
1th step
 19735
 53115
5th step
 2474
 3146
The 1th and 5th step of binary logistic regression confusion matrix Based on the comparison of the information obtained from Figures 1 and 2, it is observed that by removing the less important variables than the regression model at the 5th step compared to the first step, the amount of TP and FN (number of correctly classified items) obtained from the last step, as the Figure 2 shows have been increased compared to the first. In the 1st step, the percentage of the correctly classified cases was 78% and in the 5th, the percentage of the correctly classified cases was 98%. Therefore, by omitting diagnostic criteria of less importance than the model, a significant improvement in its efficiency has been achieved and the percentage of correctly classified cases has increased by 20% as shown in Figure 3. The confusion matrix in Table 2 (step 5), and Table 2 shows the specificity, sensitivity, and accuracy of 97.3%, 98.8%, and 98.2%, respectively.
Figure 3

Comparison of the BLR values

Comparison of the BLR values The result of the receiver operator characteristics (ROC) at all five steps is shown in Figure 4. The vertical and horizontal vertices show TP rate (TPR) and FP rate, respectively.
Figure 4

The receiver operator characteristics graph of BLR in five steps

The receiver operator characteristics graph of BLR in five steps Based on Figure 4, in the 5th step, the ROC curve was closer to the TPR than in the initial steps, so in this step, it had better performance in classifying the true cases and generally had better performance in the classification of study samples.

Discussion

The exponential outbreak of the COVID-19 and the continuously increased number of infected cases without effective treatment forces the medical fraternity to discover the many unfamiliar aspects of the disease.[2324] In this situation, the health-care industry is imposed with great pressure with severe shortages of intensive care resources.[2526] Therefore, developing an accurate diagnostic model that can effectively predict COVID-19 presence (diagnosis) with important prognostic determinants is indeed vital.[12] This study proposed an understandable, intuitive, and yet accurate prediction model using logistic regression based on the most important predictors. The logistic regression model is a chief technique to recognize the principle that the goal of an analysis is the same as that of the traditional model building technique used in statistical theory to find a suitable description of the relationship between the outcome variable and predictor variables. Logistic regression was employed to assess hypotheses about the associations of the response variable with explanatory variables. It does not need normally distributed data compared with discriminant analysis.[419] Logistic regression aids one to forecast the discrete outcome from a variety of variables. In the logistic regression model, we consider the outcome variable is a categorical random variable, getting only two likely outcomes named binary or dichotomous. The logistic regression allows physicians and researchers to recognize significant contributing factors related to COVID-19. This method also permits researchers to evaluate the extent of the effect of factors.[2728] Some efforts have been focused on predicting COVID-19 using a regression model and regularly collected clinical data. Almeshal et al. devised compartmental and LR models to predict the spread of COVID-19 in Kuwait, their study revealed the high performance of the LR model to predict the peak of daily cases, the total infected cases, and the expected dates of the start and ending phase of the epidemic.[13] Zhou et al. used ML based on logistic regression to predict the fatality rate of COVID-19 patients that can effectively predict the outcomes of COVID-19 patients with fatality probabilities (accumulative f1-score = 93.76% and accuracy score = 93.92%).[29] Xu et al. applied ordinal logistic regression analysis to identify the determinants of illness severity of COVID-19.[21] Bhandari et al. developed a predictor model of mortality risk in COVID-19 patients from routine hematologic parameters. The performance metrics of the model with 5-fold cross-validation showed an area under the receiver operating characteristic curve, sensitivity, specificity, and validation accuracy to be 0.95, 90%, 92%, and 70%, respectively.[19] Medina-Mendieta et al. used Logistic Regression and Gompertz Curves to predict peaks and total numbers of infected cases and deaths due to COVID-19. Both models showed good fit, low mean square errors, and all parameters were highly significant.[20] Hu and Li showed that the AUC, sensitivity, and specificity of a death prediction model based on a logistic regression model for predicting the mortality of COVID-19 cases during hospitalization were 0.804, 83.8%, and 82.3%.[30] The novelty of this study lies in the use of logistic regression in developing a diagnostic model, it is extremely vital to develop an accurate diagnostic model that can effectively diagnose COVID-19 patients with important prognostic criteria. Designing a scientific and valid LR-based prediction model, while not a substitute for clinical experience, can assist in the early detection and effective supportive intervention to improve patient outcomes and quality of care and ultimately a decrease of death in COVID-19 patients.[2831] This led to decreasing ambiguity by offering quantitative, objective, and evidence-based models for risk stratification, prediction, and eventually episode of care planning.[32]

Limitations and recommendations

This study had some limitations. First, we analyze a retrospective dataset that may lack control over data fields or incomplete data. Second, the dataset was extracted from a single hospital with a low sample size based on 400 data in one city, consequently, they may not be generalizable. In the future, the inclusion of other clinical and radiological features could contribute to increasing the accuracy of the prediction model.

Conclusion

In this study, we demonstrated the use of the logistic regression model to diagnose COVID-19 ahead of time. This method has many potential applications to provide frontline clinicians with an objective instrument to manage COVID-19 patients more efficiently in such time-sensitive, resource-demanding, and potentially resource-constrained situations. Finally, we believed further investigations are needed to validate our model in a larger, multi-central, and more qualitative dataset.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.
  25 in total

1.  COVID-19 Forecasts for Cuba Using Logistic Regression and Gompertz Curves.

Authors:  Juan Felipe Medina-Mendieta; Manuel Cortés-Cortés; Manuel Cortés-Iglesias
Journal:  MEDICC Rev       Date:  2020-07       Impact factor: 0.583

2.  Performance of Diagnostic Model for Differentiating Between COVID-19 and Influenza: A 2-Center Retrospective Study.

Authors:  Jingwen Li; Simin Li; Xiaoming Qiu; Wenyan Zhu; Linfeng Li; Bo Qin
Journal:  Med Sci Monit       Date:  2021-05-12

3.  Uncertainty and COVID-19: how are we to respond?

Authors:  Jonathan Koffman; Jamie Gross; Simon Noah Etkind; Lucy Selman
Journal:  J R Soc Med       Date:  2020-06       Impact factor: 5.344

4.  Application of data mining techniques for predicting residents' performance on pre-board examinations: A case study.

Authors:  Leila Amirhajlou; Zohre Sohrabi; Mahmoud Reza Alebouyeh; Nader Tavakoli; Roghye Zare Haghighi; Akram Hashemi; Amir Asoodeh
Journal:  J Educ Health Promot       Date:  2019-06-27

5.  Laboratory Parameters in Detection of COVID-19 Patients with Positive RT-PCR; a Diagnostic Accuracy Study.

Authors:  Rajab Mardani; Abbas Ahmadi Vasmehjani; Fatemeh Zali; Alireza Gholami; Seyed Dawood Mousavi Nasab; Hooman Kaghazian; Mehdi Kaviani; Nayebali Ahmadi
Journal:  Arch Acad Emerg Med       Date:  2020-04-04

6.  Comparing Ventilation Parameters for COVID-19 Patients Using Both Long-Term ICU and Anesthetic Ventilators in Times of Shortage.

Authors:  Wouter M Dijkman; Niels M C van Acht; Jesse P van Akkeren; Rhasna C D Bhagwanbali; Carola van Pul
Journal:  J Intensive Care Med       Date:  2021-06-17       Impact factor: 3.510

Review 7.  Cardiovascular disease and COVID-19.

Authors:  Manish Bansal
Journal:  Diabetes Metab Syndr       Date:  2020-03-25

8.  Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges.

Authors:  Chih-Cheng Lai; Tzu-Ping Shih; Wen-Chien Ko; Hung-Jen Tang; Po-Ren Hsueh
Journal:  Int J Antimicrob Agents       Date:  2020-02-17       Impact factor: 5.283

Review 9.  World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19).

Authors:  Catrin Sohrabi; Zaid Alsafi; Niamh O'Neill; Mehdi Khan; Ahmed Kerwan; Ahmed Al-Jabir; Christos Iosifidis; Riaz Agha
Journal:  Int J Surg       Date:  2020-02-26       Impact factor: 6.071

10.  Machine learning based early warning system enables accurate mortality risk prediction for COVID-19.

Authors:  Yue Gao; Guang-Yao Cai; Wei Fang; Hua-Yi Li; Si-Yuan Wang; Lingxi Chen; Yang Yu; Dan Liu; Sen Xu; Peng-Fei Cui; Shao-Qing Zeng; Xin-Xia Feng; Rui-Di Yu; Ya Wang; Yuan Yuan; Xiao-Fei Jiao; Jian-Hua Chi; Jia-Hao Liu; Ru-Yuan Li; Xu Zheng; Chun-Yan Song; Ning Jin; Wen-Jian Gong; Xing-Yu Liu; Lei Huang; Xun Tian; Lin Li; Hui Xing; Ding Ma; Chun-Rui Li; Fei Ye; Qing-Lei Gao
Journal:  Nat Commun       Date:  2020-10-06       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.