| Literature DB >> 33613099 |
Mostafa Atlam1, Hanaa Torkey1, Nawal El-Fishawy1, Hanaa Salem2.
Abstract
Coronavirus (COVID-19) is one of the most serious problems that has caused stopping the wheel of life all over the world. It is widely spread to the extent that hospital places are not available for all patients. Therefore, most hospitals accept patients whose recovery rate is high. Machine learning techniques and artificial intelligence have been deployed for computing infection risks, performing survival analysis and classification. Survival analysis (time-to-event analysis) is widely used in many areas such as engineering and medicine. This paper presents two systems, Cox_COVID_19 and Deep_ Cox_COVID_19 that are based on Cox regression to study the survival analysis for COVID-19 and help hospitals to choose patients with better chances of survival and predict the most important symptoms (features) affecting survival probability. Cox_COVID_19 is based on Cox regression and Deep_Cox_COVID_19 is a combination of autoencoder deep neural network and Cox regression to enhance prediction accuracy. A clinical dataset for COVID-19 patients is used. This dataset consists of 1085 patients. The results show that applying an autoencoder on the data to reconstruct features, before applying Cox regression algorithm, would improve the results by increasing concordance, accuracy and precision. For Deep_ Cox_COVID_19 system, it has a concordance of 0.983 for training and 0.999 for testing, but for Cox_COVID_19 system, it has a concordance of 0.923 for training and 0.896 for testing. The most important features affecting mortality are, age, muscle pain, pneumonia and throat pain. Both Cox_COVID_19 and Deep_ Cox_COVID_19 prediction systems can predict the survival probability and present significant symptoms (features) that differentiate severe cases and death cases. But the accuracy of Deep_Cox_Covid_19 outperforms that of Cox_Covid_19. Both systems can provide definite information for doctors about detection and intervention to be taken, which can reduce mortality.Entities:
Keywords: COVID-19; Coronavirus; Cox regression; Deep learning; Mortality and autoencoder; Survival analysis; Symptoms
Year: 2021 PMID: 33613099 PMCID: PMC7883884 DOI: 10.1007/s10044-021-00958-0
Source DB: PubMed Journal: Pattern Anal Appl ISSN: 1433-7541 Impact factor: 2.580
Fig. 1[18]: Taxonomy of survival analysis methods
Fig. 2Autoencoder components
Fig. 3Proposed survival analysis system architecture
Fig. 4Characteristics of patients
COVID-19 clinical dataset description
| Dataset | Target variables | Symptoms | Training samples | Testing samples | |||
|---|---|---|---|---|---|---|---|
| Classes | Duration | ||||||
| COVID-19 dataset [ | Two classes: death or alive | Time to the event of death or life | 29 | 369 | 140 | ||
| Demographics | Common | Other | |||||
| 5 | 7 | 17 | |||||
Features’ percentage and P value
| Symptom (Feature) | Percentage in dataset | Coefficient | |||
|---|---|---|---|---|---|
| Demographics | Age | – | 9.835E-15 | 0.1121 | |
| Gender | Male | 61.69% | 0.2095 | − 0.4998 | |
| Female | 38.31% | ||||
| Common symptoms | Cough | 29.86% | 0.937 | − 0.0807 | |
| Fever | 46.56% | 0.8287 | − 0.1853 | ||
| High fever | 0.4% | 0.9712 | − 8.5365 | ||
| Joint pain | 1.8% | 0.9524 | − 8.4264 | ||
| Respiratory distress | 0.2% | 0.9864 | − 6.4765 | ||
| Dyspnoea | 1.4% | 0.9909 | − 2.636 | ||
| Difficulty in breathing | 3.14% | 0.1446 | 1.7397 | ||
| Malaise | 5.5% | 0.9396 | − 6.7776 | ||
| Fatigue | 2.16% | 0.1021 | 2.3038 | ||
| Other symptoms | Running nose | 3.14% | 0.9355 | − 7.9937 | |
| Flu | 0.59% | 0.9781 | − 7.0568 | ||
| Chest pain | 0.59% | 0.9656 | − 8.5042 | ||
| Sputum | 2.16% | 0.9559 | − 7.3175 | ||
| Dry mouth | 0.2% | 0.9798 | − 8.5962 | ||
| Thirst | 0.2% | 0.9866 | − 6.4765 | ||
| Abdominal pain | 0.2% | 0.9975 | − 1.9189 | ||
| Vomiting | 1.18% | 0.9715 | − 6.2568 | ||
| Diarrhea | 1.96% | 0.9702 | − 5.7918 | ||
| Loss of appetite | 0.39% | 0.9682 | − 9.2821 | ||
| Chills | 2.95% | 0.9386 | − 8.4315 | ||
| Sore body | 0.2% | 0.9933 | − 3.9594 | ||
| Reflux | 0.2% | 0.9945 | − 3.4291 | ||
| Nausea | 0.79 | 0.9774 | − 6.3657 | ||
| Headache | 3.73% | 0.9505 | − 6.2049 | ||
Bold values indicated best results
Fig. 5a, and b: Survival curve for randomly 10 patients using Cox_COVID_19
Fig. 6a, and b: Survival curve for randomly 10 patients using Deep_Cox_COVID_19
Autoencoder construction
| Input layer | Encoder | Bottleneck | Decoder | Reconstruction loss | Activation function | |
|---|---|---|---|---|---|---|
| Number of layers | 1 | 1 | 1 | 1 | binary_crossentropy | Relu |
| Number of nodes | 31 | 31 | 30 | 31 | – | – |
Survival function accuracy for proposed system with different thresholds
| Threshold | Cox_COVID_19 | Deep_Cox_COVID_19 | ||
|---|---|---|---|---|
| Accuracy | ||||
| Train (%) | Test (%) | Train (%) | Test (%) | |
| 95.12 | ||||
| 0.15 | 93.5 | 95 | 95.39 | |
| 0.2 | 93.77 | 95 | 95.93 | |
| 93.77 | 92.9 | |||
| 0.45 | 93.31 | 91.4 | 96.21 | 95 |
Bold values indicated best results
Survival function precision for proposed system
| Threshold | Cox_COVID_19 | Deep_Cox_COVID_19 | ||
|---|---|---|---|---|
| Precision | ||||
| Train (%) | Test (%) | Train (%) | Test (%) | |
| 0.1 | 100 | 100 | 100 | 100 |
| 0.15 | 92.9 | 80 | 100 | 100 |
| 0.2 | 93.3 | 80 | 100 | 100 |
| 81 | 50 | |||
| 0.45 | 77.8 | 41.7 | 81.5 | 80 |
Bold values indicated best results
Comparison of survival function accuracy for proposed systems and other algorithms
| Author(s) | Algorithm | Accuracy (%) | Dataset |
|---|---|---|---|
| Nemati et al. [ | IPCRidge | 49.05 | Open-access COVID- 19 epidemiological data [ |
| CoxPH | 70.63 | ||
| Coxnet | 70.72 | ||
| Stagewise GB | 71.47 | ||
| Componentwise GB | 70.60 | ||
| Fast SVM | 70.65 | ||
| Fast Kernel SVM | 61.05 | ||
Cox_COVID_19 (Cox regression method) | Novel Corona Virus 2019 Dataset-Kaggle [ | ||
| Deep_Cox_COVID_19 |
Bold values indicated best results
Fig. 7Survival function accuracy for proposed system and other algorithms
Comparison of proposed system and previous studies for features affecting the mortality
| Author(s) | Algorithm | Key features | Results | |
|---|---|---|---|---|
| Feature | Percentage | |||
| Yan, Zhang et al. [ | XGBoost machine learning algorithm | Male | 58.7% | Male, fever, cough, fatigue, dyspnoea, lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP) are the key features for differentiating between critical patients from the two classes |
| Fever | 49.9% | |||
| Cough | 13.9% | |||
| Fatigue | 3.7% | |||
| Dyspnoea | 2.1% | |||
| Shuai Zhang et al. [ | Univariable Cox regression Model | Age, years | – | Age, male, fever, cough, weakness, severely ill, any and hypertension are the most important factors affecting the mortality |
| Male | 60% | |||
| Fever | 66.67% | |||
| Cough | 70% | |||
| Weakness | 53.33% | |||
| Severely ill | 96.67% | |||
| Any | 70% | |||
| Hypertension | 53.33% | |||
| Cox_COVID_19 prediction system | Cox regression method | Age | – | Age, fever, cough, pneumonia, muscle pain and throat pain are the most important factors affecting the mortality |
| Male | 61.69% | |||
| Fever | 46.56% | |||
| Pneumonia | 36.7% | |||
| Cough | 29.86% | |||
| Throat Pain | 8.3% | |||