| Literature DB >> 35455038 |
Tsz-Kin Wan1, Rui-Xuan Huang1, Thomas Wetere Tulu2,3, Jun-Dong Liu2, Asmir Vodencarevic4, Chi-Wah Wong5, Kei-Hang Katie Chan1,2,6.
Abstract
(1) Background: Coronavirus disease 2019 (COVID-19) is a dominant, rapidly spreading respiratory disease. However, the factors influencing COVID-19 mortality still have not been confirmed. The pathogenesis of COVID-19 is unknown, and relevant mortality predictors are lacking. This study aimed to investigate COVID-19 mortality in patients with pre-existing health conditions and to examine the association between COVID-19 mortality and other morbidities. (2)Entities:
Keywords: COVID-19; COVID-19 mortality; machine learning model; mortality predictors; prediction model
Year: 2022 PMID: 35455038 PMCID: PMC9028639 DOI: 10.3390/life12040547
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Basic characteristics of the UK Biobank study participants showed the mean and its one standard deviation or percentage and actual number of patients basic characteristics.
| Basic UK Biobank Data Characteristics | Statistics (All Data, | Statistic (Death Due to COVID-19, |
|---|---|---|
| Age | 66.5 (57.8, 75.1) | 75.8 (55.9, 90.0) |
| Death | 5.37% ( | N/A |
| Male gender | 52.8% | 34.2% |
| Height | 168.6 (159.4, 177.8) | 168.8(159.6, 178.0) |
| Weight | 80.1 (64.4, 95.8) | 84.0(67.3, 100.8) |
| Body mass index | 28.0 (23.5, 32.6) | 29.3(24.3, 34.4) |
| Current tobacco smoking | 7.9% ( | 10.6% ( |
| Vascular/heart problems diagnosed by doctor | 23.2% ( | 37.7% ( |
| Blood clot, deep-vein thrombosis, bronchitis, emphysema, asthma, rhinitis, eczema, or allergy diagnosed by a doctor | 16.5% ( | 22.2% ( |
| Other serious medical condition/disability diagnosed by a doctor | 19.0% ( | 33.3% ( |
| Long-standing illness, disability or infirmity | 33.5% ( | 57.4 ( |
| Alcohol consumption | 95.9% ( | 93.3% ( |
Example of features representing the same meaning under the same UDI.
| UDI 1 | Data Size | Description |
|---|---|---|
| 21–0.0 | 500,790 | |
| 21–1.0 | 20,334 | Weight |
| 21–2.0 | 46,439 | method |
| 21–3.0 | 2729 |
1 UDI—the Unique Data Identifier for an item of data within the UK Biobank repository.
Examples of features representing the same meaning in different UDI grouping.
| UDI | Description |
|---|---|
| 94 | Diastolic blood pressure, manual reading |
| 4079 | Diastolic blood pressure, automated reading |
Input features for models.
| Feature Name | |||
|---|---|---|---|
| Able to confide | Age at recruitment | Age completed full time education | Age first had sexual intercourse |
| Age when attended assessment centre | Alanine aminotransferase | Albumin | Alkaline phosphatase |
| Arm fat-free mass (left) | Arm fat-free mass (right) | Arm predicted mass (left) | Arm predicted mass (right) |
| Aspartate aminotransferase | Average total household income before tax | Birth weight known | Body mass index (BMI) |
| Body mass index (BMI) | Bread intake | Breastfed as a baby | Carer support indicators |
| Chest pain or discomfort | Cholesterol | Cooked vegetable intake | C-reactive protein |
| Creatinine | Current employment status | Cystatin C | Daytime dozing/sleeping (narcolepsy) |
| Direct bilirubin | Dried fruit intake | Eosinophill count | Ever had bowel cancer screening |
| Falls in the last year | Father still alive | Forced expiratory volume in 1-s (FEV1) | Forced expiratory volume in 1-s (FEV1) Z-score |
| Forced vital capacity (FVC) | Forced vital capacity (FVC) Z-score | Gamma glutamyltransferase | Genetic sex |
| Glucose | Glycated haemoglobin (HbA1c) | Haematocrit percentage | Haemoglobin concentration |
| HDL cholesterol | Hearing difficulty/problems | Hearing difficulty/problems with background | High light scatter reticulocyte count |
| High light scatter reticulocyte percentage | Housing score (England) | IGF-1 | Illnesses of siblings |
| Immature reticulocyte fraction | Impedance of arm (left) | Impedance of arm (right) | Impedance of leg (left) |
| Impedance of leg (right) | Impedance of whole body | Intended management of patient (polymorphic) | Intended management of patient (recoded) |
| Interpolated Age of participant when non-cancer illness first | diagnosed | Interpolated Age of participant when operation took place | Interpolated Year when operation took place |
| IPAQ activity group | LDL direct | Leg fat-free mass (left) | Leg fat-free mass (right) |
| Leg predicted mass (left) | Leg predicted mass (right) | Length of mobile phone use | |
Data distribution.
| Training and Prediction Round | 50 Times |
|---|---|
| Prediction type | Regression |
| Total number of data point | In the DNN model: |
Regression results from different models obtained on the testing data.
| Model | Result (AUC) |
|---|---|
| DNN | 0.84 (95% CI: 0.81–0.85) |
| RF | 0.86 (95% CI: 0.84–0.88) |
| Linear SVM | 0.81 (95% CI: 0.79–0.83) |
| XGB | 0.83 (95% CI: 0.82–0.86) |
Figure 1Receiver operating characteristic curve of the RF model.
Predicted results and corresponding mortality rates.
| Predicted Probability % | Number of Predicted Patients | Number of Deaths | Mortality Rate % |
|---|---|---|---|
| [0,10) | 1335.0 | 3.0 | 0.225 |
| [10,20) | 664.0 | 17.0 | 2.56 |
| [20,30) | 440.0 | 39.0 | 8.86 |
| [30,40) | 303.0 | 46.0 | 15.18 |
| [40,50) | 190.0 | 41.0 | 21.58 |
| [50,60) | 39.0 | 9.0 | 23.08 |
| [60,70) | 4.0 | 2.0 | 50 |
| [70,80) | 0.0 | 0.0 | NaN |
| [80,90) | 0.0 | 0.0 | NaN |
| [90,100) | 0.0 | 0.0 | NaN |
Figure 2Predicted results and corresponding mortality rates.
Figure 3Top 20 important features for the RF model.
Figure 4SHapley Additive exPlanations.
Figure 5SHapley Additive exPlanations.
Comparison of performance with developed scoring systems.
| Model | AUC |
|---|---|
| RF model (This study) | 0.863 (95% CI: 0.842–0.881) |
| NEWS2 [ | 0.790 (95% CI: 0.643–0.937) |
| CURB-65 [ | 0.81 (95% CI: 0.71–0.91) |
| ISARIC-4C [ | 0.79 (95% CI: 0.78–0.79) |
| qCSI [ | 0.81 (95% CI: 0.73–0.89) |