| Literature DB >> 34413324 |
Mohammad A Dabbah1, Angus B Reed1, Adam T C Booth1, Arrash Yassaee1,2, Aleksa Despotovic1,3, Benjamin Klasmer1, Emily Binning1, Mert Aral1, David Plans4,5, Davide Morelli1,6, Alain B Labrique7, Diwakar Mohan7.
Abstract
The COVID-19 pandemic has created an urgent need for robust, scalable monitoring tools supporting stratification of high-risk patients. This research aims to develop and validate prediction models, using the UK Biobank, to estimate COVID-19 mortality risk in confirmed cases. From the 11,245 participants testing positive for COVID-19, we develop a data-driven random forest classification model with excellent performance (AUC: 0.91), using baseline characteristics, pre-existing conditions, symptoms, and vital signs, such that the score could dynamically assess mortality risk with disease deterioration. We also identify several significant novel predictors of COVID-19 mortality with equivalent or greater predictive value than established high-risk comorbidities, such as detailed anthropometrics and prior acute kidney failure, urinary tract infection, and pneumonias. The model design and feature selection enables utility in outpatient settings. Possible applications include supporting individual-level risk profiling and monitoring disease progression across patients with COVID-19 at-scale, especially in hospital-at-home settings.Entities:
Mesh:
Year: 2021 PMID: 34413324 PMCID: PMC8376891 DOI: 10.1038/s41598-021-95136-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Descriptive characteristics of the UK Biobank cohort with positive COVID-19 test results. Pre-existing medical conditions included only when reported more than one week prior to COVID-19 positive test result. Symptoms and vitals included only from primary care (GP) records when reported within + /- two weeks of COVID-19 positive test result. MND = motor neurone disease; MS = multiple sclerosis; HD = Huntington’s disease. * Oxygen saturation, respiratory rate, and body temperature were included in the initial analysis, however, they were removed from the model due to low data availability.
| Characteristic | n (%) [count] | ||
|---|---|---|---|
| All Participants | Survived | Died | |
| 11,245 | 10,605 (94.3) | 640 (5.7) | |
| Male sex | 5,274 | 4,850 (92) | 424 (8) |
| Age (yrs), mean (SD) | 66.9 (8.7) | 66.4 (8.6) | 76.0 (5.6) |
| Body mass index, mean (SD) | 28.4 (5.1) [11,153] | 28.3 (5.1) [10,528] | 30.0 (5.7) [625] |
| Waist circumference (cm), mean (SD) | 92.5 (14.0) [11,185] | 92.1 (13.9) [10,556] | 100.1 (14.7) [629] |
| Hip circumference (cm), mean (SD) | 104.8 (9.9) [11,181] | 104.6 (9.8) [10,552] | 106.7 (11.3) [629] |
| Body weight (kg), mean (SD) | 80.9 (16.9) [11,172] | 80.6 (16.7) [10,544] | 85.9 (18.6) [628] |
| Obesity (BMI > 30) | 1,307 | 1,167 (89.3) | 140 (10.7) |
| Standing height (cm), mean (SD) | 168.5 (9.2) [11,245] | 168.5 (9.2) [10,605] | 168.9 (9.3) [640] |
| Blood type | |||
| Unknown | 353 | 318 (90.1) | 35 (9.9) |
| AA | 892 | 834 (93.5) | 58 (6.5) |
| AB | 435 | 417 (95.9) | 18 (4.1) |
| AO | 4,074 | 3,858 (94.7) | 216 (5.3) |
| BB | 67 | 62 (92.5) | 5 (7.5) |
| BO | 1,051 | 999 (95.1) | 52 (4.9) |
| OO | 4,373 | 4,117 (94.1) | 256 (5.9) |
| Sleep duration (hrs), mean (SD) | 7.0 (1.4) [11,245] | 7.0 (1.4) [10,605] | 7.2 (1.7) [640] |
| Alcohol intake | |||
| Unknown | 33 | 30 (90.9) | 3 (9.1) |
| Daily or almost daily | 1,662 | 1,562 (94) | 100 (6) |
| Three or four times a week | 2,168 | 2,068 (95.4) | 100 (4.6) |
| Once or twice a week | 3,010 | 2,862 (95.1) | 148 (4.9) |
| One to three times a month | 1,284 | 1,228 (95.6) | 56 (4.4) |
| Special occasions only | 1,398 | 1,300 (93) | 98 (7) |
| Never | 1,690 | 1,555 (92) | 135 (8) |
| Smoking status | |||
| Unknown | 68 | 60 (88.2) | 8 (11.8) |
| Never | 6,195 | 5,915 (95.5) | 280 (4.5) |
| Previous | 3,933 | 3,642 (92.6) | 291 (7.4) |
| Current | 1,049 | 988 (94.2) | 61 (5.8) |
| Gait and mobility issues | 68 | 60 (88.2) | 8 (11.8) |
| Allergy to antibiotics | 1,143 | 1,044 (91.3) | 99 (8.7) |
| Long-term use of anticoagulants | 981 | 821 (83.7) | 160 (16.3) |
| Radiation therapy | 274 | 237 (86.5) | 37 (13.5) |
| Maintenance chemotherapy | 476 | 420 (88.2) | 56 (11.8) |
| Chemotherapy | 256 | 210 (82) | 46 (18) |
| General diseases of the circulatory system | 1,216 | 1,030 (84.7) | 186 (15.3) |
| Chronic ischemic heart disease | 1,388 | 1,200 (86.5) | 188 (13.5) |
| Atrial fibrillation | 1,007 | 834 (82.8) | 173 (17.2) |
| Hypertension | 4,074 | 3,624 (89) | 450 (11) |
| Stroke | 767 | 624 (81.4) | 143 (18.6) |
| General diseases of the respiratory system | 169 | 143 (84.6) | 26 (15.4) |
| Asthma | 1,497 | 1,391 (92.9) | 106 (7.1) |
| Chronic obstructive pulmonary disease | 670 | 537 (80.1) | 133 (19.9) |
| Interstitial lung disease | 107 | 71 (66.4) | 36 (33.6) |
| Respiratory failure | |||
| less than 1 month | 291 | 171 (58.8) | 120 (41.2) |
| between 1 and 12 months | 180 | 117 (65) | 63 (35) |
| more than 12 months | 154 | 109 (70.8) | 45 (29.2) |
| Non-bacterial pneumonia | |||
| less than 1 month | 812 | 542 (66.7) | 270 (33.3) |
| between 1 and 12 months | 512 | 368 (71.9) | 144 (28.1) |
| more than 12 months | 624 | 508 (81.4) | 116 (18.6) |
| Bacterial pneumonia | |||
| less than 1 month | 734 | 485 (66.1) | 249 (33.9) |
| between 1 and 12 months | 349 | 240 (68.8) | 109 (31.2) |
| more than 12 months | 45 | 38 (84.4) | 7 (15.6) |
| General diseases of the nervous system | 640 | 554 (86.6) | 86 (13.4) |
| Parkinson's disease | 164 | 124 (75.6) | 40 (24.4) |
| MND, MS, or HD | 21 | 18 (85.7) | 3 (14.3) |
| Dementia | 491 | 373 (76) | 118 (24) |
| Haematological Cancer | |||
| less than 12 months | 85 | 52 (61.2) | 33 (38.8) |
| between 12 and 60 months | 95 | 71 (74.7) | 24 (25.3) |
| more than 60 months | 111 | 86 (77.5) | 25 (22.5) |
| Non-haematological Cancer | |||
| less than 12 months | 208 | 180 (86.5) | 28 (13.5) |
| between 12 and 60 months | 590 | 545 (92.4) | 45 (7.6) |
| more than 60 months | 908 | 834 (91.9) | 74 (8.1) |
| Diabetes (Type 1) | 143 | 110 (76.9) | 33 (23.1) |
| Diabetes (Type 2) | 1,416 | 1,204 (85) | 212 (15) |
| Osteoarthritis | 2,625 | 2,394 (91.2) | 231 (8.8) |
| Depression and anxiety disorder | 1,404 | 1,271 (90.5) | 133 (9.5) |
| Rheumatoid arthritis | 317 | 268 (84.5) | 49 (15.5) |
| Anemia | 1,260 | 1,067 (84.7) | 193 (15.3) |
| Urinary tract infection | |||
| less than 1 month | 96 | 72 (75) | 24 (25) |
| between 1 and 12 months | 171 | 136 (79.5) | 35 (20.5) |
| more than 12 months | 875 | 730 (83.4) | 145 (16.6) |
| Acute kidney failure | |||
| less than 1 month | 262 | 164 (62.6) | 98 (37.4) |
| between 1 and 12 months | 288 | 199 (69.1) | 89 (30.9) |
| more than 12 months | 443 | 331 (74.7) | 112 (25.3) |
| Any bacterial infection | |||
| less than 1 month | 169 | 110 (65.1) | 59 (34.9) |
| between 1 and 12 months | 209 | 145 (69.4) | 64 (30.6) |
| more than 12 months | 484 | 395 (81.6) | 89 (18.4) |
| Diverticulum | 1,657 | 1,507 (90.9) | 150 (9.1) |
| Haemorrhoids | 1,120 | 1,065 (95.1) | 55 (4.9) |
| Irritable bowel syndrome | 399 | 368 (92.2) | 31 (7.8) |
| Gastroenteritis | |||
| less than 1 month | 161 | 135 (83.9) | 26 (16.1) |
| between 1 and 12 months | 157 | 133 (84.7) | 24 (15.3) |
| more than 12 months | 1,700 | 1,546 (90.9) | 154 (9.1) |
| Joint pain | 1,156 | 1,035 (89.5) | 121 (10.5) |
| Delirium | 250 | 175 (70) | 75 (30) |
| Hematemesis | 563 | 512 (90.9) | 51 (9.1) |
| Syncope and collapse | 19 | 17 (89.5) | 2 (10.5) |
| Dyspnea | 282 | 246 (87.2) | 36 (12.8) |
| Cough | 70 | 60 (85.7) | 10 (14.3) |
| Myalgia | 248 | 221 (89.1) | 27 (10.9) |
| Nausea and vomiting | 38 | 29 (76.3) | 9 (23.7) |
| Chest pain | 831 | 757 (91.1) | 74 (8.9) |
| Hematuria | 42 | 35 (83.3) | 7 (16.7) |
| Malaise and fatigue | 49 | 41 (83.7) | 8 (16.3) |
| Hypotension | 342 | 266 (77.8) | 76 (22.2) |
| Diastolic blood pressure, mean (SD) | 77.9 (12.2) [123] | 77.2 (10.9) [104] | 81.9 (17.4) [19] |
| Systolic blood pressure, mean (SD) | 129.3 (19.2) [124] | 128.2 (17.6) [104] | 135.1 (25.7) [20] |
| Heart rate, mean (SD) | 84.7 (17.5) [80] | 84.0 (16.9) [71] | 90.9 (22.0) [9] |
| Body temperature, mean (SD) * | 37.5 (1.2) [41] | 37.7 (1.1) [37] | 36.1 (0.9) [4] |
| Oxygen saturation, mean (SD) * | 94.7 (3.3) [20] | 94.4 (3.6) [16] | 95.8 (1.5) [4] |
| Respiratory rate, mean (SD) * | 24.1 (7.4) [18] | 24.8 (8.5) [11] | 22.9 (5.8) [7] |
Figure 1Workflow for model development and feature selection. (A) Conceptual diagram of the data ingestion pipeline and analysis methods. To combine databases, several data pre-processing steps were carried out, including: sanitisation (eliminating redacted records and nuanced entries); normalization (scaling values to ensure fitting with a reasonable range for further processing); time filtering; duration calculation (computing the time interval between testing positive and mortality); missing value substitution (replacing missing values or records with the mean value of the UK Biobank database); augmentation (bringing all data for each subject into a single unified record); and one-hot-encoding (codifying the presence of a pre-existing condition or symptom into a binary sequence for each subject). This data ingestion process standardized the input features and attributes for all subjects in this study regardless of their unique and variable conditions, symptoms, vital signs, and records. (B) Illustration of the data-driven and clinically reviewed feature refinement process. (C) Schematic representation of the leave-one-out cross-validation method for feature selection and model validation. Each sample is systematically left out in each fold (purple). Prediction error estimates are based on left out samples. AUC = area under the curve; GP = general practice; LOO = Leave-One-Out; ROC = receiver operating characteristic.
Figure 2Model performance evaluation. (A) the receiver operating characteristic (ROC) curve comparison shown for our Random Forest (RF) and Cox models against QCOVID. (B) the F-β score generated at β = 1 (F1-score in bold), β = [ 0.5, 2, 3, 5], shown in decreasing size dashed line. AUC = area under the curve. Both the ROC and F-β score curves show the performance at various thresholds (i.e. operation points). Threshold value may be dependent on the application of the model. For example, in clinical circumstances requiring low false negatives, the threshold would be optimised for recall, though this would also correspond to higher numbers of false positives.
Figure 3Plot of Cox model coefficients of COVID-19 mortality in UK Biobank cohort. Values show HR ± 95%CI. AKF = acute kidney failure, MND = motor neurone disease, MS = multiple sclerosis, HD = Huntington’s disease, HR = hazard ratio, CI = confidence interval.