| Literature DB >> 34591027 |
Kenneth Chi-Yin Wong1, Yong Xiang1, Liangying Yin1, Hon-Cheong So1,2,3,4,5,6,7.
Abstract
BACKGROUND: COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance.Entities:
Keywords: COVID-19; biobank; machine learning; medical informatics; pandemic; prediction; prediction models; public health; risk factors
Mesh:
Year: 2021 PMID: 34591027 PMCID: PMC8485986 DOI: 10.2196/29544
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
The four sets of analysis performed and predictive performances (full model and lite model).
| Cohort | Group 1 | Group 2 | n (group 1) | n (group 2) | Area under the curvea (%) | 95% CI (%) | ||
| Full | Lite | Full | Lite | |||||
| A | Hospitalized or fatal cases | Nonhospitalized cases | 2386 | 5460 | 72.3 | 71.6 | 71.1-73.6 | 70.3-72.9 |
| B | Fatal cases | All other COVID-19 cases | 477 | 7369 | 81.4 | 81.8 | 79.1-83.8 | 79.4-84.2 |
| C | Hospitalized or fatal cases | UK Biobank patients without a COVID-19 diagnosis or tested negative | 2386 | 465,728 | 69.6 | 69.6 | 68.4-70.8 | 68.4-70.7 |
| D | Fatal cases | UK Biobank patients without a COVID-19 diagnosis or tested negative | 477 | 465,728 | 82.5 | 83.0 | 80.2-84.8 | 80.8-85.3 |
aAUC was taken from the average of 5 folds of cross-validation.
Relative risk (RR) comparing subjects in the top and bottom k% of predicted risks and proportion of cases explained by those at top k% of predicted risk.
| Full model | Lite model | |||||||||
|
| Risk in top k%a,b | Risk in bottom k% | RR | Proportion of cases explained by top k% | Risk in top k%a,b | Risk in bottom k% | RR | Proportion of cases explained by top k% | ||
|
|
|
|
|
|
|
|
|
| ||
|
| 5 | 0.676 | 0.148 | 4.56 | 0.112 | 0.691 | 0.158 | 4.37 | 0.113 | |
|
| 10 | 0.654 | 0.138 | 4.74 | 0.216 | 0.644 | 0.157 | 4.10 | 0.211 | |
|
| 20 | 0.579 | 0.145 | 4.00 | 0.382 | 0.581 | 0.153 | 3.79 | 0.381 | |
|
| 30 | 0.540 | 0.148 | 3.65 | 0.533 | 0.533 | 0.152 | 3.50 | 0.526 | |
|
| 40 | 0.489 | 0.152 | 3.20 | 0.644 | 0.479 | 0.158 | 3.03 | 0.630 | |
|
| 50 | 0.443 | 0.166 | 2.67 | 0.730 | 0.439 | 0.170 | 2.59 | 0.720 | |
|
|
|
|
|
|
|
|
|
| ||
|
| 5 | 0.214 | 0.000 | Infinity | 0.174 | 0.212 | 0.003 | 84.27 | 0.174 | |
|
| 10 | 0.200 | 0.001 | 158.20 | 0.327 | 0.216 | 0.008 | 28.38 | 0.352 | |
|
| 20 | 0.171 | 0.008 | 22.42 | 0.562 | 0.188 | 0.008 | 24.59 | 0.618 | |
|
| 30 | 0.148 | 0.009 | 16.57 | 0.727 | 0.155 | 0.008 | 19.21 | 0.763 | |
|
| 40 | 0.127 | 0.009 | 14.21 | 0.830 | 0.131 | 0.009 | 14.23 | 0.866 | |
|
| 50 | 0.111 | 0.010 | 10.94 | 0.916 | 0.111 | 0.011 | 10.37 | 0.912 | |
|
|
|
|
|
|
|
|
|
| ||
|
| 5 | 0.0201 | 0.0017 | 11.76 | 0.197 | 0.0210 | 0.0013 | 15.88 | 0.207 | |
|
| 10 | 0.0149 | 0.0021 | 6.98 | 0.293 | 0.0158 | 0.0012 | 12.95 | 0.310 | |
|
| 20 | 0.0109 | 0.0023 | 4.67 | 0.427 | 0.0118 | 0.0021 | 5.71 | 0.462 | |
|
| 30 | 0.0090 | 0.0030 | 2.99 | 0.528 | 0.0097 | 0.0027 | 3.57 | 0.573 | |
|
| 40 | 0.0075 | 0.0033 | 2.27 | 0.590 | 0.0084 | 0.0026 | 3.20 | 0.656 | |
|
| 50 | 0.0069 | 0.0033 | 2.09 | 0.678 | 0.0074 | 0.0028 | 2.63 | 0.725 | |
|
|
|
|
|
|
|
|
|
| ||
|
| 5 | 0.0067 | 0.00000 | Infinity | 0.325 | 0.0068 | 0.00000 | Infinity | 0.333 | |
|
| 10 | 0.0047 | 0.00002 | 218.02 | 0.457 | 0.0047 | 0.00006 | 73.67 | 0.463 | |
|
| 20 | 0.0033 | 0.00011 | 30.30 | 0.635 | 0.0032 | 0.00009 | 36.75 | 0.616 | |
|
| 30 | 0.0026 | 0.00014 | 18.74 | 0.746 | 0.0027 | 0.00011 | 23.38 | 0.784 | |
|
| 40 | 0.0021 | 0.00016 | 13.17 | 0.828 | 0.0022 | 0.00013 | 16.68 | 0.874 | |
|
| 50 | 0.0018 | 0.00022 | 8.35 | 0.893 | 0.0019 | 0.00015 | 13.03 | 0.929 | |
a‘Top k%’ refers to top k% of predicted probability of outcome by XGboost.
b‘Risk in top k%’ refers to the actual probability of the outcome (severe disease or fatality) within the patients belonging to the highest k% of predicted risks.
Figure 1Results of sparse k-means clustering based on Shapley values (ShapVal) in cohorts A (hospitalized cases) and B (fatal cases). The y-axis indicates the ShapVal and only those risk Factors with significant differences (P<.05 in t-test or ANOVA) across clusters were shown on the x-axis. AF: atrial fibrillation; CAD: coronary artery disease; Hb: hemoglobin; HbA1c: hemoglobin A1c; HDL: high-density lipoprotein; LDL: low-density lipoprotein; RBC: red blood cell; RF: risk factor; SHBG: sex hormone binding globulin; T1DM: type 1 diabetes; T2DM: type 2 diabetes.
Figure 2Shapley value dependence plots of the top 15 risk factors ranked by mean abs(shapley value) (full model) for cohorts A, B, C, and D, respectively. Shapley value (y-axis) is computed on a log-odds scale. Every unit increase of ShapVal corresponds to an odds ratio (OR) of exp(1)=2.72 compared with the baseline. Positive ShapVal indicates increase in the odds of the outcome and vice versa. CAD: coronary artery disease; COPD: chronic obstructive pulmonary disease; HDL: high-density lipoprotein; RBC: red blood cell; T2DM: type 2 diabetes mellitus.
Figure 3ShapVal dependence plots of the top 6 risk factors ranked by mean abs(shapley value) (lite model) for cohorts A, B, C, and D, respectively. T2DM: type 2 diabetes mellitus.
Top 10 risk factors ranked by mean absolute Shapley value for cohorts A, B, C, and D (full model).
| Risk factor | ShapVal | ||||
|
| |||||
|
| Age | 0.442 | .002 | ||
|
| Treatments taken count | 0.093 | .002 | ||
|
| Cystatin C | 0.088 | .002 | ||
|
| Waist-to-hip ratio | 0.085 | .002 | ||
|
| Townsend deprivation index | 0.059 | .004 | ||
|
| HbA1ca | 0.056 | .002 | ||
|
| Pulse rate | 0.048 | .002 | ||
|
| Hypertension | 0.048 | .002 | ||
|
| Apolipoprotein A | 0.027 | .016 | ||
|
| HDLb cholesterol | 0.026 | .016 | ||
|
| |||||
|
| Age | 0.708 | .002 | ||
|
| Testosterone | 0.069 | .002 | ||
|
| Treatments taken count | 0.048 | .002 | ||
|
| Waist circumference | 0.035 | .002 | ||
|
| RBCc distribution width | 0.027 | .002 | ||
|
| Cystatin C | 0.024 | .002 | ||
|
| Townsend deprivation index | 0.023 | .002 | ||
|
| Pulse rate | 0.019 | .004 | ||
|
| Systolic blood pressure | 0.016 | .002 | ||
|
| Lymphocyte percentage | 0.015 | .004 | ||
|
| |||||
|
| Waist-to-hip ratio | 0.113 | .002 | ||
|
| Townsend deprivation index | 0.096 | .002 | ||
|
| Age | 0.088 | .002 | ||
|
| Treatments taken count | 0.063 | .002 | ||
|
| Waist circumference | 0.044 | .002 | ||
|
| Self-report: noncancer count | 0.043 | .002 | ||
|
| Hypertension | 0.036 | .002 | ||
|
| Cystatin C | 0.030 | .024 | ||
|
| T2DM | 0.030 | .002 | ||
|
| Apolipoprotein A | 0.024 | .052 | ||
|
| |||||
|
| Age | 0.519 | .002 | ||
|
| Townsend deprivation index | 0.136 | .002 | ||
|
| Waist-to-hip ratio | 0.131 | .002 | ||
|
| Treatments taken count | 0.115 | .002 | ||
|
| Waist circumference | 0.110 | .002 | ||
|
| Cystatin C | 0.096 | .002 | ||
|
| Testosterone | 0.086 | .002 | ||
|
| Hypertension | 0.061 | .002 | ||
|
| RBC distribution width | 0.046 | .002 | ||
|
| Pulse rate | 0.036 | .006 | ||
aHbA1c: hemoglobin A1c.
bHDL: high-density lipoprotein.
cRBC: red blood cell.
dT2DM: type 2 diabetes mellitus.
Top 10 risk factors ranked by P-value, listing only factors which are not yet included in for cohorts A, B, C, and D (full model).
| Risk factor | ShapVal |
| ||
|
| ||||
|
| T2DMa | .004 | 0.010 |
|
|
| Self-report: noncancer | .008 | 0.018 |
|
|
| Depression | .008 | 0.004 |
|
|
| CADb | .016 | 0.002 |
|
|
| Cancer diagnosed by doctor | .026 | 0.000 |
|
|
| Alcohol intake (occasions) | .028 | 0.002 |
|
|
| AFc | .028 | 0.000 |
|
|
| Smoking (current) | .036 | 0.000 |
|
|
| γ-glutamyltransferase | .046 | 0.021 |
|
|
| WBCd count | .046 | 0.014 |
|
|
|
| |||
|
| BMI | .002 | 0.015 |
|
|
| Glucose | .002 | 0.015 |
|
|
| HbA1ce | .002 | 0.014 |
|
|
| Weight | .002 | 0.010 |
|
|
| Mean platelet volume | .002 | 0.009 |
|
|
| T2DM | .002 | 0.007 |
|
|
| Sleep duration | .002 | 0.006 |
|
|
| T1DMf | .002 | 0.003 |
|
|
| Cognitive impairment | .002 | 0.003 |
|
|
| CAD | .002 | 0.003 |
|
|
|
| |||
|
| COPDg | .002 | 0.015 |
|
|
| Depression | .002 | 0.009 |
|
|
| Cognitive impairment | .002 | 0.007 |
|
|
| CAD | .004 | 0.017 |
|
|
| Ethnic (Asian/Asian British) | .004 | 0.007 |
|
|
| Heart failure | .004 | 0.007 |
|
|
| AF | .004 | 0.006 |
|
|
| Smoking (previous) | .006 | 0.015 |
|
|
| Stroke | .012 | 0.001 |
|
|
| Ethnic (Black/Black British) | .020 | 0.001 |
|
|
|
|
|
| |
|
| T2DM | .002 | 0.026 |
|
|
| Cognitive impairment | .002 | 0.024 |
|
|
| COPD | .002 | 0.021 |
|
|
| AF | .002 | 0.016 |
|
|
| Heart failure | .002 | 0.007 |
|
|
| CAD | .002 | 0.008 |
|
|
| Ethnic (Black/Black British) | .004 | 0.004 |
|
|
| Stroke | .004 | 0.002 |
|
|
| Alcohol drinker (current) | .004 | 0.001 |
|
|
| Smoking (previous) | .006 | 0.003 |
|
aT2DM: type 2 diabetes mellitus.
bCAD: coronary artery disease.
cAF: atrial fibrillation.
dWBC: white blood cell.
eHbA1c: hemoglobin A1c.
fT1DM: type 1 diabetes mellitus.
gCOPD: chronic obstructive pulmonary disease.
Top 5 risk factors ranked by mean absolute Shapley value for cohorts A, B, C, and D (lite model).
| Risk factor | ShapVal | ||
|
| |||
|
| Age | 0.496 | .002 |
|
| Treatments taken count | 0.121 | .002 |
|
| Waist circumference | 0.085 | .002 |
|
| Male | 0.058 | .002 |
|
| Self-report: noncancer count | 0.054 | .004 |
|
| |||
|
| Age | 0.721 | .002 |
|
| Treatments taken count | 0.079 | .014 |
|
| Waist circumference | 0.071 | .040 |
|
| Male | 0.048 | .010 |
|
| BMI | 0.034 | .242 |
|
| |||
|
| Waist circumference | 0.153 | .002 |
|
| Age | 0.120 | .002 |
|
| Treatments taken count | 0.102 | .002 |
|
| Self-report: noncancer count | 0.064 | .002 |
|
| T2DMa | 0.050 | .002 |
|
| |||
|
| Age | 0.056 | .002 |
|
| Waist circumference | 0.248 | .002 |
|
| Treatments taken count | 0.154 | .002 |
|
| Male | 0.098 | .002 |
|
| BMI | 0.043 | .036 |
aT2DM: type 2 diabetes mellitus.
Top 5 risk factors ranked by P value, listing only factors which are not yet included in for cohorts A, B, C, and D (lite model).
| Risk factor | ShapVal | |||
|
| ||||
|
| T2DMa | .002 | 0.047 | |
|
| Smoking (current) | .004 | 0.026 | |
|
| Depression | .016 | 0.015 | |
|
| Alcohol drinker (current) | .020 | 0.013 | |
|
| CADb | .022 | 0.010 | |
|
| ||||
|
| T2DM | .006 | 0.027 | |
|
| Cognitive impairment | .006 | 0.015 | |
|
| T1DMc | .020 | 0.009 | |
|
| Bipolar | .024 | 0.006 | |
|
| AFd | .036 | 0.011 | |
|
| ||||
|
| COPDe | .002 | 0.024 | |
|
| Ethnic (Asian/British Asian) | .002 | 0.016 | |
|
| Cognitive impairment | .002 | 0.008 | |
|
| Male | .004 | 0.049 | |
|
| CAD | .004 | 0.023 | |
|
| ||||
|
| T2DM | .002 | 0.043 | |
|
| COPD | .002 | 0.039 | |
|
| Cognitive impairment | .002 | 0.029 | |
|
| AF | .002 | 0.024 | |
|
| Ethnic (Black/Black British) | .002 | 0.016 | |
aT2DM: type 2 diabetes mellitus.
bCAD: coronary artery disease.
cT1DM: type 1 diabetes mellitus.
dAF: atrial fibrillation.
eCOPD: chronic obstructive pulmonary disease.
Top interacting pairs of variables ranked by ShapVal (full model).
| Risk factor 1 | Risk factor 2 | Value | |
|
| |||
|
| Waist-to-hip ratio | Age | 150 |
|
| Treatments taken count | Age | 149 |
|
| HDLa cholesterol | Age | 86 |
|
| Age | Hypertension | 85 |
|
| Cystatin C | Age | 84 |
|
| |||
|
| Testosterone | Age | 195 |
|
| Waist circumference | Age | 95 |
|
| BMI | Age | 82 |
|
| Treatments taken count | Age | 63 |
|
| Pulse rate | Age | 57 |
|
| |||
|
| Waist-to-hip ratio | Age | 709 |
|
| Waist-to-hip ratio | Treatments taken count | 494 |
|
| Townsend deprivation index | Treatments taken count | 481 |
|
| Townsend deprivation index | Waist-to-hip ratio | 450 |
|
| Albumin | Waist-to-hip ratio | 407 |
|
| |||
|
| Waist circumference | Age | 859 |
|
| Testosterone | Age | 780 |
|
| Townsend deprivation index | Age | 725 |
|
| Waist-to-hip ratio | Age | 603 |
|
| Age | Hypertension | 585 |
aHDL: high-density lipoprotein.
Figure 4ShapVal interaction plots of the full model for the top 4 interacting pairs of cohorts A, B, C, and D, respectively.