| Literature DB >> 33681695 |
Yijun Shao1,2, Ali Ahmed1,2,3, Angelike P Liappis1,2, Charles Faselis1,2, Stuart J Nelson2, Qing Zeng-Treitler1,2.
Abstract
This study was to understand the impacts of three key demographic variables, age, gender, and race, on the adverse outcome of all-cause hospitalization or all-cause mortality in patients with COVID-19, using a deep neural network (DNN) analysis. We created a cohort of Veterans who were tested positive for COVID-19, extracted data on age, gender, and race, and clinical characteristics from their electronic health records, and trained a DNN model for predicting the adverse outcome. Then, we analyzed the association of the demographic variables with the risks of the adverse outcome using the impact scores and interaction scores for explaining DNN models. The results showed that, on average, older age and African American race were associated with higher risks while female gender was associated with lower risks. However, individual-level impact scores of age showed that age was a more impactful risk factor in younger patients and in older patients with fewer comorbidities. The individual-level impact scores of gender and race variables had a wide span covering both positive and negative values. The interaction scores between the demographic variables showed that the interaction effects were minimal compared to the impact scores associated with them. In conclusion, the DNN model is able to capture the non-linear relationship between the risk factors and the adverse outcome, and the impact scores and interaction scores can help explain the complicated non-linear effects between the demographic variables and the risk of the outcome. © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2021.Entities:
Keywords: Artificial intelligence; Coronavirus disease; Deep neural network; Explainable AI
Year: 2021 PMID: 33681695 PMCID: PMC7914049 DOI: 10.1007/s41666-021-00093-9
Source DB: PubMed Journal: J Healthc Inform Res ISSN: 2509-498X
Summary of the demographic characteristics
| Characteristics | Cases ( | Controls ( | Overall ( |
|---|---|---|---|
| Age | |||
| Mean±SD | 68.8±13.6 | 56.6±15.8 | 61.9±16.1 |
| Median (Q1, Q3) | 70.5 (60.6, 76.3) | 57.5 (44.3, 67.7) | 63.0 (51.1, 72.8) |
| Gender | |||
| Female | 122 (5.2%) | 382 (12.5%) | 504 (9.3%) |
| Male | 2233 (94.8%) | 2670 (87.5%) | 4903 (90.7%) |
| Race | |||
| AA | 1236 (52.5%) | 1342 (44.0%) | 2578 (47.7%) |
| White | 930 (39.5%) | 1377 (45.1%) | 2307 (42.7%) |
| Other | 48 (2.0%) | 85 (2.8%) | 133 (2.4%) |
| Unknown | 141 (6.0%) | 248 (8.1%) | 389 (7.2%) |
Examples of some prevalent diagnoses and their prevalences in the cohort
| ICD code | Description | Cases ( | Controls ( | Overall ( |
|---|---|---|---|---|
| I10. | Essential (primary) hypertension | 1624 (69%) | 1338 (43.8%) | 2962 (54.8%) |
| E78.5 | Hyperlipidemia, unspecified | 1078 (45.8%) | 966 (31.7%) | 2044 (37.8%) |
| E11.9 | Type 2 diabetes mellitus without complications | 892 (37.9%) | 670 (22%) | 1562 (28.9%) |
| M54.5 | Low back pain | 605 (25.7%) | 839 (27.5%) | 1444 (26.7%) |
| G47.33 | Obstructive sleep apnea | 568 (24.1%) | 572 (18.7%) | 1140 (21.1%) |
| K21.9 | Gastro-esophageal reflux disease without esophagitis | 543 (23.1%) | 500 (16.4%) | 1043 (19.3%) |
| E66.9 | Obesity, unspecified | 393 (16.7%) | 525 (17.2%) | 918 (17%) |
| F43.12 | Post-traumatic stress disorder, chronic | 328 (13.9%) | 522 (17.1%) | 850 (15.7%) |
Fig. 1Proportion (7-year moving average) of cases by age
Fig. 2The calibration curve of the predicted risk scores of the DNN model
Predictive performance of the DNN and LR models
| Model | AUC | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| DNN | 0.762 | 0.720 | 0.683 | 0.743 |
| LR | 0.732 | 0.669 | 0.686 | 0.659 |
Fig. 3The ROC curve of the DNN model on the test set. The square dot corresponds to the threshold which maximizes the accuracy
Impact scores and log odds ratios of the demographic variables
| Variable | Impact score | Log odds ratio |
|---|---|---|
| age | 0.045 | 0.046 |
| gender_female(vs. male) | −0.108 | −0.370 |
| race_aa(vs. white) | 0.178 | 0.497 |
| race_other(vs. white) | −0.037 | 0.535 |
| race_unknown(vs. white) | −0.002 | 0.469 |
Fig. 4Impacts and impact scores of age through the DNN model. The horizontal dashed line in (b) represents the population-level impact score of age (=0.045) as shown in Table 4
Fig. 5Impacts and impact scores (=log odds ratios) of age through the LR model. Only the 300 randomly selected individuals are used for illustration
Fig. 6Mean ICD count per patient by impact score of age on patients aged ≥60 years
Fig. 7Mean ICD count per patient by age
Fig. 8Prevalence of six common comorbidities by age
Fig. 9Frequency distributions of the individual-level impact scores of the gender and race variables
Interaction scores between the demographic variables
| Variable #1 | Variable #2 | Interaction score |
|---|---|---|
| age | gender_female(vs. male) | −0.00075 |
| age | race_aa(vs. white) | −0.00063 |
| age | race_other(vs. white) | −0.00081 |
| age | race_unknown(vs. white) | 0.00094 |
| gender_female(vs. male) | race_aa(vs. white) | −0.00703 |
| gender_female(vs. male) | race_other(vs. white) | 0.01042 |
| gender_female(vs. male) | race_unknown(vs. white) | −0.00154 |