| Literature DB >> 36032983 |
Elaine Hill, Hemal Mehta, Suchetha Sharma, Klint Mane, Catherine Xie, Emily Cathey, Johanna Loomba, Seth Russell, Heidi Spratt, Peter E DeWitt, Nariman Ammar, Charisse Madlock-Brown, Donald Brown, Julie A McMurry, Christopher G Chute, Melissa A Haendel, Richard Moffitt, Emily R Pfaff, Tellen D Bennett.
Abstract
Background: More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID). Objective: To identify risk factors associated with PASC/long-COVID. Design: Retrospective case-control study. Setting: 31 health systems in the United States from the National COVID Cohort Collaborative (N3C). Patients: 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system. Measurements: Risk factors included demographics, comorbidities, and treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC.Entities:
Year: 2022 PMID: 36032983 PMCID: PMC9413724 DOI: 10.1101/2022.08.15.22278603
Source DB: PubMed Journal: medRxiv
Cohort Characteristics for PASC Cases defined by U09.9 or long-COVID clinic visit
| PASC (N=8325)[ | Method 1 Unrestricted controls (N=41625) | Method 2 Restricted controls (N=41610) | Method 3 Most restricted controls (N=41610) | |
|---|---|---|---|---|
|
| ||||
| Age (Mean[SD]) | 52.3 (15.5) | 47.5 (18.4) | 46.8 (17.8) | 48.1 (18.2) |
| Sex | ||||
| Female | 5225 (62.8%) | 23090 (55.5%) | 24112 (57.9%) | 24530 (59.0%) |
| Male | 3096 (37.2%) | 18481 (44.4%) | 17482 (42.0%) | 17051 (41.0%) |
| Race/ethnicity | ||||
| White non-Hispanic (NH) | 5707 (68.6%) | 26490 (63.6%) | 27818 (66.9%) | 27654 (66.5%) |
| Hispanic | 835 (10.0%) | 4851 (11.7%) | 4430 (10.6%) | 4452 (10.7%) |
| Black NH | 1235 (14.8%) | 6244 (15.0%) | 6455 (15.5%) | 6538 (15.7%) |
| Asian NH | 136 (1.6%) | 883 (2.1%) | 921 (2.2%) | 953 (2.3%) |
| Other race NH | 54 (0.6%) | 314 (0.8%) | 267 (0.6%) | 292 (0.7%) |
|
| ||||
| Chronic Lung Disease | 2404 (28.9%) | 5717 (13.7%) | 6956 (16.7%) | 6816 (16.4%) |
| Complicated Diabetes | 1210 (14.5%) | 3582 (8.6%) | 4377 (10.5%) | 4336 (10.4%) |
| Congestive Heart Failure | 573 (6.9%) | 1530 (3.7%) | 2007 (4.8%) | 1910 (4.6%) |
| Hypertension | 3365 (40.4%) | 10894 (26.2%) | 13528 (32.5%) | 13698 (32.9%) |
| Kidney Disease | 1262 (15.2%) | 3616 (8.7%) | 4503 (10.8%) | 4388 (10.5%) |
| Obesity | 4691 (56.4%) | 16575 (39.8%) | 19588 (47.1%) | 19430 (46.7%) |
| Uncomplicated Diabetes | 1708 (20.5%) | 5547 (13.3%) | 6642 (16.0%) | 6751 (16.2%) |
|
| ||||
| COVID-associated Hospitalization | 3100 (37.3%) | 6165 (14.8%) | 6306 (15.2%) | 6162 (14.8%) |
| COVID-associated ED Visit | 1564 (18.8%) | 6468 (15.5%) | 6060 (14.6%) | 5865 (14.1%) |
| Hospitalization stay (Mean [SD]) | 5.6 (15.3) | 1.1 (6.1) | 1.1 (5.3) | 1.0 (4.8) |
| COVID treatment | ||||
| Corticosteroids[ | 2025 (24.3%) | 3054 (7.3%) | 2991 (7.2%) | 2807 (6.7%) |
| Remdesivir[ | 1409 (16.9%) | 1913 (4.6%) | 1794 (4.3%) | 1631 (3.9%) |
| Vasopressors[ | 601 (7.2%) | 682 (1.6%) | 703 (1.7%) | 720 (1.7%) |
| ECMO[ | 66 (0.8%) | 35 (0.1%) | 24 (0.1%) | <20 |
| Mechanical Ventilation[ | 615 (7.4%) | 450 (1.1%) | 398 (1.0%) | 404 (1.0%) |
| AKI during COVID-associated Hospitalization | 664 (8.0%) | 1016 (2.4%) | 1084 (2.6%) | 1026 (2.5%) |
| Sepsis during COVID-associated Hospitalization | 614 (7.4%) | 823 (2.0%) | 835 (2.0%) | 770 (1.9%) |
Only captured for individuals hospitalized for COVID-19
The restricted samples (Methods 2 and 3) lose 3 cases due to not having sufficient controls (<5 available controls). Comorbidities shown in this Table are selected. A comprehensive stratification by comorbidities is in the Supplement.
Comparison of Feature Importance for PASC Models defined by U09.9 or long-COVID clinic visit and unrestricted controls (Top 15 positive and negative features)
| Features | Logistic Regression | XGBoost | Random Forest | Mean Rank | |
|---|---|---|---|---|---|
| Hospitalization Extended Stay (31+ days) | 2 | 1 | 4 | 2.33 | FEATURES ASSOCIATED WITH INCREASED RISK |
| COVID-associated Hospitalization | 4 | 4 | 1 | 3.00 | |
| Age 40–49 | 6 | 8 | 7 | 7.00 | |
| Age 50–59 | 5 | 14 | 6 | 8.33 | |
| Hospitalization Long Stay (8–30 days) | 8 | 2 | 18 | 9.33 | |
| Female | 22 | 13 | 2 | 12.33 | |
| Depression | 19 | 10 | 15 | 14.67 | |
| Age 60–69 | 7 | 28 | 10 | 15.00 | |
| COVID Treatment: Mechanical Ventilation | 21 | 5 | 24 | 16.67 | |
| COVID Treatment: Remdesivir | 26 | 6 | 20 | 17.33 | |
| Chronic Lung Disease | 16 | 38 | 9 | 21.00 | |
| COVID-associated ED Visit | 36 | 7 | 25 | 22.67 | |
| Obesity | 37 | 15 | 27 | 26.33 | |
| Systemic Corticosteroids | 24 | 34 | 22 | 26.67 | |
| Malignant Cancer | 54 | 21 | 12 | 29.00 | |
| COVID Diagnosis during COVID-associated Hospitalization | 11 | 3 | 5 | 6.33 | FEATURES ASSOC. WITH DECREASED RISK |
| Age 18–29 | ref. | 9 | 11 | 10.00 | |
| Male | ref. | 19 | 3 | 11.00 | |
| Age 70–79 | 10 | 35 | 16 | 20.33 | |
| Substance Abuse | 14 | 16 | 37 | 22.33 | |
| Tobacco Smoker | 23 | 11 | 34 | 22.67 | |
| Cardiomyopathies | 32 | 22 | 17 | 23.67 | |
| Age 30–39 | 12 | 53 | 8 | 24.33 | |
| Metastatic Solid Tumor Cancers | 13 | 18 | 45 | 25.33 | |
| Psychosis | 9 | 23 | 44 | 25.33 | |
| Uncomplicated Diabetes | 45 | 20 | 13 | 26.00 | |
| Myocardial Infarction | 33 | 32 | 19 | 28.00 | |
| Age 80–89 | 20 | 12 | 56 | 29.33 | |
| Dementia | 38 | 29 | 26 | 31.00 | |
| Race/Ethnicity: Black NH | 28 | 42 | 29 | 33.00 |
This Table shows the top 15 features associated with increased risk and top 15 features associated with decreased risk. Complete models are shown in the Supplement. Unrestricted sample, U09.9 or long-COVID clinic visit target (see text). Grouped by median direction (increased/decreased) and ordered by mean rank. Model rank calculated based on sklearn.inspection.permutation_importance() (XGB/RF) or absolute ordered size of coefficient (LR). Mean rank is based on the rank of each model that had the variable in the model. Mint color indicates features associated with increased risk. Salmon color indicates features associated with decreased risk. An uncolored cell indicates that that feature was the reference group for the logistic regression model.
Figure 2.Forest Plots from Logistic Regression for Unrestricted Controls with SDoH (PASC defined as U09.9 or long-COVID Clinic Visit)
Comparison of Feature Importance for PASC Models defined by U09.9 or long-COVID clinic visit and unrestricted controls (Top 15 positive and negative features) with SDoH variables included
| features | Logistic Regression SDOH | XGBoost SDOH | Random Forest SDOH | Mean Rank | |
|---|---|---|---|---|---|
| Hospitalization Extended Stay (31+ days) | 2 | 1 | 19 | 7.33 | FEATURES ASSOCIATED WITH INCREASED RISK |
| COVID-associated Hospitalization | 4 | 2 | 22 | 9.33 | |
| Age 40–49 | 6 | 16 | 8 | 10.00 | |
| Age 50–59 | 5 | 11 | 15 | 10.33 | |
| Hospitalization Long Stay (8–30 days) | 8 | 4 | 23 | 11.67 | |
| Households with Income below poverty: low (<11%) | ref. | 19 | 5 | 12.00 | |
| MDs per 1000 residents: High (>3.61%) | 30 | 8 | 6 | 14.67 | |
| COVID Treatment: Mechanical Ventilation | 22 | 5 | 27 | 18.00 | |
| Depression | 19 | 17 | 31 | 22.33 | |
| Female | 24 | 42 | 1 | 22.33 | |
| Age 60–69 | 7 | 49 | 17 | 24.33 | |
| MDs per 1000 residents: medium (1.91–3.61%) | 36 | 34 | 4 | 24.67 | |
| Obesity | 39 | 13 | 30 | 27.33 | |
| Chronic Lung Disease | 17 | 53 | 16 | 28.67 | |
| COVID-associated ED Visit | 42 | 9 | 40 | 30.33 | |
| MDs per 1000 residents: Low (<1.91%) | ref. | 7 | 10 | 8.50 | FEATURES ASSOC. WITH DECREASED RISK |
| College Degree low (<19%) | ref. | 18 | 7 | 12.50 | |
| COVID Diagnosis during COVID-associated Hospitalization | 11 | 3 | 26 | 13.33 | |
| Age 18–29 | ref. | 10 | 20 | 15.00 | |
| Male | ref. | 33 | 2 | 17.50 | |
| Age 30–39 | 12 | 38 | 11 | 20.33 | |
| College Degree medium (19–25%) | 50 | 12 | 3 | 21.67 | |
| Public health Insurance for ages 19–64: Low (<13%) | ref. | 31 | 13 | 22.00 | |
| Substance Abuse | 15 | 15 | 38 | 22.67 | |
| Psychosis | 9 | 26 | 45 | 26.67 | |
| Tobacco Smoker | 26 | 14 | 40 | 26.67 | |
| Age 80–89 | 20 | 25 | 43 | 29.33 | |
| Households with Income below poverty: high (>15%) | 57 | 21 | 12 | 30.00 | |
| Metastatic Solid Tumor Cancers | 18 | 24 | 51 | 31.00 | |
| Age 70–79 | 10 | 60 | 24 | 31.33 |
This Table shows the Top 15 features associated with increased risk and top 15 features associated with decreased risk. Complete models are shown in the Supplement. Not restricted sample, U09.9 or long-COVID clinic visit target (see text). Grouped by median direction (increased/decreased) and ordered by mean rank. Model rank calculated based on sklearn.inspection.permutation_importance() (XGB/RF) or absolute ordered size of coefficient (LR). Mean rank is based on the rank of each model that had the variable in the model. Mint color indicates features associated with increased risk. Salmon color indicates features associated with decreased risk. An uncolored cell indicates that that feature was the reference group for the logistic regression model.