| Literature DB >> 33758504 |
Björn Ställberg1, Karin Lisspers1, Kjell Larsson2, Christer Janson3, Mario Müller4, Mateusz Łuczko5, Bine Kjøller Bjerregaard6, Gerald Bacher7, Björn Holzhauer7, Pankaj Goyal7, Gunnar Johansson1.
Abstract
PURPOSE: Chronic obstructive pulmonary disease (COPD) exacerbations can negatively impact disease severity, progression, mortality and lead to hospitalizations. We aimed to develop a model that predicts a patient's risk of hospitalization due to severe exacerbations (defined as COPD-related hospitalizations) of COPD, using Swedish patient level data. PATIENTS AND METHODS: Patient level data for 7823 Swedish patients with COPD was collected from electronic medical records (EMRs) and national registries covering healthcare contacts, diagnoses, prescriptions, lab tests, hospitalizations and socioeconomic factors between 2000 and 2013. Models were created using machine-learning methods to predict risk of imminent exacerbation causing patient hospitalization due to COPD within the next 10 days. Exacerbations occurring within this period were considered as one event. Model performance was assessed using the Area under the Precision-Recall Curve (AUPRC). To compare performance with previous similar studies, the Area Under Receiver Operating Curve (AUROC) was also reported. The model with the highest mean cross validation AUPRC was selected as the final model and was in a final step trained on the entire training dataset.Entities:
Keywords: COPD; exacerbation; hospitalization; machine learning
Mesh:
Year: 2021 PMID: 33758504 PMCID: PMC7981164 DOI: 10.2147/COPD.S293099
Source DB: PubMed Journal: Int J Chron Obstruct Pulmon Dis ISSN: 1176-9106
Figure 1Overview of the included primary care centers (sites) from five regions across Sweden; these sites were from five regions in Sweden; Gävleborg (2 sites), Stockholm (3 sites), Uppsala (29 sites), Västmanland (2 sites) and Västra Götaland (16 sites). In total, 52 primary care centers were included.
Figure 2Flowchart represents the patient selection procedure of the study. In the ARCTIC study, there was a total of 18,132 COPD patients. Of these, 7823 patients had their first COPD diagnosis between 2005 and 2013 AND were over 40 years old at index AND were enrolled in the study for more than 365 days AND had available socioeconomic information. These patients fulfilled all inclusion criteria and were eligible for the study.
Figure 3Overall prediction period was defined from June 2006 to October 2013, assuring the patients can be observed across data sources used for the study. Patient specific prediction periods were defined based on events captured in the data sources.
Sociodemographic Characteristics
| No Exacerbationa | Exacerbationa | |
|---|---|---|
| N=5654 (72%) | N=2169 (28%) | |
| Age at Index (Mean years) | 66.5 | 66.9 |
| Female n (%) | 3166 (56) | 1,258 (58) |
| First COPD diagnosis: Inpatient n (%) | 3,223 (57) | 672 (31) |
| First COPD diagnosis: Outpatient n (%) | 961 (17) | 759 (35) |
| First COPD diagnosis: Primary Care n (%) | 1,470 (26) | 737 (34) |
| Work and Finance | ||
| Number of days per year with sick leave benefits (mean) | 7.6 | 12.1 |
| Number of hours with sick leave benefits per year*100 (mean) | 0.08 | 0.10 |
| Income from social transferals, n (%) | 1300 (23) | 542 (25) |
| Income (mean) | 445.1 | 299.8 |
| Income from social security, n (%) | 283 (5) | 130 (6) |
| Employment – Working, n (%) | 961 (17) | 260 (12) |
| Employment – Not working, n (%) | 4354 (77) | 1670 (77) |
| Employment – No Information, n (%) | 339 (6) | 239 (11) |
| Health | ||
| CCI (mean) | 1.2 | 2.0 |
| Any sick leave, n (%) | 170 (3) | 108 (5) |
| Smoking – No Information, n (%) | 4071 (72) | 1366 (63) |
| Smoking – No smoker, n (%) | 396 (7) | 108 (5) |
| Smoking – Ex Smoker, n (%) | 622 (11) | 282 (13) |
| Smoking – Current smoker, n (%) | 565 (10) | 412 (19) |
| Education | ||
| Education – No Information | 339 (6) | 239 (11) |
| Education - Primary School < 9 years | 1583 (28) | 586 (27) |
| Education - Primary School 9 years | 622 (11) | 239 (11) |
| Education – High School 2 years | 1696 (30) | 651 (30) |
| Education – High School 12 years | 509 (9) | 174 (8) |
| Education – Post High School <3 years | 396 (7) | 108 (5) |
| Education – Post High School ≥3 years | 396 (7) | 108 (5) |
| Education – Research Education | 57 (1) | 22 (1) |
Notes: aExacerbations occurring during the observation period (look-back period and prediction period). bSocioeconomic information retrieved every year during the prediction period.
Abbreviation: CCI, Charlson Comorbidity Index.
Model Performance and Best Models by Different Setting
| Model Performance | Setting | AUPRC (95% CI) | AUROC (95% CI) | Recall (95% CI) | Precision (95% CI) |
|---|---|---|---|---|---|
| XGBoost, undersampling | Trainset | 0.17 (0.001) | 0.88 (0.001) | – | – |
| Test set | 0.08 (0.001) | 0.86 (0.001) | 0.16 (0.001) | 0.11 (0.001) | |
| CVS | 0.11 | – | – | – |
Abbreviations: AUPRC, area under the precision-recall curve; AUROC, area under receiver operating curve; CI, confidence interval; CVS, cross validation score; XGBoost, extreme gradient boosting.
Top 20 Most Important Features of Prediction Hospitalization Due to COPD, by Using Machine Learning Models
| Rank | Feature | Importancea |
|---|---|---|
| 1 | Number of severe exacerbations (last 180 days) | 0.33 |
| 2 | Number of severe exacerbations (whole history) – standardized by the number of days | 0.11 |
| 3 | Number of COPD – related contacts (whole history) – standardized by the number of days | 0.066 |
| 4 | Whether first COPD diagnosis was classified as “inpatient” | 0.054 |
| 5 | Charlson Comorbidity Index (CCI) from the year before the prediction | 0.047 |
| 6 | Number of medications from “other” groupb (last 365 days) | 0.019 |
| 7 | Whether first COPD diagnosis was classified as “outpatient” | 0.016 |
| 8 | Number of moderate exacerbations (last 30 days) | 0.012 |
| 9 | Number of prescriptions for Antibiotics (whole history) – standardized by the number of days | 0.010 |
| 10 | Number of severe exacerbations (last 180 days, 1 year before prediction date) | 0.009 |
| 11 | Number of COPD – related contacts (last 180 days) | 0.008 |
| 12 | Number of visits (person not defined) in inpatient care (last 180 days) | 0.007 |
| 13 | Number of diagnoses of ischemic heart diseases in inpatient care (whole history) – standardized by the number of days | 0.007 |
| 14 | Number of severe exacerbations (last 60 days) | 0.006 |
| 15 | Number of prescriptions for COPD Medication (last 365 days) | 0.006 |
| 16 | Number of non-COPD – related contacts (whole history) – standardized by the number of days | 0.006 |
| 17 | Number of diagnoses of respiratory disease in inpatient care, all time (whole history) – standardized by the number of days | 0.005 |
| 18 | Number of diagnoses from “other” group in outpatient care (whole history) – standardized by the number of days | 0.005 |
| 19 | Number of diagnoses from “other” group in inpatient care (last 30 days, 1 year before prediction date) | 0.005 |
| 20 | Number of prescriptions for Oral steroids (last 30 days) | 0.004 |
Notes: aThe value implies the relative contribution of the corresponding feature to the model calculated by taking each feature’s contribution for each tree in the model. bMedications against comorbidities () and respiratory medications () and were divided into main groups and sub-groups. Medications not in the groups are referred to as “other medications”.
Figure 4Relationship between the history of severe exacerbations and probability of hospitalization for severe exacerbations, within 1–10 days. The number of previous severe exacerbations, especially within 180 days before prediction point, drastically increases the probability of having a severe exacerbation within the next 10 days. *Severe exacerbations were defined as exacerbation where a hospital stay was required.