Literature DB >> 33758504

Predicting Hospitalization Due to COPD Exacerbations in Swedish Primary Care Patients Using Machine Learning - Based on the ARCTIC Study.

Björn Ställberg1, Karin Lisspers1, Kjell Larsson2, Christer Janson3, Mario Müller4, Mateusz Łuczko5, Bine Kjøller Bjerregaard6, Gerald Bacher7, Björn Holzhauer7, Pankaj Goyal7, Gunnar Johansson1.   

Abstract

PURPOSE: Chronic obstructive pulmonary disease (COPD) exacerbations can negatively impact disease severity, progression, mortality and lead to hospitalizations. We aimed to develop a model that predicts a patient's risk of hospitalization due to severe exacerbations (defined as COPD-related hospitalizations) of COPD, using Swedish patient level data. PATIENTS AND METHODS: Patient level data for 7823 Swedish patients with COPD was collected from electronic medical records (EMRs) and national registries covering healthcare contacts, diagnoses, prescriptions, lab tests, hospitalizations and socioeconomic factors between 2000 and 2013. Models were created using machine-learning methods to predict risk of imminent exacerbation causing patient hospitalization due to COPD within the next 10 days. Exacerbations occurring within this period were considered as one event. Model performance was assessed using the Area under the Precision-Recall Curve (AUPRC). To compare performance with previous similar studies, the Area Under Receiver Operating Curve (AUROC) was also reported. The model with the highest mean cross validation AUPRC was selected as the final model and was in a final step trained on the entire training dataset.
RESULTS: The most important factors for predicting severe exacerbations were exacerbations in the previous six months and in whole history, number of COPD-related healthcare contacts and comorbidity burden. Validation on test data yielded an AUROC of 0.86 and AUPRC of 0.08, which was high in comparison to previously published attempts to predict COPD exacerbation.
CONCLUSION: Our work suggests that clinically available information on patient history collected via automated retrieval from EMRs and national registries or directly during patient consultation can form the basis for future clinical tools to predict risk of severe COPD exacerbations.
© 2021 Ställberg et al.

Entities:  

Keywords:  COPD; exacerbation; hospitalization; machine learning

Mesh:

Year:  2021        PMID: 33758504      PMCID: PMC7981164          DOI: 10.2147/COPD.S293099

Source DB:  PubMed          Journal:  Int J Chron Obstruct Pulmon Dis        ISSN: 1176-9106


Introduction

Chronic obstructive pulmonary disease (COPD) affects more than 300 million people worldwide and leads to premature mortality.1 In Sweden, the prevalence of COPD is 8% in the population aged above 50 years and approximately 3000 people die annually due to COPD.2 Patients with severe COPD account for only 6% of the COPD population, but yet account for 30% of the economic burden in Sweden.3,4 A large proportion of COPD patients develop exacerbations, periods of enhanced symptoms, often caused by infections which leads to increased costs of hospitalizations.5 In Sweden patients with mild to moderate COPD are mostly diagnosed, treated and followed up in primary care, while those with severe disease and severe exacerbations are mainly managed in secondary care.6,7 The aim of treatment is to improve symptoms and quality of life, prevent exacerbations, improve physical condition and maintain lung capacity, and is tailored to disease and exacerbation severity.8 Risk of exacerbation has been associated with history of previous exacerbations9–12 and comorbidities, specifically ischemic heart disease, heart failure, other respiratory diseases, gastroesophageal reflux and depression/anxiety.11,13–17 To estimate future risk for exacerbations, prediction models that integrate multiple risk factors can be useful tools in clinical practice, to support early intervention, healthcare resource planning, reducing the burden of inpatient care and improve quality of life for patients. To date, these prediction models have been limited in value and validity. Previous models have been heterogeneous in terms of number and type of predictors, statistical methods and assessment of model performance9,18 and only a few studies conducted an external19,20 or internal21 validation.22,23 Another recent large Canadian study observed that severe COPD exacerbations can be predicted within two months, by using administrative health data.24 Notably, the majority of previous investigations had a relatively long follow-up (most commonly one year). To better understand imminent risk of exacerbations, the present study aimed to develop a prediction model to assess risk of hospitalization within ten days due to COPD exacerbations. Machine learning algorithms were trained and tested on the real-world data collected from Swedish primary and secondary healthcare settings.

Patients and Methods

Study Design

We performed a retrospective, observational cohort study including incident COPD patients (aged ≥40 years) in Sweden between 1st January 2000 and 31st December 2013, using data from the ARCTIC study.25 The study was designed to develop a prediction model for identification of factors that cause hospitalization due to COPD exacerbations using machine learning algorithms. Data from electronic medical records (EMRs) were collected for patients from 52 primary care centers across Sweden (Figure 1), using an established software system (Pygargus Customized eXtraction Program [CXP] 3.0).26 The study selection covered a representative sample of the COPD population and healthcare centers by a mix of rural and urban areas with both large and small cities. Patients with COPD are mainly treated and managed in primary care in Sweden.
Figure 1

Overview of the included primary care centers (sites) from five regions across Sweden; these sites were from five regions in Sweden; Gävleborg (2 sites), Stockholm (3 sites), Uppsala (29 sites), Västmanland (2 sites) and Västra Götaland (16 sites). In total, 52 primary care centers were included.

Overview of the included primary care centers (sites) from five regions across Sweden; these sites were from five regions in Sweden; Gävleborg (2 sites), Stockholm (3 sites), Uppsala (29 sites), Västmanland (2 sites) and Västra Götaland (16 sites). In total, 52 primary care centers were included. EMR data were linked by the Swedish National Board of Health and Welfare (SNBHW) to national registers with mandatory reporting. Linkage of individual-level information was enabled across the national registers through unique personal identification numbers (pseudonymized by SNBHW),26 available for each Swedish resident from birth or immigration. The national registers included: Longitudinal Integration Database for Health Insurance and Labour Market Studies (LISA), socio-demographic data (educational level, marital status, occupational status, retirement, economic compensation social benefits).27 The National Patient Register (NPR) including diagnoses according to International Classification of Diseases (ICD) codes, from inpatient and outpatient specialist care. The NPR started in 1964 with complete coverage across all counties in Sweden in 1987, with comprehensive outpatient specialist care information available beginning in 2001.28 National Prescribed Drug Register (NPDR), which, since 2005, tracks full details of all dispensed medications (ATC codes).28 In this study, respiratory medications and medications against comorbidities are grouped ( and ) and medications not in the groups are referred to as “other medications”. National Cause of Death Register (NCDR), including information related to date and cause of death (primary and underlying).28

Study Cohort and Prediction Period

Patients aged ≥40 years who received a first COPD diagnosis (ICD-10: J44)) regardless of an asthma diagnosis (ICD-10: J45/J46) during the study period 1st January 2000 and 16th December 2013, were included in the study (Figure 2). The index date was defined as the date of first COPD diagnosis during the study period. Patients with <365 days of information in the registers before index date were excluded as were those with incomplete socioeconomic information.
Figure 2

Flowchart represents the patient selection procedure of the study. In the ARCTIC study, there was a total of 18,132 COPD patients. Of these, 7823 patients had their first COPD diagnosis between 2005 and 2013 AND were over 40 years old at index AND were enrolled in the study for more than 365 days AND had available socioeconomic information. These patients fulfilled all inclusion criteria and were eligible for the study.

Flowchart represents the patient selection procedure of the study. In the ARCTIC study, there was a total of 18,132 COPD patients. Of these, 7823 patients had their first COPD diagnosis between 2005 and 2013 AND were over 40 years old at index AND were enrolled in the study for more than 365 days AND had available socioeconomic information. These patients fulfilled all inclusion criteria and were eligible for the study. The start of the prediction period was set to June 2006 or later to ensure a minimum of 365 days from the first available prescription data from the NPDR (Figure 3). The prediction period extended until the earliest of the following occurrences: (1) death of a patient; (2) last record in the EMR or NPR; (3) end of the study (31-Dec-2013).
Figure 3

Overall prediction period was defined from June 2006 to October 2013, assuring the patients can be observed across data sources used for the study. Patient specific prediction periods were defined based on events captured in the data sources.

Overall prediction period was defined from June 2006 to October 2013, assuring the patients can be observed across data sources used for the study. Patient specific prediction periods were defined based on events captured in the data sources.

Outcome: Exacerbations

The outcome of our study was severe COPD exacerbations that needed hospitalization within a prediction window. Severe exacerbations were defined as a record of a COPD-related hospitalization (ICD-10: J44 as a primary diagnosis or ICD-10: J44.0/J44.1 as a secondary diagnosis) in a secondary care setting. If another severe exacerbation occurred within 10 days of the last visit, it was handled as only one severe exacerbation.

Prediction Variables and Prediction Windows

All available variables (except those in free-text structure) from the ARCTIC dataset were tested as potential predictors in this study, to find out which variables could be used to predict an exacerbation. Information from free-text case notes was only used to create variables for BMI and smoking status. Charlson Comorbidity Index (CCI) was used as an overall measure of comorbidities.29 A complete list of predictor variables ie number of contacts with primary and secondary care, number of exacerbations, comorbidities, and medication is provided in Previous exacerbations as predictor variables included both severe (defined above) and moderate exacerbations defined as a record of a dispensed prescription of systemic corticosteroids (ATC-code: H02AB) and/or respiratory antibiotics (ATC-code: J01AA, J01CA) without a record of a COPD-related hospitalization. The prediction period was divided into non-overlapping 10-day prediction windows for each patient (Figure 3). Before the start of each 10-day prediction window, lookback periods of 10, 30, 60, 90,180, 365 days or the entire patient history were set up depending on the variable. For details refer to .

Statistical Analysis

Several machine-learning methods were applied and assessed to predict exacerbations of adult COPD patients. These methods included logistic regression with multiple regularization methods (lasso, ridge and elastic net), random forest and gradient boosted trees models (XGBoost). The list covers both simple, classical approaches commonly found in the existing statistical literature (logistic regressions) and state-of-the-art tree-based models that automatically detect non-linear interactions amongst patient characteristics, therefore being highly relevant for complex medical challenges like predicting short-term exacerbations. To reduce the problem of imbalanced outcome distribution, different class imbalance corrections were tested, and random under-sampling proved to show the best performance in this study. This set of models (along with support vector machines and neural networks, which were not taken into consideration being more challenging to interpret) are considered gold standard for machine learning classification studies done on tabular data. Resampling was applied during cross-validation, making sure that only training folds of each cross-validation iteration are affected, and the effect of resampling is tested on the non-resampled test fold in each cross-validation iteration. Different methods and steps for predicting factor selection were implemented to determine which factors to include, such as prefiltering zero variance and highly correlated predictors (r > 0.8) as well as selection based on intermediate, cross validated random forest model of the full predictor set. Model performance was assessed using the Area under the Precision-Recall Curve (AUPRC). Since this metric describes best model performances for the positive class, in this case prediction of exacerbations. Other metrics like AUROC take into account the performance on the negative class, which can be misleading, especially when the negative class represents the majority of the cases (imbalance problem). To compare performance with previous similar studies, the Area Under Receiver Operating Curve (AUROC) was also reported. The model with the highest mean cross validation AUPRC was selected as the final model and was in a final step trained on the entire training dataset. Models were trained on 75% of the patients (the training set) and tested on the remaining 25% (the test set) as an internal validation of the prognostic model. To account for the problem of repeated measures with correlated patient observations, the data were split into training and test set using a group-based split, so all data for a single patient were either in the training or the test set. To improve robustness and avoid overfitting (lack of generalization) when making modelling choices for each algorithm, we used 4-fold cross-validation on the training data. Imputation of missing values for predictors was done using median for logistic regression models and Random Forest while XGBoost can handle missing values automatically without imputation. Analyses were performed using SAS version 9.4 (SAS Institute) and R version 3.4.4.

Results

Patient Demographics and Exacerbations

In total, 18,132 patients had a reported COPD diagnosis in the EMRs and of these, 7823 patients were eligible for this study (Figure 2). In Table 1, patient demographics are subdivided according to exacerbation history which included the observation period (look-back and prediction period). The mean age at index date was similar for patients without (66.5 [SD 10.1] years) and with a history of exacerbations (66.9 [SD 10.5] years), as were the proportion of women (56% and 58%, respectively). A higher proportion of patients with a history of exacerbations than those without exacerbations had their first COPD diagnosis in secondary outpatient care (35% vs 17%) or primary care setting (34% vs 26%).
Table 1

Sociodemographic Characteristics

No ExacerbationaExacerbationa
N=5654 (72%)N=2169 (28%)
Baseline characteristics
Age at Index (Mean years)66.566.9
Female n (%)3166 (56)1,258 (58)
First COPD diagnosis: Inpatient n (%)3,223 (57)672 (31)
First COPD diagnosis: Outpatient n (%)961 (17)759 (35)
First COPD diagnosis: Primary Care n (%)1,470 (26)737 (34)
Socioeconomic characteristics b
 Work and Finance
 Number of days per year with sick leave benefits (mean)7.612.1
 Number of hours with sick leave benefits per year*100 (mean)0.080.10
 Income from social transferals, n (%)1300 (23)542 (25)
 Income (mean)445.1299.8
 Income from social security, n (%)283 (5)130 (6)
 Employment – Working, n (%)961 (17)260 (12)
 Employment – Not working, n (%)4354 (77)1670 (77)
 Employment – No Information, n (%)339 (6)239 (11)
 Health
 CCI (mean)1.22.0
 Any sick leave, n (%)170 (3)108 (5)
 Smoking – No Information, n (%)4071 (72)1366 (63)
 Smoking – No smoker, n (%)396 (7)108 (5)
 Smoking – Ex Smoker, n (%)622 (11)282 (13)
 Smoking – Current smoker, n (%)565 (10)412 (19)
 Education
 Education – No Information339 (6)239 (11)
 Education - Primary School < 9 years1583 (28)586 (27)
 Education - Primary School 9 years622 (11)239 (11)
 Education – High School 2 years1696 (30)651 (30)
 Education – High School 12 years509 (9)174 (8)
 Education – Post High School <3 years396 (7)108 (5)
 Education – Post High School ≥3 years396 (7)108 (5)
 Education – Research Education57 (1)22 (1)

Notes: aExacerbations occurring during the observation period (look-back period and prediction period). bSocioeconomic information retrieved every year during the prediction period.

Abbreviation: CCI, Charlson Comorbidity Index.

Sociodemographic Characteristics Notes: aExacerbations occurring during the observation period (look-back period and prediction period). bSocioeconomic information retrieved every year during the prediction period. Abbreviation: CCI, Charlson Comorbidity Index. Patients with exacerbations had more days per year receiving sick leave benefits, had a lower mean income, were less likely to be working, and had a higher CCI than patients without exacerbations. However, the educational level did not differ substantially between the patient groups (30% with high school 2 years in both groups). Patients with exacerbations seemed to have a higher prevalence of current smoking (19% vs 10%), although information on smoking status were missing for 72% of the patients with no exacerbations and for 63% of the patients with exacerbations.

Model Selection and Performance

Out of the tested models, gradient, gradient boosted trees (XGBoost,30 (Apache License)) with undersampling was selected for the final model because it had the highest mean cross validation (CV) score for AUPRC of 0.11 (Table 2). AUPRC was 0.17 (95% CI: ± 0.001) for the whole training set and 0.08 (95% CI: ± 0.001) for the testing set and AUROC was 0.88 (95% CI: ± 0.001) on the whole training set and 0.86 (95% CI: ± 0.001) on the testing set. On the testing sets, the recall was 0.16 (95% CI: ± 0.001) and precision was 0.11 (95% CI: ± 0.001).
Table 2

Model Performance and Best Models by Different Setting

Model PerformanceSettingAUPRC (95% CI)AUROC (95% CI)Recall (95% CI)Precision (95% CI)
XGBoost, undersamplingTrainset0.17 (0.001)0.88 (0.001)
Test set0.08 (0.001)0.86 (0.001)0.16 (0.001)0.11 (0.001)
CVS0.11

Abbreviations: AUPRC, area under the precision-recall curve; AUROC, area under receiver operating curve; CI, confidence interval; CVS, cross validation score; XGBoost, extreme gradient boosting.

Model Performance and Best Models by Different Setting Abbreviations: AUPRC, area under the precision-recall curve; AUROC, area under receiver operating curve; CI, confidence interval; CVS, cross validation score; XGBoost, extreme gradient boosting.

Prediction Factors

The 20 most important predictors are presented in Table 3. Several prediction factors were related to previous exacerbations, eg, number of severe exacerbations at the different time point (that is, 1–180 days, all time, 1–60 days, 1–180 days year before), and number of moderate exacerbations at 1–30 days. The association between exacerbations and hospitalization is described in Figure 4. A first COPD diagnosis in inpatient care was a stronger predictor of hospitalization compared to first COPD diagnosis in outpatient care. Number of COPD-related healthcare contacts over the entire follow-up time was more important compared with the number of non-COPD-related healthcare visits. Features related to COPD medications, number of other prescriptions 1–365 days and number of prescriptions for antibiotics during patient’s whole observation period (look-back and prediction period) were important predictive features. CCI from the year before the prediction was also important.
Table 3

Top 20 Most Important Features of Prediction Hospitalization Due to COPD, by Using Machine Learning Models

RankFeatureImportancea
1Number of severe exacerbations (last 180 days)0.33
2Number of severe exacerbations (whole history) – standardized by the number of days0.11
3Number of COPD – related contacts (whole history) – standardized by the number of days0.066
4Whether first COPD diagnosis was classified as “inpatient”0.054
5Charlson Comorbidity Index (CCI) from the year before the prediction0.047
6Number of medications from “other” groupb (last 365 days)0.019
7Whether first COPD diagnosis was classified as “outpatient”0.016
8Number of moderate exacerbations (last 30 days)0.012
9Number of prescriptions for Antibiotics (whole history) – standardized by the number of days0.010
10Number of severe exacerbations (last 180 days, 1 year before prediction date)0.009
11Number of COPD – related contacts (last 180 days)0.008
12Number of visits (person not defined) in inpatient care (last 180 days)0.007
13Number of diagnoses of ischemic heart diseases in inpatient care (whole history) – standardized by the number of days0.007
14Number of severe exacerbations (last 60 days)0.006
15Number of prescriptions for COPD Medication (last 365 days)0.006
16Number of non-COPD – related contacts (whole history) – standardized by the number of days0.006
17Number of diagnoses of respiratory disease in inpatient care, all time (whole history) – standardized by the number of days0.005
18Number of diagnoses from “other” group in outpatient care (whole history) – standardized by the number of days0.005
19Number of diagnoses from “other” group in inpatient care (last 30 days, 1 year before prediction date)0.005
20Number of prescriptions for Oral steroids (last 30 days)0.004

Notes: aThe value implies the relative contribution of the corresponding feature to the model calculated by taking each feature’s contribution for each tree in the model. bMedications against comorbidities () and respiratory medications () and were divided into main groups and sub-groups. Medications not in the groups are referred to as “other medications”.

Figure 4

Relationship between the history of severe exacerbations and probability of hospitalization for severe exacerbations, within 1–10 days. The number of previous severe exacerbations, especially within 180 days before prediction point, drastically increases the probability of having a severe exacerbation within the next 10 days. *Severe exacerbations were defined as exacerbation where a hospital stay was required.

Top 20 Most Important Features of Prediction Hospitalization Due to COPD, by Using Machine Learning Models Notes: aThe value implies the relative contribution of the corresponding feature to the model calculated by taking each feature’s contribution for each tree in the model. bMedications against comorbidities () and respiratory medications () and were divided into main groups and sub-groups. Medications not in the groups are referred to as “other medications”. Relationship between the history of severe exacerbations and probability of hospitalization for severe exacerbations, within 1–10 days. The number of previous severe exacerbations, especially within 180 days before prediction point, drastically increases the probability of having a severe exacerbation within the next 10 days. *Severe exacerbations were defined as exacerbation where a hospital stay was required.

Discussion

In this observational cohort study, we developed a machine-learning algorithm to understand predictors for severe COPD exacerbations. We discuss model performance and the top-20 factors used by the model, which were the most predictive for hospitalizations due to COPD exacerbation. These factors can be divided into five main groups: history of exacerbations, comorbidities, medications, setting of first COPD diagnosis and healthcare contacts.

Model Performance

The test set AUROC of 0.86 was relatively high,18–24 but our models may be of limited value for prediction of severe exacerbations due to high false positive rate. Instead of a focus on accuracy and AUROC, the sensitivity, positive predictive value and AUPRC should also be considered. For instance, if our predictive model would be used to assess risk of hospitalization in the next 10 days, for 16% of patients who are hospitalized (sensitivity), 89% of patients who the model assigned a risk of hospitalization would have no hospitalization event in the next 10 days (11%-positive predictive value). Furthermore, the heterogeneity of previously published models in terms of predictors, statistical methods and assessment of model performance, hampers comparisons with our study.9,18–24 Regarding time horizons, we could only identify one small study in outpatient care31 with such a short follow-up as used in our models. Finally, the large number of predictors might constitute a barrier to using the present models in clinical practice unless the predictors are recorded in EMRs or are substantially simplified.

History of Exacerbations

A history of exacerbations was the most important predictor of future COPD exacerbations. The risk of exacerbation increased if severe exacerbations occurred within the last 180 days and increased with the number of severe exacerbations in the patient’s entire health history. Previous moderate exacerbations were also strong predictors of a future hospitalization if they occurred within 1–30 days before the prediction point. Previous studies have also identified prior exacerbations as highly associated with the risk of future exacerbations.9–12,24,32

Comorbidities

The CCI (summary measure of number/severity of comorbidities) was a good predictor of hospitalization due to COPD exacerbation. This is consistent with a previous study showing that the more comorbidities patients have, the stronger a predictor it is for future exacerbations and possible hospitalizations.14 Another observational study of 213 COPD patients showed that 54% of the patients suffered from at least four comorbidities.13 The most common comorbidities of future exacerbations that were observed among COPD patients in previous research were ischemic heart disease, heart failure, other respiratory diseases, gastroesophageal reflux disease, CVD and depression/anxiety.11,14–17,24 This is in line with our findings where ischemic heart disease and respiratory disease other than COPD were strong predictors for COPD hospitalization due to acute exacerbations.

Medications

The number of prescriptions for antibiotics (mostly used for respiratory infections) and other prescriptions, were among the most important predictors, which has also been observed in other studies.24 Medications (except antibiotics) might indicate indirectly number of comorbidities.

Setting of First COPD Diagnosis

Having a first diagnosis of COPD within secondary care (inpatient or outpatient) was a strong predictor for hospitalization due to COPD exacerbation. This suggests that patients diagnosed with COPD in secondary care may have a more severe or advanced disease at the time of diagnosis and therefore have more frequent and severe exacerbations. An early diagnosis of COPD could have been overlooked in primary care or patient may not have been to primary care at all.33 It is reasonable to believe that patients, with a first diagnosis in secondary care, included in our study represent two groups: (1) Patients who visited the hospital with a severe exacerbation and are diagnosed with COPD during this hospitalization (2) Patients who visited the hospital with a referral from primary care and see a pulmonary specialist who then diagnosed the COPD. An assumption can be made that COPD first discovered during an inpatient hospitalization might be more severe than COPD discovered during an outpatient visit – unless the outpatient visit is an emergency where no referral is needed.

Contacts to the Healthcare System

The third most important predictor for hospitalization due to COPD was the number of COPD-related contacts to the healthcare system in the context of the patient’s entire medical history. It can be assumed that patients with more severe disease and frequent exacerbations would have more contact with the healthcare system. The contacts included severe exacerbation hospitalizations, but these were few compared to the total number of contacts. Studies have shown that medical comorbidities are common among patients with COPD7,11,13,17 and our results show that several non-COPD-related contacts was also an important predictor in the present model. Furthermore, when it becomes possible to remote high-risk monitor ambulatory COPD patients, such input features in the model may further improve performance of the model and may also prevent mild/moderate exacerbations from proceeding to become severe exacerbations requiring hospitalizations.

Strengths

The strengths of our study include the large sample size and long longitudinal follow-up which ensured that key predictors of severe COPD exacerbations were likely to be identified.24 We had complete and comprehensive longitudinal data on patients, extracted from the EMRs from 52 primary care centers and linked with Swedish national health registers with a mandatory reporting for all healthcare providers. The study used all COPD-related variables such as patient information, comorbidities, medication, laboratory tests and measurements, contact to the healthcare system and seasonal variables. More than 4000 variables were created to build a model to predict which factors in a COPD patient journey in the healthcare system, could be used to predict severe exacerbations requiring hospitalization. The registers used for identification of clinical/epidemiological data are national with a high coverage of most of the conditions included in our study.34 In addition, the completeness of the variables was high except for smoking and BMI. Furthermore, the extraction program used for retrieval of information from EMRs has been validated in a specific study, which concluded that it is highly reliable and that appropriate and accurate information is extracted from the EMRs.26 Finally, our dataset was large, which enabled us to perform an internal validation using a large test set not used for model training. Similarly to a Canadian study24 we found gradient boosting to be the best performing prediction model. We picked this final model using a robust cross-validation approach, which allowed us to explore the additional value of more recently developed ML compared with more traditional logistic regression approaches. These ML models can capture complex, non-linear relationships, and interactions between predictors.

Limitations

Limitations of the study include potential misreporting of information in the data sources used (EMRs and national health registries). However, for the variables that were collected for the present study, experience from previous research shows that compliance of reporting to EMRs in primary and specialist health care is good and reporting compliance into the national registries used in this study is very high.35 In addition, most variables are coded according to international classification systems (ICD-10 and ATC codes) limiting the risk of bias due to reporting ambiguities. However, as this study is based on patients with physician diagnosed COPD, it means we miss lung function results from substantial number of patients. Moreover, in order for a model to be functional in a real setting we have included patients with asthma to reflect the reality of patients with COPD as it is common for COPD patients to be diagnosed with asthma as well. Finally, it could be interesting for future studies to investigate a clean COPD population as a sensitivity analysis.

Conclusion

Our work suggests that clinically available information on patient history collected via automated retrieval from EMRs and national registries or directly during patient consultation can form the basis for future clinical tools to predict risk of severe COPD exacerbations.
  30 in total

Review 1.  Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker.

Authors:  Karel G M Moons; Andre Pascal Kengne; Mark Woodward; Patrick Royston; Yvonne Vergouwe; Douglas G Altman; Diederick E Grobbee
Journal:  Heart       Date:  2012-03-07       Impact factor: 5.994

Review 2.  Global burden of COPD.

Authors:  José Luis López-Campos; Wan Tan; Joan B Soriano
Journal:  Respirology       Date:  2015-10-23       Impact factor: 6.424

3.  Predicting Severe Chronic Obstructive Pulmonary Disease Exacerbations. Developing a Population Surveillance Approach with Administrative Data.

Authors:  Hamid Tavakoli; Wenjia Chen; Don D Sin; J Mark FitzGerald; Mohsen Sadatsafavi
Journal:  Ann Am Thorac Soc       Date:  2020-09

4.  Costs of COPD in Sweden according to disease severity.

Authors:  Sven-Arne Jansson; Fredrik Andersson; Sixten Borg; Asa Ericsson; Elsy Jönsson; Bo Lundbäck
Journal:  Chest       Date:  2002-12       Impact factor: 9.410

5.  Susceptibility to exacerbation in chronic obstructive pulmonary disease.

Authors:  John R Hurst; Jørgen Vestbo; Antonio Anzueto; Nicholas Locantore; Hana Müllerova; Ruth Tal-Singer; Bruce Miller; David A Lomas; Alvar Agusti; William Macnee; Peter Calverley; Stephen Rennard; Emiel F M Wouters; Jadwiga A Wedzicha
Journal:  N Engl J Med       Date:  2010-09-16       Impact factor: 91.245

6.  Factors related to chronic obstructive pulmonary disease readmission in Taiwan.

Authors:  Yea-Jyh Chen; Georgia L Narsavage
Journal:  West J Nurs Res       Date:  2006-02       Impact factor: 1.967

7.  Development and validation of a predictive model to identify patients at risk of severe COPD exacerbations using administrative claims data.

Authors:  Srinivas Annavarapu; Seth Goldfarb; Melissa Gelb; Chad Moretz; Andrew Renda; Shuchita Kaila
Journal:  Int J Chron Obstruct Pulmon Dis       Date:  2018-07-11

8.  Impact of COPD diagnosis timing on clinical and economic outcomes: the ARCTIC observational cohort study.

Authors:  Kjell Larsson; Christer Janson; Björn Ställberg; Karin Lisspers; Petter Olsson; Konstantinos Kostikas; Jean-Bernard Gruenberger; Florian S Gutzwiller; Milica Uhde; Leif Jorgensen; Gunnar Johansson
Journal:  Int J Chron Obstruct Pulmon Dis       Date:  2019-05-13

9.  Development and validation of a model to predict the risk of exacerbations in chronic obstructive pulmonary disease.

Authors:  Loes C M Bertens; Johannes B Reitsma; Karel G M Moons; Yvonne van Mourik; Jan Willem J Lammers; Berna D L Broekhuizen; Arno W Hoes; Frans H Rutten
Journal:  Int J Chron Obstruct Pulmon Dis       Date:  2013-10-10

10.  Management, morbidity and mortality of COPD during an 11-year period: an observational retrospective epidemiological register study in Sweden (PATHOS).

Authors:  Björn Ställberg; Christer Janson; Gunnar Johansson; Kjell Larsson; Georgios Stratelis; Gunilla Telg; Karin H Lisspers
Journal:  Prim Care Respir J       Date:  2014-03
View more
  4 in total

1.  Developing a Machine Learning Model to Predict Severe Chronic Obstructive Pulmonary Disease Exacerbations: Retrospective Cohort Study.

Authors:  Siyang Zeng; Mehrdad Arjomandi; Yao Tong; Zachary C Liao; Gang Luo
Journal:  J Med Internet Res       Date:  2022-01-06       Impact factor: 5.428

2.  Automatically Explaining Machine Learning Predictions on Severe Chronic Obstructive Pulmonary Disease Exacerbations: Retrospective Cohort Study.

Authors:  Siyang Zeng; Mehrdad Arjomandi; Gang Luo
Journal:  JMIR Med Inform       Date:  2022-02-25

3.  Evaluating Triple Therapy Treatment Pathways in Chronic Obstructive Pulmonary Disease (COPD): A Machine-Learning Predictive Model.

Authors:  Michael Bogart; Yuhang Liu; Todd Oakland; Marjorie Stiegler
Journal:  Int J Chron Obstruct Pulmon Dis       Date:  2022-04-06

4.  Predictive modeling of COPD exacerbation rates using baseline risk factors.

Authors:  Dave Singh; John R Hurst; Fernando J Martinez; Klaus F Rabe; Mona Bafadhel; Martin Jenkins; Domingo Salazar; Paul Dorinsky; Patrick Darken
Journal:  Ther Adv Respir Dis       Date:  2022 Jan-Dec       Impact factor: 5.158

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.