Literature DB >> 26609229

Predicting frequent COPD exacerbations using primary care data.

Marjan Kerkhof¹, Daryl Freeman², Rupert Jones³, Alison Chisholm⁴, David B Price⁵.

Abstract

PURPOSE: Acute COPD exacerbations account for much of the rising disability and costs associated with COPD, but data on predictive risk factors are limited. The goal of the current study was to develop a robust, clinically based model to predict frequent exacerbation risk. PATIENTS AND METHODS: Patients identified from the Optimum Patient Care Research Database (OPCRD) with a diagnostic code for COPD and a forced expiratory volume in 1 second/forced vital capacity ratio <0.7 were included in this historical follow-up study if they were ≥40 years old and had data encompassing the year before (predictor year) and year after (outcome year) study index date. The data set contained potential risk factors including demographic, clinical, and comorbid variables. Following univariable analysis, predictors of two or more exacerbations were fed into a stepwise multivariable logistic regression. Sensitivity analyses were conducted for subpopulations of patients without any asthma diagnosis ever and those with questionnaire data on symptoms and smoking pack-years. The full predictive model was validated against 1 year of prospective OPCRD data.
RESULTS: The full data set contained 16,565 patients (53% male, median age 70 years), including 9,393 patients without any recorded asthma and 3,713 patients with questionnaire data. The full model retained eleven variables that significantly predicted two or more exacerbations, of which the number of exacerbations in the preceding year had the strongest association; others included height, age, forced expiratory volume in 1 second, and several comorbid conditions. Significant predictors not previously identified included eosinophilia and COPD Assessment Test score. The predictive ability of the full model (C statistic 0.751) changed little when applied to the validation data set (n=2,713; C statistic 0.735). Results of the sensitivity analyses supported the main findings.
CONCLUSION: Patients at risk of exacerbation can be identified from routinely available, computerized primary care data. Further study is needed to validate the model in other patient populations.

Entities: Chemical Disease Gene Species

Keywords: FEV1; model; prediction; risk factor; validation

Mesh：

Year: 2015 PMID： 26609229 PMCID： PMC4644169 DOI： 10.2147/COPD.S94259

Source DB: PubMed Journal: Int J Chron Obstruct Pulmon Dis ISSN： 1176-9106

Introduction

COPD is a serious, debilitating condition that has become a major public health concern and by 2020 is projected to rank fifth in global burden of disease and third in global mortality.1 For a proportion of patients, COPD is a progressive disease characterized by periodic acute exacerbations of symptoms. These exacerbations pose an immediate threat to patients and also hasten progression of the disease. Exacerbations accelerate decline of lung function, so that patients often fail to return to baseline levels. Although symptoms may last a few days, recovery of lung function can take weeks to months, resulting in prolonged periods of functional limitation and general worsening of quality of life, often with some degree of permanent functional decline. Exacerbations are also associated with substantial risk of hospitalization and death, as well as considerable economic costs that increase with exacerbation frequency.1,2 Preventing exacerbations is therefore a key goal of COPD management1,3–5 and hence the need to predict who are likely to experience exacerbations. The recent Global Initiative for Chronic Obstructive Lung Disease (GOLD) categories A–D were developed to aid in assessing future risk of exacerbations and performed well in one study;6,7 however, patient assignment to categories may vary depending on the choice of symptom measure, limiting their applicability.8 Indices of COPD severity such as the body mass index [BMI], obstruction, dyspnea, and exercise (BODE) index and the age, dyspnea, and obstruction (ADO) index have been used to try to predict future exacerbations among patients with COPD but with only moderate (60%–70%) prediction success; moreover, they require specific data (eg, 6-minute walk test) that may not be routinely available.9,10 The dyspnea, airflow obstruction, smoking status, and exacerbation frequency (DOSE) index has been shown to predict future exacerbations in a large primary care data set, and the index was stronger than previous exacerbation frequency or the ADO or BODE index.9,11,12 Other researchers have developed de novo statistical models to identify independent clinical predictors. However, many of these studies included relatively small sample sizes,10,13–16 patients with severe COPD,14,17,18 and/or severe outcomes such as hospitalization or death.13,15,17,19–22 The goal of the current study was to develop a robust, clinically based predictive model that would encompass all levels of COPD severity as well as moderate or severe exacerbation severity. Such a model could help in earlier targeting of patients for review to optimize drug therapy and other interventions, with the aim of reducing hospital admissions, decline in lung function, and the morbidity and mortality associated with COPD. A secondary objective was to compare the model’s predictive value in relation to existing predictive tools.

Patients and methods

This was a historical follow-up study of patients with COPD identified from the Optimum Patient Care Research Database (OPCRD).23 The OPCRD is a quality-controlled, longitudinal, respiratory-focused database containing anonymous data from general practices in the UK and has been approved by the Trent Multicentre Research Ethics Committee for clinical research use (approval reference 10/H0405/3), and this study was approved by the Anonymised Data Ethics Protocols and Transparency committee, the independent scientific advisory committee for the OPCRD. Informed consent was not required or possible as we worked with anonymous data, and this was not an interventional study. However, patients could opt out of having their data used in research. At the time of the study, the OPCRD contained records of >50,000 patients with COPD from >300 UK general practices. The database combines routine data from electronic patient records with linked patient-reported data collected using disease-specific questionnaires. Routine clinical data, including patient demographic characteristics, comorbidities, exacerbation history, modified Medical Research Council (mMRC) score,24 and current therapy, were extracted from primary care practice management systems. In addition, a proportion of patients with relevant disease codes were invited to complete validated disease assessment questionnaires, sent via a secure mailing house. The questionnaires enabled calculation of the mMRC scores and the COPD Assessment Test (CAT) scores.25 The current study was divided into model-building and model-validation components. Patients were eligible for inclusion in the model-building phase if, on or before March 12, 2013, they had at least one recorded eosinophil count, were at least 40 years of age, had been diagnostically coded for COPD, and had a forced expiratory volume in 1 second/forced vital capacity (FEV1/FVC) ratio <0.7 recorded within 5 years of their last eosinophil count (defined as the index date). All eligible patients needed to have at least 1 year of observation before (baseline year) and 1 year after (outcome year) the index date. Included patients also needed to have complete data on the candidate predictors analyzed. Those with chronic respiratory diseases other than asthma, such as bronchiectasis, were excluded. The validation cohort consisted of patients with similar eligibility criteria identified between March 2013 and February 2014. Potentially important variables within the OPCRD were identified from a search of the literature and from expert opinion of the authors: Sociodemographic factors: sex, age, height, weight, BMI, smoking status Symptom severity: mMRC dyspnea score, CAT score, number of exacerbations in the previous year Comorbidities: asthma, eczema, allergic or nonallergic rhinitis, nasal polyps, diabetes mellitus, gastroesophageal reflux disease (GERD), ischemic heart disease, heart failure, anxiety/depression, Charlson comorbidity index Spirometry: FEV1 (% predicted), FEV1/FVC ratio Peripheral blood eosinophilia (defined as ≥500 cells/μL).

Model building

To enhance diagnostic specificity and to be consistent with earlier research, as well as with the GOLD cut-point for risk of future events,7 we defined the outcome of interest as frequent (≥2) exacerbations. Exacerbations were defined as either 1) unscheduled hospital admission or accident/emergency attendance for COPD or lower respiratory events, 2) an acute course of oral corticosteroids prescribed with evidence of respiratory review, or 3) antibiotics prescribed with evidence of respiratory review. Where one or more oral corticosteroid course, hospitalization, or antibiotic prescription occurred within a 2-week window, these events were considered to be the result of the same exacerbation. All analyses were performed using the R statistical package (version 3.0.2). Prior to analysis, continuous variables were evaluated via likelihood ratio test to see if quadratic or cubic transformation improved model fit. Akaike information criteria (AIC) were compared to test whether model fit was improved by categorical or continuous classification. Univariable logistic regressions were performed to gauge the importance of individual variables and to help define the best form (eg, continuous, categorical) of each variable. However, all potentially important variables were fed into a multiple logistic regression with backward selection of the model having the lowest AIC. The questionnaire variables dealing with symptoms and pack-years were excluded from this analysis owing to small sample size. By way of sensitivity analysis, the model-building process was repeated for two different subpopulations. Sub-population 1 consisted of all patients without an overlapping diagnosis of asthma, defining asthma using the sensitive definition of any asthma-related Read code at any time in the data set. The second, smaller subpopulation consisted of those with questionnaire information on symptoms (CAT and mMRC score) and pack-years of smoking.

Model validation

Calibration plots were performed by comparing observed with predicted risk among 150 groups of ~110 patients each. Goodness of fit was judged using both the C statistic (area under the curve) and the Hosmer–Lemeshow test.26 The C statistic confidence intervals (CIs) were generated by bootstrapping. External validity was judged by the C statistic when the full multivariable model was applied to the validation cohort. In addition, we assessed the predictive accuracy of the model to predict two or more exacerbations compared with the DOSE index and GOLD categories A–D using the mMRC, together with exacerbations and FEV1, to assign categories.7

Results

Approximately 51,000 patients with COPD were identified from the OPCRD; 16,565 met all inclusion criteria. The main reasons for exclusion from the study are depicted in Figure 1. The study index dates (ie, dates of last eosinophil count) ranged from 1993 to 2012 (median year, 2009; interquartile range, 2007–2010).

Figure 1

Patient selection in the database.

Abbreviations: BMI, body mass index; CAT, COPD Assessment Test; OPCRD, Optimum Patient Care Research Database; FEV1, forced expiratory volume in 1 second; FVC, forced vital capacity.

Of the 16,565 patients included in the full population, 9,393 did not have any recorded asthma Read code at any time (subpopulation 1) and 3,713 had questionnaire data for determining CAT score (subpopulation 2). The characteristics of these three populations (Table 1) and the frequencies of COPD exacerbations (Table 2) were similar, with only minor differences. Most patients had moderate COPD, and 92% of lung function measurements were taken within 2 years of the index date (80% within 1 year).

Table 1

Baseline characteristics of all patients and of the two subpopulations

Characteristics	Total population (N=16,565)	Subpopulation 1 (no asthma ever) (N=9,393)	Subpopulation 2 (with symptom data) (N=3,713)
Female sex, n (%)	7,736 (46.7)	4,187 (44.6)	1,623 (43.7)
Age, median (IQR)	70 (63–78)	71 (63–78)	71 (58–78)
Weight (kg), mean (SD)	74.4 (18.3)	74.0 (18.2)	76.3 (18.2)
Height (m), mean (SD)	1.67 (0.10)	1.67 (0.10)	1.68 (0.10)
BMI, mean (SD)	26.7 (5.8)	26.4 (5.7)	27.1 (5.6)
BMI, n (%)
Underweight (<18.5 kg/m²)	859 (5.2)	550 (5.9)	151 (4.1)
Normal (≥18.5 and <25 kg/m²)	6,016 (36.3)	3,503 (37.3)	1,293 (34.8)
Overweight (≥25 and <30 kg/m²)	5,607 (33.8)	3,151 (33.5)	1,278 (34.4)
Obese (≥30 kg/m²)	4,083 (24.6)	2,189 (23.3)	991 (26.7)
Smoking status, n (%)
Never smoker	1,964 (11.9)	761 (8.1)	455 (12.3)
Ex-smoker	8,875 (53.6)	4,952 (52.7)	2,220 (59.8)
Current smoker	5,726 (34.6)	3,680 (39.2)	1,038 (28.0)
Pack-years of smoking, median (IQR)a	n/a	n/a	27.5 (12.5–40.5)
CAT score, median (IQR)	n/a	n/a	17 (11–23)
CAT score ≥10, n (%)	n/a	n/a	3,012 (81.1)
mMRC score available, n (%)			3,558 (95.8)
0–1			1,962 (55.1)
≥2			1,596 (44.9)
DOSE index score available, n (%)			3,558 (95.8)
≤4			3,384 (95.1)
>4			174 (4.9)
FEV₁/FVC ratio, mean (SD)	54.4 (10.7)	54.6 (10.6)	53.9 (10.6)
FEV₁ % predicted, mean (SD)	57.4 (18.8)	57.7 (18.7)	57.8 (18.1)
GOLD airflow limitation (FEV₁ % predicted), n (%)
1: mild (≥80%)	1,847 (11.2)	1,053 (11.2)	370 (10.0)
2: moderate (50%–79%)	8,742 (52.8)	5,002 (53.3)	2,069 (55.7)
3: severe (30%–49%)	4,911 (29.6)	2,755 (29.3)	1,071 (28.8)
4: very severe (<30%)	1,065 (6.4)	583 (6.2)	203 (5.5)
Eosinophilia (≥500 cells/μL), n (%)
Eosinophilia – all patients	1,639 (9.9)	831 (8.8)	336 (9.0)
Eosinophilia – noncurrent smokers	1,128 (6.8)	522 (5.6)	253 (6.8)
Asthma, n (%)	7,172 (43.3)	0	1,587 (42.7)
Eczema, n (%)	3,443 (20.8)	1,748 (18.6)	835 (22.5)
Rhinitis, n (%)
Allergic	1,245 (7.5)	500 (5.3)	307 (8.3)
Nonallergic	965 (5.8)	507 (5.4)	242 (6.5)
Nasal polyps, n (%)	421 (2.5)	152 (1.6)	94 (2.5)
Diabetes mellitus, n (%)	3,741 (22.6)	2,029 (21.6)	861 (23.2)
GERD, n (%)	1,940 (11.7)	998 (10.6)	446 (12.0)
Ischemic heart disease, n (%)	3,159 (19.1)	1,844 (19.6)	595 (16.0)
Heart failure, n (%)	1,340 (8.1)	806 (8.6)	266 (7.2)
Anxiety or depression, n (%)	5,151 (31.1)	2,822 (30.0)	1,078 (29.0)
Charlson comorbidity index, n (%)
0	12,802 (77.3)	7,903 (84.1)	2,927 (78.8)
1–4	2,095 (12.6)	577 (6.1)	474 (12.8)
≥5	1,668 (10.1)	913 (9.7)	312 (8.4)

Notes: Noncurrent smokers included ex-smokers and never smokers.

Pack-years data were available for 3,107 patients.

Abbreviations: BMI, body mass index; CAT, COPD Assessment Test; DOSE, dyspnea, airway obstruction, smoking status, exacerbations; FEV1/FVC, forced expiratory volume in 1 second/forced vital capacity; GERD, gastroesophageal reflux disease; GOLD, Global Initiative for Chronic Obstructive Lung Disease; IQR, interquartile range; mMRC, modified Medical Research Council; n/a, not applicable; SD, standard deviation.

Table 2

Number of exacerbations in baseline and outcome years

Number of exacerbations	Total population (N=16,565)	Subpopulation 1 (no asthma ever) (N=9,393)	Subpopulation 2 (with symptom data) (N=3,713)
Baseline year, n (%)
0	8,783 (53.0)	5,406 (57.6)	2,003 (53.9)
1	4,101 (24.8)	2,277 (24.2)	912 (24.6)
2	1,940 (11.7)	950 (10.1)	420 (11.3)
3	900 (5.4)	422 (4.5)	206 (5.5)
≥4	841 (5.1)	338 (3.6)	172 (4.6)
Outcome year, n (%)
0	9,347 (56.4)	5,640 (60.0)	2,096 (56.5)
1	3,973 (24.0)	2,245 (23.9)	906 (24.4)
2	1,722 (10.4)	844 (9.0)	398 (10.7)
3	754 (4.6)	339 (3.6)	151 (4.1)
≥4	769 (4.6)	325 (3.5)	162 (4.4)

Patient numbers for the categorized mMRC and DOSE index scores and for GOLD category are in Table S1. Approximately 20% of the total population had two or more exacerbations in the outcome year (Table 2). Most variables were significant univariable predictors (Table 3).

Table 3

Univariable predictors of two or more COPD exacerbations in the outcome year in the total population data set (N=16,565)

Variable	Odds ratio (95% CI)	P-value
Female sex	1.25 (1.16–1.35)	9×10⁻⁹
Age (per 10 years)	1.71 (1.14–2.56)	0.009
Age2 (per 10 years)	0.96 (0.93–0.98)	0.003
Height (per 10 cm)	0.88 (0.85–0.92)	6×10⁻¹⁰
Body mass index
Normal	1.00	0.30
Underweight	1.10 (0.92–1.31)	0.60
Overweight	1.02 (0.93–1.12)	0.45
Obese	1.04 (0.94–1.15)
Smoking status
Never	1.00	0.23
Former	1.08 (0.95–1.22)	0.48
Current	0.95 (0.84–1.09)
Exacerbations in the baseline year
0	1.00	3×10
1	2.55 (2.30–2.84)	5×10
2	4.86 (4.31–5.47)	7×10
3	8.34 (7.17–9.68)	8×10
≥4	21.05 (17.90–24.75)
Asthma	1.67 (1.55–1.81)	7×10⁻³⁹
Rhinitis
Allergic	1.12 (0.97–1.29)	0.13
Nonallergic	1.60 (1.38–1.85)	5×10⁻¹⁰
Nasal polyps	1.49 (1.19–1.85)	0.0004
GERD	1.37 (1.22–1.53)	3×10⁻⁸
Anxiety/depression	1.32 (1.22–1.44)	8×10⁻¹²
Eczema	1.17 (1.06–1.28)	0.001
Diabetes	1.12 (1.02–1.22)	0.01
Ischemic heart disease	1.12 (1.02–1.23)	0.02
Heart failure	1.12 (0.98–1.29)	0.09
Charlson comorbidity index
0	1.00
1	1.41 (1.27–1.58)	5×10⁻¹⁰
≥2	1.22 (1.08–1.38)	0.002
FEV₁% (per 10% decrease)	1.15 (1.12–1.17)	5×10⁻³⁶
FEV₁/FVC (per 10% decrease)	1.17 (1.13–1.22)	4×10⁻¹⁹
Blood eosinophilia
Noncurrent smokers	1.43 (1.25–1.64)	7×10⁻⁷
Current smokers	0.89 (0.70–1.12)	0.31

Note: Noncurrent smokers included ex-smokers and never smokers.

Abbreviations: CI, confidence interval; FEV1/FVC, forced expiratory volume in 1 second/forced vital capacity; GERD, gastroesophageal reflux disease.

The final multivariable model contained eleven variables, of which the number of exacerbations in the preceding year had the strongest association. Most other variables were associated with relatively modest odds ratios (ORs) (Table 4). The overall C statistic for the model was 0.751 (95% CI 0.742–0.761) (Figure 2). The Hosmer–Lemeshow test had a P-value of 0.30, suggesting no significant departures from goodness of fit. The model and a patient example are included in the Supplementary material.

Table 4

Significant multivariable predictors of two or more COPD exacerbations in the outcome year in the total population data set (N=16,565)

Covariate	Odds ratio (95% CI)
Exacerbations in the baseline year
0	1.00
1	2.42 (2.18–2.69)
2	4.39 (3.89–4.95)
3	7.28 (6.25–8.48)
≥4	17.83 (15.12–21.03)
FEV₁% predicted (per 10% decrease)	1.10 (1.07–1.12)
Age (per 10 years)	1.43 (0.92–2.23)
Age2 (per 10 years)	0.97 (0.93–1.00)
Height (per 10 cm)	0.89 (0.85–0.93)
Eosinophilia in noncurrent smokers	1.29 (1.10–1.51)
Asthma	1.34 (1.23–1.46)
Nonallergic rhinitis	1.35 (1.15–1.59)
Nasal polyps	1.39 (1.09–1.78)
Ischemic heart disease	1.12 (1.01–1.25)
Anxiety or depression	1.11 (1.02–1.22)
GERD	1.18 (1.05–1.34)
Model C statistic (95% CI)	0.751 (0.742–0.761)

Note: Noncurrent smokers included ex-smokers and never smokers.

Abbreviations: CI, confidence interval; FEV1, forced expiratory volume in 1 second; GERD, gastroesophageal reflux disease.

Figure 2

Calibration plot of observed versus predicted risk using the full developmental model (N=16,565).

The model developed using the asthma-free subset (subpopulation 1) was very similar to that for the full population (with the obvious exception of asthma) and had similar ORs (Table 5). The smaller subset of patients for whom CAT questionnaire data were available (subpopulation 2) also produced a similar model, although several variables important to the full data set were no longer retained in the subpopulation 2 model. This latter model included two additional variables (CAT score and female sex) not in the full model, but including these variables did not appreciably alter the C statistic for the full model. Age was a much stronger risk factor in subpopulation 2 than in the other models (Table 5).

Table 5

Significant multivariable predictors of two or more COPD exacerbations in the outcome year among subpopulations

Covariate	OR (95% CI)
Covariate	Subpopulation 1 (no asthma ever) (N=9,393)	Subpopulation 2 (with symptom data) (N=3,713)
Exacerbations in the baseline year
0	1.00	1.00
1	2.34 (2.02–2.71)	2.17 (1.73–2.71)
2	4.46 (3.75–5.30)	4.26 (3.30–5.51)
3	7.86 (6.31–9.79)	6.49 (4.71–8.93)
≥4	18.18 (14.21–23.26)	15.11 (10.57–21.59)
FEV₁% predicted (per 10% decrease)	1.09 (1.06–1.14)	1.05 (1.00–1.11)
Age (per 10 years)	1.59 (0.81–3.11)	5.45 (1.77–16.78)
Age2 (per 10 years)	0.96 (0.91–1.01)	0.89 (0.82–0.96)
Height (per 10 cm)	0.86 (0.80–0.91)	NI
Eosinophilia in noncurrent smokers	1.41 (1.11–1.79)	1.43 (1.04–1.98)
Asthma	NI	1.19 (1.00–1.43)
Nonallergic rhinitis	1.45 (1.14–1.83)	NI
Nasal polyps	1.52 (1.00–2.32)	1.95 (1.18–3.20)
Heart failure	1.30 (1.06–1.60)	NI
Diabetes	1.11 (0.97–1.28)	NI
Anxiety or depression	1.17 (1.03–1.33)	NI
GERD	1.22 (1.02–1.46)	NI
Female sex	NI	1.31 (1.09–1.57)
CAT score (per 10 units)	NI	1.28 (1.15–1.42)
Model C statistic	0.742 (0.728–0.756)	0.745 (0.724–0.766)

Note: Noncurrent smokers included ex-smokers and never smokers.

Abbreviations: CAT, COPD Assessment Test; CI, confidence interval; FEV1, forced expiratory volume in 1 second; GERD, gastroesophageal reflux disease; NI, not included in the model; OR, odds ratio.

Summary measures of internal validity were unremarkable and suggested adequate fit and predictive ability for the full-population model. Applying the full-population model to the validation data set (N=2,713; baseline characteristics in Table S2) resulted in a C statistic of 0.735 (95% CI 0.713–0.757), suggesting good external validity within the validation cohort (Figure 3).

Figure 3

Calibration plot (25 groups of 108–109 observations) of the observed versus predicted risk after applying the model to the validation cohort (N=2,713).

Comparison of the model with other indices

The DOSE index score (Table 1) and GOLD group categorization were determined using the mMRC score for 3,558 patients with available data (Table S1). The C statistic (95% CI) for a model using the DOSE index was 0.641 (0.617–0.664) and that using the GOLD groups was 0.644 (0.622–0.666) as compared with 0.751 (0.742–0.761) for our full-population model.

Discussion

Using a large database of routinely collected electronic health records from patients with COPD in the UK, we developed and validated a model incorporating eleven variables that performed well in predicting two or more COPD exacerbations in the subsequent year (C statistic of 0.751). Sensitivity analyses in the subpopulations with no asthma ever recorded (C statistic 0.742) and with patient-recorded questionnaire data (C statistic 0.745) supported the main results. The frequency of exacerbations in the previous year was the major predictor of future exacerbation risk. Our findings provide evidence that routinely collected health care data can be used to predict frequent COPD exacerbations. Moreover, our model performed better for predicting COPD exacerbations when applied to our heterogeneous study population than models using the DOSE index or GOLD groups calculated using the mMRC score. Many other predictive studies have focused on risk of death or hospitalization, often among patients with severe COPD.13,15,17,19–21 A strength of the current study is the inclusion of all individuals with COPD in a general population and all subsequent exacerbations, regardless of whether the exacerbation required hospitalization. Exacerbation rates were relatively low in the study, with >50% of patients having no acute exacerbation in either the baseline or the outcome years, possibly a result of the broadly inclusive eligibility criteria that produced a distribution of COPD severities, from mild to very severe, within the study population;27 primary care COPD populations are recognized as having lower rates of exacerbations than patients enrolled in clinical trials.28 Moreover, the relatively low rate of exacerbations seen in this study, as compared with past research suggesting mean annual rates of 0.8 in mild COPD and 1.2–2.0 in moderate to very severe COPD,27 may be a reflection of changes in recent years, including better identification of milder COPD, with spirometry being broadly undertaken, and more focused COPD management in UK primary care.29 A value close to 1 for the C statistic indicates that a model has excellent discriminatory power.30 While a C statistic of 0.75 for our model indicates modest predictive ability, the results of the current study compare favorably with those of the earlier studies focused on predicting exacerbations. Miravitlles et al31 performed a cross-sectional assessment of frequent (≥1 per year) exacerbation occurrence among 627 ambulatory patients with COPD. Significant covariates included age, FEV1, and chronic mucus hypersecretion, but none of these were particularly strong risk factors (OR for hypersecretion 1.54), and the predictive ability of the model was marginal (C statistic 0.6). A substantial number of prospective studies have also attempted to predict COPD exacerbations. Niewoehner et al18 followed 1,829 veterans for 6 months to assess the risk of either COPD exacerbation or COPD hospitalization. Significant independent predictors for exacerbation included older age, FEV1, productive cough, previous hospitalization, and medications used previously. However, all patients had moderate to severe COPD, and the short follow-up may have limited the predictive ability of the model, which was itself inferior (C statistic 0.67). Hurst et al32 followed 2,138 patients with COPD for 3 years to assess risk of COPD exacerbations requiring antibiotics, corticosteroids, or hospitalization. Significant multivariable predictors of two or more exacerbations included previous exacerbation, FEV1, history of GERD, increased white cell count, and respiratory health status. Predictive ability of the full model was not reported. Bertens et al16 followed 243 patients with COPD for 24 months to assess the risk of exacerbation occurrence. Their multivariable model identified FEV1, smoking pack-years, history of vascular disease, and previous exacerbations (as a dichotomous variable, yes/no) as significant predictors. The model C statistic (0.75) suggested good predictive ability, but the small sample raises issues as to general applicability of these findings, which are mirrored by a marginal external validity (validation C statistic 0.66). Bowler et al33 followed 3,804 patients with COPD for an average of 3 years, identifying ten significant exacerbation predictors, especially FEV1, St George severity score, and exacerbation in the previous year. However, none were strong risk factors (OR 1.19 per exacerbation in the previous year), and information on overall predictive ability and external validation was not provided. The largest prospective study conducted to date included ~59,000 patients with COPD from a primary care database and followed them for 1 year. Multivariable logistic regression identified many significant predictors of two or more exacerbations, including previous exacerbation, airflow, level of dyspnea, female sex, and various comorbidities (eg, heart failure, renal disease, anxiety, and asthma). Unfortunately, predictive ability of the model was not reported.34 Most recently, Make et al35 followed 3,141 patients from several drug trials with a history of ≥1 COPD exacerbation in the previous year in order to predict the 6-month risk of an exacerbation requiring corticosteroids or emergency/hospital visit. Independent predictors from a multivariable model included number of maintenance medications, inhaler use, exacerbations in the previous year, FEV1/FVC ratio, female sex, and respiratory health status. The C statistic of 0.67 suggested only moderate predictive ability. Our study identified the number of exacerbations in the previous year as a significant predictive factor, which was borne out by most of the studies cited earlier. Bowler et al33 and Make et al35 reported only moderately increased risk for previous exacerbation, although the latter study used one previous exacerbation as the reference value (rather than none) in patients with at least one previous exacerbation. However, Bertens et al16 reported an OR of 5.07 (95% CI 2.55–10.07) for at least one exacerbation in the previous year, and Hurst et al32 reported an OR of 5.72 (95% CI 4.47–7.31) for this same measure. Müllerová et al34 reported an exposure–response relationship, with one previous exacerbation associated with an OR of 3.31 (95% CI 3.12–3.51) and two or more associated with an OR of 13.64 (12.67–14.68). The findings from these latter three studies are consistent with those reported in the current study. Furthermore, our results and those reported by Müllerová et al34 suggest that risk increases with increasing number of previous exacerbations, highlighting the importance of obtaining detailed information on this variable. Neither our multivariable model nor most other predictive models have identified an independent association between smoking and frequent exacerbations in a broad population of patients with COPD. This suggests that the impact of this variable is dependent on inclusion criteria frequently applied to select patients with COPD for controlled trials. Any association with smoking may also be ameliorated by a so-called “healthy smoker effect”, in which those with poorer lung function or frequent exacerbations tend to quit smoking, whereas less severely affected patients do not.18 This suggests that there is a phenotypical propensity to frequent exacerbation that is somewhat independent of other risk factors.32,34 A unique finding of the current study was the significance of blood eosinophilia as an independent predictor. This is consistent with research showing that eosinophils are present in 20%–40% of sputum samples from patients with stable COPD and that airway eosinophilia increases during exacerbation episodes.36 This variable had only a moderate OR of ~1.3, and only among patients not currently smoking, so there may be somewhat limited applicability for predicting exacerbations in general. However, the relatively weak association with eosinophilia may reflect more active COPD management among this population compared with others. We conducted a post hoc sensitivity analysis to evaluate whether the association between blood eosinophilia and the risk of two or more exacerbations would be relevantly different after excluding the 17% (n=2,785) of patients with blood eosinophil counts measured at an exacerbation. The results were not relevantly different after excluding these measurements from the analyses (OR 1.26 [95% CI 1.04–1.52] vs 1.29 [1.10–1.51] for the full population). We also identified CAT score as a significant predictor for subpopulation 2, which to the best of our knowledge has not been reported elsewhere. Indices of COPD severity such as the BODE index have been significantly linked to future exacerbations,9,37 but these are more complex than the simple CAT survey. A major strength of the current study is the sample size of >16,000 patients, which is much larger than most other predictive studies. In fact, the population in the current study was more than fivefold larger than all but that of Müllerová et al.34 However, it is likely that some of the variables retained in our final model reached significance primarily because of this large sample rather than because of strong biological importance. Müllerová et al34 similarly reported relatively weak OR for many of their predictors but without information on the overall model predictability or the predictive power of individual variables. We identified a number of comorbid predictors, including heart disease, GERD, and other respiratory conditions. Conditions such as GERD and heart disease have been identified in other follow-up studies,32–34 whereas risk factors such as nasal polyps and rhinitis appear unique to the current study. These findings highlight the potentially complex relationship that may exist between COPD and other conditions. Although these comorbidities were not strong risk factors (OR 1.1–1.4), their importance would likely improve in a model of newly diagnosed COPD patients without history of exacerbation. Of course, some of these significant predictors may have been driven by the large sample size. Asthma is recognized as an important comorbid condition that increases disability and risk of exacerbation among those with COPD.38,39 Asthma was also a significant independent predictor in the current study. However, the OR for asthma was of similar magnitude as that for several other variables, and excluding patients with any record of asthma did not appreciably change the model. Such findings may reflect an overuse of the asthma “label”, especially in the past. We applied a very sensitive definition to select the subpopulation of patients without overlapping asthma, namely, the recording of any Read code that could indicate the general practitioner (GP) was considering asthma, which explains the relatively high proportion of patients reported with COPD and concomitant asthma. We cannot exclude the possibility of some patients being wrongly diagnosed with asthma by the GP or, conversely, of some patients having undiagnosed asthma. Differences between the current study and previous clinical research may partially reflect the populations studied. Clinical trials typically enroll restricted populations with more severe disease, often with a greater frequency of exacerbation at baseline.16,35 The current population better reflects the broader landscape of patients with COPD treated in routine primary care practice in the UK. Nonetheless, we cannot rule out the possibility of selection bias. For example, we required an FEV1/FVC ratio of <0.7 for study inclusion. Because spirometry is not universally available in primary care settings, 11,658 of 37,224 (31%) otherwise potentially eligible patients did not have spirometry results and hence were excluded. With regard to the requirement for blood eosinophil count, full blood count measurements are very common among patients with COPD and were available for 86% of patients evaluated (43,436 of 50,716 patients with COPD and no other chronic respiratory disease). Our aim was to evaluate the predictive value of routinely collected data. Although other parameters of eosinophilic inflammation, such as sputum eosinophils or exhaled nitric oxide, may have improved the predictive performance of the model, these measurements are generally unavailable in general practice and hence were not included in our model. The current study used electronic records from primary care providers, which are a readily available data source. However, it is possible that outcomes such as hospital and emergency admissions may be underrepresented in the data. The current study was validated both internally and externally, with good concordance when the multivariable model was applied to an external sample. However, this does not guarantee universal generalizability given that our external sample arose from the same patient population as the sample used for model development. Further study is needed to validate the model in other patient populations as well.

Conclusion

Routine, electronic medical record data available from most GP clinical systems can be used to identify patients with COPD at risk of two or more exacerbations the subsequent year. Our model could be used to profile patients with COPD, or to underpin decision support tools, in general practice. The number of exacerbations in the preceding year showed a strong exposure–response relationship, highlighting the importance of detailed information on patients’ exacerbation history. The findings also suggest that CAT score and eosinophilia may be convenient markers of future exacerbation, at least in some populations.

Supplementary materials

The formula: Risk of ≥2 COPD exacerbations within the next 12 months =1/(1+ exp(−0.7306+0.8840×1 previous exacerbation in last 12 months +1.4786×2 previous exacerbations in last 12 months +1.9857×3 previous exacerbations in last 12 months +2.8811× ≥4 previous exacerbations in last 12 months −0.0093× FEV1% predicted +0.0360× age −0.0004× age2 −1.2194× height (in meter) +0.2518× (blood eosinophil count ≥400/μL in a patient who is not currently smoking) +0.2953× any evidence of asthma ×0.3018× history of nonallergic rhinitis +0.3298× history of nasal polyps +0.1164× history of ischemic heart disease +0.1071× history of anxiety of depression +0.1689× history of GERD). Example: a person aged 70 years currently smoking with height of 1.80 m and FEV1 of 60% of predicted, without a history of previous exacerbations in the last 12 months and no history of comorbidities, has a calculated risk of 0.064 (6.4%) of two or more exacerbations in the next year. Baseline mMRC scores, DOSE index scores, and GOLD groups based on mMRC score Notes: mMRC data (hence DOSE index scores and GOLD category based on mMRC)1–3 were available for 3,558 (95.8%) of 3,713 patients in subpopulation 2. Data are presented as n (% of 3,558). GOLD categories were calculated using mMRC, exacerbations, and FEV1. Abbreviations: DOSE, dyspnea, airway obstruction, smoking status, exacerbations; GOLD, Global Initiative for Chronic Obstructive Lung Disease; mMRC, modified Medical Research Council; FEV1, forced expiratory volume in 1 second. Baseline characteristics of the validation cohort Abbreviations: IQR, interquartile range; SD, standard deviation; FEV1/FVC, forced expiratory volume in 1 second/forced vital capacity; GOLD, Global Initiative for Chronic Obstructive Lung Disease; GERD, gastroesophageal reflux disease.

Table S1

Baseline mMRC scores, DOSE index scores, and GOLD groups based on mMRC score

	Subpopulation 2 (with symptom data) (N=3,558)
mMRC score
0	558 (15.7)
1	1,404 (39.5)
2	802 (22.5)
3	539 (15.1)
4	255 (7.2)
DOSE index score
0	846 (23.8)
1	1,070 (30.1)
2	711 (20.0)
3	469 (13.2)
4	288 (8.1)
>4	174 (4.9)
GOLD category
A	1,213 (34.1)
B	661 (18.6)
C	749 (21.1)
D	935 (26.3)

Notes: mMRC data (hence DOSE index scores and GOLD category based on mMRC)1–3 were available for 3,558 (95.8%) of 3,713 patients in subpopulation 2. Data are presented as n (% of 3,558). GOLD categories were calculated using mMRC, exacerbations, and FEV1.

Abbreviations: DOSE, dyspnea, airway obstruction, smoking status, exacerbations; GOLD, Global Initiative for Chronic Obstructive Lung Disease; mMRC, modified Medical Research Council; FEV1, forced expiratory volume in 1 second.

Table S2

Baseline characteristics of the validation cohort

Characteristics	Total population (N=2,713)
Female sex, n (%)	1,237 (45.6)
Age, median (IQR)	71 (64–79)
Weight (kg), mean (SD)	74.0 (18.2)
Height (m), mean (SD)	1.67 (0.10)
Body mass index, mean (SD)	26.6 (5.7)
Body mass index, n (%)
Underweight (<18.5 kg/m²)	157 (5.8)
Normal (≥18.5 and <25 kg/m²)	1,006 (37.1)
Overweight (≥25 and <30 kg/m²)	904 (33.3)
Obese (≥30 kg/m²)	646 (23.8)
Smoking status, n (%)
Never smoker	341 (12.6)
Ex-smoker	1,365 (50.3)
Current smoker	1,007 (37.1)
FEV₁/FVC ratio, mean (SD)	54.5 (11.0)
FEV₁% predicted, mean (SD)	57.4 (19.9)
GOLD airflow limitation (FEV₁% predicted), n (%)
1: mild (≥80%)	331 (12.2)
2: moderate (50%–79%)	1,345 (49.6)
3: severe (30%–49%)	830 (30.6)
4: very severe (<30%)	207 (7.6)
Eosinophilia (≥500 cells/μL), n (%)
Eosinophilia – all patients	252 (9.3)
Eosinophilia – noncurrent smokers	199 (7.3)
Asthma, n (%)	1,335 (49.2)
Eczema, n (%)	560 (20.6)
Rhinitis, n (%)
Allergic	225 (8.3)
Nonallergic	171 (6.3)
Nasal polyps, n (%)	59 (2.2)
Ischemic heart disease, n (%)	655 (24.1)
Anxiety or depression, n (%)	908 (33.5)
GERD	358 (13.2)

Abbreviations: IQR, interquartile range; SD, standard deviation; FEV1/FVC, forced expiratory volume in 1 second/forced vital capacity; GOLD, Global Initiative for Chronic Obstructive Lung Disease; GERD, gastroesophageal reflux disease.

35 in total

1. Factors associated with increased risk of exacerbation and hospital admission in a cohort of ambulatory COPD patients: a multiple logistic regression analysis. The EOLO Study Group.

Authors: M Miravitlles; T Guerrero; C Mayordomo; L Sánchez-Agudo; F Nicolau; J L Segú
Journal: Respiration Date: 2000 Impact factor: 3.580

2. Usefulness of the Medical Research Council (MRC) dyspnoea scale as a measure of disability in patients with chronic obstructive pulmonary disease.

Authors: J C Bestall; E A Paul; R Garrod; R Garnham; P W Jones; J A Wedzicha
Journal: Thorax Date: 1999-07 Impact factor: 9.139

3. BODE index and GOLD staging as predictors of 1-year exacerbation risk in chronic obstructive pulmonary disease.

Authors: Márcia Maria Faganello; Suzana Erico Tanni; Fernanda Figueirôa Sanchez; Nilva Regina Gelamo Pelegrino; Paulo Adolfo Lucheta; Irma Godoy
Journal: Am J Med Sci Date: 2010-01 Impact factor: 2.378

4. Development and first validation of the COPD Assessment Test.

Authors: P W Jones; G Harding; P Berry; I Wiklund; W-H Chen; N Kline Leidy
Journal: Eur Respir J Date: 2009-09 Impact factor: 16.671

Review 5. Association between lung function and exacerbation frequency in patients with COPD.

Authors: Martine Hoogendoorn; Talitha L Feenstra; Rudolf T Hoogenveen; Maiwenn Al; Maureen Rutten-van Mölken
Journal: Int J Chron Obstruct Pulmon Dis Date: 2010-12-09

6. The clinical features of the overlap between COPD and asthma.

Authors: Megan Hardin; Edwin K Silverman; R Graham Barr; Nadia N Hansel; Joyce D Schroeder; Barry J Make; James D Crapo; Craig P Hersh
Journal: Respir Res Date: 2011-09-27

7. A score to predict short-term risk of COPD exacerbations (SCOPEX).

Authors: Barry J Make; Göran Eriksson; Peter M Calverley; Christine R Jenkins; Dirkje S Postma; Stefan Peterson; Ollie Östlund; Antonio Anzueto
Journal: Int J Chron Obstruct Pulmon Dis Date: 2015-01-27

8. Risk factors for acute exacerbations of COPD in a primary care population: a retrospective observational cohort study.

Authors: Hana Müllerová; Amit Shukla; Adam Hawkins; Jennifer Quint
Journal: BMJ Open Date: 2014-12-18 Impact factor: 2.692

9. Development and validation of a model to predict the risk of exacerbations in chronic obstructive pulmonary disease.

Authors: Loes C M Bertens; Johannes B Reitsma; Karel G M Moons; Yvonne van Mourik; Jan Willem J Lammers; Berna D L Broekhuizen; Arno W Hoes; Frans H Rutten
Journal: Int J Chron Obstruct Pulmon Dis Date: 2013-10-10

10. Primary care COPD patients compared with large pharmaceutically-sponsored COPD studies: an UNLOCK validation study.

Authors: Annemarije L Kruis; Björn Ställberg; Rupert C M Jones; Ioanna G Tsiligianni; Karin Lisspers; Thys van der Molen; Jan Willem H Kocks; Niels H Chavannes
Journal: PLoS One Date: 2014-03-05 Impact factor: 3.240

31 in total

1. Cost-effectiveness of indacaterol/glycopyrronium in comparison with salmeterol/fluticasone combination for patients with moderate-to-severe chronic obstructive pulmonary disease: a LANTERN population analysis from Singapore.

Authors: Augustine Tee; Wai Leng Chow; Colin Burke; Basavarajaiah Guruprasad
Journal: Singapore Med J Date: 2018-03-16 Impact factor: 1.858

2. Modelling the Cost-Effectiveness of Indacaterol/Glycopyrronium versus Salmeterol/Fluticasone Using a Novel Markov Exacerbation-Based Approach.

Authors: Bhavesh Lakhotia; Ronan Mahon; Florian S Gutzwiller; Andriy Danyliv; Ivan Nikolaev; Praveen Thokala
Journal: Int J Chron Obstruct Pulmon Dis Date: 2020-04-16

3. MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records.

Authors: Xi Sheryl Zhang; Fengyi Tang; Hiroko H Dodge; Jiayu Zhou; Fei Wang
Journal: KDD Date: 2019-08

4. Ambient air particulate matter (PM₁₀) satellite monitoring and respiratory health effects assessment.

Authors: Mahssa Mohebbichamkhorami; Mohsen Arbabi; Mohsen Mirzaei; Ali Ahmadi; Mohammad Sadegh Hassanvand; Hamid Rouhi
Journal: J Environ Health Sci Eng Date: 2020-10-03

5. The association between previous and future severe exacerbations of chronic obstructive pulmonary disease: Updating the literature using robust statistical methodology.

Authors: Mohsen Sadatsafavi; Hui Xie; Mahyar Etminan; Kate Johnson; J Mark FitzGerald
Journal: PLoS One Date: 2018-01-19 Impact factor: 3.240

6. Use of electronic medical records and biomarkers to manage risk and resource efficiencies.

Authors: Dermot Ryan; John Blakey; Alison Chisholm; David Price; Mike Thomas; Björn Ställberg; Karin Lisspers; Janwillem W H Kocks
Journal: Eur Clin Respir J Date: 2017-03-14

7. Rural Residence and Chronic Obstructive Pulmonary Disease Exacerbations. Analysis of the SPIROMICS Cohort.

Authors: Robert M Burkes; Amanda J Gassett; Agathe S Ceppe; Wayne Anderson; Wanda K O'Neal; Prescott G Woodruff; Jerry A Krishnan; R Graham Barr; MeiLan K Han; Fernando J Martinez; Alejandro P Comellas; Allison A Lambert; Joel D Kaufman; Mark T Dransfield; J Michael Wells; Richard E Kanner; Robert Paine; Eugene R Bleecker; Laura M Paulin; Nadia N Hansel; M Bradley Drummond
Journal: Ann Am Thorac Soc Date: 2018-07

8. Comparison of the clinical characteristics and treatment outcomes of patients requiring hospital admission to treat eosinophilic and neutrophilic exacerbations of COPD.

Authors: Hye Seon Kang; Chin Kook Rhee; Sung Kyoung Kim; Jin Woo Kim; Sang Haak Lee; Hyung Kyu Yoon; Joong Hyun Ahn; Yong Hyun Kim
Journal: Int J Chron Obstruct Pulmon Dis Date: 2016-10-03

9. A prospective, observational cohort study of the seasonal dynamics of airway pathogens in the aetiology of exacerbations in COPD.

Authors: Tom M A Wilkinson; Emmanuel Aris; Simon Bourne; Stuart C Clarke; Mathieu Peeters; Thierry G Pascal; Sonia Schoonbroodt; Andrew C Tuck; Viktoriya Kim; Kristoffer Ostridge; Karl J Staples; Nicholas Williams; Anthony Williams; Stephen Wootton; Jeanne-Marie Devaster
Journal: Thorax Date: 2017-04-21 Impact factor: 9.139

10. Prediction models for exacerbations in different COPD patient populations: comparing results of five large data sources.

Authors: Martine Hoogendoorn; Talitha L Feenstra; Melinde Boland; Andrew H Briggs; Sixten Borg; Sven-Arne Jansson; Nancy A Risebrough; Julia F Slejko; Maureen Pmh Rutten-van Mölken
Journal: Int J Chron Obstruct Pulmon Dis Date: 2017-11-01