Literature DB >> 34857526

Ten-year prediction model for post-bronchodilator airflow obstruction and early detection of COPD: development and validation in two middle-aged population-based cohorts.

Jennifer L Perret1,2,3, Don Vicendese4,5, Koen Simons4, Debbie L Jarvis6, Adrian J Lowe4, Caroline J Lodge4, Dinh S Bui4, Daniel Tan4, John A Burgess4, Bircan Erbas7, Adrian Bickerstaffe4, Kerry Hancock8, Bruce R Thompson9, Garun S Hamilton10,11, Robert Adams12, Geza P Benke13, Paul S Thomas14, Peter Frith15, Christine F McDonald2,3, Tony Blakely4, Michael J Abramson13, E Haydn Walters4,16, Cosetta Minelli6, Shyamali C Dharmage4.   

Abstract

BACKGROUND: Classifying individuals at high chronic obstructive pulmonary disease (COPD)-risk creates opportunities for early COPD detection and active intervention.
OBJECTIVE: To develop and validate a statistical model to predict 10-year probabilities of COPD defined by post-bronchodilator airflow obstruction (post-BD-AO; forced expiratory volume in 1 s/forced vital capacity<5th percentile).
SETTING: General Caucasian populations from Australia and Europe, 10 and 27 centres, respectively. PARTICIPANTS: For the development cohort, questionnaire data on respiratory symptoms, smoking, asthma, occupation and participant sex were from the Tasmanian Longitudinal Health Study (TAHS) participants at age 41-45 years (n=5729) who did not have self-reported COPD/emphysema at baseline but had post-BD spirometry and smoking status at age 51-55 years (n=2407). The validation cohort comprised participants from the European Community Respiratory Health Survey (ECRHS) II and III (n=5970), restricted to those of age 40-49 and 50-59 with complete questionnaire and spirometry/smoking data, respectively (n=1407). STATISTICAL
METHOD: Risk-prediction models were developed using randomForest then externally validated.
RESULTS: Area under the receiver operating characteristic curve (AUCROC) of the final model was 80.8% (95% CI 80.0% to 81.6%), sensitivity 80.3% (77.7% to 82.9%), specificity 69.1% (68.7% to 69.5%), positive predictive value (PPV) 11.1% (10.3% to 11.9%) and negative predictive value (NPV) 98.7% (98.5% to 98.9%). The external validation was fair (AUCROC 75.6%), with the PPV increasing to 17.9% and NPV still 97.5% for adults aged 40-49 years with ≥1 respiratory symptom. To illustrate the model output using hypothetical case scenarios, a 43-year-old female unskilled worker who smoked 20 cigarettes/day for 30 years had a 27% predicted probability for post-BD-AO at age 53 if she continued to smoke. The predicted risk was 42% if she had coexistent active asthma, but only 4.5% if she had quit after age 43.
CONCLUSION: This novel and validated risk-prediction model could identify adults aged in their 40s at high 10-year COPD-risk in the general population with potential to facilitate active monitoring/intervention in predicted 'COPD cases' at a much earlier age. © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities:  

Keywords:  COPD epidemiology; clinical epidemiology

Mesh:

Substances:

Year:  2021        PMID: 34857526      PMCID: PMC8640628          DOI: 10.1136/bmjresp-2021-001138

Source DB:  PubMed          Journal:  BMJ Open Respir Res        ISSN: 2052-4439


How can we classify individuals at high chronic obstructive pulmonary disease (COPD)-risk to create opportunities for early COPD detection before too much lung damage has occurred? Using information that is readily accessible from patients and a machine learning methodology, we have developed and validated a COPD risk-prediction model with good discriminatory ability from Australian and European general populations aged in their 40s to predict post-bronchodilator airflow obstruction approximately 10 years later. This approach can classify individuals when aged from their 40s but at high or very high COPD-risk who could benefit from serial spirometry; we strengthen the rationale for smoking cessation strategies in middle-age; and advance available precision medicine.

Introduction

Chronic obstructive pulmonary disease (COPD) ranks among the highest causes of potentially preventable hospitalisations,1 2 yet there is a lack of action to generate high-quality evidence to support the pre-emptive identification and/or management of individuals most at-risk. A risk-prediction approach like what is used to manage modifiable risk factors for cardiovascular disease and type II diabetes,3 4 could also be useful for COPD which is multifactorial and typically features a gradual progression of airflow obstruction that can be established by middle-age. Evaluating COPD-risk for adults aged in their 40s represents an important time window, as selected screening of high-risk individuals using spirometry could confirm disease well before they usually seek medical attention.5 Although only one study has studied the cost-effectiveness of actively finding COPD cases and found systematic case-finding could be useful if targeting older smokers,6 theoretically, appropriate and early individualised interventions have potential to favourably influence poorer lung function trajectories,7 8 and thereby slow or even prevent COPD onset. In the usual clinical scenario where healthcare professionals see patients prior to testing,9 a risk-prediction model can have both diagnostic and ‘prognostic’ features as it would cover current and onward risks and assist in determining both the need for further tests and prognosis. Previous attempts to develop COPD risk-prediction models have been limited and include: administrative databases, which had inaccurate smoking and COPD information; case–control designs, which are prone to selection bias; and/or stepwise regression statistical models, which are inclined to overfitting.10 11 To date there has been only one externally validated risk-prediction tool that used longitudinal data but this was based on several clinical test results that would generally be unavailable to treating clinicians and their patients at the time of initial assessment.12 Furthermore, no previous risk-prediction model has incorporated changes in smoking status prior to lung function measurement to contrast continuing smokers with quitters, which would indicate the potential prospective impact of subsequent smoking behaviour. Using data from two of the largest respiratory cohorts worldwide, the Tasmanian Longitudinal Health Study (TAHS) and European Community Respiratory Health Survey (ECRHS), we aimed to develop and validate such a COPD risk-prediction model for middle-aged adults using a ‘real world’ scenario in a general population setting.

Methods

The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis prediction model development and validation checklist,13 and 2020 Editors’ prediction framework on prediction modelling were followed.11

Study design: development cohort

Our sample included participants from the whole-of-population TAHS cohort, born in 1961, first studied in 1968 (n=8583) and followed into middle-age (figure 1).14 15 At mean age 43 years, baseline questionnaire data from 5729 (67%) respondents were collected (online supplemental Methods E1). Approximately 10 years later, this original cohort was retraced and invited to participate in the 2012–2016 study (n=6128). Of 3609 respondents (58.9%), 2719 underwent pre-bronchodilator/post-bronchodilator (BD) spirometry (75.3%). Participants were those who had postal survey data plus 10-year smoking status/spirometry data (n=2407). Participants who reported doctor-diagnosed COPD and/or emphysema at baseline were excluded (n=15).
Figure 1

Study flow diagram of participation and non-participation in the development cohort, Tasmanian Longitudinal Health Study 1968–2016. Percentages for non-participation at subsequent follow-ups relate the proportion from the original 1968 survey. *Numbers may overlap. BD, bronchodilator; COPD, chronic obstructive lung disease.

Study flow diagram of participation and non-participation in the development cohort, Tasmanian Longitudinal Health Study 1968–2016. Percentages for non-participation at subsequent follow-ups relate the proportion from the original 1968 survey. *Numbers may overlap. BD, bronchodilator; COPD, chronic obstructive lung disease.

Study design: validation cohort

ECRHS, a collaborative study of 29 centres within 14 mostly European countries, first recruited 17 250 20–44-year-old adults in the general community between 1992 and 1994 (ECRHS I),16 details of which are available at https://wwwecrhsorg/. Participants of ECRHS II completed a detailed questionnaire, work history calendar (n=9645) and pre-BD spirometry (1998–2004, n=8033, age range 26–56). ECRHS III (2008–2012) was conducted in 27 centres in which participants underwent a detailed administered questionnaire and pre-BD/post-BD spirometry (n=5970, age range 38–67). The validation sample consisted of those persons aged in their 40s who participated in ECRHS II and subsequently underwent post-BD spirometry at ECRHS III in their 50s with complete predictor data (n=1407, online supplemental figure E1).

Outcome data collection and definition

Details on lung function data collection using international standards17 and reference values18 are outlined in online supplemental Methods E3. Post-bronchodilator airflow obstruction (post-BD-AO), referred to as spirometry consistent with COPD, was defined by forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC)<5th percentile of normal predicted values following inhaled BD administered via spacer (ie, z-score<–1.645 SD).18 Using this FEV1/FVC criterion, mild-to-moderately severe post-BD-AO was defined by post-BD FEV1 ≥50% predicted, and severe-to-very severe post-BD-AO by <50% predicted.19

Prediction model development and validation

Predictors

A pragmatic approach to selecting the predictor variables was adopted through using information which could be reasonably recalled in middle-age, practical to collect in primary care and feasibly harmonised with ECRHS data (online supplemental table E1, Method E2). The final input variables included: sex; current respiratory symptoms (wheezing, cough, sputum, breathlessness on exertion, chest tightness); smoking (current, duration, intensity, age-of-onset); asthma (asthma-ever, current adult asthma by age-of-onset) and socioeconomic status (occupational class, online supplemental Methods E1). Smoking at baseline and 10-year follow-up was expressed by a four-level variable: never-smoker; ex-smoker who quit before baseline; current smoker at baseline who quit before follow-up; or current smoker at follow-up. Baseline spirometry was not included as a predictor in the final model as post-BD spirometry was only collected for a subset of TAHS participants, enriched for asthma and symptoms (n=897).

Model development

Using R statistical software, we adopted randomForest,20 a flexible, non-parametric and semi-automated machine learning method that considered all possible predictors and their interactions (online supplemental Methods E4a, table E2). The model was built on four randomly selected subsets of the data (80% of 2407 observations) and tested on a distinct fifth subset (20%, ie, remaining observations), optimally tuned and internally validated using a fivefold cross-validation scheme and this process was replicated 25 times. The final model was chosen based on the maximum area under the receiver operator characteristic curve (AUCROC, that is, its ability to discriminate between participants with and without post-BD-AO), followed by maximal sensitivity. Two thresholds were used to define a positive outcome:>50% probability of being a ‘COPD case’; and the “optimal” threshold as defined by the Youden index.21 Imputation of missing data was performed using a single imputation method integral to randomForest. More detailed statistical methods are reported in online supplemental Methods E4 (online supplemental sections E4a–g, Figures E2–4).

Hypothetical cases, individualised predictions and risk classification

Using the final model, personalised predictions were calculated from different case scenarios and recalibrated using the Platt scaling method.22 Model calibration was assessed using the Hosmer-Lemeshow (HL) test that is, to assess the model’s ability to match the predictions to the observed (or actual) COPD outcomes. COPD-risk groups were defined based on the following approximations previously used in other clinical tools3 4: minimal risk if <1% predicted probability; low 1%–5% predicted probability; moderate 5%–10% predicted probability; high 10%–20% predicted probability or very high >20% predicted probability.

External calibration and discrimination

After model development, ECRHS data were used for external validation as two participant subsets: the main validation was derived from ECRHS participants with an extended age range of 40–49 years at baseline and 50–59 years when undergoing spirometry (n=1407) to broaden the model’s transportability, and this was compared with ages similar to the development cohort that is, 40–44 years and 50–54 years, respectively (n=548). The final mean (SD) of model performance metrics was extracted from bootstrapped replications (n=50) and repeated 50 times to summarise uncertainty (online supplemental Methods E4h, table E3). 23

Patient and public involvement

Patients, TAHS participants or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.

Results

TAHS and ECRHS participants

Descriptive results

Of the 2407 TAHS participants, 4.5% (n=108) fulfilled the lung function criterion for COPD at mean age 53 (table 1). Of these 108 participants, mild, moderate and moderately severe airflow obstruction was present for 106 (98%, n=91, n=11 and n=4, respectively). Post-BD-AO of any severity was present for 11.8% (n=62) of current smokers and 12.9% (n=50) of those who reported wheezing at age 43. A total of 187 (0.52%) clinical datapoints were missing in 3.8% (n=87) participants which included two cases with post-BD-AO (online supplemental table E4).
Table 1

Characteristics of participants with and without post-BD airflow obstruction in the development and validation samples

Characteristics in middle-age*Post-BD airflow obstruction aged 50s (n (%))†
Development cohort (TAHS, N=2407)‡Validation cohort (ECRHS, N=1407)§
No (n=2299)Yes (n=108)No (n=1317)Yes (n=95)
Sex (% male)1086 (49)60 (55)641 (49)53 (56)
Age (mean years (SD))§
Questionnaire42.6 (0.5)42.7 (0.6)43.8 (2.5)43.8 (2.6)
Post-BD spirometry52.7 (0.8)52.4 (0.7)55.2 (2.5)55.3 (2.5)
Post-BD spirometry at age 50s (mean (SD))
Post-BD FEV1 (L)3.33 (0.7)2.67 (0.7)3.10 (0.7)2.47 (0.7)
Post-BD FVC (L)4.16 (0.9)4.29 (1.0)3.94 (0.9)3.99 (1.0)
Post-BD FEV1/FVC (ratio)0.80 (0.05)0.63 (0.07)0.79 (0.05)0.62 (0.05)
z-score (SD)0.14 (0.7)–2.30 (0.7)–0.03 (0.8)–2.33 (0.6)
Symptoms at age 40s (n (%))
Current wheezing327 (15)52 (47)203 (15)47 (49)
Chronic cough159 (7.1)24 (22)88 (7)21 (22)
Chronic sputum130 (5.8)16 (15)78 (6)15 (16)
Breathlessness
MRC-1 (none)2026 (91)78 (72)1087 (82)69 (73)
MRC-2141 (6.3)20 (18)179 (14)19 (20)
MRC-3/466 (3.0)11 (10)51 (4)7 (7)
Chest tightness343 (15)37 (34)197 (15)32 (34)
Smoking (n (%); mean (SD); median (IQR); range†)
Never smoker1094 (49)22 (20)566 (43)25 (26)
Past smoker698 (31)24 (22)405 (31)31 (33)
Pack-years6.4 (1.7, 16)2.0 (0.3, 17)10.0 (4, 20)10.0 (5, 20)
Current smoker441 (20)63 (58)346 (26)39 (41)
Cigs per day14.0 (10)19.8 (10)14.1 (10)19.7 (11)
Duration26.0 (6)27.4 (3)26.1 (5)27.1 (4)
Age of onset16.3 (5)15.6 (3)17.5 (5)16.6 (3)
Pack-years17.4 (7, 28)27.0 (18, 38)21.9 (11, 30)31.0 (22, 39)
Current at age 50s290 (13)53 (49)234 (18)36 (38)
Quit by age 50s221 (10)16 (15)147 (11)7 (7)
Asthma at age 40s (n (%))
No asthma or wheezy breathing1459 (65)34 (31)961 (73)29 (31)
Wheezy breathing only134 (6)12 (11)166 (13)22 (23)
Self-reported asthma.
Remitted382 (17)22 (20)80 (6)12 (13)
Active, early onset88 (4)15 (14)33 (2.5)12 (13)
Active, late onset170 (8)26 (24)77 (6)20 (21)
Employment at age 40s (n (%))
Legislators, managers257 (12)12 (11)121 (9)10 (11)
Professionals474 (21)11 (10)248 (19)18 (19)
Technicians, associates263 (12)13 (12)198 (15)14 (15)
Trade workers277 (12)15 (14)106 (8)8 (8)
Clerks, services534 (24)24 (28)240 (18)19 (20)
Machine operators130 (6)9 (8)46 (4)4 (4)
Labourers, cleaners, other147 (7)15 (14)62 (5)4 (4)
House persons, other139 (6)10 (9)74 (6)5 (5)
Employed (unspecified)6 (0.3)0208 (16)§12 (13)
Non-work other6 (0.3)01 (0.1)1 (1)

*Post-BD airflow obstruction defined by post-BD FEV1/FVC<5th percentile (z-score<–1.645).

†Summary data expressed by n (%) unless by mean (SD), for example, smoking intensity/duration/start age or median (IQR), for example, pack-years. Ranges for continuous predictors: smoking intensity 0–60; duration 0–37; age-of-onset 6–41; pack-years 0–108 (ever-smokers).

‡TAHS participant numbers refer to those aged in their 50s who underwent post-BD spirometry.

§ECRHS validation of participants aged 50 to up to 60 years (validation numbers for ages 50 up to 55 years not shown). Self-reported but unspecified employment was higher in ECRHS, as current job in the work history calendar was used with some missing (online supplemental Methods E1, E2).

BD, bronchodilator; ECRHS, European Community Respiratory Health Survey; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; LLN, lower limit of normal; MRC, Medical Research Council breathlessness scale; TAHS, Tasmanian Longitudinal Health Study.

Characteristics of participants with and without post-BD airflow obstruction in the development and validation samples *Post-BD airflow obstruction defined by post-BD FEV1/FVC<5th percentile (z-score<–1.645). †Summary data expressed by n (%) unless by mean (SD), for example, smoking intensity/duration/start age or median (IQR), for example, pack-years. Ranges for continuous predictors: smoking intensity 0–60; duration 0–37; age-of-onset 6–41; pack-years 0–108 (ever-smokers). ‡TAHS participant numbers refer to those aged in their 50s who underwent post-BD spirometry. §ECRHS validation of participants aged 50 to up to 60 years (validation numbers for ages 50 up to 55 years not shown). Self-reported but unspecified employment was higher in ECRHS, as current job in the work history calendar was used with some missing (online supplemental Methods E1, E2). BD, bronchodilator; ECRHS, European Community Respiratory Health Survey; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; LLN, lower limit of normal; MRC, Medical Research Council breathlessness scale; TAHS, Tasmanian Longitudinal Health Study. Among 1407 ECRHS participants who had complete data, post-BD-AO was present in 6.7% (n=95) and this included 10.1% (n=39) of all current smokers and 18.8% (n=47) of those who reported wheezing at baseline. Compared with TAHS, ECRHS participants were somewhat more likely to have exertional breathlessness, be current and heavier smokers, and not have current asthma (online supplemental table E4). TAHS and ECRHS participants who had post-BD-AO in their 50s reported more current wheeze, chronic cough, sputum and chest tightness at baseline, that is, they were substantially more symptomatic than those without post-BD-AO (table 1). There were fewer current smokers in the group with complete compared with some missing data, but otherwise there were no appreciable differences in baseline characteristics (online supplemental table E5) or spirometry (online supplemental table E6).

Internal cross-validation of the final developed model

Discrimination between the risk-predictions and observed outcome was good, with an AUCROC of 80.8% (95% CI 80.0% to 81.5%) (table 2 and figure 2). Using the Youden index,21 sensitivity was 80.3% (77.7 to 82.9) and specificity 69.1% (68.7 to 69.5). The NPV was ≥98.5% compared with a low PPV (11.1%), but this was 2.5-fold higher than the baseline prevalence of post-BD-AO (4.5%). The HL test provided reasonable evidence of calibration (p>0.13, table 2 and online supplemental figure E4). Imputing missing data did not appreciably improve the predictive model performance (AUCROC 81.1%).
Table 2

Performance metrics for the internal cross-validation and external validation of the COPD risk-prediction model, with and without imputation in the development TAHS dataset*

Model validation (n/N=COPD/total cases)Diagnostic metrics (SE) †HL 2 p value
AUCROC (Cut-off)‡SensSpecNPVPPV
Internal validation (TAHS)
Complete case model (n/N=106/2320)0.808 (0.004)0.4800.803 (0.013)0.691 (0.002)0.987 (0.001)0.111 (0.004)0.13
0.500.779 (0.017)0.713 (0.004)0.985 (0.001)0.115 (0.002)
Imputed data model (n/N=108/2407)0.811 (0.004)0.4500.816 (0.013)0.671 (0.003)0.987 (0.001)0.105 (0.002)0.30
0.500.764 (0.012)0.724 (0.003)0.985 (0.001)0.115 (0.002)
External validations (ECRHS) using complete case model§
Equivalent age group (n/N=39/548)§0.746 (0.006)0.4830.745 (0.010)0.668 (0.003)0.972 (0.001)0.148 (0.005)0.95
0.500.666 (0.011)0.686 (0.003)0.964 (0.001)0.141 (0.005)
Extended age group (n/N=95/1407)¶0.756 (0.001)0.4830.769 (0.003)0.659 (0.001)0.975 (0.001)0.140 (0.001)0.69
0.500.737 (0.003)0.677 (0.001)0.975 (0.001)0.142 (0.001)
Current smokers only (n/N=36/268)0.639** (0.010)0.500.835 (0.011)0.173 (0.003)0.870 (0.008)0.137 (0.003)
Current asthma only (n/N=32/142)0.458** (0.006)0.500.969 (0.005)0.018 (0.002)0.662 (0.055)0.223 (0.005)
Any current respiratory symptom (n/N=72/631)0.719** (0.004)0.500.905 (0.005)0.469 (0.003)0.975 (0.001)0.179 (0.004)

*In TAHS, complete case numbers (n/N=106/2320) and imputed data (n/N=108/2407 participants).

†SE=SD deviations from the mean (equivalent to SE).

‡Based on >50% predicted probably for a positive case or optimised cut-off as per the Youden index.

§Data from ECRHS II (age 40–44) and ECRHS III (age 50–55) (n/n=39/548 participants).

¶Data from ECRHS II (age 40–49) and ECRHS III (age 50–59) (n/n=95/1407 participants).

**AUCROC values based only on a subset of data are poor indicators of model performance (as not based on the entire dataset).

AUCROC, area under the receiver operator characteristic curve; COPD, chronic obstructive lung disease; ECRHS, European Community Respiratory Health Survey; HL, Hosmer-Lemeshow; n, number of COPD cases; N, total number; NPV, negative predictive values; PPV, positive predictive value; sens, sensitivity; spec, specificity; TAHS, Tasmanian Longitudinal Health Study.

Figure 2

(A–C) Area under the receiver operator characteristic curve (AUCROC). Internal validation of the main chronic obstructive lung disease risk-prediction model using complete cases in Tasmanian Longitudinal Health Study (A). External validation using the corresponding (40–44 and 50–54 years) and extended age groups (40–49 and 50–59 years) in European Community Respiratory Health Survey (B and C, respectively). The Youden index that defines the optimal cut-off as specified in table 2 is indicated by the small black dot on the corresponding curves.

(A–C) Area under the receiver operator characteristic curve (AUCROC). Internal validation of the main chronic obstructive lung disease risk-prediction model using complete cases in Tasmanian Longitudinal Health Study (A). External validation using the corresponding (40–44 and 50–54 years) and extended age groups (40–49 and 50–59 years) in European Community Respiratory Health Survey (B and C, respectively). The Youden index that defines the optimal cut-off as specified in table 2 is indicated by the small black dot on the corresponding curves. Performance metrics for the internal cross-validation and external validation of the COPD risk-prediction model, with and without imputation in the development TAHS dataset* *In TAHS, complete case numbers (n/N=106/2320) and imputed data (n/N=108/2407 participants). †SE=SD deviations from the mean (equivalent to SE). ‡Based on >50% predicted probably for a positive case or optimised cut-off as per the Youden index. §Data from ECRHS II (age 40–44) and ECRHS III (age 50–55) (n/n=39/548 participants). ¶Data from ECRHS II (age 40–49) and ECRHS III (age 50–59) (n/n=95/1407 participants). **AUCROC values based only on a subset of data are poor indicators of model performance (as not based on the entire dataset). AUCROC, area under the receiver operator characteristic curve; COPD, chronic obstructive lung disease; ECRHS, European Community Respiratory Health Survey; HL, Hosmer-Lemeshow; n, number of COPD cases; N, total number; NPV, negative predictive values; PPV, positive predictive value; sens, sensitivity; spec, specificity; TAHS, Tasmanian Longitudinal Health Study.

External validation of the final developed model

Validation in the extended age group (ie, 1407 observations) performed similarly but with greater precision than that in the restricted age group (n=548 observations) and showed fairly good discriminatory ability, that is, AUCROC 75.6 and 74.6%, respectively (table 2 and figure 2). The PPV was not appreciably different when restricted to only current smokers aged 40–49 years but was slightly higher for adults with any current respiratory symptom/s (17.9% compared with 13.7%, table 2). This PPV was 2.7-fold higher than the baseline prevalence of post-BD airflow obstruction (6.7%).

Interactions between predictors

Of 210 potential interactions, the most frequent combination was between occupational class and smoking duration. For smoking beyond 25 years duration, the 10-year predicted probabilities for post-BD-AO were around 25% (figure 3, highlighted in yellow) which increased to around 40% for the occupational classes of labourers/cleaners, intermediate production/transport, house persons but not trade workers (highlighted in orange). The example of the single classification tree in online supplemental Methods E4c, figure E2, shows the 10-level occupational variable could be split multiple times within the same individual tree, with the averaging of predicted probabilities across thousands of classification trees plausibly explained the gradient (or blurring) of colours. The frequency of interactions is illustrated by online supplemental figure E5; 8 of the 10 most frequent interactions were between the smoking variables and occupation, 2 were between asthma and occupation, and none were between smoking and asthma. The ‘multi-way importance plots’ showed that occupational class, smoking duration and age-of-asthma and smoking onsets were more significant predictors in the TAHS dataset (online supplemental figures E6 and E7).
Figure 3

Interaction plot between the effects of increasing smoking duration (0–37 years) and occupation class on post-bronchodilator airflow obstruction at age 53 years. Recalibrated predicted probabilities range between <0.1 (blue) and 0.5 (red). Occupation class categories labelled from right to left: advanced clerical services (ACS), elementary clerical services (ECS), house persons (HP), intermediate production/transport (IPT), intermediate clerical services (ICS), labourer/cleaner/related workers (LC), legislator/manager (LM), professional (Pro), technicians/associate professional (Tech) and trade/related workers (TW).

Interaction plot between the effects of increasing smoking duration (0–37 years) and occupation class on post-bronchodilator airflow obstruction at age 53 years. Recalibrated predicted probabilities range between <0.1 (blue) and 0.5 (red). Occupation class categories labelled from right to left: advanced clerical services (ACS), elementary clerical services (ECS), house persons (HP), intermediate production/transport (IPT), intermediate clerical services (ICS), labourer/cleaner/related workers (LC), legislator/manager (LM), professional (Pro), technicians/associate professional (Tech) and trade/related workers (TW).

Individualised predicted probabilities and predicted occurrence

Due to the large number of potential combinations of predictors, it was not possible to present the full prediction model and predictions for all hypothetical scenarios. Selected examples of 43-year-old adults have been entered into the primary model (ie, complete cases and threshold >0.50) to predict probabilities of having the COPD outcome in their 50s. These scenarios included: an asymptomatic current smoker with varying smoking intensities/pack-years, then a current smoker with symptoms (online supplemental table E7); an ex-smoker with varying quit dates and respiratory symptoms (online supplemental table E8); a non-smoker with asthma (table 3); and comparisons between groups of quitters and continued smokers with or without active asthma at baseline (table 3).
Table 3

Hypothetical examples of individualised predictions by baseline smoking and asthma status in a high-risk occupation: risk difference with and without quitting by age 50s

Predictions by recalled asthma and smoking status for an at-risk occupational group*†Predicted probability (%)Predicted occurrence (1/n persons)Risk category (age in 40s)‡
Smoking status—no asthma or respiratory symptoms
Non-smoker0.6166Minimal
Past smoker2.540Low
Current smoker at mean age 43
Quit smoking by mean age 534.522Low
Continued smoking at 53§27.03.7Very high
Smoking status—adult-onset asthma with wheeze in the last 12 months
Non-smoker6.416Moderate
Past smoker10.89.3High
Current smoker at mean age 43
Quit smoking by mean age 5316.46.1High
Continued smoking at 53¶42.02.4Very high

*Based on a 30 pack-year smoking history starting at age 13 and asthma onset at age 23 years.

†Based on a female worker from an at-risk occupation (eg, labourers and related workers such as cleaners, factory workers, farm and/or kitchen hands).

‡Risk categories: minimal risk if predicted occurrence of 1 in >100 similar persons; low risk if 1 in 20–100 persons; moderate risk if 1 in 10–20 persons; high risk if 1 in 5–10 persons and very high risk if 1 in 1.5–5 persons.

§Same clinical scenario has been presented in online supplemental table E7 (30 pack-years of smoking).

¶Same clinical scenario as in online supplemental table E9 except the predicted probability was for a male worker (42.5%, labelled with ‡).

Hypothetical examples of individualised predictions by baseline smoking and asthma status in a high-risk occupation: risk difference with and without quitting by age 50s *Based on a 30 pack-year smoking history starting at age 13 and asthma onset at age 23 years. †Based on a female worker from an at-risk occupation (eg, labourers and related workers such as cleaners, factory workers, farm and/or kitchen hands). ‡Risk categories: minimal risk if predicted occurrence of 1 in >100 similar persons; low risk if 1 in 20–100 persons; moderate risk if 1 in 10–20 persons; high risk if 1 in 5–10 persons and very high risk if 1 in 1.5–5 persons. §Same clinical scenario has been presented in online supplemental table E7 (30 pack-years of smoking). ¶Same clinical scenario as in online supplemental table E9 except the predicted probability was for a male worker (42.5%, labelled with ‡).

Predictions for a current smoker

Predicted probabilities for post-BD-AO while aged 50s for a 43-year-old tradesman who currently smoked are presented in online supplemental table E7, while varying the daily cigarette intensity and age of smoking onset separately. Overall, the results suggest two smoking thresholds: (1) predicted risk-estimates that plateau beyond a smoking intensity of 20 cigarettes/day despite an increasing pack-year smoking history and (2) an acceleration of predicted risk-estimates beyond 20 years duration of smoking. Thus, the COPD-risk for a 43-year-old tradesman who smoked ≥20 cigarettes/day from age 18 was high (ie, predicted occurrence of one in every seven similar individuals), with and without respiratory symptoms typical of obstructive lung diseases. The predicted probability was very high if he started smoking from age 13 (1 in 3.7 persons). A twofold variation in the predicted probabilities for post-BD-AO when aged 50s was observed across the spectrum of occupations (online supplemental table E9).

Predictions for an ex-smoker

Predicted probabilities for post-BD-AO while aged 50s for a 43-year-old tradesman who had quit smoking are presented in online supplemental table E8, with varying years since quit dates (and therefore varied quit age and pack-years). These risk-estimates showed that the subgroup who quit even as recently as 12 months prior to baseline had substantially lower COPD-risk when compared with current smokers in table 3. Thus, the predicted COPD-risk for a 43-year-old ex-smoker of 25 pack-years who quit 5 years earlier, was only low-to-moderate, even in the presence of isolated respiratory symptoms typical of obstructive lung diseases. A similar 2.2-fold variation across occupational classes was also seen, however, all risk-predictions were in the low range (1.12%–2.50%) (online supplemental table E10).

Predictions for a non-smoker who has active asthma

Predicted probabilities for a 43-year-old female unskilled worker (eg, cleaner) showed that having active (current) asthma in the absence of smoking inferred moderate COPD-risk at age 50s with little variation by age-of-asthma onset (predicted probability 6%–9%, table 3). The risk-estimate was low for remitted asthma, although the predicted occurrence was not negligible at around 1 in 38 similar persons.

Difference in COPD-risk between groups of quitters and continuing smokers

Four hypothetical examples of asymptomatic 43-year-old unskilled current heavy smokers who were partitioned into subgroups of quitters and continuing smokers over the next 10 years, with or without concurrent asthma at baseline. For current smokers without active asthma, the risk-difference in predicted probabilities between those who quit or continued smoking over the next 10 years was 22.5% (4.51% compared with 27.0%, respectively, table 3), which is equivalent to a one in 4.4–fold difference in COPD-risk. For similar smokers with active asthma, the risk-difference was 25.6% (16.4% compared with 42.0%, respectively or one in 3.9-fold difference in COPD-risk).

Discussion

Using information from questionnaires that is readily accessible from patients and clinicians in a typical clinical scenario, we developed and validated a COPD risk-prediction model from general Australian and European populations aged in their 40s, to calculate 10-year COPD-risks as determined by post-BD airflow obstruction in their 50s. The variables of the final model comprised nine stem questions on known risk factors and resembled those of a basic respiratory assessment that covered participant sex, respiratory symptoms, smoking, asthma and occupation. As indicated by figure 3, online supplemental figure E5–7, our machine learning methods were able to account for the likely interactions between these predictors, especially with regards to smoking and at-risk occupations. Our risk-predictions could potentially inform on further testing of high-risk adults aged in their 40s using spirometry to uncover ‘COPD cases’ which could create opportunities for earlier detection and active intervention. However, the predictions do not relate to actual cases of clinical COPD but to spirometrically defined COPD, with and without symptoms or risk factors. The baseline prevalence of post-BD airflow obstruction for adults aged in their 40s is low, yet our model had good discriminatory ability. The PPV in the validation subset of symptomatic adults aged 40–49 indicated that the predictions were multiple times the baseline prevalence of post-BD airflow obstruction, and was higher than the recent Lancet article that presented machine learning-based predictions of non-fatal adverse effects following an acute coronary syndrome (in theonline supplemental file).24 However, it is acknowledged that for symptomatic adults who are identified by our risk-prediction model to be at high or very high COPD-risk, approximately 5.6 spirometry tests would be performed to uncover one case of spirometrically defined COPD, with the remainder being false positive results. Using individual case scenarios as examples, our prediction model confirmed high COPD-risk smoking profiles, but also has added to the knowledge base of causal inference by challenging the assumption of dose–response associations with smoking through illustrating two threshold effects: insignificant increases in predicted probability beyond smoking >20 cigarettes/day and an escalating risk for smoking durations longer than 20 years. It also highlighted a moderate COPD-risk for active asthma in non-smokers and discovered the modest predictive ability of respiratory symptoms which in retrospect is not unexpected given active asthma and chronic bronchitis can commonly occur in the absence of airflow obstruction. The modelling also found a 2.2-fold occupational risk and these risk-predictions were surprisingly highest for unskilled workers of lower socioeconomic status rather than for trade workers, and this possibly relates to the healthy worker effect. The 10-year risk-predictions for current smokers in their 40s were substantially lower for subsequent quitters than for continuing smokers thus supporting more intensive tobacco cessation counselling and support for this age group. Active case-finding to identify individuals with moderate-to-severe COPD is advocated by expert bodies25 26 to identify adults with early COPD and reduce morbidity, mortality and economic costs through early intervention, although conclusive evidence to support this initiative is lacking. Spirometry testing has generally been underused as a diagnostic test for COPD,27 28 despite recommendations for testing to be considered in symptomatic adults with and without known risk factors.29 30 While active case-finding in smokers is feasible and likely to be cost-effective,6 31 there has been a lack of action among primary care physicians to pre-emptively manage individuals who have relatively few symptoms.32 This is typified by the inclusion of early spirometry within the section, ‘screening tests of unproven benefit’ of the Australian primary care guidelines.29 This recommendation was largely informed by a lack of direct evidence to determine the benefits and harms of screening in asymptomatic adults even when at high risk,33 34 while comparable screening programmes for coronary heart disease and diabetes3 4 were based on only limited data before being recommended as part of routine practice.35 36 Historically, COPD may not be given equal priority by primary care physicians as the disease has traditionally been regarded to be self-inflicted with stigmatisation of affected people,37 and its multifactorial nature beyond smoking has only recently been appreciated. The health system seems to place the responsibility for COPD prevention primarily with public health initiatives, and thus, establishing the cost-effectiveness of pre-emptive identification and providing integrated research support for practice change would be needed to improve the uptake of spirometry use in primary care. Our risk-prediction model includes known risk factors as predictors of lung function consistent with COPD, that is, smoking, active asthma38 and potential hazardous exposures from unskilled jobs, for which preventive management strategies are the cornerstone of best clinical practice. External validation in a similar population-based cohort suggests robustness in our predicted probabilities for individual case scenarios. While we acknowledge that our model alone cannot identify all individuals on an accelerated course to severe airways obstruction, this approach could help identify adults aged in their 40s who are at higher risk and may benefit from serial spirometry to detect rapidly progressive COPD as a ‘targeted intervention’. Although the use of a supervised learning model requires careful interpretation of the findings, our individualised risk-predictions might be useful in refining guideline recommendations that consider spirometry testing in adults at least 40 years of age29 who are heavy smokers,29 39 symptomatic29 40 and/or have recurrent chest infections.39 Similarly, our novel approach to partition current smokers aged in their 40s into either quitters or persistent smokers over the next 10 years could motivate middle-aged smokers to change their behaviours.41 While we did not account for the reasons underlying quitting and acknowledge that these are distinct participant subgroups, a causal interpretation is biologically plausible given smoking cessation can improve lung function trajectories,7 and asthma control.8

Strengths and limitations

By design, our risk-predictions were based on information that was easily collected and relevant to an age when early COPD begins to manifest clinically and when there is some potential for reversal or at least stabilisation. Our use of randomForest methodology was advantageous over regression methods for prediction as it could inherently accommodate non-linearity, multi-collinearity and multiple interactions (figure 3, online supplemental figure E4). External validation using general population-based data from Europe extends the generalisability to different geographical regions and to a broader age group of 50–59 years old, although validation in non-Caucasian populations is still needed. Although much larger participant numbers such as those available in administrative health databases could have improved the predictive accuracy, our study design was superior because we used objective and individualised spirometry measurements (rather than ICD-9 codes) and a detailed smoking history. Post-BD spirometry is more relevant to clinically important COPD outcomes than pre-BD measurements,42 especially for countries with moderate-to-high asthma prevalence such as Australia. Although we did not have post-BD spirometry for the majority of participants in their 40s, we argue that this represents a usual clinical scenario when an individual is assessed for the first time. Our selection of predictor variables could have limited our model performance as we did not have reliable data on family history of COPD/emphysema, respiratory infections and other air pollutants. Finally, this study was not designed to address causal inference and rate of lung function decline, so caution is advised when interpreting the effect size of quitting smoking on change in COPD-risk and progression to clinical COPD, respectively.

Conclusion

This pragmatic and validated COPD risk-prediction model could predict high or very high risk of post-BD airflow obstruction in 10 years’ time in Caucasian adults aged 40–49 years. These risk-predictions are especially relevant to COPD in the presence of respiratory symptoms, and to the asthma-COPD overlap (in the presence of current asthma). We have quantified substantial differences in COPD-risk between middle-aged quitters and continuing smokers, which provide rationale to intensify tobacco cessation strategies for smokers less than 50 years of age, especially unskilled workers with a history of asthma. This work has potential to facilitate the pre-emptive detection of COPD at a much earlier age in primary care settings.
  29 in total

1.  Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets.

Authors:  Fabrizio D'Ascenzo; Ovidio De Filippo; Guglielmo Gallone; Gianluca Mittone; Marco Agostino Deriu; Mario Iannaccone; Albert Ariza-Solé; Christoph Liebetrau; Sergio Manzano-Fernández; Giorgio Quadri; Tim Kinnaird; Gianluca Campo; Jose Paulo Simao Henriques; James M Hughes; Alberto Dominguez-Rodriguez; Marco Aldinucci; Umberto Morbiducci; Giuseppe Patti; Sergio Raposeiras-Roubin; Emad Abu-Assi; Gaetano Maria De Ferrari
Journal:  Lancet       Date:  2021-01-16       Impact factor: 79.321

2.  Interpretative strategies for lung function tests.

Authors:  R Pellegrino; G Viegi; V Brusasco; R O Crapo; F Burgos; R Casaburi; A Coates; C P M van der Grinten; P Gustafsson; J Hankinson; R Jensen; D C Johnson; N MacIntyre; R McKay; M R Miller; D Navajas; O F Pedersen; J Wanger
Journal:  Eur Respir J       Date:  2005-11       Impact factor: 16.671

3.  Cohort Profile: The Tasmanian Longitudinal Health STUDY (TAHS).

Authors:  Melanie C Matheson; Michael J Abramson; Katrina Allen; Geza Benke; John A Burgess; James G Dowty; Bircan Erbas; Iain H Feather; Peter A Frith; Graham G Giles; Lyle C Gurrin; Garun S Hamilton; John L Hopper; Alan L James; Mark A Jenkins; David P Johns; Caroline J Lodge; Adrian J Lowe; James Markos; Stephen C Morrison; Jennifer L Perret; Melissa C Southey; Paul S Thomas; Bruce R Thompson; Richard Wood-Baker; Eugene Haydn Walters; Shyamali C Dharmage
Journal:  Int J Epidemiol       Date:  2017-04-01       Impact factor: 7.196

4.  Optimizing Prediction of the Lung Function Features of COPD.

Authors:  Jennifer L Perret; Koen Simons; Don Vicendese; Adrian Bickerstaffe; Tony Blakely; Shyamali C Dharmage
Journal:  Chest       Date:  2020-03       Impact factor: 9.410

5.  The European Community Respiratory Health Survey.

Authors:  P G Burney; C Luczynska; S Chinn; D Jarvis
Journal:  Eur Respir J       Date:  1994-05       Impact factor: 16.671

6.  Comparison of pre- and post-bronchodilator lung function as predictors of mortality: The HUNT Study.

Authors:  Laxmi Bhatta; Linda Leivseth; David Carslake; Arnulf Langhammer; Xiao-Mei Mai; Yue Chen; Anne H Henriksen; Ben M Brumpton
Journal:  Respirology       Date:  2019-07-24       Impact factor: 6.424

Review 7.  Screening for Chronic Obstructive Pulmonary Disease: Evidence Report and Systematic Review for the US Preventive Services Task Force.

Authors:  Janelle M Guirguis-Blake; Caitlyn A Senger; Elizabeth M Webber; Richard A Mularski; Evelyn P Whitlock
Journal:  JAMA       Date:  2016-04-05       Impact factor: 56.272

8.  Updated USPSTF Screening Recommendations for Diabetes: Identification of Abnormal Glucose Metabolism in Younger Adults.

Authors:  Richard W Grant; Anjali Gopalan; Marc G Jaffe
Journal:  JAMA Intern Med       Date:  2021-10-01       Impact factor: 21.873

9.  Screening for Chronic Obstructive Pulmonary Disease: US Preventive Services Task Force Recommendation Statement.

Authors:  Albert L Siu; Kirsten Bibbins-Domingo; David C Grossman; Karina W Davidson; John W Epling; Francisco A R García; Matthew Gillman; Alex R Kemper; Alex H Krist; Ann E Kurth; C Seth Landefeld; Carol M Mangione; Diane M Harper; William R Phillips; Maureen G Phipps; Michael P Pignone
Journal:  JAMA       Date:  2016-04-05       Impact factor: 56.272

10.  Impact of COPD case finding on clinical care: a prospective analysis of the TargetCOPD trial.

Authors:  Shamil Haroon; Peymane Adab; Andrew P Dickens; Alice J Sitch; Kiran Rai; Alexandra Enocson; David A Fitzmaurice; Rachel E Jordan
Journal:  BMJ Open       Date:  2020-10-05       Impact factor: 2.692

View more
  1 in total

1.  Development and validation of immune-based biomarkers and deep learning models for Alzheimer's disease.

Authors:  Yijie He; Lin Cong; Qinfei He; Nianping Feng; Yun Wu
Journal:  Front Genet       Date:  2022-08-22       Impact factor: 4.772

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.