Literature DB >> 34411121

A machine learning approach to predict extreme inactivity in COPD patients using non-activity-related clinical data.

Bernard Aguilaniu¹, David Hess², Eric Kelkel³, Amandine Briault⁴, Marie Destors⁴, Jacques Boutros⁵, Pei Zhi Li⁶, Anestis Antoniadis^7,8.

Abstract

Facilitating the identification of extreme inactivity (EI) has the potential to improve morbidity and mortality in COPD patients. Apart from patients with obvious EI, the identification of a such behavior during a real-life consultation is unreliable. We therefore describe a machine learning algorithm to screen for EI, as actimetry measurements are difficult to implement. Complete datasets for 1409 COPD patients were obtained from COLIBRI-COPD, a database of clinicopathological data submitted by French pulmonologists. Patient- and pulmonologist-reported estimates of PA quantity (daily walking time) and intensity (domestic, recreational, or fitness-directed) were first used to assign patients to one of four PA groups (extremely inactive [EI], overtly active [OA], intermediate [INT], inconclusive [INC]). The algorithm was developed by (i) using data from 80% of patients in the EI and OA groups to identify 'phenotype signatures' of non-PA-related clinical variables most closely associated with EI or OA; (ii) testing its predictive validity using data from the remaining 20% of EI and OA patients; and (iii) applying the algorithm to identify EI patients in the INT and INC groups. The algorithm's overall error for predicting EI status among EI and OA patients was 13.7%, with an area under the receiver operating characteristic curve of 0.84 (95% confidence intervals: 0.75-0.92). Of the 577 patients in the INT/INC groups, 306 (53%) were reclassified as EI by the algorithm. Patient- and physician- reported estimation may underestimate EI in a large proportion of COPD patients. This algorithm may assist physicians in identifying patients in urgent need of interventions to promote PA.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34411121 PMCID： PMC8376055 DOI： 10.1371/journal.pone.0255977

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Patients with chronic obstructive pulmonary disease (COPD) are known to be substantially less physically active than age- and sex-matched healthy subjects [1]. Several studies have shown that low physical activity (PA) levels are associated with poor prognosis in COPD patients [2, 3], yet pulmonary rehabilitation programs that incorporate endurance and strength training have shown significant benefit in this patient population [4]. Thus, accurate identification of the true PA status is a crucial factor in ensuring that the least active patients, who would be expected to derive the greatest benefit from PA, can be encouraged to become more active and/or referred to a rehabilitation program. Several methods have been devised to assess and quantify PA levels in patients with various respiratory diseases. In particular, accelerometers can be worn over several days to analyze the full range of different activities and their distribution over time. Data from such devices have generally correlated well with assessments of daily metabolic expenditure, as measured using the doubly labeled water method, and accelerometers are also sufficiently sensitive to detect low levels of PA in COPD patients [4]. These quantitative studies have estimated that approximately 26%–30% of COPD patients are physically inactive and exhibit sedentary behavior, both of which are independently associated with an increased risk of morbidity and mortality [3, 5, 6]. However, accelerometry requires considerable cost, time, and effort commitments on the part of the patient and physician, and it is generally considered impractical for routine clinical use. At the same time, clinical interviews and patient questionnaires alone cannot accurately determine the patient’s true PA level [7]. To improve this situation, the PROactive consortium proposed that a combination of questionnaires and accelerometric measurements be used to assess the behavior of COPD patients [8, 9]. Nevertheless, this approach does not eliminate the drawbacks of accelerometry, and therefore does not resolve the primary clinical concern, which is to accurately and objectively detect extreme inactivity (referred to hereafter as EI) in patients whose PA status initially presents as unclear or equivocal [10, 11]. Although such patients may be identified during consultation with experienced practitioners, it is likely that a significant percentage of EI patients fall under the radar of clinical vigilance, which most often focuses on respiratory function. Given the proven benefit of pulmonary exercise programs in COPD patients, we therefore sought to develop a predictive algorithm that can reliably detect EI patients, who might most benefit from interventions such as pulmonary rehabilitation programs. We hypothesized that certain physiological and clinical variables may be more frequently observed (through cause or effect) among patients at the extreme ends of the PA spectrum (i.e., EI and overtly active [OA] patients), and that such ‘phenotype signatures’ composed of non-PA-related variables could be used to develop the predictive algorithm.

Materials and methods

Patients and data collection

This was a retrospective analysis of data submitted to the COLIBRI-COPD database [12, 13], which has been authorized by the French national commission on personal data privacy (Commission Nationale de l’Informatique et des Libertés, CNIL, #2013–526). The requirement for written consent was waived in this observational study in accordance with French law. Patients provided oral informed consent to their physician. At the time of the analysis, data were available from 5035 initial consultations for COPD patients (Fig 1). We selected 1409 patients with comprehensive information on 22 specific variables (see Table 2) in the areas of anthropometry, smoking habits, resting pulmonary function, comorbidities, exacerbations during the preceding year, Global Initiative for COPD (GOLD) ABCD classification, and self-reported questionnaires: the modified Medical Research Council dyspnea scale (mMRC) [14] and Disability related to COPD Tool (DIRECT), both of which assess dyspnea [15, 16]; the COPD Assessment Test (CAT), which assesses quality of life [17]; and the Hospital Anxiety and Depression Scale, which separately assesses anxiety and depression [18, 19].

Fig 1

Study design.

See Table 1 for definitions of activity categories.

Table 2

Clinical and functional characteristics of the stratified COPD patients (n = 1409).

	EI	INT	INC	OA	p-value
	n = 172	n = 410	n = 167	n = 660	p-value
Anthropometric and behavioral characteristics
Age (years)	67.5 ± 10.1	65.4 ± 9.5	65.9 ± 8.6	65.5 ± 8.3	0.063
Male gender	60.5%	63.9%	61.7%	73.5%	****
BMI (kg/m²)	26.5 ± 6.8	26.2 ± 5.9	25.0 ± 5.4	25.8 ± 5.1	0.058
Smokers (current or ex)	0.965	0.963	0.964	0.964	0.07
mMRC score	2.7 ± 1.1	1.9 ± 1.1	1.8 ± 1	1.2 ± 0.9	****
DIRECT score	17.4 ± 8.6	13.0 ± 8	12.0 ± 6.9	8.6 ± 6.4	****
CAT score	21.3 ± 8.1	18.0 ± 7.6	17.4 ± 7.5	14.1 ± 7.2	****
HADS Anxiety subscore	7.4 ± 4.6	6.3 ± 4.5	6.1 ± 4	5.4 ± 3.7	****
HADS Depression subscore	8.2 ± 4.8	6.2 ± 4.2	5.6 ± 3.7	4.7 ± 3.5	****
Functional respiratory parameters and GOLD 2011 classification
FEV₁ (L)	1.28 ± 0.6	1.57 ± 0.6	1.58 ± 0.7	1.82 ± 0.7	****
FEV₁ (% predicted)	50.9 ± 22.6	59.2 ± 22	59.2 ± 22.9	65.5 ± 20.5	****
FVC (L)	2.5 ± 0.9	2.86 ± 0.9	3.0 ± 1.1	3.24 ± 1	****
FVC (% predicted)	77.9 ± 24.3	85.4 ± 22.5	89.6 ± 25.2	92.8 ± 21.2	****
FEV₁/FVC (%)	50.6 ± 14.5	54.4 ± 13.3	52 ± 14	55.3 ± 11.8	****
GOLD 1	13.4%	18.3%	21.6%	24.7%	****
GOLD 2	32.0%	45.9%	36.5%	49.4%	****
GOLD 3	25.6%	24.4%	29.3%	21.7%	****
GOLD 4	29.1%	11.5%	12.6%	4.2%	****
Comorbidities and GOLD 2017 classification
Cardiovascular disease and/or diabetes	83.1%	71.0%	69.5%	63.9%	****
Treated for anxiety or depression	72.7%	59.8%	59.9%	51.1%	****
Exacerbation within the previous year (≥1 severe or ≥2 mild/moderate)	48.3%	32.7%	39.5%	25.0%	****
GOLD A	2.9%	6.8%	10.2%	22.0%	****
GOLD B	48.8%	60.5%	50.3%	53.0%	****
GOLD C	0.0%	2.4%	2.4%	3.9%	****
GOLD D	48.3%	30.2%	37.1%	21.1%	****

Data are presented as the percentage or mean ± standard deviation. Comparisons between PA categories were performed by Kruskal-Wallis tests and ANOVA with ordinal factors test (ordAOV). Significant differences are noted: p< 0,…****; p< 0.001 ***; p< 0.01 **; p< 0.05 *.

Abbreviations: BMI, body mass index; CAT, COPD Assessment Test; COPD, chronic obstructive pulmonary disease; DIRECT, Disability related to COPD Tool; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; GOLD, Global Initiative for Chronic Obstructive Lung Disease classification; HADS, Hospital Anxiety and Depression Scale; mMRC, modified Medical Research Council dyspnea scale. For EI, INT, INC, and OA definitions, see Table 1.

Study design.

See Table 1 for definitions of activity categories.

Table 1

Categorization of physical activity levels in COPD patients according to combined patient- and physician-derived estimates.

Patient’s Estimate (daily walking time; n = 1409)	Physician’s Estimate (activity intensity; n = 1409)
	(D)omestic	(R)ecreational	(A)ctive
	n = 504	n = 530	n = 375
(1) ≤ 10 min (n = 203)	EI	INT-b (n = 23)*	INC-c (n = 8)
(1) ≤ 10 min (n = 203)	n = 172	EI predicted = 9	EI predicted = 3
(2) 10–30 min (n = 440)	INT-a (n = 226)	INT-c (n = 161)	INC-d (n = 53)
(2) 10–30 min (n = 440)	EI predicted = 140	EI predicted = 74	EI predicted = 22
(3) 30–60 min (n = 399)	INC-a (n = 69)	OA n = 660
	EI predicted = 41	OA n = 660
	EI predicted = 41	n = 194	n = 136
(4) >60 min (n = 367)	INC-b (n = 37)	n = 152	n = 178
(4) >60 min (n = 367)	EI predicted = 17	n = 152	n = 178

Abbreviations: EI, extremely inactive category; OA, overtly active category; INT (a,b,c), physical activity levels intermediate between EI and OA; INC (a,b,c,d), incompatible physician and patient estimates of activity. (D)omestic, activities mainly confined to the home; (R)ecreational, predominantly outside the home; (A)ctive, predominantly devoted to maintaining fitness.

*EI predicted indicates the number of patients within each INT and INC subcategory reassigned to the EI category by the predictive algorithm.

Construct of the predictive machine learning

We first categorized a cohort of COPD patients into one of four activity levels based on the patient’s own estimates of their PA (daily walking time) and the physician’s estimates of the patient’s PA intensity level (domestic, recreational, and fitness-directed). We then tested existing machine learning processes already in use for predicting disease outcomes using routine clinical data [20, 21], and trained the model to identify an EI signature using clinicopathological data from a subset (80%) of patients in the EI and OA categories. After training, we tested the algorithm’s predictive validity on the remaining 20% of patients in the EI and OA categories, and then evaluated its ability to detect EI patients in the intermediate (INT) or inconclusively determined (INC) PA categories.

Definition of PA categories

Assignment of patients to PA categories was based on physician estimates of the predominant intensity level of the patient’s daily PA: domestic (D, in-home activities), recreational (R, mostly outside the home), or active (A, devoted to maintaining physical fitness) and patient estimates of the average daily walking time outside the home (including weekends): <10 min, 10–30 min, 30–60 min, and >60 min. Based on these criteria, we constructed a 3 × 4 table to identify four main PA categories: (i) least active (EI, n = 172); (ii) most active (OA, n = 660); (iii) intermediate activity level (INT, n = 410), which had three subcategories (a, b, and c); and (iv) incompatible (INC, n = 167), which had four subcategories (a, b, c, and d) and consisted of patients whose self-reported and physician-reported activities were considered conflicting (Table 1). Descriptive clinical and functional characteristics of COPD patients stratified by PA categories are presented as mean ± standard deviation. Comparisons between PA categories were performed by Kruskal-Wallis tests and ANOVA with ordinal factors test (ordAOV). Abbreviations: EI, extremely inactive category; OA, overtly active category; INT (a,b,c), physical activity levels intermediate between EI and OA; INC (a,b,c,d), incompatible physician and patient estimates of activity. (D)omestic, activities mainly confined to the home; (R)ecreational, predominantly outside the home; (A)ctive, predominantly devoted to maintaining fitness. *EI predicted indicates the number of patients within each INT and INC subcategory reassigned to the EI category by the predictive algorithm.

Predictive statistical methods

The predictive machine learning method was developed in five steps. (i) We first verified that the EI variable and its variability correlated well with a set of continuous and categorical variables. Then, we performed an explanatory canonical discriminant analysis of mixed data followed by a scree plot to select the statistically significant canonical variables to be used in more elaborate individual predictive models. After this step, a reduced rank display (S1 Fig) showed that two canonical discriminant projections accounted for 98.6% of the variation between categories, of which 95.8% concerned EI and OA, while the projection of INT and INC on the two canonical directions was very slight. (ii) Based on this, we opted to develop an algorithm focused on individual prediction of the two most extreme categories; EI (n = 172) versus OA (n = 660). The predictive model was developed using an ensemble regression and classification algorithm [22] with a version for balancing error in unbalanced data (weighted random forest, WRF). To account for random effects, such as the physician identity or study center, we also combined the random forest methodology with generalized linear mixed models using the binary mixed model (BiMM) forest algorithm [23]. (iii) Data from the 832 patients in the EI and OA groups were randomly selected; of these, we used data from 666 patients (80%) to develop the model and data from the remaining 20% (166 patients) to assess its accuracy (i.e., predictive error). (iv) In the next step, we addressed the imbalance in our final prediction using a recent hyper-ensemble of SMOTE under sampled random forests (HyperSMURF) method, which is based on resampling techniques and a hyper-ensemble approach (S2 Fig). (v) Finally, once validated, the algorithm was applied to patients in the combined INT and INC subcategories. Descriptive results are presented as mean ± standard deviation. The performance of the algorithm for predicting EI and OA is expressed as overall error, weighted accuracy, true negative value, true positive value, and sensitivity. Additional performance measurements included area under the precision and recall curve (AUPRC) and area under the receiver operating characteristic curve (AUROC).

Results

Descriptive results

Table 1 shows the distribution of the 1409 patients into four categories and 12 subcategories according to the combination of patient and physician estimates. The reference category EI (n = 172) was composed of patients with the lowest duration and intensity PA level (subcategory D and <10 min walking/day), whereas the OA category (n = 660) included the most active patients (subcategory R or A and >30 min walking/day). Patients who spent short times (≤30 min) in daily activities were referred to as the INT group (n = 410) and were subcategorized as a, b, or c, depending on the physicians’ estimate of the activity intensity (Table 1). Finally, patients whose self- and physician-reported subcategories were incompatible were referred to as the INC group (n = 167) and were further assigned to a, b, c, or d groups based on the time and intensity. The seven categories encompassed by INC a–d and INT a–c together account for about 40% of the total cohort, highlighting the need for a tool to more accurately assess daily PA. After validation and predictive validity testing (see next section), we applied the algorithm to patients in the full cohort as well as the INC and INT categories and determined the number of patients who were identified by the algorithm as having the EI phenotype (Table 1). A total of 21.7% of the full cohort (306/1409) were reassigned to EI. Of these, 15.8% (223/1409) were in the original INT a–c categories and 5.9% (83/1409) were in the original INC a–d categories. Thus, application of the algorithm increased the proportion of EI patients in the full cohort from 12.2% (172/1409) to 33.9% (478/1409). Not surprisingly, comparisons of clinicopathological characteristics showed a trend towards worsening health status of patients in the order EI > INT > INC > OA (Table 2). The differences were particularly stark when comparing patients in the EI versus OA categories, while the INT group had intermediate values between the EI and OA groups. Fig 2 shows a comparison of selected anthropometric and behavioral characteristics (continuous variables) stratified by our PA categories or the GOLD ABCD 2017 categories. Of note, the symptom-related variables (mMRC, DIRECT, and CAT scores) logically discriminate between patients according to the GOLD ABCD classification, but they overlap the PA categories, indicating that these questionnaires individually have a poor ability to predict PA level. As shown in Fig 3, this possibility was confirmed by the large overlap between not only continuous variables (DIRECT score, CAT score, age, body mass index) but also categorical variables (age, sex, exacerbation, and GOLD ABCD) for patients in the EI, INT, INC, and OA categories, consistent with their poor individual ability to predict EI status.

Fig 2

Fig 3

Box plots (categorical/ordinal variables) and line plots (continuous variables) of the marginal effect of a predictor (x-axis) on the probability of a patient being assigned to the EI category according to the weighted random forest method (y-axis). See also S3 Fig for the inverse analysis of probability of assignment to the OA category. Box plots show the median, minimum, maximum, and interquartile values. See Table 1 for definitions of activity categories.

Univariate boxplots comparing the distribution of selected continuous variables according to the physical activity category described here (top row) and GOLD 2017 category (bottom row). Plots show the median, minimum, maximum, and interquartile values. See Table 1 for definitions of activity categories. Box plots (categorical/ordinal variables) and line plots (continuous variables) of the marginal effect of a predictor (x-axis) on the probability of a patient being assigned to the EI category according to the weighted random forest method (y-axis). See also S3 Fig for the inverse analysis of probability of assignment to the OA category. Box plots show the median, minimum, maximum, and interquartile values. See Table 1 for definitions of activity categories. Anthropometric and behavioral characteristics Age (years) 67.5 ± 10.1 65.4 ± 9.5 65.9 ± 8.6 65.5 ± 8.3 0.063 Male gender 60.5% 63.9% 61.7% 73.5% **** BMI (kg/m2) 26.5 ± 6.8 26.2 ± 5.9 25.0 ± 5.4 25.8 ± 5.1 0.058 Smokers (current or ex) 0.965 0.963 0.964 0.964 0.07 mMRC score 2.7 ± 1.1 1.9 ± 1.1 1.8 ± 1 1.2 ± 0.9 **** DIRECT score 17.4 ± 8.6 13.0 ± 8 12.0 ± 6.9 8.6 ± 6.4 **** CAT score 21.3 ± 8.1 18.0 ± 7.6 17.4 ± 7.5 14.1 ± 7.2 **** HADS Anxiety subscore 7.4 ± 4.6 6.3 ± 4.5 6.1 ± 4 5.4 ± 3.7 **** HADS Depression subscore 8.2 ± 4.8 6.2 ± 4.2 5.6 ± 3.7 4.7 ± 3.5 **** FEV1 (L) 1.28 ± 0.6 1.57 ± 0.6 1.58 ± 0.7 1.82 ± 0.7 **** FEV1 (% predicted) 50.9 ± 22.6 59.2 ± 22 59.2 ± 22.9 65.5 ± 20.5 **** FVC (L) 2.5 ± 0.9 2.86 ± 0.9 3.0 ± 1.1 3.24 ± 1 **** FVC (% predicted) 77.9 ± 24.3 85.4 ± 22.5 89.6 ± 25.2 92.8 ± 21.2 **** FEV1/FVC (%) 50.6 ± 14.5 54.4 ± 13.3 52 ± 14 55.3 ± 11.8 **** GOLD 1 13.4% 18.3% 21.6% 24.7% **** GOLD 2 32.0% 45.9% 36.5% 49.4% **** GOLD 3 25.6% 24.4% 29.3% 21.7% **** GOLD 4 29.1% 11.5% 12.6% 4.2% **** Cardiovascular disease and/or diabetes 83.1% 71.0% 69.5% 63.9% **** Treated for anxiety or depression 72.7% 59.8% 59.9% 51.1% **** Exacerbation within the previous year (≥1 severe or ≥2 mild/moderate) 48.3% 32.7% 39.5% 25.0% **** GOLD A 2.9% 6.8% 10.2% 22.0% **** GOLD B 48.8% 60.5% 50.3% 53.0% **** GOLD C 0.0% 2.4% 2.4% 3.9% **** GOLD D 48.3% 30.2% 37.1% 21.1% **** Data are presented as the percentage or mean ± standard deviation. Comparisons between PA categories were performed by Kruskal-Wallis tests and ANOVA with ordinal factors test (ordAOV). Significant differences are noted: p< 0,…****; p< 0.001 ***; p< 0.01 **; p< 0.05 *. Abbreviations: BMI, body mass index; CAT, COPD Assessment Test; COPD, chronic obstructive pulmonary disease; DIRECT, Disability related to COPD Tool; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; GOLD, Global Initiative for Chronic Obstructive Lung Disease classification; HADS, Hospital Anxiety and Depression Scale; mMRC, modified Medical Research Council dyspnea scale. For EI, INT, INC, and OA definitions, see Table 1.

Predictive results

Table 3 shows the analysis of the predictive algorithm performance using several classifier methods. The BiMM and WRF results did not differ significantly, suggesting that the prediction was independent of the physician who collected the data and the practice setting. This assertion was further checked by performing a panel data analysis on the clustered data and testing the hypothesis of presence of random effects. This analysis yielded a p value of 0.0069, thus supporting a fixed effects model (i.e., a random forest prediction without random effects). Overall, the AUPRC indicates that the HyperSMURF algorithm achieved significantly better sensitivity than WRF or BiMM for predicting EI, with little deterioration in the sensitivity of the OA classification. As an example, Fig 3 shows the influence of some variable values on the prediction of EI status, and S3 Fig shows a comparable analysis for the prediction of OA. As can be seen, only the higher scores (mMRC ≥3, CAT >30, DIRECT >23) are associated with a probability of EI >0.5. The strength of our predictive model is also confirmed by the corresponding ROC curves (Fig 4). Although the differences between the WRF and HyperSMURF predictions, as measured by the AUROC, are not large, AUPRC is considered to be more informative than AUROC for imbalanced data [24]. Finally, we applied our predictive algorithm process to the INT and INC subcategories. Table 1 shows that about half of the patients were predicted to be EI; specifically, 54% and 41% in the INT and INC categories, respectively. S1 Table shows the distribution of the GOLD 2011 and ABCD classifications within the PA categories.

Table 3

Evaluation of the performance of the predictive algorithm.

	Overall error	Accuracy*	PPV	NPV	Sensitivity		AUPRC	AUROC*
	Overall error	Accuracy*	PPV	NPV	EI	OA	AUPRC	AUROC*
HyperSMURF	13.7%	0.76 (0.69–0.82)	0.45	0.93	79.4%	75.0%	0.64	0.84 (0.75–0.92)
Weight Random Forest	14.1%	0.84 (0.87–0.90)	0.63	0.90	59.0%	90.9%	0.49	0.75 (0.66–0.84)
BiMM Random Forest	14.2%	0.84 (0.78–0.90)	0.87	0.67	47.0%	94.0%	0.47	0.70 (0.62–0.80)

*Accuracy and AUROC are presented with 95% confidence intervals.

Data are based on analysis of 20% (n = 166) patients in the OA and EI groups.

Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision and recall curve; BiMM, binary mixed model forest algorithm (23); HyperSMURF, hyper-SMOTE under sampled random forests (24); NPV, negative predictive value; PPV, positive predictive value. See Table 1 for definitions of EI and OA.

Fig 4

Receiver operating characteristic curves for the prediction of EI using weighted random forest (WRF, left) and hyper-ensemble of SMOTE under sampled random forests (HyperSMURF, right) methods. Areas under the curves are shown as the median and 95% confidence intervals. *Accuracy and AUROC are presented with 95% confidence intervals. Data are based on analysis of 20% (n = 166) patients in the OA and EI groups. Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision and recall curve; BiMM, binary mixed model forest algorithm (23); HyperSMURF, hyper-SMOTE under sampled random forests (24); NPV, negative predictive value; PPV, positive predictive value. See Table 1 for definitions of EI and OA.

Discussion

The main contribution of this study is to demonstrate the predictive validity of an algorithm for predicting the least active COPD patients from information available in routine pulmonologist practice independently of PA-related measures. The originality and strength of our algorithm lies in its ability to predict EI in patients whose PA level is equivocal or unclear based on the patient’s and physician’s opinions, thus bringing to light the precise subgroup of COPD patients who are most in need of increased PA. Depending on the options available to the referring pulmonologist, this algorithm will help in deciding the optimal next step for each patient; whether that is accelerometry, as proposed by the PROactive consortium,8 referral to supervised rehabilitation [25], and/or simply encouraging the patient to participate in social activities that include PA [26].

Selection of machine learning methods

In the present study, we demonstrate that a specific random forest machine learning algorithm, which we refer to as the EI algorithm, is effective in predicting the EI or OA status of COPD patients. In addition, the algorithm has the potential to automatically detect the most informative predictors of EI by excluding many irrelevant confounding factors that influence both the dependent variable (EI or OA) and independent variables (explanatory variable), thus causing a spurious association. The EI algorithm outperforms traditional multiple linear/logistic regression models by unmasking predictive potential not apparent in a linear model. We could also have considered using a Bayesian machine learning framework to develop a prediction procedure and simultaneously identify promising subsets of relevant predictors. While the Bayesian framework may have achieved equivalent predictive performance, it would have required a large number of assumptions on independent variables and many successive statistical checks, making it much more difficult to interpret. Because of this complexity, we opted for a frequentist framework that markedly reduces the number of mathematical steps and their validation and obtains a level of predictive validity acceptable for its intended clinical use. Our results confirm that the EI algorithm possesses two critical features of a predictive model: the agreement between observed probabilities and predicted probabilities (i.e., calibration) and the ability to clearly distinguish between categories (i.e., discrimination). Thus, for the intended purpose of guidance in clinical decision-making, our EI algorithm provides an acceptable balance between a high rate of true positives (correctly identified patients) and a low rate of false positives (incorrectly identified patients). As with any predictive algorithm designed to assist in medical decisions, the EI algorithm should be considered a contributing tool that takes into account the potential impact on the patient’s health.

Decision-making process and machine learning

Matching these predicted probabilities with a 0–1 classification, by choosing a threshold above which a new observation is classified as 1 versus 0, is no longer part of the statistics. It is part of the decision-making process that integrates other contingencies or issues than the probabilistic results of the model. Practitioners may ask several pertinent questions that could influence this threshold. For example, will a binary categorization (EI and OA) negatively affect patient care compared with a more detailed determination of daily PA behavior? If so, in what way will it affect care, especially with respect to the design of individualized pulmonary rehabilitation programs or personalized recommendations? Like any diagnostic method implemented in the clinical decision-making process, the predictive validity of the information and the operational impact of the level of precision must be evaluated. This echoes some points raised by Faner and Agusti [27], who questioned the practical use of conclusions based on clustering studies for identifying a clinical phenotype predictive of mortality for a single patient. In that case, the issue was whether a complex analytical approach—as opposed to common sense—was really necessary to know that patients with severe airflow limitation and comorbidities would have a poor prognosis. In real-life practice, the purpose of the EI algorithm would be to alert the physician to the probability of a new patient having EI or OA status. This is particularly important because only a minority of patients who are eligible for pulmonary rehabilitation actually derive benefit [28], partly because the referring pulmonologist may be unaware of the patient’s true EI status, which may be sufficiently poor as to predispose them to failure. In support of this, our results suggest that the most extreme inactivity (i.e., EI) is largely underestimated in routine consultations. Indeed, application of the EI algorithm increased the proportion of the total population with EI status from 12.2%, detected by the patient and physician estimates, to 21.7%. Our results compare favorably with those reported by Schneider et al. [5], who examined daily PA in COPD patients using accelerometry. The detailed analysis of the kinetics and intensity of PA by those authors found that 49% (n = 67) of patients could be defined as “active and non-sedentary” and 26% (n = 35) as “non-active and sedentary”, which compare with 46.8% OA (n = 660) and 34% EI (n = 478) in our study. Nevertheless, further comparisons between studies based on accelerometry measurements and machine learning using non-PA data are beyond the scope of this analysis.

Limitations and strengths

The method we have proposed to define EI status may seem too simplistic compared with objective measurements from accelerometry. Our definition was based on two assumptions: that employing both patient- and physician-derived information would compensate for any imprecision resulting from subjectivity; and that EI status could be predicted from routine clinical data (e.g., behavioral, psychological, symptomatic) that are causes and/or consequences of extreme inactivity. It is important to note that whether the EI status used here would be exactly the same as one derived from accelerometry is ultimately not a crucial factor. The most important intended use of the algorithm is to enable patients with genuine EI status to be identified when the clinical data are equivocal. The best illustration that our assumptions were acceptable is the accuracy of prediction with the test sample of EI and OA patients (n = 166), which had a modest predictive error of 13.7%. Another limitation is that we did not perform accelerometry of the 306 patients with intermediate PA levels who were reclassified by our algorithm as EI. However, various studies have reported that between 10% and 20% of data are routinely missing from accelerometry studies (incomplete measurements or any other reasons) and the patient number included per study rarely exceeds about 100. In addition, considering that >200 pulmonologists from throughout France contributed data to the EI algorithm, any attempt to perform comparative accelerometry would undoubtedly have resulted in an even higher rate of lost or unusable data. We propose that the predictive validity of our predictive algorithm will increase as the size and diversity of the COLIBRI-COPD database increases. Moreover, the addition of new variables to the EI algorithm is technically possible, because the machine learning approach developed for the algorithm is an evolutionary and adaptable process. Examples of potentially influential variables for predicting EI status are psychological and social vulnerability, and regional climate and pollution data [9]. The addition of physiological data, such as functional exercise capacity (6-minute walk test, chair-rising test, grip strength, pedometer readings) could also be valuable, even though these parameters have been shown to be individually unreliable for identifying patients with extremely inactive lifestyles [11].

Interpretation

In conclusion, we report that a predictive machine learning algorithm, developed from routine clinical data collected during online consultations, can identify EI status among patients with all stages of COPD severity. Integration of this algorithm within online consultations via an R-Shiny-python interface [29] could alert the clinician to the frequently overlooked patients who urgently require intervention to promote PA. Thus, it is our hope that the approach proposed here will advance the field of medical decision-making and move it further towards the holy grail of predictive and personalized medicine for COPD patients.

2D plot of the first two canonical discriminant variables accounting for the greatest variation between physical activity categories (red) relative to error.

The two dimensions account for 98.6% of the variance between categories, most (95.8%) of which is due to EI versus OA. The latter is mainly influenced by FEV1/FVC and the former by CAT, DIRECT, and mMRC scores. (TIFF) Click here for additional data file.

Schematic representation of the HyperSMURF method.

HyperSMURF divides the majority class (OA) into n partitions. For each partition, oversampling techniques are used to generate additional patients from the minority class (EI) that closely resemble the distribution of the actual class to amplify the number of training patients from the minority class. At the same time, a comparable number of patients is subsampled from the majority class. HyperSMURF then trains in parallel n random forests on the resulting balanced data sets and finally combines the prediction of the n ensembles according to a hyper-ensemble (ensemble of ensembles) approach. (TIFF) Click here for additional data file.

Box plots (categorical/ordinal values) and line plots (continuous variables) of the marginal effect of a predictor (x-axis) on predicted probability of a patient being assigned to the OA category according to the weighted random forest method (y-axis).

Box plots show the median, minimum, maximum, and interquartile values. See Table 1 for definitions of activity categories. (TIFF) Click here for additional data file.

Distribution of patients classified as GOLD ABCD within the INT and INC physical activity categories.

(PDF) Click here for additional data file. 29 Apr 2021 PONE-D-21-07345 A machine learning approach to predict extreme inactivity in COPD patients using non-activity-related clinical data PLOS ONE Dear Dr. Aguilaniu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jun 13 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Prof. Jeremy Coquart, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please amend your Methods section to add the ethics information and approval numbers that you provided in your Ethics Statement. 3. Thank you for stating the following in the Financial Disclosure section: [The COLIBRI web consultation platform is supported by contractual partnerships with Agir à Dom, AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline and Novartis. BA, DH, and AA received grants from Agir à Dom, AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, and Novartis for the conduct of the study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.]. We note that you received funding from a commercial source: Agir à Dom, AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline and Novartis Please provide an amended Competing Interests Statement that explicitly states this commercial funder, along with any other relevant declarations relating to employment, consultancy, patents, products in development, marketed products, etc. Within this Competing Interests Statement, please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include your amended Competing Interests Statement within your cover letter. We will change the online submission form on your behalf. Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests 4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 5. One of the noted authors is a group or consortium [COLIBRI-COPD program contributors]. In addition to naming the author group, please list the individual authors and affiliations within this group in the acknowledgments section of your manuscript. Please also indicate clearly a lead author for this group along with a contact email address. 6. Please amend the manuscript submission data (via Edit Submission) to include author 'COLIBRI-COPD program contributors'. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Abstract: The rational between the first and second sentences is missing. In other words, why would one predict physical activity level from clinical data using artificial intelligence? Why the algorithm was proposed to identify IE patients in the INT and INC groups is not clear to me. I don't understand how the authors can be sure that the patients have been classified correctly. Introduction: This part is pretty well written. It just lacks, in my opinion, a paragraph on artificial intelligence... because the authors could have proposed simple linear equations to classify their patients. However, they use machine learning. Then, in my opinion, the end of the introduction (from "Since 2013, pulmonologists...") should be in the "method" section instead. The authors indicate what they have done. Method: For clarity, the authors could list each of the 22 specific variables in parentheses. For example, "...in the areas of anthropometry (age, body mass index...). Results: The authors do not present significant differences within groups (p values). This could be added especially for Table 2. This is especially important because the authors state: “Not surprisingly, comparisons of clinicopathological characteristics showed a trend towards worsening health status of patients in the order EI > INT > INC > OA (Table 2).” The final equation (with the weight of each variable: age, BMI, FEV1…) must be presented. Without it, physicians cannot classify their patients, and therefore the study loses its interest. In other words, readers should be able to access the equation that will allow them to tell if their patients are extremely inactive or not... Discussion: In the first sentence of the discussion, the authors say that they have shown the reliability of their method. To me, it does not show that, but rather the validity. As mentioned before, the authors state here that "Depending on the options available to the referring pulmonologist, this algorithm will help in deciding the optimal next step for each patient..." but the authors do not give their algorithm. This should be presented in the article. The authors are aware of the limitations of their study (mainly self-reported information, subjectivity for ranking, lack of GPS measurement...). However, there is another limitation. Indeed, 800 patients is a lot, but not much for machine learning... Often to gain accuracy (actually only 86%), several thousands of data are needed… Again, the authors talk about reliability, but for me it does not test that. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 5 Jul 2021 B. Abstract: Reviewer: The rational between the first and second sentences is missing. In other words, why would one predict physical activity level from clinical data using artificial intelligence? Extreme inactivity contributes to mortality and morbidity in patients with chronic obstructive pulmonary disease (COPD), but patient-reported physical activity (PA) levels are often inaccurate. Here, we describe a machine learning algorithm that uses non-PA-related clinical data to identify extreme inactivity in patients with equivocal PA status. Answer: Indeed, an intermediate sentence is missing. Thank you for the relevance of this observation. We propose a new text. Facilitating the identification of extreme inactivity (EI) has the potential to improve morbidity and mortality in COPD patients. Apart from patients with obvious EI, the identification of a such behavior during a real-life consultation is unreliable. We therefore describe a machine learning algorithm to screen for EI, as actimetric measurements are difficult to implement. Reviewer: Why the algorithm was proposed to identify IE patients in the INT and INC groups is not clear to me. Answer: The previous sentence that has been corrected makes it more explicit why the ML algorithm seeks to identify EI patients among those whose EI behavior is not clinically evident (INT and INC). Thanks again to the reviewer for this remark. We therefore describe a machine learning algorithm to screen for EI, as actimetric measurements are difficult to implement Reviewer: I don't understand how the authors can be sure that the patients have been classified correctly. Answer: The answer to this question is discussed in more detail below when we discuss the difference between reliability and predictive validity. C. Introduction: Reviewer: This part is pretty well written. It just lacks, in my opinion, a paragraph on artificial intelligence Answer: Due to the limited number of words allowed and also because we have clearly detailed the steps in the development of the AI algorithm, we have not found it useful to explain the AI process further Reviewer: ... because the authors could have proposed simple linear equations to classify their patients. However, they use machine learning. Answer: We have indeed approached the problem more classically with multiple logistic regression techniques for binary variables but with less powerful results. Our choice was also motivated by the literature of which we report below 2 examples which compare the 2 predictive approaches. References: 1. Kaitlin Kirasich, Trace Smith and Bivin Sadler (2018). Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets. SMU Data Science Review, Volume 1, Number 3, Article 9. 2. Couronné, R., Probst, P. & Boulesteix, AL. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics 19, 270 (2018). https://doi.org/10.1186/s12859-018-2264-5 Reviewer: Then, in my opinion, the end of the introduction (from "Since 2013, pulmonologists...") should be in the "method" section instead. The authors indicate what they have done. Answer: Indeed, we moved these last sentences (from “Since 2013…”) in the method section opening a short paragraph “Construct of the predictive machine learning” D. Methods: Reviewer: For clarity, the authors could list each of the 22 specific variables in parentheses. For example, "...in the areas of anthropometry (age, body mass index...). Answer: In accordance with the reviewer's recommendation, we specified that the 22 variables considered were presented in Table 2 We selected 1409 patients with comprehensive information on 22 specific variables (see table 2) in the areas of anthropometry, smoking habits, resting pulmonary function, comorbidities, exacerbations during the preceding year, Global Initiative for COPD (GOLD) ABCD classification, and self-reported questionnaires: the modified Medical Research Council dyspnea scale (mMRC) [16] and Disability related to COPD Tool (DIRECT), both of which assess dyspnea [17, 18] ; the COPD Assessment Test (CAT), which assesses quality of life [19] ; and the Hospital Anxiety and Depression Scale, which separately assesses anxiety and depression [20, 21]. E. Results: Reviewer: The authors do not present significant differences within groups (p values). This could be added especially for Table 2. This is especially important because the author’s state: “Not surprisingly, comparisons of clinicopathological characteristics showed a trend towards worsening health status of patients in the order EI > INT > INC > OA (Table 2).” Answer: Classical one-way ANOVA can be seen as a generalization of the t-test for comparing the means of a continuous variable in more than two groups defined by the levels of a discrete covariate, a so-called nominal factor. Testing is then typically done by using the standard F-test. However, when it comes to use an ordinal factor (factor's levels are ordered), the choices for performing an appropriate ANOVA analysis are slim. An alternative test to the classical F-test, taking the ordering of factor levels into account has been developed in [1] using a mixed model’s methodology. Software implementing the proposed ANOVA procedure for factors with ordered levels in the above paper is included in R package ``ordPens'' (see [2]). This function performs analysis of variance when the factor(s) of interest has/have ordinal scale level. For testing, values from the null distribution are simulated. The method uses a mixed effects formulation of the usual one- or multi-factorial ANOVA model (with main effects only) while penalizing (squared) differences of adjacent means. The interested reviewer is referred to the above-mentioned paper for further details. 1. Gertheiss, J. (2014). ANOVA for Factors With Ordered Levels. Journal of Agricultural, Biological, and Environmental Statistics, Volume 19, Number 2, Pages 258–277. https://doi.org/10.1007/s13253-014-0170-5 2. Gertheiss J. (2015) ordPens: Selection and/or Smoothing of Ordinal Predictors. R package version 0.3-1 (2015). Available online at: https://CRAN.R-project.org/package=ordPens Reviewer: The final equation (with the weight of each variable: age, BMI, FEV1…) must be presented. Without it, physicians cannot classify their patients, and therefore the study loses its interest. In other words, readers should be able to access the equation that will allow them to tell if their patients are extremely inactive or not... Answer 1: We do not make logistic regressions but Random Forests. Therefore, there is no equation formula to make the prediction (see references above). The prediction is made from the fitted model which also has the advantage of being able to increase its predictive power by introducing new variables such as a functional capacity test (for example: sit-to-stand or 6 min walking tests, etc.). This point has been in the discussion section: Moreover, the addition of new variables to the EI algorithm is technically possible, because the machine learning approach developed for the algorithm is an evolutionary and adaptable process…. Answer 2 : It is indeed essential that readers have access to the ML. As stated in the conclusion, the return of an individual result is done by a site to which the instantaneous calculation process is attached. Integration of this algorithm within online consultations via an R-Shiny-python interface29 could alert the clinician to the frequently overlooked patients who urgently require intervention to promote PA F. Discussion: Reviewer: In the first sentence of the discussion, the authors say that they have shown the reliability of their method. To me, it does not show that, but rather the validity. Answer: We thank reviewer for this pertinent comment. In the context of predictive statistics, it is indeed better to use the term predictive validity than reliability. Therefore, we modify the text in this sense. In addition, we provide below additional comments (which we do not offer to readers because it deals specifically with statistical considerations remote from medical interests) to convince that the reliability of the statistical model is robust. In addition, we provide below additional elements to convince the reviewer that the reliability of the statistical model is robust; in other words that the predictive validity is very satisfactory. The predictive models developed in our paper have been evaluated in terms of predictive reliability (discrimination), i.e. the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. An accurate probability estimate is crucial for clinical decision making and well-calibrated predictive models are imperative in our case. Unfortunately, even a highly discriminative classifier (e.g., a classifier with a large area under the receiver operating characteristic (ROC) curve, or AUROC) may not be well-calibrated. Various techniques have been proposed to calibrate existing predictive models. We have used in our case the GiViTI calibration belt and associated test which apply to models estimating the probability of binary responses, such as the random forest regression models studied in our paper (see references [1] and [2]). In particular, we have adopted the approach implemented in the R-package GiViTI which allows to evaluate the models’ external calibration, in independent samples (test) different than the training dataset used to fit the model. Calibration belt and test: The GiViTI calibration belt has used to assess the calibration of the fitted Random Forest model. External validation has been be applied on the testing part of the data. The function givitiCalibrationBelt generates the plot displayed in the figure below. By default, the 80%- and 95%-confidence level calibration belt are plotted, in light and dark grey respectively. The table in the bottom-right side of the figure reports the ranges of the predicted probabilities where the belt significantly deviates from the bisector. Notably, the calibration belt contains the bisector (representing the identity between predicted probability and observed response rate) for all predictions in our RF fit. Hence, the RF predictions match the average observed rates in the whole range without any overestimation of the risk of EI or OA patients. The overall calibration of the model is synthesized into the test’s p-value, which is reported in the top-left corner of the figure. In addition, the sample size \\(n=166\\) of the external (test) sample and the polynomial order of the calibration curve (for an explanation see reference [1]) are reported in the plot. The conclusion is that the RF model that was developed is well calibrated and doesn't overestimates the probability of the risk events. [1]. Finazzi, S., D. Poole, D. Luciani, P. E. Cogo, and G. Bertolini. 2011. “Calibration Belt for Quality of Care Assessment Based on Dichotomous Outcome.” PLoS ONE 6 (2): e16110. doi:10.1371/journal.pone.0016110. [2]. Nattino, Giovanni, Stefano Finazzi, and Guido Bertolini. 2014. “A New Calibration Test and a Reappraisal of the Calibration Belt for the Assessment of Prediction Models Based on Dichotomous Outcomes.” Statistics in Medicine. doi:10.1002/sim.6100. Reviewer: As mentioned before, the authors state here that "Depending on the options available to the referring pulmonologist, this algorithm will help in deciding the optimal next step for each patient..." but the authors do not give their algorithm. This should be presented in the article. Answer: As previously mentioned, the ML algorithm delivers the predictive result via a web-based application that instantly calculates the prediction of being or not being EI. This application will be available upon final acceptance of the article. Reviewer: The authors are aware of the limitations of their study (mainly self-reported information, subjectivity for ranking, lack of GPS measurement...). However, there is another limitation. Indeed, 800 patients is a lot, but not much for machine learning... Often to gain accuracy (actually only 86%), several thousands of data are needed… Answer: We are fully aware of the remarks raised by the reviewer. But as it is underlined in the article and in the response to the comments above, our ML is designed to improve the predictive performance as the number of patients included in the Colibri-COPD cohort increases but also by enrichment with new variables of interest Reviewer: Again, the authors talk about reliability, but for me it does not test that. See answer above. Submitted filename: PLOSONE - Response to Reviewers.pdf Click here for additional data file. 28 Jul 2021 A machine learning approach to predict extreme inactivity in COPD patients using non-activity-related clinical data PONE-D-21-07345R1 Dear Dr. Aguilaniu, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Prof. Jeremy Coquart, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thanks to the authors for answering all my questions. Even if all my remarks were not included, the paper could be published now. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No 11 Aug 2021 PONE-D-21-07345R1 A machine learning approach to predict extreme inactivity in COPD patients using non-activity-related clinical data Dear Dr. Aguilaniu: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Jeremy Coquart Academic Editor PLOS ONE

26 in total

1. A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets.

Authors: Dong D Zhang; Xia-Hua Zhou; Daniel H Freeman; Jean L Freeman
Journal: Stat Med Date: 2002-03-15 Impact factor: 2.373

2. An official European Respiratory Society statement on physical activity in COPD.

Authors: Henrik Watz; Fabio Pitta; Carolyn L Rochester; Judith Garcia-Aymerich; Richard ZuWallack; Thierry Troosters; Anouk W Vaes; Milo A Puhan; Melissa Jehn; Michael I Polkey; Ioannis Vogiatzis; Enrico M Clini; Michael Toth; Elena Gimeno-Santos; Benjamin Waschki; Cristobal Esteban; Maurice Hayot; Richard Casaburi; Janos Porszasz; Edward McAuley; Sally J Singh; Daniel Langer; Emiel F M Wouters; Helgo Magnussen; Martijn A Spruit
Journal: Eur Respir J Date: 2014-10-30 Impact factor: 16.671

3. Sitting Time, Physical Activity, and Risk of Mortality in Adults.

Authors: Emmanuel Stamatakis; Joanne Gale; Adrian Bauman; Ulf Ekelund; Mark Hamer; Ding Ding
Journal: J Am Coll Cardiol Date: 2019-04-30 Impact factor: 24.094

4. Sedentary Behaviour and Physical Inactivity in Patients with Chronic Obstructive Pulmonary Disease: Two Sides of the Same Coin?

Authors: Lorena P Schneider; Karina C Furlanetto; Antenor Rodrigues; José R Lopes; Nidia A Hernandes; Fabio Pitta
Journal: COPD Date: 2018-10 Impact factor: 2.409

5. Validity of physical activity monitors during daily life in patients with COPD.

Authors: Roberto A Rabinovich; Zafeiris Louvaris; Yogini Raste; Daniel Langer; Hans Van Remoortel; Santiago Giavedoni; Chris Burtin; Eloisa M G Regueiro; Ioannis Vogiatzis; Nicholas S Hopkinson; Michael I Polkey; Frederick J Wilson; William Macnee; Klaas R Westerterp; Thierry Troosters
Journal: Eur Respir J Date: 2013-02-08 Impact factor: 16.671

6. Minimum clinically important difference for the COPD Assessment Test: a prospective analysis.

Authors: Samantha S C Kon; Jane L Canavan; Sarah E Jones; Claire M Nolan; Amy L Clark; Mandy J Dickson; Brigitte M Haselden; Michael I Polkey; William D-C Man
Journal: Lancet Respir Med Date: 2014-02-04 Impact factor: 30.700

7. BiMM forest: A random forest method for modeling clustered and longitudinal binary outcomes.

Authors: Jaime Lynn Speiser; Bethany J Wolf; Dongjun Chung; Constantine J Karvellas; David G Koch; Valerie L Durkalski
Journal: Chemometr Intell Lab Syst Date: 2019-01-11 Impact factor: 3.491

8. Minimal clinically important difference of 3-minute chair rise test and the DIRECT questionnaire after pulmonary rehabilitation in COPD patients.

Authors: Jonathan Lévesque; Anestis Antoniadis; Pei Zhi Li; Frédéric Herengt; Christophe Brosson; Jean-Marie Grosbois; Alain Bernady; Anthony Bender; Murielle Favre; Antoine Guerder; Pascale Surpas; Thomas Similowski; Bernard Aguilaniu
Journal: Int J Chron Obstruct Pulmon Dis Date: 2019-01-22

9. The PROactive innovative conceptual framework on physical activity.

Authors: Fabienne Dobbels; Corina de Jong; Ellen Drost; Janneke Elberse; Chryssoula Feridou; Laura Jacobs; Roberto Rabinovich; Anja Frei; Milo A Puhan; Willem I de Boer; Thys van der Molen; Kate Williams; Hillary Pinnock; Thierry Troosters; Niklas Karlsson; Karoly Kulich; Katja Rüdell
Journal: Eur Respir J Date: 2014-07-17 Impact factor: 16.671

10. Translation and Cultural Adaptation of PROactive Instruments for COPD in French and Influence of Weather and Pollution on Its Difficulty Score.

Authors: Trija Vaidya; Véronique Thomas-Ollivier; François Hug; Alain Bernady; Camille Le Blanc; Claire de Bisschop; Arnaud Chambellan
Journal: Int J Chron Obstruct Pulmon Dis Date: 2020-03-03

1 in total

Review 1. Artificial Intelligence Techniques to Predict the Airway Disorders Illness: A Systematic Review.

Authors: Apeksha Koul; Rajesh K Bawa; Yogesh Kumar
Journal: Arch Comput Methods Eng Date: 2022-09-28 Impact factor: 8.171

1 in total