| Literature DB >> 34068718 |
Mario Lovrić1, Ivana Banić2, Emanuel Lacić1, Kristina Pavlović1, Roman Kern1,3, Mirjana Turkalj2,4,5.
Abstract
Asthma in children is a heterogeneous disease manifested by various phenotypes and endotypes. The level of disease control, as well as the effectiveness of anti-inflammatory treatment, is variable and inadequate in a significant portion of patients. By applying machine learning algorithms, we aimed to predict the treatment success in a pediatric asthma cohort and to identify the key variables for understanding the underlying mechanisms. We predicted the treatment outcomes in children with mild to severe asthma (N = 365), according to changes in asthma control, lung function (FEV1 and MEF50) and FENO values after 6 months of controller medication use, using Random Forest and AdaBoost classifiers. The highest prediction power is achieved for control- and, to a lower extent, for FENO-related treatment outcomes, especially in younger children. The most predictive variables for asthma control are related to asthma severity and the total IgE, which were also predictive for FENO-based outcomes. MEF50-related treatment outcomes were better predicted than the FEV1-based response, and one of the best predictive variables for this response was hsCRP, emphasizing the involvement of the distal airways in childhood asthma. Our results suggest that asthma control- and FENO-based outcomes can be more accurately predicted using machine learning than the outcomes according to FEV1 and MEF50. This supports the symptom control-based asthma management approach and its complementary FENO-guided tool in children. T2-high asthma seemed to respond best to the anti-inflammatory treatment. The results of this study in predicting the treatment success will help to enable treatment optimization and to implement the concept of precision medicine in pediatric asthma treatment.Entities:
Keywords: asthma control; asthma controller medication; childhood asthma; machine learning; treatment outcome
Year: 2021 PMID: 34068718 PMCID: PMC8151683 DOI: 10.3390/children8050376
Source DB: PubMed Journal: Children (Basel) ISSN: 2227-9067
The variables used in this study, described in more detail in the supplementary file.
| Variable Group | Description |
|---|---|
| demographics | gender, age |
| subjective | at baseline (t0)-personal and family medical history-atopy status, allergic rhinitis (AR), atopic dermatitis (AD), food allergy and other comorbidities |
| objective | at baseline (t0) and after 6 months (t0 + 6)-symptom control, frequency and severity of exacerbations in the period since the last visit, lung function (FVC, FEV1, MEF50), airway inflammation (FENO) measurement and medication use; at baseline (t0)- skin prick (SPT) and total and specific IgE to inhaled allergens, blood eosinophils and neutrophils, anthropometric measures (height, weight, body mass index) and for certain patients with suggestive history for comorbidities -ENT examination, pH probing with impedance for diagnostics of laryngopharyngeal reflux and gastroesophageal reflux disease, polysomnography for diagnostics of obstructive sleep apnea syndrome, SPT and specific IgE to food and insect venom allergens for diagnostics of food/insect venom allergy |
| genetic data | genotypes for rs37973 (GLCCI1), rs9910408 (TBX21), rs242941 (CRHR1), rs1876828 (CRHR1), rs1042713 (ADRB2) and rs17576 (MMP9) (see |
AR: allergic rhinitis, AD: atopic dermatitis, FVC: forced vital capacity, FEV1- forced expiratory volume in one second, MEF50- maximal expiratory flow at 50% of the vital flow capacity, FENO- Fractional Exhaled nitric oxide, SPT: skin prick test, IgE: immunoglobulin E, ENT: ear/nose/throat, GLCCI1: glucocorticoid-induced 1, TBX21: t-box 21, CRHR1: corticotropin releasing hormone receptor 1, ADRB2: beta-2 adrenergic receptor and MMP9: matrix metalloproteinase-9. A full variable list is given in the supplementary file (see Table S3).
Patient stratification according to their response to treatment (target variables). Response to treatment is defined into more detail in the supplementary file. Ppb: parts per billion.
| Class | FEV1 | MEF50 | FENO | Asthma Control |
|---|---|---|---|---|
|
| Increase ≥ 10% predicted | Increase ≥ 15% predicted | Decrease < 20% for values > 35 (50) ppb or < 10 ppb for values < 35 (50) ppb | Improvement in asthma control |
|
| Change < 10% predicted | Change < 15% predicted | Decrease ≤ 20% FENO ≤ 20% for values over 35 (50) ppb or ± 10 ppb for values < 35 (50) ppb or increase >20% for values > 35 (50) ppb or > 10 ppb for values < 35 (50) ppb | No changes in partial asthma control or deterioration in asthma control |
FEV1—forced expiratory volume in one second, MEF50—maximal expiratory flow at 50% of the vital flow capacity, FENO—Fractional Exhaled nitric oxide.
Distribution of the responders and non-responders per measured outcome after 6 months of treatment. The 13 missing FENO response values were imputed. t0 + 6 is the timepoint for 6 months after the start of the treatment.
| Treatment Outcome (t0 + 6) | Responders (1) | Non-Responders (0) |
|---|---|---|
| LOAC (t0 + 6) | 230 | 135 |
| FENO (t0 + 6) | 248 | 104 |
| FEV1 (t0 + 6) | 129 | 236 |
| MEF50 (t0 + 6) | 126 | 239 |
FEV1—forced expiratory volume in one second, MEF50—maximal expiratory flow at 50% of the vital flow capacity, FENO—Fractional Exhaled nitric oxide, LOAC—level of asthma control.
Figure 1(Left) A typical (weak) tree classifier. A group of patients with both responders and non-responders is to be separated based on the given predictive variables. The first split happens with a variable that gives the best split for the two groups. The algorithms split the groups until it gets leaf nodes (tree bottom) as pure as possible for the aimed classes. (Right) Simplified scheme of the random forest algorithm. The trees represent weak classifiers that are aggregated via voting to form a strong one. Every tree trains on a random part of the training data (bootstrapping). The AdaBoost classifier trains the trees sequentially instead of parallel.
The experimental matrix consists of training two different classifiers × three sampling methods × four targets, meaning there are 24 options (2 × 3× 4), each resampled 100× train–test splitting (giving a total of 600 models trained per target).
| 1. Classification | 2. Sampling | 3. Targets After Six |
|---|---|---|
| (a) AdaBoost | (a) No sampling (base) | (a) MEF50 (t0 + 6) |
| (b) Random Forest | (b) Under sampling (cluster centroids) | (b) FEV1 (t0 + 6) |
| (c) Oversampling | (c) FENO (t0 + 6) | |
| (d) LOAC (t0 + 6) |
FEV1—forced expiratory volume in one second, MEF50- maximal expiratory flow at 50% of the vital flow capacity, FENO—Fractional Exhaled nitric oxide, LOAC—level of asthma control.
Average classification results for the treatment targets FEV1, FENO, MEF50 and LOAC. The results are reported for the best performing model (classifier and sampling method) and are calculated by the mean of the accuracy, specificity, sensitivity and the MCC.
| FEV1 | FENO | MEF50 | LOAC | |
|---|---|---|---|---|
| Accuracy | 0.6503 | 0.7005 | 0.6753 | 0.9698 |
| Specificity | 0.8986 | 0.8531 | 0.8817 | 0.9661 |
| Sensitivity | 0.7854 | 0.9560 | 0.7855 | 0.9781 |
| MCC | 0.2190 | 0.2146 | 0.2608 | 0.9366 |
MCC is the Matthews correlation coefficient. FEV1—forced expiratory volume in one second, MEF50—maximal expiratory flow at 50% of the vital flow capacity, FENO—Fractional Exhaled nitric oxide, LOAC—level of asthma control.
Figure 2Boxplot of the classification results by means of the MCC (x-axis). The comparison includes the classification results for the four targets (FEV1, LOAC, FENO and MEF50); two classification algorithms (AB—Ada Boost, RF—Random Forest) and two sampling methods (Oversampling: OS and Cluster Centroids: CC) compared to no sampling (base). The best models are assigned per target by a red square surrounding the box. FEV1—forced expiratory volume in one second, MEF50—maximal expiratory flow at 50% of the vital flow capacity, FENO—Fractional Exhaled nitric oxide, LOAC—level of asthma control, MCC—Matthews correlation coefficient.
Top important variables for each of the targets. The variables were aggregated by the median value of the permutation importance per target (600 runs each). The permutation importance is divided by the respective Matthews correlation coefficient (MCC) value from Table 5 and calculated as the % of weight respective to the MCC, i.e., contribution to MCC. For each target, only several variables returned an aggregated median above 1%. hsCRP: high-sensitivity C-reactive protein and t0: baseline.
| Variable | LOAC | FENO | FEV1 | MEF50 |
|---|---|---|---|---|
| Seasonal allergens (SPT) | 1.1% | |||
| Asthma severity (t0) | 47.0% | |||
| hsCRP | 1.2% | |||
| IgE total | 1.5% | 3.2% | ||
| FENO (t0) | 12.8% | |||
| FEV1 (t0) | 14.8% | 1.8% | ||
| MEF50 (t0) | 8.2% | 30.3% |
Figure 3An exemplary decision tree classifier where the treatment outcome LOAC after six months was predicted by three predictive variables (LOAC baseline, Asthma severity baseline and IGE_total). The responders are assigned as R-LOAC (responders by level of asthma control), while the non-responders are assigned as NR-LOAC (non-responders by level of asthma control). The asthma severity baseline is the first split. Most of the responders will respond well to treatment if their asthma severity was estimated to have a value of 1. In an ensemble classifier, a few hundred of these are trained on the bootstrapped samples and averaged for prediction, which is explained in Figure 1. Ass_asthma_sev_basline: asthma severity (according to GINA) grade assessed at baseline, ass_asthma_ctrl_basline: asthma control assessed at baseline and biom_ige_total: total serum IgE.
Analysis of model results stratified by age. Each MCC value is the median of 200 models. The patients are grouped in age groups (2–5, 6–11, 12–17 and 18+). The number of patients per age group is given in the column “No. Patients”. Superscripts b and w are used to describe the best and worst results per response, respectively.
| Response | Age Group | MCC (All Models) | No. Patients |
|---|---|---|---|
| LOAC | 2–5 y/o | b 1 | 53 |
| 6–11 y/o | 0.96 | 178 | |
| 12–17 y/o | 0.89 | 124 | |
| >18 y/o | w 0.61 | 10 | |
| FENO | 2–5 y/o | w 0 | 53 |
| 6–11 y/o | b 0.14 | 178 | |
| 12–17 y/o | 0.1 | 124 | |
| >18 y/o | w 0 | 10 | |
| FEV1 | 2–5 y/o | 0.12 | 53 |
| 6–11 y/o | b 0.27 | 178 | |
| 12–17 y/o | 0.08 | 124 | |
| >18 y/o | w 0 | 10 | |
| MEF50 | 2–5 y/o | b 0.29 | 53 |
| 6–11 y/o | 0.18 | 178 | |
| 12–17 y/o | 0.25 | 124 | |
| >18 y/o | w 0 | 10 |
No. Patients—number of patients, y/o—years old.
Comparison of the ensemble models to logistic regression. The median MCC values across all the models grouped by the target variables are compared.
| Response | Sampling | Logistic Regression | AdaBoost | Random Forest |
|---|---|---|---|---|
| FENO | Base | 0.07 | 0.21 | 0.07 |
| CC | −0.02 | 0.07 | 0.10 | |
| OS | 0.05 | 0.17 | 0.13 | |
| FEV1 | Base | 0.00 | 0.22 | 0.14 |
| CC | 0.03 | 0.14 | 0.15 | |
| OS | 0.03 | 0.18 | 0.19 | |
| LOAC | Base | 0.19 | 0.94 | 0.90 |
| CC | 0.04 | 0.89 | 0.90 | |
| OS | 0.17 | 0.93 | 0.90 | |
| MEF50 | Base | −0.01 | 0.23 | 0.19 |
| CC | −0.02 | 0.14 | 0.12 | |
| OS | 0.03 | 0.24 | 0.26 |
CC—cluster centroids, OS—oversampling (both explained in Section 2.3).