| Literature DB >> 24131308 |
Mattia C F Prosperi1, Danielle Belgrave, Iain Buchan, Angela Simpson, Adnan Custovic.
Abstract
BACKGROUND: Identifying different patterns of allergens and understanding their predictive ability in relation to asthma and other allergic diseases is crucial for the design of personalized diagnostic tools.Entities:
Keywords: Bayesian networks; IgE; airway hyper-reactivity; asthma; children; component-resolved diagnostics; feature selection; logistic regression; machine learning; methacholine; random forests; rhinitis; wheeze
Mesh:
Substances:
Year: 2013 PMID: 24131308 PMCID: PMC4282342 DOI: 10.1111/pai.12139
Source DB: PubMed Journal: Pediatr Allergy Immunol ISSN: 0905-6157 Impact factor: 6.377
Characteristics of the study population at age 11 (N = 426)
| At least one IgE > 0 | Median (IQR) or N | Missing | ||
|---|---|---|---|---|
| Male 154 | Female 84 | Total 238 | N/A | |
| N1 = 238 (51.6%) | ||||
| Number of specific positive IgE (>0.3 ISU) | 7.5 (3.0–14.5) | 7.0 (2.0–12.0) | 7.0 (3.0–13.0) | N/A |
| Sum of all IgE | 34.0 (8.4–130.2) | 38.9 (1.9–132.8) | 36.4 (4.9–131.4) | N/A |
| Asthma | 33 | 15 | 48 | 4 |
| Eczema | 36 | 22 | 58 | 5 |
| Mean eNO | 14.6 (8.8–33.8) | 19.9 (9.8–41.7) | 17.6 (9.4–37.8) | 62 |
| FVC | 2.7 (2.4–3.0) | 2.5 (2.3–2.9) | 2.6 (2.4–3.0) | 3 |
| % predicted FEV1 | 99.2 (91.0–107.4) | 98.5 (92.2–103.7) | 99.0 (91.6–106.0) | 3 |
| Current wheeze | 48 | 25 | 73 | 3 |
| AHR | 62 | 31 | 93 | 59 |
| Rhino-conjunctivitis | 63 | 28 | 91 | 3 |
| Methacholine dose-response ratio | 3.20 (1.01–5.47) | 3.48 (0.95–6.11) | 0.98 (3.28–5.84) | 57 |
| FEV/FVC ratio | 0.86 (0.81–0.90) | 0.89 (0.84–0.92) | 0.86 (0.82–0.91) | 4 |
Performance of statistical learning models by means of 50 independent validation runs, stratified by different outcomes
| Outcome | Method | Feature set | Feature/topology selection | AUROC (s.d.) | Sensitivity (s.d.) | Specificity (s.d.) |
|---|---|---|---|---|---|---|
| Asthma | Majority class | N/A | N/A | 0.50 (0.00) | 0.00 (0.00) | |
| LR | Number of positive IgE + sum of all IgE | N/A | 0.71 (0.10) | 0.96 (0.03) | 0.20 (0.12) | |
| LR | 112 IgE + gender | Cross-validated LogitBoost | 0.79 (0.08) | 0.95 (0.03) | ||
| DT | 112 IgE + gender | Embedded (information gain, pruning) | 0.59 (0.10) | 0.96 (0.06) | 0.14 (0.16) | |
| RF | 112 IgE + gender | Embedded (Gini index, random subset) | 0.97 (0.04) | 0.34 (0.13) | ||
| NB | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.76 (0.08) | 0.91 (0.05) | 0.38 (0.15) | |
| BN | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.77 (0.07) | 0.96 (0.03) | 0.32 (0.15) | |
| Wheeze | Majority class | N/A | N/A | 0.50 (0.00) | 0.00 (0.00) | |
| LR | Number of positive IgE + sum of all IgE | N/A | 0.67 (0.06) | 0.93 (0.04) | 0.13 (0.08) | |
| LR | 112 IgE + gender | Cross-validated LogitBoost | 0.72 (0.07) | 0.94 (0.05) | ||
| DT | 112 IgE + gender | Embedded (information gain, pruning) | 0.61 (0.08) | 0.90 (0.09) | 0.29 (0.20) | |
| RF | 112 IgE + gender | Embedded (Gini index, random subset) | 0.91 (0.05) | 0.45 (0.12) | ||
| NB | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.69 (0.06) | 0.92 (0.06) | 0.30 (0.12) | |
| BN | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.65 (0.07) | 0.93 (0.10) | 0.29 (0.12) | |
| Rhino-conjunctivitis | Majority class | N/A | N/A | 0.50 (0.00) | 0.00 (0.00) | |
| LR | Number of positive IgE + sum of all IgE | N/A | 0.84 (0.07) | 0.44 (0.11) | ||
| LR | 112 IgE + gender | Cross-validated LogitBoost | 0.73 (0.07) | 0.79 (0.09) | ||
| DT | 112 IgE + gender | Embedded (information gain, pruning) | 0.66 (0.06) | 0.81 (0.11) | 0.47 (0.17) | |
| RF | 112 IgE + gender | Embedded (Gini index, random subset) | 0.78 (0.07) | 0.80 (0.08) | 0.57 (0.11) | |
| NB | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.75 (0.07) | 0.88 (0.06) | 0.41 (0.12) | |
| BN | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.73 (0.07) | 0.82 (0.08) | 0.50 (0.12) | |
| AHR | Majority class | N/A | N/A | 0.50 (0.00) | 0.00 (0.00) | |
| LR | Number of positive IgE + sum of all IgE | N/A | 0.57 (0.11) | 0.70 (0.19) | 0.39 (0.20) | |
| LR | 112 IgE + gender | Cross-validated LogitBoost | 0.64 (0.05) | 0.55 (0.18) | 0.70 (0.23) | |
| DT | 112 IgE + gender | Embedded (information gain, pruning) | 0.64 (0.10) | 0.55 (0.18) | 0.70 (0.21) | |
| RF | 112 IgE + gender | Embedded (Gini index, random subset) | 0.69 (0.11) | |||
| NB | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.64 (0.10) | 0.62 (0.20) | 0.57 (0.25) | |
| BN | 112 IgE + gender | Cross-validated wrapper (best-first search, K2) | 0.56 (0.06) | 0.58 (0.21) | 0.54 (0.29) |
AHR, airway hyper-reactivity; AUROC, area under the receiver operating characteristic; BN, Bayesian networks; DT, Decision tree; LR, Logistic regression; NB, naïve Bayes; RF, random forests.
The hypothesis of difference in means comparing against the best model (in bold) could not be rejected at p = 0.05.
Figure 1Performance of statistical learning models in classifying the asthma and rhino-conjunctivitis outcomes, using the full feature set (112 IgE + gender) by means of area under the receiver operating characteristic, across 50 independent validation (80%/20%) runs. Results are out-of-sample predictions (i.e., on unseen data). Bars represent standard errors.
Figure 2Feature importance plots for the asthma outcome (upper panel) and for the rhino-conjunctivitis outcome (lower panel) measured as mean decrease in accuracy from fitting a random forest and performing an outcome permutation test (1000 runs). Green intervals represent rescaled average (±standard deviation) decrease in accuracy, whilst box plots represent the null distribution (randomized outcomes); p-values are highlighted in red. Only the first 10 variables shown.
Figure 3Multivariable logistic regression for asthma and rhino-conjunctivitis outcomes (upper and lower panel, respectively), showing mutually adjusted odds ratios and associated p-values from the LogitBoost algorithm (run on the whole data set). Only variables significant in the univariate analysis were included (p < 0.05).
Figure 4Bayesian networks (BN) for the classification of asthma and rhino-conjunctivitis. Upper panel shows the naïve Bayes (NB) models (hypothesizing variable independence, it can be abstracted to a main-effect logistic model, that is, a linear score where each variable has a weight), and lower panel shows the BN that allow for more complex (direct and indirect) conditional dependencies. Given the non-superiority of the more complex BN model as compared to the NB on the current data set, one could choose the NB hypothesis and further evaluate different variable sets, as in main-effects logistic regression.