| Literature DB >> 34769911 |
Bo Lan1, Perry Haaland2, Ashok Krishnamurthy3,4, David B Peden5,6, Patrick L Schmitt3, Priya Sharma3, Meghamala Sinha7, Hao Xu3, Karamarie Fecho3,8.
Abstract
ICEES (Integrated Clinical and Environmental Exposures Service) provides a disease-agnostic, regulatory-compliant approach for openly exposing and analyzing clinical data that have been integrated at the patient level with environmental exposures data. ICEES is equipped with basic features to support exploratory analysis using statistical approaches, such as bivariate chi-square tests. We recently developed a method for using ICEES to generate multivariate tables for subsequent application of machine learning and statistical models. The objective of the present study was to use this approach to identify predictors of asthma exacerbations through the application of three multivariate methods: conditional random forest, conditional tree, and generalized linear model. Among seven potential predictor variables, we found five to be of significant importance using both conditional random forest and conditional tree: prednisone, race, airborne particulate exposure, obesity, and sex. The conditional tree method additionally identified several significant two-way and three-way interactions among the same variables. When we applied a generalized linear model, we identified four significant predictor variables, namely prednisone, race, airborne particulate exposure, and obesity. When ranked in order by effect size, the results were in agreement with the results from the conditional random forest and conditional tree methods as well as the published literature. Our results suggest that the open multivariate analytic capabilities provided by ICEES are valid in the context of an asthma use case and likely will have broad value in advancing open research in environmental and public health.Entities:
Keywords: asthma; biostatistics; conditional random forest; conditional tree; epidemiology; generalized linear model; machine learning; open data; open science; public health
Mesh:
Year: 2021 PMID: 34769911 PMCID: PMC8582932 DOI: 10.3390/ijerph182111398
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Flow chart depicting the sequential data processing steps taken to generate a final ICEES multivariate dataset for model development.
Summary statistics for final ICEES multivariate dataset used for model development.
| Feature Variable (ICEES Feature Variable Name) | Variable Enumeration | |
|---|---|---|
| Total Annual ED or Inpatient Visits for Respiratory Issues | 1 | 9633 (67.6%) |
| 2 | 2677 (18.8%) | |
| 3 | 1067 (7.5%) | |
| 4 | 492 (3.5%) | |
| 5 | 210 (1.5%) | |
| 6 | 115 (0.8%) | |
| 7 | 43 (0.3%) | |
| 8 | 13 (0.1%) | |
| Sex (Sex2) | 0 (Male) | 6231 (43.7%) |
| 1 (Female) | 8019 (56.3%) | |
| Race (Race) | Caucasian | 8457 (59.3%) |
| African American | 4111 (28.9%) | |
| Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other | 1682 (11.8%) | |
| Prescription or Administration of Prednisone (Prednisone) 1 | 0 (No) | 12,978 (91.1%) |
| 1 (Yes) | 1272 (8.9%) | |
| Diagnosis of Obesity (ObesityDx) 2 | 0 (No) | 12,859 (90.2%) |
| 1 (Yes) | 1391 (9.8%) | |
| Maximum Daily PM2.5 Exposure, µg/m3 (MaxDailyPM2.5Exposure_StudyMax) | 1 (6.58, 25.09) | 878 (6.2%) |
| 2 (25.09, 98.76) | 13,372 (93.8%) | |
| Residential Distance to a Major Roadway or Highway, Meters (RoadwayDistanceExposure2) | 1 (0–49) | 1629 (11.4%) |
| 2 (50–99) | 1008 (7.1%) | |
| 3 (100–149) | 878 (6.2%) | |
| 4 (150–199) | 922 (6.5%) | |
| 5 (200–249) | 861 (6.0%) | |
| 6 (≥250) | 8952 (62.8%) | |
| Estimated Residential Density, Persons (EstResidentialDensity) 3 | 1 (0, 2500) | 10,321 (72.4%) |
| 2 (2500, 50,000) | 3929 (27.6%) | |
| 3 (50,000, infinity) | 0 (0.0%) | |
|
| 14,250 | |
Abbreviations: ED, emergency department; PM2.5, particulate matter ≤ 2.5 µm in diameter. 1 One or more prescriptions for prednisone over one-year study period. 2 One or more diagnoses for obesity over one-year study period. 3 US Census Bureau American Community Survey 2007–2011 estimated total population (block group), binned according to U.S. Census Bureau definitions (1, rural (0, 2500); 2, urban cluster (2500, 50,000); 3, urbanized area (50,000, inf).
Figure 2Predictor variable importance, as determined by CRF analysis. The cforest algorithm within Package party in R was applied to the ICEES multivariate dataset (see Figure 1 and Table 1) to determine the most important predictor variables in relation to the dependent variable of total annual ED or inpatient visits for respiratory issues. The vertical dashed line represents the significance threshold; values to the right of the line are significant. Variables are defined in Table 1.
Figure 3Predictor variable importance and interactions as determined by CTree analysis. The CTree algorithm within Package party in R was applied to the ICEES multivariate dataset (see Figure 1 and Table 1) to confirm the most important predictor variables determined by the CRF analysis in terms of relation to the dependent variable of total annual ED or inpatient visits for respiratory issues and to identify significant interactions between predictor variables. The enumeration levels for each variable are indicated on each branch of the tree. Variables are defined in Table 1.
GLM scenario comparison.
| Scenario 1 | Poisson Regression | Lasso Poisson Regression | ||
|---|---|---|---|---|
| AIC | BIC | AIC | BIC | |
| Scenario 1: main effects, no interaction terms | 39,548 | 39,593 | 39,559 | 39,612 |
| Scenario 2: main effects, top significant two-way interaction identified by CTree analysis |
|
| 39,559 | 39,627 |
| Scenario 3: main effects, all significant two-way interactions identified by CTree analysis | 39,525 | 39,601 | 39,563 | 39,608 |
| Scenario 4: main effects, all significant two- and three-way interactions identified by CTree analysis | 39,520 | 39,618 | 39,544 | 39,604 |
Abbreviations: AIC, Akaike information criterion; BIC, Bayesian information criterion; CTree, conditional tree; GLM, generalized linear model. 1 Refer to text for complete description of each scenario. The final GLM model that we applied is described in Equation (3): Y = Exp(0.0916 + β1Race + β2MaxDailyPM2.5Exposure_StudyMax_cat + β3(Prednisone * Race) + β4ObesityDx) (3), where Y = predicted annual number of ED or inpatient visits for respiratory issues; β1 = estimated model parameters for Race; β2 = estimated model parameters for MaximumDailyPM2.5Exposure_StudyMax; β3 = estimated model parameters for interaction between Prednisone and Race; and β4 = estimated model parameters for ObesityDx. Bold font was used to highlight the metrics that were used to select the final model.
Final GLM model output.
| Parameter | Estimate | Standard Error | ||
|---|---|---|---|---|
| Intercept | 0.0916 | 0.0366 | 0.0122 | |
| Race = African American | 0.1491 | 0.0254 | <0.0001 | |
| Race = Caucasian | 0.1685 | 0.0235 | <0.0001 | |
| Race = Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other | 0 | 0 | . | |
| MaxDailyPM2.5Exposure_StudyMax = 2 | 0.1869 | 0.0304 | <0.0001 | |
| MaxDailyPM2.5Exposure_StudyMax = 1 | 0 | 0 | . | |
| ObesityDx = 1 | 0.0966 | 0.0217 | <0.0001 | |
| ObesityDx = 0 | 0 | 0 | . | |
| Race = African American | Prednisone = 1 | 0.2062 | 0.0385 | <0.0001 |
| Race = African American | Prednisone = 0 | 0 | 0 | . |
| Race = Caucasian | Prednisone = 1 | 0.294 | 0.0258 | <0.0001 |
| Race = Caucasian | Prednisone = 0 | 0 | 0 | . |
| Race = Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other | Prednisone = 1 | −0.2785 | 0.1417 | 0.0493 |
| Race = Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other | Prednisone = 0 | 0 | 0 | . |
Abbreviation: GLM, generalized linear model.
Observed versus predicted mean annual ED or inpatient visits for respiratory issues.
| Parameter |
| Mean 1 | Std Dev (Observed) | Std Dev (Predicted) | |
|---|---|---|---|---|---|
| Race = African American | 4111 | 1.58 | 1 | 0.14 | |
| Race = Caucasian | 8457 | 1.61 | 1.11 | 0.18 | |
| Race = Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other | 1682 | 1.31 | 0.64 | 0.07 | |
| MaxDailyPM2.5Exposure_StudyMax = 2 | 13372 | 1.58 | 1.06 | 0.18 | |
| MaxDailyPM2.5Exposure_StudyMax = 1 | 878 | 1.31 | 0.58 | 0.11 | |
| ObesityDx = 1 | 1391 | 1.73 | 1.05 | 0.17 | |
| ObesityDx = 0 | 12859 | 1.55 | 1.03 | 0.18 | |
| Race = African American | Prednisone = 1 | 400 | 1.91 | 1.11 | 0.07 |
| Race = African American | Prednisone = 0 | 3711 | 1.54 | 0.98 | 0.08 |
| Race = Caucasian | Prednisone = 1 | 821 | 2.1 | 1.42 | 0.1 |
| Race = Caucasian | Prednisone = 0 | 7636 | 1.56 | 1.05 | 0.09 |
| Race = Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other | Prednisone = 1 | 51 | 1 | 0 | 0 |
| Race = Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other | Prednisone = 0 | 1631 | 1.32 | 0.64 | 0.05 |
Abbreviation: ED, emergency department; Std Dev, standard deviation. 1 Observed versus predicted mean annual ED or inpatient visits for respiratory issues are identical for the applied model, but standard deviation estimates differ.