| Literature DB >> 35875189 |
Karamarie Fecho1, Perry Haaland2, Ashok Krishnamurthy1,3, Bo Lan4, Stephen A Ramsey5, Patrick L Schmitt1, Priya Sharma1, Meghamala Sinha5, Hao Xu1.
Abstract
The Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-compliant open access to sensitive patient data that have been integrated with public exposures data. ICEES was designed initially to support dynamic cohort creation and bivariate contingency tests. The objective of the present study was to develop an open approach to support multivariate analyses using existing ICEES functionalities and abiding by all regulatory constraints. We first developed an open approach for generating a multivariate table that maintains contingencies between clinical and environmental variables using programmatic calls to the open ICEES application programming interface. We then applied the approach to data on a large cohort (N = 22,365) of patients with asthma or related conditions and generated an eight-feature table. Due to regulatory constraints, data loss was incurred with the incorporation of each successive feature variable, from a starting sample size of N = 22,365 to a final sample size of N = 4,556 (20.4%), but data loss was < 10% until the addition of the final two feature variables. We then applied a generalized linear model to the subsequent dataset and focused on the impact of seven select feature variables on asthma exacerbations, defined as annual emergency department or inpatient visits for respiratory issues. We identified five feature variables-sex, race, obesity, prednisone, and airborne particulate exposure-as significant predictors of asthma exacerbations. We discuss the advantages and disadvantages of ICEES open multivariate analysis and conclude that, despite limitations, ICEES can provide a valuable resource for open multivariate analysis and can serve as an exemplar for regulatory-compliant informatic solutions to open patient data, with capabilities to explore the impact of environmental exposures on health outcomes.Entities:
Keywords: Asthma; Environmental exposures; Environmental health; Generalized linear model; Open clinical data; Open science
Year: 2021 PMID: 35875189 PMCID: PMC9302917 DOI: 10.1016/j.imu.2021.100733
Source DB: PubMed Journal: Inform Med Unlocked ISSN: 2352-9148
Feature variables used to generate multivariate table.
| Feature Variable | Variable Definition and Enumeration |
|---|---|
| TotalEDInpatientVisits |
|
| Sex2 |
|
| Race |
|
| Prednisone |
|
| ObesityDx |
|
| MaxDailyPM2.5Exposure_StudyMax | |
| RoadwayDistanceE.xposure2 |
|
| EstResidentialDensity |
|
Abbreviations: PM2.5 = particulate matter ≤2.5-μm in diameter.
Fig. 1.High-level overview of process for generating an ICEES multivariate table by application of dynamic cohort creation and nested bivariate contingencies. Levels or bins for each variable are defined in source documentation available from the OpenAPI and also accessible as an ICEES OpenAPI endpoint. See Table 1 for the feature variable definitions and enumeration used in this study.
Fig. 2.Example ICEES tri-variate table, with rows in aggregate form representing the number of patients sharing the characteristics defined in each column. Each row can thus be duplicated to represent N = 1 patient.
Fig. 3.Excerpt from ICEES eight-feature multivariate table. The frequency column allows users to generate patient-level rows by, for instance, creating six separate rows for the features defined in row two and assigning a pseudo-identifier to each row.
Quantification of data loss with ICEES open multivariate approach.[a]
| Feature Variable Added[ | Total ICEES Rows (N) | Maximum Possible Rows (N) | Missing Rows (N) | Missing Rows/Maximum Possible Rows (%) |
|---|---|---|---|---|
|
|
|
|
|
|
| Sex2 | 22365 | 22365 | 0 | 0 |
| Race | 22365 | 22365 | 0 | 0 |
| Prednisone | 22365 | 22365 | 0 | 0 |
| ObesityDx | 22208 | 22361 | 153 | 0.68 |
| MaxDailyPM2.5Exposure_StudyMax | 15861 | 17390 | 1529 | 8.79 |
| RoadwayDistanceExposure2 | 5022 | 8262 | 3240 | 39.2 |
| EstResidentialDensity | 4556 | 10615 | 6059 | 57.1 |
Starting sample size before filtering for patients who were active in the ‘study’ period (calendar year 2010): N = 163302.
Feature variables were added in the order listed, following the schema shown in Fig. 1.
ANOVA results for GLM model with main effects and 2-way interactions.[a]
| Main Effect or Interaction | df | Deviance | Residual df | Residual Deviance | Sig | |
|---|---|---|---|---|---|---|
| NULL | 14936 | 8796.1 | ||||
| Sex2 | 1 | 16.701 | 14935 | 8779.4 | 4.376e-05 |
|
| Race | 5 | 141.052 | 14930 | 8638.3 | < 2.2e-16 |
|
| Prednisone | 1 | 153.832 | 14929 | 8484.5 | < 2.2e-16 |
|
| ObesityDx | 1 | 28.412 | 14928 | 8456.1 | 9.806e-08 |
|
| MaxDailyPM2_5Exposure_StudyMax | 2 | 36.204 | 14926 | 8419.9 | 1.375e-08 |
|
| RoadwayDistanceExposure2 | 5 | 9.601 | 14921 | 8410.3 | 0.087363 | |
| EstResidentialDensity | 1 | 0.274 | 14920 | 8410 | 0.600541 | |
| Sex2:Race | 5 | 9.395 | 14915 | 8400.6 | 0.094305 | |
| Sex2:Prednisone | 1 | 0.871 | 14914 | 8399.7 | 0.35066 | |
| Sex2:ObesityDx | 1 | 6.249 | 14919 | 8403.4 | 0.012426 |
|
| Sex2:MaxDailyPM2_5Exposure_StudyMax | 2 | 0.428 | 14906 | 8363 | 0.80749 | |
| Sex2:RoadwayDistanceExposure2 | 5 | 2.997 | 14896 | 8347.6 | 0.700431 | |
| Sex2:EstResidentialDensity | 1 | 2.814 | 14855 | 8314.8 | 0.093454 | |
| Race:Prednisone | 2 | 18.555 | 14912 | 8381.2 | 9.351e-05 |
|
| Race:ObesityDx | 2 | 1.35 | 14909 | 8373.5 | 0.509129 | |
| Race:MaxDailyPM2_5Exposure_StudyMax | 3 | 1.589 | 14903 | 8361.4 | 0.661954 | |
| Race:RoadwayDistanceExposure2 | 25 | 9.65 | 14871 | 8338 | 0.997493 | |
| Race:EstResidentialDensity | 5 | 1.129 | 14850 | 8313.7 | 0.951543 | |
| Prednisone:ObesityDx | 1 | 10.07 | 14908 | 8363.5 | 0.001507 |
|
| Prednisone:MaxDailyPM2_5Exposure_StudyMax | 1 | 9.507 | 14902 | 8351.9 | 0.002047 | |
| Prednisone:RoadwayDistanceExposure2 | 5 | 14.696 | 14866 | 8323.3 | 0.011744 | |
| Prednisone:EstResidentialDensity | 1 | 0.175 | 14849 | 8313.5 | 0.676105 | |
| ObesityDx:MaxDailyPM2_5Exposure_StudyMax | 1 | 1.313 | 14901 | 8350.6 | 0.251903 | |
| ObesityDx:RoadwayDistanceExposure2 | 5 | 2 | 14861 | 8321.3 | 0.849104 | |
| ObesityDx:EstResidentialDensity | 1 | 0.658 | 14848 | 8312.8 | 0.417263 | |
| MaxDailyPM2_5Exposure_StudyMax: RoadwayDistanceExposure2 | 5 | 3.681 | 14856 | 8317.6 | 0.596173 | |
| MaxDailyPM2_5Exposure_StudyMax: EstResidentialDensity | 1 | 0.115 | 14847 | 8312.7 | 0.734287 | |
| RoadwayDistanceExposure2 EstResidentialDensity | 5 | 0.862 | 14842 | 8311.8 | 0.972891 |
Abbreviations: ANOVA = analysis of variance; df = degrees of freedom; GLM = generalized linear model; Sig = significance level (*:0.05, **: 0.01, ***0.001).
Negative binomial model, link: log, dependent variable: TotalEDInpatientVisits. Three-way and higher interactions are not included in the table for readability.