| Literature DB >> 35162394 |
Bonnie R Joubert1, Marianthi-Anna Kioumourtzoglou2, Toccara Chamberlain1, Hua Yun Chen3, Chris Gennings4, Mary E Turyk3, Marie Lynn Miranda5, Thomas F Webster6, Katherine B Ensor7, David B Dunson8, Brent A Coull9.
Abstract
Humans are exposed to a diverse mixture of chemical and non-chemical exposures across their lifetimes. Well-designed epidemiology studies as well as sophisticated exposure science and related technologies enable the investigation of the health impacts of mixtures. While existing statistical methods can address the most basic questions related to the association between environmental mixtures and health endpoints, there were gaps in our ability to learn from mixtures data in several common epidemiologic scenarios, including high correlation among health and exposure measures in space and/or time, the presence of missing observations, the violation of important modeling assumptions, and the presence of computational challenges incurred by current implementations. To address these and other challenges, NIEHS initiated the Powering Research through Innovative methods for Mixtures in Epidemiology (PRIME) program, to support work on the development and expansion of statistical methods for mixtures. Six independent projects supported by PRIME have been highly productive but their methods have not yet been described collectively in a way that would inform application. We review 37 new methods from PRIME projects and summarize the work across previously published research questions, to inform methods selection and increase awareness of these new methods. We highlight important statistical advancements considering data science strategies, exposure-response estimation, timing of exposures, epidemiological methods, the incorporation of toxicity/chemical information, spatiotemporal data, risk assessment, and model performance, efficiency, and interpretation. Importantly, we link to software to encourage application and testing on other datasets. This review can enable more informed analyses of environmental mixtures. We stress training for early career scientists as well as innovation in statistical methodology as an ongoing need. Ultimately, we direct efforts to the common goal of reducing harmful exposures to improve public health.Entities:
Keywords: chemical interactions; chemicals; combined exposures; environment; epidemiology; exposomics; health impact; methods; mixtures; non-chemical stressors; risk assessment; statistics
Mesh:
Year: 2022 PMID: 35162394 PMCID: PMC8835015 DOI: 10.3390/ijerph19031378
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Summary of PRIME Projects.
| Project (Institutions(s)) 1 | Summary | Exposures 2 | Study Populations 3 |
|---|---|---|---|
| Development and testing of response surface methods for investigating the epidemiology of exposure to mixtures | Combines aspects of response surface modeling with index methods into the Bayesian Multiple Index Method (BMIM) and incorporates toxicological information. Special cases are a single index model and a full response surface of all exposures as in BKMR. | Dioxin-like compounds, PCBs, phthalates, parabens, bisphenols triclosan, UV filters, BFRs, PBDEs | RCC, EARTH |
| Principal Component Pursuit to assess exposure to environmental mixtures in epidemiologic studies | Adapts the method Principal Component Pursuit (PCP), used in computer vision applications, to the epidemiologic setting of mixtures of environmental pollutants. | PCBs, metals, air pollution | CHDS, CCCEH, SHS, SPARCS |
| Structured nonparametric methods for mixtures of exposures | Incorporates chemical structure data and mechanistic constraints into nonparametric Bayesian regression methods to improve stability, performance, and interpretation in estimating dose response. Supplemental funding develops Bayesian modeling frameworks for including exposures in epidemiological models of infectious disease spread, as well as flexible spatiotemporal modeling with applications to study exposure effects on COVID-19 hospitalizations. | Phenols, OPs, perchlorate, PFCs, phthalates, BFRs, PAHs, pyrethroids, air pollutants | MSSM, NHANES, CHAMACOS, CLEAR, CDC COVID Data Tracker, NYTimes COVID Data, State Population by Characteristics |
| Methods for data integration and risk assessment for environmental mixtures | Integrates temporally resolved exposure into models, evaluates how early (“priming” or “protective”) exposures can impact susceptibility to later exposures, and estimates regulatory guideline values for mixtures. | Tooth metal biomarkers; EDCs, dietary data | Colorado birth data; SELMA |
| Bringing Modern Data Science Tools to Bear on Environmental Mixtures | Develops data architecture to capture complex spatial location data for families, environmental exposures, and social stressors that vary over time. Leverages modern data science by applying rapidly evolving techniques for architecting data combined with hierarchical Bayesian models with variable selection, spatial models, and machine learning algorithms to large-scale environmental mixture and social exposure datasets of direct importance to child outcomes. | Air pollution, lead, social stressors | Aggregate North Carolina birth records, blood lead surveillance data, and educational system data to social and environmental exposures |
| Innovative Methodologic Advances for Mixtures Research in Epidemiology | Adapts genomics approaches to evaluate the total main effects and interactions of chemical exposures. Applies novel multivariate models for analyzing the complex relationship between health outcomes, biological intermediates, and environmental pollutants. | POPs, PCBs, OCPs, BFRs, PFCs, dioxins, heavy metals | NHANES, GLFCS, HCHS/SOL |
1 Listed in alphabetical order, by institution. Project details available at NIH RePORTER: https://reporter.nih.gov/, accessed on 21 December 2021. Institutions: Columbia University Mailman School of Public Health, University of Illinois Chicago, Icahn School of Medicine at Mount Sinai, Harvard T.H. Chan School of Public Health, University of Notre Dame, Rice University, Boston University School of Public Health, Duke University. 2 BFRs: Brominated Flame Retardants; EDCs: Endocrine Disrupting Chemicals, OCPs: Organochlorine Pesticides; OPs: Organophosphorus Pesticides; PAHs: Polycyclic Aromatic Hydrocarbons; PBDEs: Polybrominated Diphenyl Ethers; PCBs: Polychlorinated Biphenyls; PFCs: Perfluorinated Chemicals; POPs: Persistent Organic Pollutants; UV: Ultraviolet. 3 CCCEH: Columbia Center for Children’s Environmental Health; CDC COVID Data Tracker: https://covid.cdc.gov/covid-data-tracker/#variant-proportions and https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc, accessed on 21 December 2021; CHAMACOS: Center for the Health Assessment of Mothers and Children of Salinas; CHDS: Child Health and Development Studies; CLEAR: Climate Change, Environmental Contaminants and Reproductive Health; EARTH: Environment And Reproductive Health cohort; GLFCS: Great Lakes Fish Consumption Study; HCHS-SOL: Hispanic Community Health Study/Study of Latinos; MSSM: Mount Sinai Children’s Environmental Health Study; NHANES: National Health and Nutrition Examination Survey; NYTimes COVID Data: https://github.com/nytimes/covid-19-data, accessed on 21 December, 2021 RCC: Russian Children’s Cohort; SELMA: Swedish Environmental Longitudinal Mother and child, Asthma and allergy study; SHS: Strong Heart Study; SPARCS: NY Statewide Planning and Research Cooperative System; State Population by Characteristics: published by the U.S. Census Bureau breaks down 2019 U.S. state populations by Age. From Single Year of Age and Sex Population Estimates: 1 April 2010 to 1 July 2019—CIVILIAN (SC-EST2019-AGESEX-CIV) https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-detail.html, accessed on 21 December 2021, WAS: Wisconsin Angler Study.
PRIME Methods and Software.
| Project 1 | Method | Method | Summary |
|
|---|---|---|---|---|
| BU/ | BKMR-CMA | Bayesian Kernel Machine Regression-Causal Mediation Analysis | Performs a causal mediation analysis when exposure within the mediation framework is a mixture. Estimates a multivariate exposure response surface in a model for the mediator given exposure, and another for the outcome given the mediator and the outcome, both using BKMR. | [ |
| BU/ | BMIM | Bayesian Multiple Index Model | Unifies exposure index models with the response surface method BKMR, allowing a spectrum of intermediate models of multiple indices. Models non-linear, non-additive relationships between indices and an outcome. Special cases are a single exposure index and a response surface of all exposures. | [ |
| BU/ | DAG analysis | Use of causal methods for determining which exposures to include in a model | Applies directed acyclic graphs (DAGs) to determine inclusion of exposure variables. In some circumstances, including an exposure variable can increase bias. Determines causal relationships between exposures (or groups of exposures) and a health outcome. | [ |
| Columbia | BN2MF | Bayesian Non-parametric non-negative Matrix Factorization | Matrix factorization that provides non-negative (and more interpretable) solutions for factors and loadings and uncertainty estimates for the estimated parameters. Used for exposure pattern identification, similar to PCP. | [ |
| Columbia | PCP | Principal Component Pursuit | Unsupervised robust exposure pattern identification. Decomposes exposure matrix into a low-rank matrix (consistent patterns) and a sparse matrix (unique exposure events). Robust exposure pattern identification. | [ |
| Duke | BAG | Bag of DAGs | A computationally efficient method to construct a class of non-stationary spatiotemporal processes in point-referenced geostatistical models. Accounts for uncertainty in directions of association over space and time by considering a mixture of direct acyclic graphs (DAGs) | [ |
| Duke | BMC | Bayesian Matrix Completion for hypothesis testing | Bayesian inference about chemical activity on mean and variance of dose-response measurements accounting for sparsity of data. Used to characterize chemical activity and its uncertainty. | [ |
| Duke | BS3FA | Bayesian partially supervised sparse and smooth factor analysis | Bayesian inference on how chemical structure relates to variation in dose-response measurements. Addresses how to jointly model structural variability in molecular features of a chemical and its dose-response profile. | [ |
| Duke | FIN | Factor analysis for interactions | Bayesian factor analysis for inference on interactions. Estimates interactions between highly correlated chemical exposures and effect on health outcomes. | [ |
| Duke | GIF-SIS | Generalized infinite factor model | Shrinkage prior to the loadings matrix of infinite factor models that incorporate meta covariates to inform the sparsity structure and has desirable shrinkage properties. Addresses how to incorporate a priori known structure among variables when fitting a member of the broad class of factorization models. | [ |
| Duke | GL-GPs | Graph Laplacian based Gaussian Process | Gaussian process model with a covariance function that respects the geometry of highly restricted or nonlinear domains. Develops a covariance function for nonparametric regression that respects the intrinsic geometry of the domain without sacrificing computational tractability. | [ |
| Duke | GriPS | Computational improvements for Bayesian multivariate regression models based on latent meshed gaussian processes | Computational improvements for Bayesian multivariate regression models based on latent Meshed Gaussian Processes. Addresses how to efficiently solve the big-n problem for GPs when the number of outcomes is large. | [ |
| Duke | MixSelect | Identifying main effects and interactions among exposures using Gaussian processes | Identifies main effects and interactions among exposures using Gaussian processes. Addresses how to model potentially non-linear effects and high-order interactions of chemical exposures on health outcomes. | [ |
| Duke | MrGap | Manifold Reconstruction via Gaussian Process | Local covariance Gaussian process model for estimating a manifold in high dimensional space from noisy data. Conducts inference on a low-dimensional, nonlinear manifold in high dimensional space when data are subject to measurement error. | [ |
| Duke | PFA | Perturbed factor analysis | Factor analysis that captures common structure among groups of related observations. Distinguishes shared and group-specific covariance structure and expresses shared structure via a set of shared factors. | [ |
| Duke | MatchAlign | Resolving rotational ambiguity in matrix sampling | Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching. Does inference on unidentifiable random matrices. | [ |
| Duke | SPAMTREE | Spatial Multivariate Trees | Bayesian multivariate regression methods for big data using sparse treed Gaussian processes. Jointly models several imbalanced variables flexibly and scalably via GPs | [ |
| MSSM/ | ACR | Acceptable Concentration Range model | New class of nonlinear statistical models for human data that incorporates and evaluates regulatory guideline values into analyses of health effects of exposure to chemical mixtures. Allows for human data to suggest points of departure for comparison to in vivo estimates from single chemicals. | [ |
| MSSM/ | Mult DLAG | Multiple exposure distributed lag models with variable selection | A method to identify the presence of time-dependent interactions (interactions among chemical exposures experienced during different exposure windows) in a critical windows analysis. Identifies critical windows of exposure to multiple chemicals, and whether exposures experienced at different developmental windows interact with one another on a health outcome. | [ |
| MSSM/ | BKMR-DLM | Bayesian Kernel Machine Regression-Distributed Lag Model | Develops distributed lag models for assessing critical windows of exposure associated with a mixture. The model simultaneously estimates a time-weighted combination of each exposure and estimates a multivariate exposures-response surface of these time-weighted exposures using BKMR. | [ |
| MSSM/ | CVEK | Cross-validated kernel ensemble | Performs tests of interaction between two sets of exposures (i.e., two mixtures) while placing minimal assumptions on the main effects of each mixture. Asks whether one mixture (e.g., a collection of nutrients) modifies the effect of another (e.g., a metal mixture) as a whole. | [ |
| MSSM/ | Bayes Tree Pairs | Bayesian Regression Tree Pairs | Estimates critical windows of susceptibility to an environmental mixture. Uses an additive ensemble of tree pairs to estimate main effects and interactions between time-resolved predictors with variable selection. | [ |
| MSSM/ | DLMtree | Bayesian Treed Distributed Lab Models | Distributed lag linear and non-linear models. Method to improve the precision of critical window identification compared to methods that use spline or penalized spline basis functions. Interest focuses on identifying critical windows of exposure using data on a single exposure measured over time. | [ |
| MSSM/ | Het-DLM | Heterogeneous distributed lag models | Methods for precision children’s environmental health—that is, methods to identify subject characteristics (child sex, maternal age, etc.) that modify distributed lag effects of exposure. Addresses which subjects exhibit the strongest associations with an exposure measured over multiple developmental windows, and whether the critical windows of exposure vary among subgroups. | [ |
| MSSM/ | LWQS | Lagged Weighted Quantile Sum (WQS) regression | Uses a reverse distributed lag model for assessing critical windows of exposure associated with a mixture when the exposure temporal pattern differs across subjects. Can also incorporate strata-specific associations. Useful for identifying time-varying associations of a mixture effect and later life health/developmental outcomes. | [ |
| MSSM/ | NLinteraction | Bayesian semiparametric regression with sparsity inducing priors | Estimates effects of environmental mixtures to allow for interactions of any order. Provides variable importance measures for both main effects and interactions among exposures within a mixture, while making minimal assumptions on the forms of those effects. | [ |
| MSSM/ | RH-WQS | Repeated holdout Weighted Quantile Sum (WQS) regression | Generalizes WQS regression to include repeated holdout random data splits. Estimates a mixture effect using an empirically estimated weighted index. | [ |
| MSSM/ | SGP-MPI | Scalable Gaussian Process regression via Median Posterior Inference | Takes a split-and-conquer strategy to fitting BKMR to big data. Yields summaries of the multivariate exposure-response surface, as well as variable importance measures of each individual exposure. | [ |
| ND/Rice | BDS | Bayesian Data Synthesis | A Bayesian framework used to simulate fully synthetic datasets of mixed data types. The dataset may be comprised of mixed categorical, binary, count, and continuous datatypes. Can handle missing data and has customized metrics for attributing risk disclosure and other privacy concerns. | [ |
| ND/Rice | BSSVI | Bayesian subset selection and variable importance for interpretable prediction and classification | Used to collect and summarize all near-optimal subset models to provide a complete predictive picture. Useful in the presence of correlated covariates, weak signals, and/or small sample sizes, where different subsets may be indistinguishable in their predictive accuracy. | [ |
| ND/Rice | BVSM | Bayesian variable selection for understanding mixtures in environmental exposures | Variable selection via sparse summaries of a linear regression model. Given a Bayesian regression model with social and environmental covariates, addresses which variables matter most for predicting educational outcomes. | [ |
| ND/Rice | FOTP | Fast, optimal, and targeted predictions using parameterized decision analysis | Computes targeted summaries and prediction for specific decision tasks. Given a target (or functional) of interest and a Bayesian model, constructs accurate, simple, and efficient predictions of future values or functionals of future values. Model summaries can be customized for each functionality. | [ |
| ND/Rice | SCC | Spatiotemporal case-crossover | Presents a strategy for the case-crossover study design in a spatial-temporal setting. Incorporates a temporal case-crossover and a geometrically aware spatial random effect based on the Hausdorff distance. | [ |
| ND/Rice | SiBAR | State Informed Background Removal | Computational technique to quantify ‘background’ versus ‘source influenced’ contributions to air pollutant time series. Addresses whether a hidden Markov model can be used and what the ‘background’ levels of pollutants are measured across an urban area. | [ |
| UI Chicago | MVNimpute | Imputation of multivariate data by normal model | Implements multiple imputation to the data when there are missing and/or censored values. | [ |
| UI Chicago | SPORM | Semi-Parametric Odds Ratio Model | Flexible semiparametric model for estimating complex relationship among multiple variables. Associations are modeled by odds ratio functions. | [ |
| UI Chicago | TEV | Estimation and inference on the explained variation parameter | Estimates the explained variation of an outcome by a set of mixture pollutants. | [ |
1 Listed in alphabetical order, by institution. Project details available at NIH RePORTER: https://reporter.nih.gov/, accessed on 21 December 2021. Institutions: Columbia University Mailman School of Public Health, University of Illinois Chicago, Icahn School of Medicine at Mount Sinai, Harvard T.H. Chan School of Public Health, University of Notre Dame, Rice University, Boston University School of Public Health, Duke University.
Figure 1Mixtures Methods x Research Questions 1: Methods Preceding PRIME. 1 Research questions following Gibson et al., 2019 review: (1) Overall effect estimation: What is the overall effect of the mixture and what is the magnitude of association? (2) Toxic agent identification: Which congeners or chemicals are associated with the outcome? What congeners/chemicals are most important? (3) Pattern identification: Are there specific exposure patterns in the data? (4) A priori defined groups: What are the associations between an outcome and a priori defined groups of exposures? (5) Interactions and non-linearities: Are there interactions between exposures? Is the exposure-response surface non-linear? (6) Exposure-response relationship: What is the exposure-response relationship between each chemical and the outcome? Because almost all methods that investigate interactions also characterize potentially nonlinear exposure-response functions, we group questions #5 and #6 into a single bubble in this figure.
Figure 2Mixtures Methods x Research Questions 1: Highlighted Methods from PRIME. 1 Research questions following Gibson et al., 2019 review: (1) Overall effect estimation: What is the overall effect of the mixture and what is the magnitude of association? (2) Toxic agent identification: Which congeners or chemicals are associated with the outcome? What congeners/chemicals are most important? (3) Pattern identification: Are there specific exposure patterns in the data? These can be managed with clustering and dimension reduction methods. (4) A priori defined groups: What are the associations between an outcome and a priori defined groups of exposures? (5) Interactions and non-linearities: Are there interactions between exposures? Is the exposure-response surface non-linear? (6) Exposure-response relationship: What is the exposure-response relationship between each chemical and the outcome? Because almost all methods that investigate interactions also characterize potentially nonlinear exposure-response functions, we group questions #5 and #6 into a single bubble in this figure.
PRIME Methods by Research Question 1.
| Method | Overall Effect | Toxic Agent Identification | Pattern | A Priori | Interactions and |
|---|---|---|---|---|---|
| FIN | X | X | X | X | |
| BSSVI | X | X | X | X | |
| SGP-MPI | X | X | X | ||
| RH-WQS | x | X | |||
| Mult DLAG | X | X | X | ||
| MatchAlign | X | X | X | ||
| LWQS | x | X | |||
| GriPS | X | X | X | ||
| DLMtree | X | X | X | ||
| DAG analysis | X | X | |||
| BVSM | X | X | X | ||
| BMIM | X | X | X | X | |
| BKMR-DLM | X | X | X | ||
| BKMR-CMA | X | X | X | X | |
| Bayes Tree Pairs | X | X | X | ||
| ACR | X | X | |||
| SPAMTREE | X | X | X | ||
| FOTP | X | X | X | ||
| BAG | X | X | X | ||
| TEV | X | X | |||
| SCC | X | ||||
| GL-GPs | X | X | |||
| BDS | X | X | |||
| SPORM | X | X | X | X | |
| SiBAR | X | X | |||
| BS3FA | X | X | |||
| NLinteraction | X | X | |||
| Het-DLM | X | ||||
| BMC | X | ||||
| PFA | X | ||||
| PCP | X | ||||
| MrGap | X | ||||
| MixSelect | X | ||||
| GIF-SIS | X | X | |||
| BN2MF | X | ||||
| CVEK | X | X |
1 Research questions following Gibson et al., 2019 review: (1) Overall effect estimation: What is the overall effect of the mixture and what is the magnitude of association? (2) Toxic agent identification: Which congeners or chemicals are associated with the outcome? What congeners/chemicals are most important? (3) Pattern identification: Are there specific exposure patterns in the data? These can be managed with clustering and dimension reduction methods. (4) A priori defined groups: What are the associations between an outcome and a priori defined groups of exposures? (5) Interactions and non-linearities: Are there interactions between exposures? (6) Exposure-response relationship: What is the exposure-response relationship between each chemical and the outcome? Because almost all methods that investigate interactions also characterize potentially nonlinear exposure-response functions, we group questions #5 and #6 into a single column in this Table. 2 Method acronyms: ACR: Acceptable Concentration Range model; Bayes Tree Pairs: Bayesian Regression Tree Pairs; BAG: Bag of DAGs; BDS: Bayesian Data Synthesis; BKMR-CMA: Bayesian Kernel Machine Regression Causal Mediation Analysis; BKMR-DLM: Bayesian Kernel Machine Regression-Distributed Lag Model; BMC: Bayesian Matrix Completion for hypothesis testing; BMIM: Bayesian Multiple Index Model; BN2MF: Bayesian Non-parametric non-negative Matrix Factorization; BS3FA: Bayesian partially supervised sparse and smooth factor analysis; BSSVI: Bayesian subset selection and variable importance for interpretable prediction and classification; BVSM: Bayesian variable selection for understanding mixtures in environmental exposures; CVEK: Cross-validated kernel ensemble; DAG analysis: Directed Acyclic Graphs Analysis; DLMtree: Bayesian Treed Distributed Lab Models; FIN: Factor analysis for interactions; FOTP: Fast, optimal, and targeted predictions using parameterized decision analysis; GIF-SIS: General; zed infinite factor model; GL-GPs: Graph Laplacian based Gaussian Process; GriPS: Computational improvements for Bayesian multivariate regression models based on latent meshed Gaussian processes; Het-DLM: Heterogeneous distributed lag models; LWQS: Lagged Weighted Quantile Sum (WQS) regression; MatchAlign: Resolving rotational ambiguity in matrix sampling; MixSelect: Identifying main effects and interactions among exposures using Gaussian processes; MrGap: Manifold Reconstruction via Gaussian Process; Mult DLAG: Multiple exposure distributed lag models with variable selection; MVNimpute: Imputation of multivariate data by normal model; NLinteraction: Bayesian semiparametric regression with sparsity inducing priors; PCP: Principal Component Pursuit; PFA: Perturbed factor analysis; RH-WQS: Repeated holdout Weighted Quantile Sum (WQS) regression; SCC: Spatiotemporal case-crossover; SGP-MPI: Scalable Gaussian Process regression via Median Posterior Inference; SiBAR: State Informed Background Removal; SPAMTREE: Spatial Multivariate Trees; SPORM: Estimating complex relationship among outcome, biomarkers, and exposures; TEV: Estimation and inference on the explained variation parameter.