| Literature DB >> 30720337 |
Nina Lazarevic1, Adrian G Barnett2, Peter D Sly3, Luke D Knibbs1,4.
Abstract
BACKGROUND: Prenatal exposures to endocrine-disrupting chemicals (EDCs) during critical developmental windows have been implicated in the etiologies of a wide array of adverse perinatal and pediatric outcomes. Epidemiological studies have concentrated on the health effects of individual chemicals, despite the understanding that EDCs act together via common mechanisms, that pregnant women are exposed to multiple EDCs simultaneously, and that substantial toxicological evidence of adverse developmental effects has been documented. There is a move toward multipollutant models in environmental epidemiology; however, there is no current consensus on appropriate statistical methods.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30720337 PMCID: PMC6752940 DOI: 10.1289/EHP2207
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
Figure 1.PRISMA flow diagram of the study selection process.
Studies of prenatal coexposures to EDCs, categorized by dimensionality: one aggregate-, two-, three or more exposure variables presented in sections A, B, and C, respectively. A detailed summary of each study is available in Supplemental Material, Table S3. Number of studies is shown in parentheses.
| Studies | Study design | Goals in mixtures analyses | Statistical methods |
|---|---|---|---|
| | Prospective birth cohort (20) Case–control (4) | Estimation of mixture health effects (24) Assessment of nonmonotonicity (6) | Generalized linear models and their extensions (23) Factor analysis (1) |
| | Prospective birth cohort (14) Retrospective cohort (2) Case–control (2) Cross-sectional (1) | Identification of important mixture components (10) | Generalized linear models and their extensions (19) |
| | Prospective birth cohort (24) Case–control (7) | Identification of important mixture components (26) | Generalized linear models and their extensions (22) Principal components analysis/regression (5) Bayesian hierarchical regression (5) Elastic net regression (2) |
The reported study design may not be specific to the mixture analyses conducted but refers to the overall study design.
The number of studies shown may exceed the totals in each section where studies had multiple goals or used multiple methods.
Although nonmonotonic exposure-response relationships are of interest in EDC studies (Vandenberg et al. 2012), studies using statistical methods for nonlinearity that are able to address both nonlinear monotonic and nonmonotonic relationships were included in the tally for this goal.
Factor analysis used by Ochiai et al. (2014); analysis of variance used by Tran et al. (2016).
Studies that focused on the health effects of only one exposure while controlling for copollutant confounding were included in the tally for this goal if results were reported for multiple exposures.
Gray et al. (2005); Roen et al. (2015); Vejrup et al. (2016).
Claus Henn et al. (2016); Erkin-Cakmak et al. (2015); Kobrosly et al. (2014).
Elastic net regression used by Forns et al. (2016) and Lenters et al. (2015a); Bayesian model averaging used by Forns et al. (2016).
Berg et al. (2016) used partial least squares regression and hierarchical clustering; Krysiak-Baltyn et al. (2012) used partial least squares, support vector machine, and neural network classifiers.
Semiparametric regression used by Claus Henn et al. (2016); structural equation model used by Heilmann et al. (2006).
Novel algorithm for identifying important mixture components, based on ranking p-values and averaging Z-scores, used by Govarts et al. (2016).
Summary of statistical methods.
| Method | Description | Purpose/properties | Goals | Outcome types | Strengths | Caveats or weaknesses | Studies included in review | Further information | Software (open source listed where possible) |
|---|---|---|---|---|---|---|---|---|---|
| Generalized linear models (GLMs) | Generalized linear models with one or a few single or aggregate exposures. | Regression | Identification of important mixture components; interactions; nonlinearity (e.g., using nonlinear additive terms such as polynomials or splines) | Any GLM link function and family; longitudinal/multilevel data | - Well known properties, accessible software. - Straightforward interpretation of coefficients. - Straightforward adjustment for confounding. - Recognized methods for adjusting for multiple testing. | - Multicollinearity issues if exposures are highly correlated. - High false positive rate for single exposure models with correlated exposures. - Not suitable for higher-dimensional data as does not perform variable selection. | See | Dobson and Barnett ( | |
| Ridge regression, least absolute shrinkage and selection operator (LASSO), elastic net regression (ENR) | Shrinkage (penalized regression) methods. LASSO and ENR perform variable selection. ENR also performs grouped selection, where highly correlated exposures tend to be retained or dropped together, by being assigned coefficients of similar magnitude ( | Shrinkage; variable selection | Identification of important mixture components; pairwise interactions | Any | - Robust to multicollinearity. - Lower coefficient variance than ordinary least squares regression. - Greater prediction accuracy has been demonstrated in ENR than LASSO in simulation studies ( | - Biased toward zero; however, for LASSO and ENR, a subsequent unpenalized regression can be used to obtain unbiased effect estimates for the selected variables ( | Lenters et al. ( | Sun et al. ( | |
| Weighted quantile sum regression (WQSR), lagged WQSR | Empirically weighted sums of exposure quantiles are used in a regression model, with weights and parameters estimated by nonlinear programming and bootstrap ( | Variable selection | Identification of important mixture components; estimation of mixture health effects; identification of windows of exposure susceptibility (lagged WQSR). | Any GLM link function and survival data | - Robust to multicollinearity ( | - Loss of information through the use of quantiles ( | None | Carrico et al. ( | |
| Semi-Bayes hierarchical regression (semi-BHR) | Two-level semi-Bayes parametric model that can incorporate prior information on exposure categories and the variance of the effect estimates. | Shrinkage | Identification of important mixture components | Any | - Able to control for traditional and co-pollutant confounders ( | - Quality of prior knowledge determines the validity and efficiency of the analysis ( | Braun et al. ( | Greenland ( | See MacLehose et al. ( |
| Bayesian hierarchical regression (BHR), and BHR with stochastic search variable selection (BHR-SSVS) | A flexible class of models, which can be parametric (e.g., the Bayesian empirical-Bayes model, | Shrinkage; variable selection; clustering; smoothing | Identification of important mixture components; nonlinearity; interaction | Any | - Able to control for traditional and co-pollutant confounders ( | - Specification may require subject-matter knowledge that may not be available ( | Swartz et al. ( | Greenland ( | See MacLehose et al. ( |
| Bayesian kernel machine regression (BKMR) with component-wise or hierarchical variable selection (BKMR-HVS) | Nonparametric method which estimates joint effects of multiple exposures, allows for nonlinearity and interactions, and performs component-wise or hierarchical (grouped) variable selection ( | Smoothing; variable selection; shrinkage | Identification of important mixture components; estimation of mixture health effects; nonlinearity; pairwise and higher-order interactions | Continuous; extension to binary and count outcomes planned for development (Claus Henn in | - Simultaneously estimates mixture health effects while identifying drivers of the association ( | - Computation for very large datasets currently not feasible (see | None | Bobb et al. ( | |
| Structural equation modeling (SEM) | A path modeling approach where observed exposure variables are seen as manifestations of latent true exposure variables, which are then assumed to affect one or more outcome variables (which can themselves also be seen as manifestations of multiple observed outcome variables) ( | Dimension reduction | Identification of important mixture components; estimation of mixture health effects | Continuous | - Can model multiple related outcomes and exposures simultaneously, for example allowing prenatal and postnatal exposures to be analyzed in the same model ( | - Requires greater subject matter knowledge for correct specification than single outcome-exposure models ( | Heilmann et al. ( | Sánchez et al. ( | |
| Principal components analysis (PCA) and regression (PCR) | Unsupervised PCA reduces exposure data to several uncorrelated components (assuming orthogonal rotation), which are then regressed on the outcome. Supervised PCA uses both exposure and outcome data to obtain components, by first selecting variables that are most correlated with the outcome ( | Dimension reduction; variable selection | Estimation of mixture health effects; identification of important mixture components | Any | - Eliminates multicollinearity by reducing data to several uncorrelated components (assuming orthogonal rotation). - Supervised PCA also performs variable selection and provides variable importance scores with fixed ordering ( - Ability to control for confounding (in subsequent regression model using components from PCA). | - Components may not have biologically-relevant interpretations. - Can only be used with continuous exposures (correspondence analysis can be used for categorical exposures). | Forns et al. ( | Hastie et al. ( | |
| Partial least squares regression (PLSR) | A supervised dimension reduction method that constructs latent variables by forming linear combinations of exposures that maximize the covariance between the exposures and the outcome. Sparse PLSR additionally performs variable selection ( | Dimension reduction; variable selection | Estimation of mixture health effects; identification of important mixture components | Continuous, categorical (using PLSR-DA, i.e., Discriminant Analysis) | - Addresses multicollinearity by constructing a set of orthogonal latent variables. | - Adversely affected by the inclusion of irrelevant variables ( | Berg et al. ( | Lenters et al. ( | |
| Generalized additive models (GAMs) and semiparametric models | Commonly used data-driven method for modeling nonlinearity in which the outcome is regressed on a sum of nonparametric smoothing functions ( | Smoothing | Nonlinearity | Any GLM link function and family; survival data | - Flexibility in functional form, does not impose assumptions on the shape of the exposure-response curve ( | - May be affected by concurvity, the nonlinear analog to collinearity; however, this may be addressed by partial GAMs ( | 19 studies used splines and GAMs (e.g., | Eisen et al. ( | |
| Multivariate adaptive regression splines (MARS) | A nonparametric method suitable for higher-dimensional problems that automatically selects variables, models nonlinearity, and identifies interactions ( | Smoothing; variable selection | Interactions; nonlinearity; identification of important mixture components | Continuous; categorical | - Flexibility in functional form. | - May select an exposure somewhat arbitrarily in the presence of high correlation between exposures ( | None | Friedman and Roosen ( | |
| Classification and regression trees (CART) and ensemble methods: stochastic gradient boosting (SGB), random forests (RF), and Bayesian additive regression trees (BART); tree-based distributed lag models | Statistical learning methods that partition data into a tree. Numerous regression trees can be combined to improve predictive performance using ensemble methods. | Classification; variable selection | Identification of important mixture components; pairwise and higher-order interactions; nonlinearity; identification of windows of exposure susceptibility (tree-based distributed lag models) | Continuous; binary | - Does not assume additivity and linearity in exposure-response relationships ( | - CART requires many splits to create a linear or nonlinear association and hence is better suited to categorical exposures ( - Variable importance scores (RF and SGB) reflect predictive performance, not direction or magnitude of exposure-response associations ( | None | Lampa et al. ( | |
| Bayesian model averaging (BMA) | A Bayesian technique that provides weighted average estimates over a set of all possible models rather than requiring selection of a single best model ( | Variable selection; model averaging | Identification of important mixture components | Any GLM link function and family; survival data | - Accounts for uncertainty in model selection. - Supports the inclusion of interactions and nonlinearity. | - Not robust to multicollinearity ( | Forns et al. ( | Sun et al. ( |