Literature DB >> 30720337

Statistical Methodology in Studies of Prenatal Exposure to Mixtures of Endocrine-Disrupting Chemicals: A Review of Existing Approaches and New Alternatives.

Nina Lazarevic¹, Adrian G Barnett², Peter D Sly³, Luke D Knibbs^1,4.

Abstract

BACKGROUND: Prenatal exposures to endocrine-disrupting chemicals (EDCs) during critical developmental windows have been implicated in the etiologies of a wide array of adverse perinatal and pediatric outcomes. Epidemiological studies have concentrated on the health effects of individual chemicals, despite the understanding that EDCs act together via common mechanisms, that pregnant women are exposed to multiple EDCs simultaneously, and that substantial toxicological evidence of adverse developmental effects has been documented. There is a move toward multipollutant models in environmental epidemiology; however, there is no current consensus on appropriate statistical methods.
OBJECTIVES: We aimed to review the statistical methods used in these studies, to identify additional applicable methods, and to determine the strengths and weaknesses of each method for addressing the salient statistical and epidemiological challenges.
METHODS: We searched Embase, MEDLINE, and Web of Science for epidemiological studies of endocrine-sensitive outcomes in the children of mothers exposed to EDC mixtures during pregnancy and identified alternative statistical methods from the wider literature. DISCUSSION: We identified 74 studies and analyzed the methods used to estimate mixture health effects, identify important mixture components, account for nonmonotonicity in exposure–response relationships, assess interactions, and identify windows of exposure susceptibility. We identified both frequentist and Bayesian methods that are robust to multicollinearity, performing shrinkage, variable selection, dimension reduction, statistical learning, or smoothing, including methods that were not used by the studies included in our review.
CONCLUSIONS: Compelling motivation exists for analyzing EDCs as mixtures, yet many studies make simplifying assumptions about EDC additivity, relative potency, and linearity, or overlook the potential for bias due to asymmetries in chemical persistence. We discuss the potential impacts of these choices and suggest alternative methods to improve analyses of prenatal exposure to EDC mixtures. https://doi.org/10.1289/EHP2207.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 30720337 PMCID： PMC6752940 DOI： 10.1289/EHP2207

Source DB: PubMed Journal: Environ Health Perspect ISSN： 0091-6765 Impact factor: 9.031

Introduction

A growing body of evidence suggests that prenatal exposures to environmental chemicals during critical windows of development may have lasting effects throughout the life course (Gillman 2005; Soto et al. 2008; Stroustrup and Swan 2013; Woodruff et al. 2009). Prenatal exposures to endocrine-disrupting chemicals (EDCs) are of particular concern because they can interfere with the synthesis, secretion, binding, transport, and metabolism of endogenous hormones that are involved in regulating developmental processes (Diamanti-Kandarakis et al. 2009; Meeker 2012). Exposure to EDCs during gestation can also induce epigenetic changes that have transgenerational effects, which have been observed in the children of women prenatally exposed to the miscarriage-prevention drug diethylstilbestrol (Diamanti-Kandarakis et al. 2009; Grandjean et al. 2015; Meeker 2012). Of concern is that the exposure levels required for endocrine disruption during gestation can be infinitesimally small (as low as picomolar serum concentrations) (Vandenberg et al. 2012); and, unexpectedly, lower doses may have more potent effects than higher doses (Diamanti-Kandarakis et al. 2009). The developing fetus is uniquely sensitive to chemicals that mimic hormones because the protective mechanisms existing in adulthood are not completely functional in utero (Newbold et al. 2009; Romano et al. 2014). Hence, prenatal exposures to EDCs are thought to be implicated in the etiologies of several adverse perinatal and childhood outcomes, such as decreased fetal growth and length of gestation, reproductive tract defects, altered pubertal development, neurodevelopmental dysfunctions, obesity, and diabetes (Silver and Meeker 2015; Skakkebaek et al. 2011). This matter is of critical public health importance because the prevalence of many of these endocrine-related disorders and diseases has been increasing over time, a trend that cannot be explained solely by genetic factors (Darbre 2015; Skakkebaek et al. 2011). A broad range of environmental chemicals are likely to be EDCs; many are high-production-volume chemicals: organochlorine pesticides, industrial chemicals and their byproducts, flame retardants, surfactants, plastics and plasticizers, and some metals (Diamanti-Kandarakis et al. 2009; Meeker 2012). Consequently, exposure to EDCs is widespread, multisource, and multiroute (Meeker 2012), and biomonitoring studies indicate that the pattern of exposure in pregnant women and neonates is chronic, low-dose, and involves multiple chemicals simultaneously rather than individual agents (Mitro et al. 2015; Stroustrup and Swan 2013; Woodruff et al. 2011). However, epidemiological studies of EDCs have concentrated on the health effects of individual chemicals, an approach that may not be suitable to the study of chemicals that behave like hormones (Gaudriault et al. 2017; Kortenkamp 2007). Multiple EDCs can act via a common mechanism to produce an outcome (e.g., binding to a particular type of hormone receptor), which suggests that individual chemicals can act together at lower concentrations to achieve the same outcome than the concentration that would be required for each chemical on its own (Darbre 2015). A corollary of this postulation is that different mixtures of EDCs are also able to produce a common outcome via a common mechanism (Darbre 2015). This has been confirmed in experimental studies, in vitro, which have shown that combinations of xenoestrogens are able to produce significant effects even when each individual chemical is present at a dose below its no-observed-effects level (NOEL; Rajapakse et al. 2002; Silva et al. 2002). Additionally, there is substantial toxicological evidence of adverse effects in vivo on both prenatal and postnatal development due to mixtures of EDCs, even with individual chemicals at levels below concentrations that cause observable effects (e.g., below the NOEL; Al-Gubory 2014; Kortenkamp 2008). Such a mixture effect was also recently observed in an epidemiological study of breast cancer using a novel biomarker of combined xenoestrogen exposure; a strong positive relationship was observed for the mixture though individual xenoestrogens showed no associations (Pastor-Barriuso et al. 2016), casting doubt on the null findings of meta-analyses of breast cancer risk and individual xenoestrogens such as p,p’-DDE (Ingber et al. 2013; Kortenkamp 2006, 2008). These findings suggest that epidemiological studies of individual chemicals with null findings may have substantially underestimated the risks of exposure to EDCs (Kortenkamp 2007). Numerous authors have expressed a need for epidemiological studies to move beyond analyzing the health effects of individual chemicals toward the study of chemical mixtures (e.g., Braun et al. 2016; Carlin et al. 2013; Kortenkamp 2007). The study of chemical mixtures is one of the ongoing research priorities of the U.S. National Institute of Environmental Health Sciences (NIEHS) (Carlin et al. 2013) and is recognized by the U.S. National Academy of Sciences, which recommends that risk assessments consider the possibility of cumulative effects from multiple chemical exposures (Mitro et al. 2015; Woodruff et al. 2011). However, progress has been thwarted by the complexities of such studies: They place a high demand on the quantity and quality of data and require advanced statistical methods, the development of which is an active area of research (Billionnet et al. 2012; Braun et al. 2016; Sun et al. 2013; Taylor et al. 2016). The statistical challenges are considerable: High correlation between exposures can lead to inflated standard errors and instability in effect estimates; differing degrees of measurement error among exposures can bias effect estimates; and there may be insufficient power to estimate small effects in the presence of measurement error, multicollinearity, small sample size, interactions, and nonlinearity (Billionnet et al. 2012; Sun et al. 2013). The study of chemical mixtures addresses some of the shortcomings of single-chemical epidemiology. Effect estimates may be biased in the presence of copollutant confounding (Braun et al. 2016), and the study of correlated exposures in separate models can lead to inflated false-positive discoveries. One exposure can also modify the effects of another [e.g., bisphenol A (BPA) can increase the expression of estrogen receptors and make cells more vulnerable to other EDCs; Brieño-Enríquez et al. 2012; Hayes et al. 2016], making it difficult to infer combined effects from knowledge of individual effects (Braun et al. 2016; Kortenkamp 2007). Moreover, choice of chemicals has typically been based on those of known concern or measurable with existing methods, meaning that studies may suffer from the streetlight effect (Braun et al. 2016). These issues have important regulatory implications if health effects have been incorrectly attributed to one exposure rather than to a correlated harmful exposure (Braun et al. 2016; Lenters et al. 2015a). Studies of chemical mixtures provide effects that more closely correspond to observed exposure patterns; they may enable better targeted interventions by identifying which exposures out of a set are the most detrimental to health; and they can reveal how exposures interact, whether their effects are additive, antagonistic, or synergistic (Braun et al. 2016; Sun et al. 2013). There is no current consensus on the appropriate statistical methodology for use in epidemiological studies of chemical mixtures (Braun et al. 2016). The existing methodological reviews have concentrated on multipollutant methods in air pollution epidemiology and the analysis of exposome-health associations (Agier et al. 2016; Billionnet et al. 2012; Davalos et al. 2016; Patel 2017; Stafoggia et al. 2017; Sun et al. 2013), with the latter focusing on an exploratory and hypothesis-generating search for important exposures in higher dimensional datasets rather than etiologic research questions. Moreover, the analysis of EDCs requires methods that can account for nonmonotonicity in exposure–response relationships (i.e., changes in the sign of the slope are possible; Vandenberg et al. 2012). Statistical methods for multiple correlated exposures are also required for omics data; however, the focus there is on methods suitable for ill-conditioned problems where the number of parameters may exceed the sample size (Chadeau-Hyam et al. 2013). A wide array of statistical methods were presented and compared in a 2015 workshop hosted by the NIEHS, where participants stressed the importance of choosing a method based on the specific research question and the need for improving existing methods with subject-specific knowledge (NIEHS 2015; Taylor et al. 2016). Subsequently, Braun et al. (2016) explicated three research questions that can be addressed by mixtures analyses, including the estimation of mixture health effects, and the identification of important mixture components and their interactions. Recent simulation studies have compared subsets of the available methods for identifying important exposures and interactions (Agier et al. 2016; Barrera-Gómez et al. 2017; Lenters et al. 2018; Sun et al. 2013). No single method has emerged as consistently superior, and choice of method may be context-dependent (NIEHS 2015; Sun et al. 2013). We conducted a review to identify the statistical methods used in studies of prenatal coexposures to EDCs and identified additional applicable methods. We compared methods within the three research questions explicated by Braun et al. (2016), as well as in their ability to account for nonmonotonicity in exposure–response relationships and to assess windows of exposure susceptibility in a multipollutant context. We aimed to arrive at an improved understanding of the methods used for studying EDC mixtures, along with an analysis of issues affecting method choice that may help to inform methodological decisions in future studies of prenatal exposure to EDC mixtures.

Methods

We searched Embase, MEDLINE, and Web of Science for observational studies of prenatal exposure to EDC mixtures and pregnancy or childhood outcomes published between 1997 and September/October 2016 (see Tables S1 and S2 in the Supplemental Material for Embase/MEDLINE and Web of Science search queries, respectively). Search terms specified both known and suspected EDCs and included generic terms such as “endocrine disrupt*,” “xenobiotic,” and “pesticide.” We also included a combination of general and specific search terms related to the prenatal and perinatal period and endocrine-sensitive outcomes. Both outcome and exposure search terms were chosen from reviews of EDCs, pregnancy outcomes, and child health (Meeker 2012; Silver and Meeker 2015; Slama and Cordier 2010). Search terms specifying the study of multiple chemicals included “multipollutant,” “co-pollutant,” “co-exposure,” and terms such as “combin*,” “mixture,” or “multip*,” in proximity to terms such as “exposure,” “chemical,” “pollutant,” “toxicant,” or “effects.” Only English-language primary studies of human health were eligible. Web of Science results were additionally limited to public, environmental, and occupational health. We screened abstracts for studies that measured prenatal (maternal) exposure to more than one individual EDC and assessed associations with outcomes in offspring. Studies that assessed early postnatal exposure through breast milk, as a proxy for prenatal exposure, were also examined. We then assessed full-text articles to identify studies in which exposures were analyzed in combination rather than separately. We excluded studies that measured only exposure to classes of chemicals (e.g., through job-exposure matrices, questionnaires, or biomarkers of cumulative exposure) or considered only maternal outcomes. We supplemented database searches with studies identified through manual searching. Our review was conducted using a systematic approach consistent with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Moher et al. 2009). For each eligible study, we recorded the statistical methods used and the goals of the analysis: only control for copollutant confounding (where one exposure was the focus), estimation of mixture health effects (i.e., joint or cumulative health effects from multiple chemical exposures), identification of important mixture components, assessment of nonmonotonicity in exposure–response relationships, assessment of statistical interactions (i.e., effect measure modification by coexposures), and identification of windows of exposure susceptibility. Although nonmonotonic exposure–response relationships are of interest in EDC studies (Vandenberg et al. 2012), we discuss statistical methods for nonlinearity that are able to address both nonlinear monotonic and nonmonotonic relationships. We recorded the overall study design reported by the authors of each study, which may not be specific to the mixture analyses conducted. We categorized eligible studies by dimensionality (one aggregate-, two-, or three or more exposure variables). In addition to the review of studies that met our eligibility requirements, we provide a narrative review of methods that are available for mixtures analyses, including methods that were not used in the studies we reviewed. For each method, we recorded the outcome variable types that can be analyzed (continuous, dichotomous, categorical, count, survival, repeated measures) and recorded whether the method can address the analytical goals and challenges of mixtures analyses, including multicollinearity.

Results

Study Selection Process

We identified 942 unique records through database and manual searches, of which 112 full-text studies were assessed for eligibility, and 38 studies were excluded for the reasons outlined in Figure 1. The 74 studies that met our eligibility criteria are listed in Table 1, and a detailed summary of each study is available in Supplemental Material, Table S3. These tables are divided into three sections according to dimensionality: studies of one aggregate-, two-, or three or more exposure variables are listed in sections A, B, and C, respectively.

Figure 1.

PRISMA flow diagram of the study selection process.

Table 1

Studies of prenatal coexposures to EDCs, categorized by dimensionality: one aggregate-, two-, three or more exposure variables presented in sections A, B, and C, respectively. A detailed summary of each study is available in Supplemental Material, Table S3. Number of studies is shown in parentheses.

Studies	Study design^a	Goals in mixtures analyses^b	Statistical methods^b
SECTION A. Studies that analyzed only one aggregate exposure variable at a time
24 studies (Barrett et al. 2016; Buscail et al. 2015; Chevrier et al. 2012; Damgaard et al. 2006; Gascon et al. 2015b; Harley et al. 2011; Jusko et al. 2010; Konishi et al. 2009; Koskenniemi et al. 2015; Miller et al. 2004; Naksen et al. 2015; Neugebauer et al. 2015; Ochiai et al. 2014; Park et al. 2009; Philippat et al. 2012; Polanska et al. 2014; Rosa et al. 2011; Sonneborn et al. 2008; Sturza et al. 2016; Tang-Péronard et al. 2015; Tran et al. 2016; Wickerham et al. 2012; Wolff et al. 2008; Yolton et al. 2013)	Prospective birth cohort (20) Case–control (4)	Estimation of mixture health effects (24) Assessment of nonmonotonicity (6)^c Identification of windows of exposure susceptibility (6)	Generalized linear models and their extensions (23) Factor analysis (1)^d Analysis of variance (1)^d
SECTION B. Studies that analyzed two individual or aggregate exposure variables at a time
19 studies (Casas et al. 2015; Gray et al. 2005; Huel et al. 2004; Huen et al. 2014; Kim et al. 2013; Kippler et al. 2012; LaRocca et al. 2016; Lignell et al. 2013; Limousi et al. 2014; Migeot et al. 2013; Roen et al. 2015; Stroustrup et al. 2016; Suzuki et al. 2010; Symanski et al. 2016; Vejrup et al. 2016; Whyatt et al. 2004; Windham et al. 2006; Wojtyniak et al. 2010; Yorifuji et al. 2011)	Prospective birth cohort (14) Retrospective cohort (2) Case–control (2) Cross-sectional (1)	Identification of important mixture components (10)^e Assessment of interactions between exposures (9) Estimation of mixture health effects (7) Assessment of nonmonotonicity (6)^c Identification of windows of exposure susceptibility (4) Only control for copollutant confounding, where one exposure is the focus (3)^f	Generalized linear models and their extensions (19)
SECTION C. Studies that analyzed three or more individual or aggregate exposure variables at a time
31 studies (Agay-Shay et al. 2015; Berg et al. 2016; Braun et al. 2014; Buckley et al. 2016a, 2016b; Casas et al. 2016; Chevrier et al. 2011; Claus Henn et al. 2016; Erkin-Cakmak et al. 2015; Fernández et al. 2007; Forns et al. 2016; Gascon et al. 2015a; Govarts et al. 2016; Heilmann et al. 2006; Jacobson et al. 2015; Kobrosly et al. 2014; Krysiak-Baltyn et al. 2012; Lenters et al. 2015a; Liew et al. 2015; Maresca et al. 2016; Marques et al. 2014; Rodrigues et al. 2016; Rull et al. 2006; Stewart et al. 2008; Swartz et al. 2015; Talbott et al. 2015; Tatsuta et al. 2014; Vafeiadi et al. 2014; Vafeiadi et al. 2015; Valvi et al. 2012; Xu et al. 2015)	Prospective birth cohort (24) Case–control (7)	Identification of important mixture components (26)^e Estimation of mixture health effects (23) Assessment of nonmonotonicity (17)^c Identification of windows of exposure susceptibility (8) Assessment of interactions between exposures (4) Only control for copollutant confounding, where one exposure is the focus (3)^g	Generalized linear models and their extensions (22) Principal components analysis/regression (5) Bayesian hierarchical regression (5) Elastic net regression (2)^h Partial least squares regression (2)ⁱ Bayesian model averaging (1)^h Hierarchical clustering (1)ⁱ Machine learning classifiers (1)ⁱ Semiparametric regression (1)^j Structural equation modeling (1)^j Novel algorithm (1)^k

The reported study design may not be specific to the mixture analyses conducted but refers to the overall study design.

The number of studies shown may exceed the totals in each section where studies had multiple goals or used multiple methods.

Although nonmonotonic exposure-response relationships are of interest in EDC studies (Vandenberg et al. 2012), studies using statistical methods for nonlinearity that are able to address both nonlinear monotonic and nonmonotonic relationships were included in the tally for this goal.

Factor analysis used by Ochiai et al. (2014); analysis of variance used by Tran et al. (2016).

Studies that focused on the health effects of only one exposure while controlling for copollutant confounding were included in the tally for this goal if results were reported for multiple exposures.

Gray et al. (2005); Roen et al. (2015); Vejrup et al. (2016).

Claus Henn et al. (2016); Erkin-Cakmak et al. (2015); Kobrosly et al. (2014).

Elastic net regression used by Forns et al. (2016) and Lenters et al. (2015a); Bayesian model averaging used by Forns et al. (2016).

Berg et al. (2016) used partial least squares regression and hierarchical clustering; Krysiak-Baltyn et al. (2012) used partial least squares, support vector machine, and neural network classifiers.

Semiparametric regression used by Claus Henn et al. (2016); structural equation model used by Heilmann et al. (2006).

Novel algorithm for identifying important mixture components, based on ranking p-values and averaging Z-scores, used by Govarts et al. (2016).

PRISMA flow diagram of the study selection process. Studies of prenatal coexposures to EDCs, categorized by dimensionality: one aggregate-, two-, three or more exposure variables presented in sections A, B, and C, respectively. A detailed summary of each study is available in Supplemental Material, Table S3. Number of studies is shown in parentheses. The reported study design may not be specific to the mixture analyses conducted but refers to the overall study design. The number of studies shown may exceed the totals in each section where studies had multiple goals or used multiple methods. Although nonmonotonic exposure-response relationships are of interest in EDC studies (Vandenberg et al. 2012), studies using statistical methods for nonlinearity that are able to address both nonlinear monotonic and nonmonotonic relationships were included in the tally for this goal. Factor analysis used by Ochiai et al. (2014); analysis of variance used by Tran et al. (2016). Studies that focused on the health effects of only one exposure while controlling for copollutant confounding were included in the tally for this goal if results were reported for multiple exposures. Gray et al. (2005); Roen et al. (2015); Vejrup et al. (2016). Claus Henn et al. (2016); Erkin-Cakmak et al. (2015); Kobrosly et al. (2014). Elastic net regression used by Forns et al. (2016) and Lenters et al. (2015a); Bayesian model averaging used by Forns et al. (2016). Berg et al. (2016) used partial least squares regression and hierarchical clustering; Krysiak-Baltyn et al. (2012) used partial least squares, support vector machine, and neural network classifiers. Semiparametric regression used by Claus Henn et al. (2016); structural equation model used by Heilmann et al. (2006). Novel algorithm for identifying important mixture components, based on ranking p-values and averaging Z-scores, used by Govarts et al. (2016).

Study Designs

The majority of studies were prospective birth cohort studies (58 studies, 78%), including two studies of pooled cohorts (Buckley et al. 2016a; Casas et al. 2015), and case–control studies (13 studies, 18%), including one case–cohort study (Chevrier et al. 2011) and one analysis of pooled data from two case–control studies (Rull et al. 2006) (Tables 1 and S3). There were also two retrospective cohort studies (Limousi et al. 2014; Migeot et al. 2013) and one cross-sectional study (Huel et al. 2004).

Outcomes

Perinatal outcomes included fetal growth (e.g., birth weight, length, and head circumference) and gestational age (22 studies, 30%), and birth defects, such as neural tube defects, cryptorchidism, and hypospadias (7 studies, 9%) (Table S3). Other perinatal outcomes included anogenital distance (Barrett et al. 2016), neonatal neurobehavior (Suzuki et al. 2010), enzyme activity (Huel et al. 2004), neonatal thyroid stimulating hormone concentrations (Berg et al. 2016), and placental miRNA expression (LaRocca et al. 2016). Neurodevelopmental outcomes were the most common pediatric outcomes (22 studies, 30%), including measures of intelligence, cognitive and psychomotor development, behavior, autism, and attention deficit hyperactivity disorder (Table S3). Eight studies (11%) assessed pediatric measures of body mass (e.g., body mass index, waist circumference) and cardiometabolic markers (e.g., leptin; Tang-Péronard et al. 2015; Vafeiadi et al. 2015). Four studies (5%) examined respiratory and atopic outcomes in infancy and childhood, including asthma and eczema (Gascon et al. 2015a; Miller et al. 2004; Ochiai et al. 2014; Rosa et al. 2011). Other pediatric outcomes were postvaccination antibody response (Heilmann et al. 2006; Jusko et al. 2010), auditory outcomes (Buscail et al. 2015; Sturza et al. 2016), and leukemia (Symanski et al. 2016). DNA methylation at birth and in childhood was considered in one study (Huen et al. 2014). Most studies considered outcomes that were continuous (44 studies), dichotomous (19), or both (9) (Table S3). Few studies considered polytomous (Rull et al. 2006; Stroustrup et al. 2016) and count (Gascon et al. 2015b; Miller et al. 2004; Roen et al. 2015) outcomes. Nine studies analyzed an outcome measured at multiple time points in a longitudinal analysis (Braun et al. 2014; Buckley et al. 2016a, 2016b; Casas et al. 2016; Erkin-Cakmak et al. 2015; Gascon et al. 2015a; Huen et al. 2014; Kippler et al. 2012; Maresca et al. 2016) (Table S3).

Individual and Aggregate Exposures

Exposures were primarily measured using biomarkers (63 studies, 85%) and were analyzed individually or in aggregate to reduce dimensionality (Table S3). Exposures were primarily aggregated by raw or weighted sums, such as the toxic equivalence quotient (TEQ) or molar sum. Exposures were only summed if they shared similar physicochemical properties. Common aggregate variables were sums of polychlorinated biphenyl (PCB) congeners (in total or separated into dioxin-like and nondioxin-like congeners) (e.g., Jusko et al. 2010; Stewart et al. 2008; Vafeiadi et al. 2014), sums of phthalate metabolites (grouped into di(2-ethylhexyl) phthalate (DEHP) metabolites, low- and high-molecular-weight phthalate metabolites) (e.g., Chevrier et al. 2012; LaRocca et al. 2016; Wolff et al. 2008), TEQ of polychlorinated dibenzodioxins and dibenzofurans (PCDD/Fs) (e.g., Konishi et al. 2009; Koskenniemi et al. 2015; Neugebauer et al. 2015), and sums of polybrominated diphenyl ether (PBDE) congeners (e.g., Harley et al. 2011; Huen et al. 2014; Lignell et al. 2013). The health effects of sums of parabens (Chevrier et al. 2012; LaRocca et al. 2016; Philippat et al. 2012), polycyclic aromatic hydrocarbons (PAHs) (e.g., Miller et al. 2004; Polanska et al. 2014; Rosa et al. 2011), and dialkylphosphate (DAP) metabolites (e.g., Naksen et al. 2015; Yolton et al. 2013) were also analyzed. Exposures were also aggregated as counts of individual chemicals (e.g., above detection limits), particularly in studies of pesticides (e.g., Sturza et al. 2016; Wickerham et al. 2012).

Estimation of EDC-Mixture Health Effects

Of the 54 studies (73%) estimating mixture health effects, 51 studies estimated the effects of one or multiple aggregate exposure variables concurrently (24 studies in Section A, 7 studies in Section B, and 20 studies in Section C, Table S3). For example, summed concentrations of PCBs and summed PBDEs in breast milk were analyzed in a model of birth weight (Lignell et al. 2013). The majority of these studies used generalized linear models (GLMs) and their extensions, including conditional logistic regression for a nested case–control study (Chevrier et al. 2012). Studies performing longitudinal analyses used population-averaged models estimated by generalized estimating equations (e.g., Gascon et al. 2015a) and linear or logistic mixed effects regression models (e.g., Buckley et al. 2016a; Huen et al. 2014). Studies that examined a large number of exposures used dimension reduction methods to assess mixture health effects (Section C, Tables 1 and S3). The most common were principal components analysis (PCA) (Agay-Shay et al. 2015; Forns et al. 2016; Govarts et al. 2016; Maresca et al. 2016; Talbott et al. 2015) and partial least squares regression (PLSR) (Berg et al. 2016; Krysiak-Baltyn et al. 2012). Methods used to partition exposures into groups included hierarchical clustering (Berg et al. 2016) and factor analysis (Ochiai et al. 2014). The health effects of predefined exposure groups were estimated using a structural equation model (SEM) (Heilmann et al. 2006).

Identification of Important Mixture Components

Almost half of the studies (36 studies, 49%) aimed to identify important individual exposures or subsets of exposures within a mixture (Tables 1 and S3). Within this group, GLMs were used by all ten of the studies that considered only two exposure variables at a time (Section B), and by 16 studies that analyzed more than two exposure variables at a time (Section C). Studies analyzing problems of greater dimensionality used variable selection strategies (Section C, Tables 1 and S3). Elastic net regression (ENR), a frequentist shrinkage method, was used by Forns et al. (2016) and Lenters et al. (2015a). Both studies capitalized on the variable selection feature of ENR and then used the corresponding unpenalized regression methods to obtain unbiased effect estimates for the selected variables. Forns et al. (2016) compared their ENR estimates with those estimated by Bayesian model averaging (BMA), a method that accounts for uncertainty in model selection. In another approach, Govarts et al. (2016) presented a novel algorithm for identifying important subsets of exposures, based on ranking p-values and averaging z-scores from single pollutant models for all possible combinations of 16 exposures. Shrinkage estimators in a semi-Bayesian hierarchical regression (semi-BHR) setting were used by Braun et al. (2014) and Rull et al. (2006), in which coefficients were shrunk toward the mean of their exchangeability group, representing chemicals with similar physicochemical properties (e.g., phthalate metabolites grouped by parent compounds). Bayesian hierarchical regression (BHR) models were also estimated by Buckley et al. (2016a, 2016b), relating multimetabolite phthalate exposures to measures of body mass. BHR with stochastic search variable selection (BHR-SSVS) was used to identify important air toxicants associated with spina bifida (Swartz et al. 2015).

Nonmonotonicity in Exposure–Response Relationships

Twenty-nine studies reported assessing whether exposure–response relationships were nonlinear (39%; Tables 1 and S3). Of these, 19 studies used splines and generalized additive models (GAMs) to assess nonlinearity. Other studies analyzed exposure quantiles or categories instead of continuous variables (15 studies; e.g., Buckley et al. 2016b; Erkin-Cakmak et al. 2015; Wolff et al. 2008) or used locally weighted scatterplot smoothers (LOWESS) (Harley et al. 2011; Kippler et al. 2012). A further nine studies used categorical exposure variables in their analyses without reporting that they were assessing nonlinearity. Some studies additionally performed hypothesis tests of trends across exposure categories (seven studies; e.g., Casas et al. 2015; Tang-Péronard et al. 2015). Fourteen studies used a method that may address nonlinearity in a mixture (Table S3). Of these, 11 studies analyzed exposure quantiles or categories, rather than continuous variables, in a multipollutant model (e.g., Agay-Shay et al. 2015; Claus Henn et al. 2016; Valvi et al. 2012). Four studies modeled nonlinear additive relationships among exposures in a mixture using polynomials (Rodrigues et al. 2016) or splines (Buckley et al. 2016a; Claus Henn et al. 2016; Kippler et al. 2012). For example, Kippler et al. (2012) used a piecewise linear spline with one knot to model an inverse-V-shaped relationship between cadmium exposure and fetal size, controlling for arsenic exposure.

Interactions between Chemical Exposures

Interactions between chemical exposures were assessed in 13 studies (18%; Tables 1 and S3). These were predominantly studies of two or three exposure variables only, using GLMs with product terms between pairs of exposures to assess two-way interactions. Three studies considered interactions between pairs of aggregate exposure variables or between an aggregate exposure and individual chemicals; e.g., Huen et al. (2014) examined interactions between summed PBDE congeners and dichlorodiphenyl-trichloroethylene/dichloroethane (DDT/E) compounds in relation to DNA methylation.

Windows of Exposure Susceptibility

Eighteen studies (24%) analyzed exposure at multiple times to investigate etiological windows of exposure susceptibility (Tables 1 and S3). Most studies analyzed exposure windows in separate models (10 studies), predominantly using GLMs. For example, Jusko et al. (2010) estimated the health effects of PCBs measured in maternal serum at delivery, cord serum, and 6-month infant serum, on postvaccination antibody response, in separate linear regression models. Eight studies analyzed exposure measured at multiple prenatal/postnatal times in the same model. Six of these studies used GLMs. For example, Roen et al. (2015) estimated the effects of prenatal and postnatal BPA exposure on child behavior, controlling for prenatal mono-n-butyl phthalate metabolite (MnBP) exposure. To address the possibility of multicollinearity when analyzing multiple exposures at multiple time points, Maresca et al. (2016) used PCA to obtain prenatal and postnatal phthalate concentration component scores that were then regressed on the outcome. In another approach, Heilmann et al. (2006) used a SEM with latent variables for both prenatal and postnatal exposure to assess cumulative health effects of PCBs on postvaccination antibody response, allowing prenatal exposure to be a determinant of postnatal exposure.

Discussion

We reviewed studies of prenatal exposure to EDC mixtures to determine the analytical strategies and statistical methods that have been employed and concurrently identified several novel and applicable statistical methods. We now compare the abilities of each method for addressing the salient epidemiological and statistical challenges. A detailed summary of each method is presented in Table 2, along with software suggestions.

Table 2

Summary of statistical methods.

Method	Description	Purpose/properties	Goals	Outcome types	Strengths	Caveats or weaknesses	Studies included in review	Further information	Software (open source listed where possible)
Generalized linear models (GLMs)	Generalized linear models with one or a few single or aggregate exposures.	Regression	Identification of important mixture components; interactions; nonlinearity (e.g., using nonlinear additive terms such as polynomials or splines)	Any GLM link function and family; longitudinal/multilevel data	- Well known properties, accessible software. - Straightforward interpretation of coefficients. - Straightforward adjustment for confounding. - Recognized methods for adjusting for multiple testing.	- Multicollinearity issues if exposures are highly correlated. - High false positive rate for single exposure models with correlated exposures. - Not suitable for higher-dimensional data as does not perform variable selection.	See Tables 1 and S3	Dobson and Barnett (2008)	glm function in R (R Core Team 2017); lme4 in R (Bates et al. 2015)
Ridge regression, least absolute shrinkage and selection operator (LASSO), elastic net regression (ENR)	Shrinkage (penalized regression) methods. LASSO and ENR perform variable selection. ENR also performs grouped selection, where highly correlated exposures tend to be retained or dropped together, by being assigned coefficients of similar magnitude (Zou and Hastie 2005). Grouped LASSO penalty is available for pre-defined groups of exposures (Hastie et al. 2009). Sparse hierarchical interaction models can be built using hiernet in R (Bien et al. 2013) and glinternet in R (Lim and Hastie 2015).	Shrinkage; variable selection	Identification of important mixture components; pairwise interactions	Any	- Robust to multicollinearity. - Lower coefficient variance than ordinary least squares regression. - Greater prediction accuracy has been demonstrated in ENR than LASSO in simulation studies (Zou and Hastie 2005). - Ability to control for confounding.	- Biased toward zero; however, for LASSO and ENR, a subsequent unpenalized regression can be used to obtain unbiased effect estimates for the selected variables (Hastie et al. 2009). - In LASSO: among a set of highly correlated exposures, only one may be chosen and the others dropped, giving the impression that they are not associated with the outcome (Hastie et al. 2009; Zou and Hastie 2005). - Tuning parameter selection via k-fold cross-validation may lead to highly variable model selection; however, methods exist that address this instability (e.g., Lim and Yu 2016; Roberts and Nowak 2014). - Post-selection statistical inference tools are required for inference, including generating confidence intervals (e.g., Berk et al. 2013; Lockhart et al. 2014; Meinshausen and Bühlmann 2010; Tibshirani et al. 2016).	Lenters et al. (2015a); Forns et al. (2016)	Sun et al. (2013); Zou and Hastie (2005); Hastie et al. (2009); Bien et al. (2013); Lim and Hastie (2015)	glmnet in R (Friedman et al. 2010); hierNet in R (Bien et al. 2013); glinternet in R (Lim and Hastie 2015); selectiveInference in R (Tibshirani et al. 2016)
Weighted quantile sum regression (WQSR), lagged WQSR	Empirically weighted sums of exposure quantiles are used in a regression model, with weights and parameters estimated by nonlinear programming and bootstrap (Carrico et al. 2015). Inference is focused in a specific direction, i.e., in the direction of increased or decreased risk (Carrico et al. 2015). Lagged WQSR can be used for mixtures analyses with numerous repeated exposure measures (Bello et al. 2017).	Variable selection	Identification of important mixture components; estimation of mixture health effects; identification of windows of exposure susceptibility (lagged WQSR).	Any GLM link function and survival data	- Robust to multicollinearity (Gennings et al. 2013). - The use of quantiles reduces the impact of outlying observations (Gennings et al. 2013). - May have greater accuracy (lower mean-squared error), comparable sensitivity, and higher specificity than shrinkage methods such as LASSO and ENR (Carrico et al. 2015; Czarnota et al. 2015). - Ability to control for confounding.	- Loss of information through the use of quantiles (Carrico et al. 2015). - Weights are non-zero so a threshold choice is required for identifying important mixture components (Carrico et al. 2015). - Interactions are not supported (Carrico et al. 2015).	None	Carrico et al. (2015); Czarnota et al. (2015); Gennings et al. (2013); Bello et al. (2017)	wqs in R (Czarnota and Wheeler 2015); gWQS in R (Renzetti et al. 2018)
Semi-Bayes hierarchical regression (semi-BHR)	Two-level semi-Bayes parametric model that can incorporate prior information on exposure categories and the variance of the effect estimates.	Shrinkage	Identification of important mixture components	Any	- Able to control for traditional and co-pollutant confounders (Braun et al. 2014). - Able to incorporate prior knowledge on exposure groupings, e.g. based on similarity of chemical properties or expected mechanisms. Exchangeability groups do not need to be mutually exclusive and can be implied from a set of covariates rather than directly specified (MacLehose et al. 2007; Thomas et al. 2007). - Coefficients are shrunk toward the mean of their exchangeability group to counteract multicollinearity (Greenland 1993; MacLehose et al. 2007). - Mitigation of multiple testing issues (Momoli et al. 2010).	- Quality of prior knowledge determines the validity and efficiency of the analysis (Momoli et al. 2010). - Coefficient variability needs to be set (fixed prior), so information about coefficient variability from the observed data is ignored (MacLehose et al. 2007).	Braun et al. (2014); Rull et al. (2006); Buckley et al. (2016a, 2016b)	Greenland (1993, 1994); MacLehose (2007); Momoli et al. (2010)	See MacLehose et al. (2007) for WinBUGS code.
Bayesian hierarchical regression (BHR), and BHR with stochastic search variable selection (BHR-SSVS)	A flexible class of models, which can be parametric (e.g., the Bayesian empirical-Bayes model, Greenland 1994) or semiparametric (e.g., using a Dirichlet process prior for the exposure coefficients, MacLehose et al. 2007). Variable selection priors are also possible (e.g., a Bayesian hierarchical mixture model with stochastic search variable selection, Swartz et al. 2015).	Shrinkage; variable selection; clustering; smoothing	Identification of important mixture components; nonlinearity; interaction	Any	- Able to control for traditional and co-pollutant confounders (Braun et al. 2014). - Able to incorporate prior information. - Mitigation of multiple testing issues (Momoli et al. 2010). - Shrinkage estimator, so robust to multicollinearity: effect estimates have greater accuracy than maximum likelihood estimates (MacLehose et al. 2007). - Adaptive rather than fixed shrinkage, allowing the data to inform the amount of shrinkage based on support for the prior distribution (MacLehose et al. 2007). - Inference less sensitive to specification of prior parameters than semi-Bayes model (MacLehose et al. 2007).	- Specification may require subject-matter knowledge that may not be available (Greenland 1994; Thomas et al. 2007). - May be computationally demanding.	Swartz et al. (2015)	Greenland (1993, 1994); MacLehose et al. (2007); Herring (2010)	See MacLehose et al. (2007) for WinBUGS code.
Bayesian kernel machine regression (BKMR) with component-wise or hierarchical variable selection (BKMR-HVS)	Nonparametric method which estimates joint effects of multiple exposures, allows for nonlinearity and interactions, and performs component-wise or hierarchical (grouped) variable selection (Bobb et al. 2015; Coull et al. 2015).	Smoothing; variable selection; shrinkage	Identification of important mixture components; estimation of mixture health effects; nonlinearity; pairwise and higher-order interactions	Continuous; extension to binary and count outcomes planned for development (Claus Henn in NIEHS 2015; Coull et al. 2015); longitudinal outcomes	- Simultaneously estimates mixture health effects while identifying drivers of the association (Coull et al. 2015). - Reduces the dimensionality of the problem (Bobb et al. 2015). - Robust to multicollinearity (Bobb et al. 2015). - Increased power to detect associations due to grouping of highly correlated exposures, using BKMR-HVS (Bobb et al. 2015). - Estimation of posterior inclusion probabilities, i.e., measure of the importance of each exposure and each group of exposures (Bobb et al. 2015). - Able to model nonlinearity in a high-dimensional exposure-response surface (Bobb et al. 2015).	- Computation for very large datasets currently not feasible (see Coull et al. 2015). - Assumes exposures within a group do not interact (Bobb et al. 2015). - Currently does not allow overlapping groups, however this extension is planned for development (Bobb et al. 2015). - Only one exposure can be selected within a group in BKMR-HVS (Coull et al. 2015). - Magnitude, but not relative ranking, of the posterior inclusion probabilities is sensitive to the choice of tuning parameters (Coull et al. 2015).	None	Bobb et al. (2015); Coull et al. (2015)	bkmr in R (Bobb et al. 2015)
Structural equation modeling (SEM)	A path modeling approach where observed exposure variables are seen as manifestations of latent true exposure variables, which are then assumed to affect one or more outcome variables (which can themselves also be seen as manifestations of multiple observed outcome variables) (Sánchez et al. 2005). This is a frequentist method that can also be estimated in a Bayesian setting for more flexibility.	Dimension reduction	Identification of important mixture components; estimation of mixture health effects	Continuous	- Can model multiple related outcomes and exposures simultaneously, for example allowing prenatal and postnatal exposures to be analyzed in the same model (Heilmann et al. 2006). - May correct for measurement error when multiple measurements for each exposure are available, under certain assumptions (Budtz-Jørgensen et al. 2002; Sánchez et al. 2005). - Greater model parsimony than modeling each outcome-exposure pair separately (Sánchez et al. 2005). - Multiple testing issues are reduced through the use of latent variables that represent multiple observed exposure variables (Sánchez et al. 2005). - Variables that are related to exposure but not outcome can be included to increase power (Sánchez et al. 2005). - Budtz-Jørgensen et al. (2002) describe an approach for handling missing data in SEMs with continuous outcome variables. - Ability to control for confounding.	- Requires greater subject matter knowledge for correct specification than single outcome-exposure models (Sánchez et al. 2005). - Imposes multiple strong assumptions on the analysis (regarding the existence and direction of causal relationships, and distributional and functional forms) that may be difficult to verify in practice and may lead to severe biases (VanderWeele 2012).	Heilmann et al. (2006)	Sánchez et al. (2005); VanderWeele (2012)	sem in R (Fox et al. 2017); blavaan in R (Merkle and Rosseel 2016)
Principal components analysis (PCA) and regression (PCR)	Unsupervised PCA reduces exposure data to several uncorrelated components (assuming orthogonal rotation), which are then regressed on the outcome. Supervised PCA uses both exposure and outcome data to obtain components, by first selecting variables that are most correlated with the outcome (Bair et al. 2006).	Dimension reduction; variable selection	Estimation of mixture health effects; identification of important mixture components	Any	- Eliminates multicollinearity by reducing data to several uncorrelated components (assuming orthogonal rotation). - Supervised PCA also performs variable selection and provides variable importance scores with fixed ordering (Bair et al. 2006). - Ability to control for confounding (in subsequent regression model using components from PCA).	- Components may not have biologically-relevant interpretations. - Can only be used with continuous exposures (correspondence analysis can be used for categorical exposures).	Forns et al. (2016); Agay-Shay et al. (2015); Govarts et al. (2016); Berg et al. (2016); Maresca et al. (2016); Talbott et al. (2015)	Hastie et al. (2009); Roberts and Martin (2006); Bair et al. (2006)	prcomp function in R (R Core Team 2017); nsprcomp in R (Sigg and Buhmann 2008)
Partial least squares regression (PLSR)	A supervised dimension reduction method that constructs latent variables by forming linear combinations of exposures that maximize the covariance between the exposures and the outcome. Sparse PLSR additionally performs variable selection (Agier et al. 2016; Chun and Keleş 2010).	Dimension reduction; variable selection	Estimation of mixture health effects; identification of important mixture components	Continuous, categorical (using PLSR-DA, i.e., Discriminant Analysis)	- Addresses multicollinearity by constructing a set of orthogonal latent variables.	- Adversely affected by the inclusion of irrelevant variables (Bair et al. 2006). - In sparse PLS, one exposure may be chosen somewhat arbitrarily among a set of highly correlated exposure variables (Hastie et al 2009; Lenters et al. 2015b). - Adjusting for confounders is not straightforward in PLSR (Lenters et al. 2015b), however both the exposures and outcome may be preadjusted for confounders and their residuals used in the analysis.	Berg et al. (2016)	Lenters et al. (2015b); Agier et al. (2016); Chun and Keleş (2010); Hastie et al. (2009)	pls in R (Mevik et al. 2016); spls in R (Chung et al. 2018)
Generalized additive models (GAMs) and semiparametric models	Commonly used data-driven method for modeling nonlinearity in which the outcome is regressed on a sum of nonparametric smoothing functions (Billionnet et al. 2012).	Smoothing	Nonlinearity	Any GLM link function and family; survival data	- Flexibility in functional form, does not impose assumptions on the shape of the exposure-response curve (Billionnet et al. 2012; Eisen et al. 2004). - Provides a visual representation of exposure-response relationships. - Pairwise and higher-order interactions can be modeled using multivariate smoothers (Ruppert et al. 2003).	- May be affected by concurvity, the nonlinear analog to collinearity; however, this may be addressed by partial GAMs (Gu et al. 2010).	19 studies used splines and GAMs (e.g., Vejrup et al. 2016; Claus Henn et al. 2016) (Table S3).	Eisen et al. (2004); Ruppert et al. (2003, 2009); Gu et al. (2010)	SemiPar in R (Ruppert et al. 2003, 2009); mgcv in R (Wood 2017)
Multivariate adaptive regression splines (MARS)	A nonparametric method suitable for higher-dimensional problems that automatically selects variables, models nonlinearity, and identifies interactions (Hastie et al. 2009; Morlini 2006).	Smoothing; variable selection	Interactions; nonlinearity; identification of important mixture components	Continuous; categorical	- Flexibility in functional form.	- May select an exposure somewhat arbitrarily in the presence of high correlation between exposures (Morlini 2006). - Meaningful standard error estimates are not available; bootstrapping techniques are required for statistical inference (Friedman and Roosen 1995). - Confounder terms are also subject to selection.	None	Friedman and Roosen (1995); Hastie et al. (2009); Waters in NIEHS (2015)	earth in R (Milborrow 2017)
Classification and regression trees (CART) and ensemble methods: stochastic gradient boosting (SGB), random forests (RF), and Bayesian additive regression trees (BART); tree-based distributed lag models	Statistical learning methods that partition data into a tree. Numerous regression trees can be combined to improve predictive performance using ensemble methods.	Classification; variable selection	Identification of important mixture components; pairwise and higher-order interactions; nonlinearity; identification of windows of exposure susceptibility (tree-based distributed lag models)	Continuous; binary	- Does not assume additivity and linearity in exposure-response relationships (Coull et al. 2015). - Able to automatically deal with missing values through surrogate splits (Hastie et al. 2009). - Immune to outliers (Hastie et al. 2009). - BART provides exposure effect estimates with uncertainty intervals (Chipman et al. 2010; Hernández et al. 2015).	- CART requires many splits to create a linear or nonlinear association and hence is better suited to categorical exposures (Hastie et al. 2009). - Variable importance scores (RF and SGB) reflect predictive performance, not direction or magnitude of exposure-response associations (Bobb et al. 2015; Coull et al. 2015); i.e., effect estimates are not available. - Search for interactions is automatic, so spurious interactions may be identified (Lampa et al. 2014). - Confidence intervals are not immediately provided for SGB and RF, however can be obtained by bootstrap (Lampa et al. 2014). - No straightforward way to control for confounding (Coull et al. 2015), however see Gass et al. (2014) for a CART algorithm that controls for confounding. - Choice of threshold strategy for variable selection in BART requires cross-validation or prior knowledge of the sparsity of the solution (Bleich et al. 2014).	None	Lampa et al. (2014); Hastie et al. (2009); Gass et al. (2014); BART: Dunson in NIEHS (2015); Chipman et al. (2010); Hernández et al. (2015); Bleich et al. (2014); tree-based distributed lag models: Bello et al. (2017)	gbm in R (Ridgeway 2017); mboost in R (Bühlmann and Hothorn 2007); bartMachine in R (Kapelner and Bleich 2016); see Gass et al. (2014) for a CART algorithm that controls for confounding.
Bayesian model averaging (BMA)	A Bayesian technique that provides weighted average estimates over a set of all possible models rather than requiring selection of a single best model (Sun et al. 2013).	Variable selection; model averaging	Identification of important mixture components	Any GLM link function and family; survival data	- Accounts for uncertainty in model selection. - Supports the inclusion of interactions and nonlinearity.	- Not robust to multicollinearity (Sun et al. 2013). - Computational inefficiency involved in considering numerous models with large numbers of exposure variables (Sun et al. 2013).	Forns et al. (2016)	Sun et al. (2013); Wang et al. (2004)	BMA in R (Raftery et al. 2018)

Summary of statistical methods. - Eliminates multicollinearity by reducing data to several uncorrelated components (assuming orthogonal rotation). - Supervised PCA also performs variable selection and provides variable importance scores with fixed ordering (Bair et al. 2006). - Ability to control for confounding (in subsequent regression model using components from PCA). - CART requires many splits to create a linear or nonlinear association and hence is better suited to categorical exposures (Hastie et al. 2009). - Variable importance scores (RF and SGB) reflect predictive performance, not direction or magnitude of exposure-response associations (Bobb et al. 2015; Coull et al. 2015); i.e., effect estimates are not available. - Search for interactions is automatic, so spurious interactions may be identified (Lampa et al. 2014). - Confidence intervals are not immediately provided for SGB and RF, however can be obtained by bootstrap (Lampa et al. 2014). - No straightforward way to control for confounding (Coull et al. 2015), however see Gass et al. (2014) for a CART algorithm that controls for confounding. - Choice of threshold strategy for variable selection in BART requires cross-validation or prior knowledge of the sparsity of the solution (Bleich et al. 2014). Correlation between chemicals with common sources, exposure pathways, or metabolic processes, can induce multicollinearity when analyzed simultaneously (Carrico et al. 2015; Vrijheid et al. 2016). This is arguably the paramount statistical challenge facing the study of chemical mixtures. In GLMs, multicollinearity can cause standard errors to become inflated, leading to unstable coefficient estimates that are sensitive to minor changes in model specification. Consequently, it may be difficult to assess the relative importance of individual exposures, and therefore to separate potential etiological agents from copollutant confounders and redundant variables (Woodruff et al. 2009). The problem is potentiated with the addition of interaction and nonlinear terms in the model (Schisterman et al. 2017), and at an extreme, maximum likelihood techniques may fail to converge to a solution (MacLehose et al. 2007). Frequentist shrinkage methods, such as ridge regression, least absolute shrinkage and selection operator (LASSO) regression, and ENR, present an appealing solution (Table 2) (only ENR was used in the reviewed studies, by Forns et al. 2016; Lenters et al. 2015a). By imposing a penalty on the size of the coefficients, they ensure convergence and confer a degree of immunity to multicollinearity (Hastie et al. 2009). These methods shrink the coefficients of explanatory variables toward zero, with the degree of shrinkage determined by the strength of the association. LASSO and ENR also perform variable selection, whereby the coefficients of the least predictive variables are shrunk to exactly zero (Hastie et al. 2009), yielding a sparse model with greater interpretability and more-precise coefficient estimates. However, these estimators are biased and pose challenges for statistical inference because valid standard error estimates (and therefore confidence intervals) may be difficult to obtain and may depend on the choice of tuning parameters (i.e., parameters that are adjusted to find an optimal model) (Kyung et al. 2010; Pfeiffer et al. 2017). Confidence intervals may also lack proper coverage if the uncertainty introduced by variable selection is ignored in post-model-selection inference (Pfeiffer et al. 2017). Moreover, data-driven tuning parameter selection via k-fold cross-validation may lead to highly variable model selection results (Roberts and Nowak 2014); this instability may be especially apparent in small sample-size studies with weak signals and correlated exposures (Kirk et al. 2013; Lim and Yu 2016). Methods such as the percentile-LASSO (Roberts and Nowak 2014), estimation stability cross-validation (Lim and Yu 2016), and others (e.g., Frommlet et al. 2012; Kirk et al. 2013; Waldron et al. 2011), may address model selection instability. ENR may be preferred to LASSO in mixtures analyses. Whereas LASSO may choose only one exposure among a set of highly correlated exposures and drop the others (Hastie et al. 2009), ENR possesses a grouping property, ensuring that highly correlated variables are retained or dropped together by being assigned coefficients of similar magnitude (Zou and Hastie 2005). However, not all correlated exposures selected by this property may be associated with the outcome, and those exposures dropped by either method may be causal agents, so it cannot be assumed that nonselected exposures are safe. In a recent simulation study, ENR was found to have slightly higher sensitivity but slightly higher false discovery proportion than LASSO, with comparable accuracy (mean-squared error) (Lenters et al. 2017). Weighted quantile sum regression (WQSR) (not used in the reviewed studies) was developed as an alternative to LASSO and ENR for cross-sectional analyses and was reported to have greater accuracy (lower mean-squared error), comparable sensitivity, and improved specificity for variable selection among correlated exposures in simulation studies (Carrico et al. 2015) (Table 2). In WQSR, exposures are combined into an empirically weighted index, which provides both estimates of mixture health effects and indicators of exposure importance (i.e., weights); however, because the weights are nonzero, a threshold choice is required for variable selection (Carrico et al. 2015). Performing Bayesian shrinkage may offer a number of advantages beyond robustness to multicollinearity. Bayesian methods offer flexibility of specification, can incorporate external information, can handle missing data, are robust to overfitting in small sample sizes, and offer probability-based uncertainty intervals (Hernández et al. 2015). BHR methods treat exposure coefficients as random variables from a prior distribution (Table 2). They deal with multicollinearity by shrinking coefficients toward the prior mean, with the amount of shrinkage depending on the magnitude of the prior variance (MacLehose et al. 2007). The prior variance can be treated as known (semi-Bayes model; used by Braun et al. 2014; Buckley et al. 2016a, 2016b; Rull et al. 2006), requiring the researcher to set the parameter to a plausible value. However, unlike tuning parameters in frequentist shrinkage methods that are set by cross-validation, this tuning parameter has an intuitive interpretation as the variance of the exposure coefficients (Greenland 1994). If there is uncertainty about the magnitude of the parameter, it can be allowed to vary, with the advantage of making estimates more robust to misspecification of the prior (Greenland 1994; MacLehose et al. 2007). Coefficients can also be shrunk toward group-specific means using BHR; e.g., groups based on physicochemical similarity, as in the reviewed studies by Braun et al. (2014) and Rull et al. (2006). These groups may be overlapping and can be implied from a set of covariates rather than directly specified (MacLehose et al. 2007; Thomas et al. 2007). Alternatively, if groups cannot be chosen a priori, a Dirichlet process prior can be used to perform data-driven clustering (MacLehose et al. 2007). Methods that perform variable selection in addition to shrinkage can realize a further gain in precision by allowing exposures that do not affect the outcome to be excluded, a feature that is especially important in small samples (MacLehose et al. 2007). Unlike frequentist shrinkage methods that perform variable selection (i.e., LASSO and ENR), mixture priors in the “spike and slab” form (such as BHR-SSVS, used by Swartz et al. 2015) can also provide a measure of uncertainty in variable selection (Herring 2010). These methods also have the ability to account for truncation of exposures due to detection limits (Herring 2010). Like frequentist shrinkage estimators, BHR estimates are biased toward their prior mean. However, this bias may be more than compensated for by the reduction in variance (Greenland 1994). Because maximum likelihood estimates are only asymptotically unbiased, hierarchical estimates are likely to have a small-sample advantage in terms of accuracy when modeling multiple correlated exposures (Greenland 1994). Moreover, simulation studies of chemical mixtures have observed accuracy and precision advantages of BHR methods over their frequentist method counterparts (e.g., Roli and Monari 2015). Despite the availability of methods that are robust to multicollinearity, no statistical method is able to differentiate between very highly correlated exposures in the absence of further information. Preprocessing steps may be required, such as choosing an individual chemical to represent a group of very highly correlated chemicals or dimension reduction using PCA (Coull et al. 2015). For example, Braun et al. (2014) analyzed only the chemical with the highest median concentration in pairs of chemicals with correlation coefficients greater than 0.95 in absolute value. However, estimated associations may lack specificity if the chosen chemical is not a suitable surrogate for the others (e.g., a common proxy choice for PCBs is nonplanar PCB-153, but it is not representative of coplanar PCBs) (Engel and Wolff 2013). Moreover, inclusion of exposures unrelated to the outcome but correlated with exposures that are associated with the outcome may enhance, diminish, or reverse associations between the outcome and the important exposures due to the reversal paradox (Tu et al. 2008). One method that allows prior information on the exposure correlation structure to be incorporated is Bayesian kernel machine regression (BKMR) (Table 2). BKMR is a novel approach for mixtures analyses (not used in the reviewed studies), offering both component-wise and hierarchical (grouped) variable selection (BKMR-HVS) (Bobb et al. 2015; Coull et al. 2015). BKMR-HVS concurrently estimates the importance (i.e., posterior inclusion probability) of groups of highly correlated exposures as well as individual chemicals within the group, enabling selection of an exposure in a group of highly correlated exposures (Bobb et al. 2015; Coull et al. 2015). Because BKMR estimates the joint exposure–response function, it can also provide an estimate of an overall mixture effect, representing the risk associated with all exposures at a particular quantile vs. the median (Bobb et al. 2015).

Estimation of EDC-Mixture Effects

A robust body of evidence demonstrates mixture effects due to low-dose EDCs acting together via common mechanisms (Kortenkamp et al. 2007; Ribeiro et al. 2017). EDCs that act in combination may have only weak signals when analyzed individually, and these signals may not be detected in studies lacking sufficient power. Here we examine methods that may be used to estimate the health effects of EDC mixtures directly when the estimation of individual effects is not possible. A common approach for estimating EDC-mixture effects involves the use of aggregating variables, such as raw or weighted sums, or counts of exposures above detection limits (e.g., Koskenniemi et al. 2015; LaRocca et al. 2016; Table S3). These approaches are appealing because of their ease of implementation; however, they discard information and make strong assumptions that effects are additive (implying independence and no interaction) and that exposure–response relationships are linear. Raw sums and counts assume component chemicals are equipotent; these aggregate variables will be driven by the highest concentration pollutants, which may not necessarily be the most detrimental to health (Axelrad et al. 2009; Braun et al. 2016). Even when component chemicals have been transformed to a common scale, heterogeneous concentration ranges and detection limits among mixture components imply that these chemicals are measured with differing precision, which may induce noise in the aggregate variable (Engel and Wolff 2013). Relative potencies of individual chemicals within a mixture may be addressed by a weighted sum; however, the quality of the resulting aggregate variable depends on the availability of accurate outcome-specific potency measures as those derived in different biological contexts may be inappropriate (Gennings et al. 2010; Zoeller et al. 2014). Findings based on aggregate variables will be sensitive to these assumptions, as well as the particular choice of mixture components and approach chosen for aggregating exposures. The analysis of multiple aggregate exposure variables in one model makes further assumptions regarding independence and additivity of groups of exposures. Exposure groups based on physicochemical similarity, a common grouping method, may not share similar biological mechanisms. In an extreme case, Suzuki et al. (2010) and Tatsuta et al. (2014) summed 209 PCB congeners that were above detection limits. However, individual congeners may have varying effects and opposing actions (e.g., some are estrogenic, and others are antiestrogenic; Cooke et al. 2001), and these classifications may be specific to the tissues and diseases under investigation (Gennings et al. 2010). To separate EDCs into distinct groups based on modes of action may not be possible as the mode may be dose-dependent for individual EDCs; for example, low doses of some xenoestrogens may act by binding to estrogen receptors, whereas, at high doses, these same chemicals may bind to the receptors of other hormones (e.g., androgen or thyroid hormone receptors), through receptor cross-talk (Myers et al. 2009). Here, methods that allow estimation of health effects for overlapping groups of exposures, as well as careful consideration of group composition, may be more appropriate. For example, Varshavsky et al. (2016) recently presented a novel potency-weighted metric for cumulative exposure to antiandrogenic phthalates based on common adverse outcomes. For predefined groups of exposures, health effects may be estimated using WQSR and grouped LASSO (Hastie et al. 2009). WQSR addresses some of the limitations of aggregate exposure variables discussed above; WQSR uses empirically determined weights that take into account the exposure correlation structure and, by focusing inference in the direction of increased risk, produces an index that is readily interpretable (Carrico et al. 2015). SEM may also be used to assess health effects of predefined exposure groups (Table 2, used by Heilmann et al. 2006). In this approach, chemicals sharing similar characteristics are assumed to reflect a latent true exposure, which are then used to estimate effects for one or more outcomes (Heilmann et al. 2006). This pooling of information from several exposures results in a more powerful analysis than using linear regression methods (Budtz-Jørgensen et al. 2002; Heilmann et al. 2006); however, the gain in power may come at a considerable cost. SEMs are susceptible to misspecification because they impose multiple strong assumptions (regarding the existence and direction of causal relationships specified in a path diagram, as well as their distributional and functional forms) that may be difficult to verify in practice and may lead to severe biases if incorrect (VanderWeele 2012). SEMs are therefore considered confirmatory methods, suitable in settings where sufficient knowledge exists for the specification of a path diagram and sensible parametric assumptions (Sánchez et al. 2005). Dimension reduction methods present an alternative to the use of aggregating variables and are particularly useful when the number of exposures is large in comparison with the sample size. PCA reduces multidimensional exposure data to several orthogonal (uncorrelated) components that can be used in a regression model in place of the original exposure variables (Table 2) (e.g., Agay-Shay et al. 2015; Maresca et al. 2016), providing a measure of health effects for components that summarize the variability in the exposure data. Although the component compositions may be statistically informative, they may not necessarily possess biologically relevant interpretations or provide a good explanation of the outcome (Bair et al. 2006). Better explanatory power (but not necessarily interpretability) may be obtained using supervised PCA, which captures information in both the exposures and outcome by preselecting exposures most correlated with the outcome prior to constructing components (Bair et al. 2006). However, because this preselection is based on univariate exposure–outcome relationships, there may be some redundancy in the set of selected exposures because highly correlated exposures are likely to be chosen together (Hastie et al. 2009). This redundancy may be ameliorated through the use of preconditioned LASSO (i.e., applying the LASSO as a postprocessor using a predicted outcome from supervised PCA), a two-stage procedure that produces a sparse model (Hastie et al. 2009; Paul et al. 2007). An alternative supervised dimension reduction method is partial least squares regression (PLSR; Table 2) (e.g., Berg et al. 2016), which constructs latent variables by forming linear combinations of exposures that maximize the covariance between the exposures and outcome (Hastie et al. 2009). However, although both unsupervised and supervised dimension reduction methods address the problem of multicollinearity, they do so at the cost of interpretability. The interpretability of the latent variables can be improved through the use of sparse PLSR, which performs simultaneous dimension reduction and variable selection (Chun and Keleş 2010). The use of biomarkers of cumulative EDC exposure, such as the total effective xenoestrogen burden (TEXB; Fernández et al. 2004) and total effective xenobiotic burden of antiandrogens (TEXB-AA; Arrebola et al. 2015), present a nonstatistical alternative for estimating mixture effects. For example, Fernández et al. (2007) analyzed both TEXB and 16 organochlorine pesticides in relation to the risk of cryptorchidism and hypospadias in male neonates. TEXB provides dual fractions composed primarily of organohalogenated lipophilic xenoestrogens (alpha fraction) and endogenous hormones with more polar xenoestrogens (beta fraction) (Fernández et al. 2004; Pastor-Barriuso et al. 2016). The beta fraction has been used to control for potential confounding by endogenous hormones (e.g., Vilahur et al. 2014); however, this adjustment is thought to induce a selection bias (Pastor-Barriuso et al. 2016).

Accounting for Nonmonotonicity in EDC Exposure–Response Relationships

Motivation for assessing nonmonotonicity.

The breadth of evidence for nonmonotonic EDC dose–response relationships observed across cell culture and animal experimental studies as well as the epidemiological literature has led some authors to recommend that nonmonotonicity should be assumed by default for chemicals that behave like hormones (Myers et al. 2009; Vandenberg et al. 2012). Despite this recommendation, fewer than half of the reviewed studies reported assessing nonlinearity (Tables 1 and S3). The assumption of linearity may be made to simplify statistical analyses, with the view that nonmonotonic relationships are unlikely to be observed over the comparatively narrow exposure ranges common in epidemiological studies (the principle of low-dose linearity; Thomas 2009), or the possibility that sampling biases in observational studies may create the impression of nonmonotonicity when dose–response is truly linear (Engel and Wolff 2013). Data sparsity in regions of nonlinearity may also make these relationships difficult to detect; a large overall sample size and sufficiently large sample size in regions of nonlinearity are required to detect nonlinearity (May and Bigelow 2006). Small sample sizes may yield wide confidence intervals across the exposure–response curve and uncertainty in functional form. Nevertheless, assuming linearity when exposure–response relationships are nonmonotonic can lead to misleading conclusions; for example, a horizontal fitted line when there is a U-shaped relationship (May and Bigelow 2006). Such model misspecification may be more severe in an analysis of multiple correlated exposures, where bias due to omitting important variables (e.g., quadratic terms) may be exacerbated by multicollinearity (Ganzach 1997; Schisterman et al. 2017). Moreover, although a narrow exposure range is common in observational studies of one chemical, the cumulative exposure range of a mixture of similarly acting chemicals is likely to be wider, which increases the chances of observing a nonmonotonic association.

Methods for addressing nonmonotonicity.

We identified several statistical approaches for both diagnosing and modeling nonmonotonicity. A common approach in the reviewed studies involved modeling categorical exposure variables in place of their continuous counterparts (e.g., using quantiles; Berg et al. 2016). Although the simplicity of this approach may be appealing, it has several limitations, including loss of information that may preclude the identification of more complex functional forms with few categories, sensitivity to the arbitrary choice of categorization (Greenland 1995), and bias away from the null if observations near category cutpoints are misclassified (Eisen et al. 2004). Moreover, the large number of additional parameters requiring estimation can lead to a loss of power, which may make this approach unfeasible in mixtures studies with many exposures and comparatively small sample size. An alternate approach that avoids these pitfalls is the inclusion of polynomial exposure terms (e.g., quadratic or cubic) directly in a parametric model (e.g., Rodrigues et al. 2016). These terms enable assessment of U-shaped, inverted U-shaped, or more complex multiphasic exposure–response relationships. They are straightforward to implement and particularly useful when the functional form conforms to a known parametric type. An improvement in fit and more flexibility in functional form may be obtained by using fractional polynomials to automatically identify the optimal power transformations according to goodness-of-fit criteria (May and Bigelow 2006). However, including polynomial terms for multiple correlated exposures may exacerbate multicollinearity issues and reduce the interpretability of parameter estimates. Moreover, these approaches apply a global fit; i.e., they assume the same functional form at all levels of exposure, which may lead to poor fit in some exposure regions (Keele 2007). In the absence of hypotheses regarding functional form, GAMs may be used to fit flexible nonparametric exposure–response functions based on splines (Table 2) (e.g., Heilmann et al. 2006). For diagnosing nonlinearity, GAMs are powerful tools that impose few assumptions on the data. One of the primary strengths of GAMs is that they are directly comparable with nested parametric GLM fits using likelihood-based statistics (Keele 2007). Semiparametric regression extends GAMs by allowing additional linear additive terms with the usual parametric specifications (e.g., linear confounder terms) (Keele 2007). This method was used by Claus Henn et al. (2016), who analyzed the effect of arsenic on fetal growth while controlling for smoothed lead and manganese terms and linear confounders. Bootstrapping may be used to compare the fits of parametric and semiparametric models (Keele 2007). One drawback of nonlinear approaches is the lack of a single reportable summary statistic. However, plots of the mean nonlinear association with confidence intervals can provide a more informative summary of the strength of a statistical relationship than single statistics (Keele 2007). Nonetheless, a number of authors used GAMs only as exploratory tools for the diagnosis of nonlinearity (e.g., Agay-Shay et al. 2015 analyzed exposure tertiles after detecting nonlinearity through GAMs) and to inform a parametric functional form (e.g., Rodrigues et al. 2016). GAMs may suffer from concurvity, the nonlinear analog to collinearity, yielding unstable estimates or misleading functional forms (Gu et al. 2010). Although shrinkage methods can improve numerical stability and predictive accuracy in the presence of collinearity, they do not address concurvity (Gu et al. 2010). Instead, the method of partial GAMs, which adjusts for concurvity through a variable selection procedure that sequentially eliminates functional dependencies between covariates, may be useful (Gu et al. 2010). This method yields a more parsimonious model that is easier to interpret and a graphical representation of concurvity between covariates (Gu et al. 2010). BKMR is an alternative nonparametric approach, not used in the reviewed studies, which addresses the multifaceted goals of mixtures analyses (Table 2). BKMR offers flexible estimation of a multivariate exposure–response surface, allowing for nonlinearity and interactions while simultaneously performing componentwise or hierarchical variable selection, with the latter providing selection among a group of highly correlated exposures (Bobb et al. 2015; Coull et al. 2015). Other methods that can simultaneously and automatically select variables and functional forms include shrinkage approaches in GAMs (Marra and Wood 2011) and Bayesian approaches based on spike and slab priors (Scheipl et al. 2013). For higher-dimensional problems, statistical learning methods may be more appropriate than GAMs and BKMR. MARS is a nonparametric method that proceeds in a manner similar to stepwise regression, building a model from piecewise linear basis functions and their products (Hastie et al. 2009; Morlini 2006) (Table 2). It is able to automatically select variables and identify interactions in large datasets, as well as modeling nonlinearity. However, it is not able to control for confounding as confounder terms are also subject to selection, and in the presence of high correlation between exposures, it may select exposures somewhat arbitrarily (Morlini 2006).

Interactions between EDCs

The assessment of interactions compounds the methodological complexity of mixtures analyses. Fitting a GLM with product terms between pairs of exposures quickly becomes intractable as the number of exposures increases; for k exposures, the assessment of all two-way interactions introduces interaction terms, which may overwhelm modest sample sizes and exacerbate multicollinearity. Even in cases where the parameter count-to-sample size ratio is small, correlated measurement error among biomarkers may diminish the power available to detect interactions (Pollack et al. 2013). The paucity of studies in this review examining interactions perhaps reflects these difficulties (Tables 1 and S3). Nevertheless, synergistic effects between EDCs are biologically plausible (Claus Henn et al. 2014; Diamanti-Kandarakis et al. 2009; Kortenkamp 2007; Rosa et al. 2011; Silins and Högberg 2011) and have been reported in studies of metal mixtures and child health (Claus Henn et al. 2014). Antagonistic and synergistic EDC-mixture effects have also been observed in human cell lines (Ribeiro et al. 2017) and toxicological studies (e.g., Christiansen et al. 2009); however, whereas interaction is defined as a departure from additivity, the definition of additivity can vary widely between (and among) toxicologists and epidemiologists (Howard and Webster 2013). Assessment of interactions among EDCs needs to take into account the possibility of nonmonotonicity in exposure–response relationships. When both nonmonotonicity and interaction exist, the consequences of misspecifying either of these characteristics may be severe. In a linear regression model, given two correlated exposures that interact and have U- or inverted U-shaped relationships with the outcome, assuming linearity may indicate that the relationship is synergistic when it is truly antagonistic, or vice versa (Ganzach 1997). Moreover, failure to assess interaction may result in exposure–response relationships that appear flat or convex when they are truly concave, or vice versa (Ganzach 1997). To our knowledge, these issues have been insufficiently explored in analyses of environmental chemical mixtures. We identified several methods for identifying and modeling interaction between chemical exposures (Table 2), none of which were used in the reviewed studies. To address the risk of multicollinearity when including main effects and pairwise product terms of correlated exposures, LASSO and grouped LASSO regression have been extended to fit sparse hierarchical interaction models, performing variable selection while honoring the hierarchy principle for interactions (that for nonzero interaction coefficients, one or both main effects should also be included in the model) (Bien et al. 2013; Lim and Hastie 2015). Sparse hierarchical interaction models can also be built using BHR-SSVS by introducing dependence into the usual independent Bernoulli prior for variable importance (Bien et al. 2013; Chipman 1996). Where nonlinearity is expected, pairwise and possibly higher-order interactions can be modeled using multivariate smoothers in GAMs, such as tensor products or thin plate splines (Ruppert et al. 2003). Nonlinear multivariate interactions can also be identified using BKMR, by plotting cross-sections of the estimated joint nonparametric exposure–response function (Bobb et al. 2015). Tree-based statistical learning methods present an alternative for datasets with a large number of exposures. These methods do not assume additivity or linearity in exposure–response relationships and are able to automatically identify and account for nonlinearity and pairwise or higher-order interaction effects (Lampa et al. 2014). Classification and regression trees (CART) is a nonparametric method that partitions data into a decision tree, and multiple trees can be combined to improve predictive performance using ensemble methods, such as stochastic gradient boosting (SGB) and random forests (RF) (Table 2). SGB combines trees by fitting a weighted additive model of trees based on random subsets of the data, whereas RF averages trees based on bootstrapped data and uses a random sample of exposures for determining splits in individual trees (Hastie et al. 2009). Although both methods provide a measure of variable importance, SGB may be preferred for identifying interactions in studies of chemical mixtures because it provides a measure of interaction strength (Lampa et al. 2014). Although the inability of these methods to impose a monotonicity constraint may be advantageous for modeling EDC exposure–response relationships, it is problematic when linearity needs to be enforced in the handling of confounders (Sun et al. 2013). This issue was addressed by Gass et al. (2014), who developed a CART algorithm that controls for confounding. A further drawback is that spurious interactions may be identified in the presence of nonlinearity and high correlation between exposures (Lampa et al. 2014). One solution may be to encourage repeated splits on the same variable in the CART algorithm (Friedman and Popescu 2008; Lampa et al. 2014); another option may be to use principal component scores in place of subsets of very highly correlated exposures (Lampa et al. 2014). Consequently, these methods need to be considered exploratory unless the results can be validated (e.g., using external data or split-sample validation; Lampa et al. 2014). Some of the drawbacks of tree-based statistical learning methods are addressed by Bayesian additive regression trees (BART), a Bayesian ensemble method that fits a sum of trees model (Chipman et al. 2010) (Table 2). Although SGB and RF provide point estimates of variable importance, these scores reflect predictive performance, not direction or magnitude of exposure–response associations (Bobb et al. 2015), and there is no measure of uncertainty. BART shares the benefits of ensemble methods along with the benefits of a fully specified Bayesian model; i.e., full posterior inference capabilities, providing exposure-effect estimates and probability-based uncertainty intervals, and the ability to incorporate prior knowledge (Chipman et al. 2010; Hernández et al. 2015). Moreover, BART can be extended to include linear confounder terms and random effects (Chipman et al. 2010). Simulation studies comparing the performance of methods for assessing interactions in analyses of environmental chemical mixtures are lacking. In a recent exposome simulation study, boosted regression trees had lower sensitivity but lower false discovery proportion than hierarchical grouped LASSO (Barrera-Gómez et al. 2017). In a simulation study of 4–20 exposures, LASSO, BMA, and supervised PCA all had high interaction detection rates, with BMA and LASSO possessing the lowest false positive rates (Sun et al. 2013). Using CART as a preprocessor to restrict the analysis to a subset of exposures improved interaction detection rates and false positive rates (Sun et al. 2013). The deletion/substitution/addition algorithm, not reviewed here, had poor performance in identifying interactions (Sun et al. 2013). Further simulation studies are warranted to characterize the performance of the available methods in identifying interactions between EDCs.

Exposure Measurement Error and Timing of Exposure Measurement

Careful timing of exposure measurement is crucial in studies of prenatal exposure to chemical mixtures, requiring consideration of both the etiologically relevant window of exposure and the persistence of chemical species within the mixture. EDCs exhibit a broad gamut of biological half-lives; at one extreme, the metabolism and excretion of some phenols and phthalates occur within hours, whereas the half-lives of lipophilic persistent organic pollutants are in the order of years. The levels of nonpersistent chemicals in pregnant women have been shown to exhibit high intraindividual variability (Braun et al. 2012), reflecting patterns in the use of personal care products (Braun et al. 2012), plastics (Braun et al. 2013), seasonality (e.g., in pesticide use; Whyatt et al. 2004), and changes in xenobiotic metabolism associated with pregnancy (Braun et al. 2012; Jusko et al. 2014). This variability suggests the need for repeated measurements, as spot measurements may not adequately capture long-term exposure levels or average exposure levels across a targeted window of susceptibility (Braun et al. 2012). For persistent chemicals, the intraindividual variability is likely to be overshadowed by interindividual variability (Gennings et al. 2013; Longnecker et al. 1999), suggesting that timing of exposure assessment is more flexible for these chemicals (Romano et al. 2014). However, even within classes of persistent chemicals, the magnitude of intraindividual variability is likely to be chemical-specific (e.g., the nondioxin-like di-ortho PCB congeners have substantially longer half-lives than the dioxin-like mono-ortho PCBs; Heilmann et al. 2006). Increasing the complexity, further sources of measurement error include differential placental transfer rates of chemicals (e.g., for PBDE congeners, there is a lower rate of placental transfer with increasing degree of bromination; Frederiksen et al. 2010), differential rates of metabolism and bioaccumulation within the fetus across chemicals (e.g., fetal methylmercury levels are greater than maternal levels, whereas the converse is true for PBDEs; Frederiksen et al. 2010; Stern and Smith 2003), and the lower precision in measurement of exposures with concentration ranges near detection limits than higher-concentration-range compounds (Engel and Wolff 2013). These asymmetries between EDCs suggest that exposure may be misclassified to differing degrees for individual chemicals within a mixture, especially if exposure is measured at only a single point in pregnancy. Although acknowledging the degree of exposure misclassification is universally important in epidemiological studies, it is of particular importance in studies of chemical mixtures. Exposure measurement error in a single-pollutant model is expected to bias exposure–response associations toward the null, when the error is nondifferential with respect to the outcome (Zeka and Schwartz 2004). However, when exposures with differing degrees of (nondifferential) measurement error are analyzed in a multipollutant model, the measurement error in one variable is able to contribute bias to the coefficients of other variables (Zeka and Schwartz 2004). Importantly, both bias toward the null and away from the null are possible, with the overall direction and magnitude of the bias dependent on the extent of correlation between the exposures, and the variances and correlation of the measurement errors (Zeger et al. 2000). Regardless of the true strength of an association, an exposure with lower measurement error may appear more strongly associated with an outcome than an exposure with higher measurement error may appear (Vedal and Kaufman 2011; Woodruff et al. 2009). If identifying causal agents within a mixture is the goal of the analysis, this issue may lead to misleading conclusions, unless the precision of exposure measurement is considered or explicitly incorporated into statistical models (Woodruff et al. 2009). Not only is the use of single measurements anticipated to bias effect estimates, it also reduces the power to detect exposure–response associations (Jusko et al. 2012; Perrier et al. 2016). In the Generation R study, Jusko et al. (2012) demonstrated that the required sample size for detecting linear associations was almost halved when using the mean across three BPA measurements vs. one measurement. A more powerful analysis can be realized not only from increasing the number of individuals but also from increasing the number of measurements per individual of short half-life chemicals (Jusko et al. 2012; Perrier et al. 2016), though this gain in power is attenuated with increasing intraindividual correlation. To reduce costs of repeated exposure measurements, pooling multiple biospecimens per individual before assaying chemicals may decrease bias and increase power, in comparison with the use of a single spot measurement (Perrier et al. 2016). The majority of studies we reviewed analyzed nonpersistent chemicals on the basis of a single measurement (Table S3). The few studies that took repeated measurements to reduce measurement error used the average of two spot measurements across pregnancy (e.g., Casas et al. 2016; Gascon et al. 2015b). Although this approach is useful for summarizing longitudinal exposure data when measurements are unstable over time, a more powerful analysis may be obtained by taking into account the longitudinal nature of the exposure data, using methods such as a two-stage mixed effects model, generalized additive mixed model, and functional clustering model (Chen et al. 2015). These methods avoid loss of information about exposure temporal variation and may enable other time-varying covariates to be incorporated (Chen et al. 2015). Only two reviewed studies took exposure measurement error into account in their statistical methodology (Claus Henn et al. 2016; Heilmann et al. 2006) (Table S3). For incorporating measurement error into statistical models, numerous applicable methods exist that form part of an extensive statistical literature dealing with this issue (e.g., Carroll et al. 2006; Gustafson et al. 2003; Thomas 2009). An exhaustive review is outside the scope of this article, but we present SEM and BHR as examples. SEM (used by Heilmann et al. 2006) enables the simultaneous modeling of a system of multiple outcomes and exposures that are measured with error (Table 2) (Grandjean et al. 2004; Sánchez et al. 2005). SEM can correct for classical measurement error when multiple exposure measurements are available (e.g., exposure measured across multiple biological matrices or multiple metabolites), under certain distributional and independence assumptions (e.g., independent and normally distributed measurement errors). This method was demonstrated by Budtz-Jørgensen et al. (2002), in which a latent methylmercury exposure variable was assumed to be measured with error by two methylmercury biomarkers in umbilical cord blood and maternal hair. This approach is purported to correct for bias due to both laboratory imprecision and biological variation (Grandjean et al. 2004); however, as discussed above, SEMs have been criticized for their susceptibility to misspecification and reliance on strong causal and parametric assumptions that may lead to severe biases (VanderWeele 2012). BHR provides a flexible framework for correcting for bias due to measurement error, by treating exposures as latent variables and specifying additional submodels for the measurement error and prior (Gustafson et al. 2003; Muff et al. 2015; Richardson et al. 2002). Correcting measurement error bias requires prior knowledge of the measurement error mechanism (e.g., classical or Berkson) and the measurement error variance from repeated measures, external studies, or expert knowledge (Gustafson et al. 2003; Muff et al. 2015). Where prior information is lacking, sensitivity analysis may allow examination of the extent to which inferences are affected by measurement error correction.

Identifying Windows of Exposure Susceptibility in Mixtures Analyses

The challenges in identifying vulnerable gestational windows in mixtures analyses are considerable. Teasing out the independent effects of multiple correlated exposures at multiple times, which may have nonlinear relationships with the outcome, may be an intractable problem when the number of exposures is large relative to the sample size. Consequently, most of the reviewed studies that compared windows of exposure susceptibility analyzed time points separately in parallel cross-sectional models (e.g., Erkin-Cakmak et al. 2015; Gascon et al. 2015a) (Table S3). Although this approach is straightforward, it does not allow for formal tests of differences in effect estimates across windows and may result in bias if exposure is not uniform throughout the window (Sánchez et al. 2011). Missing data may further hamper comparisons, if effect estimates at separate time points are based on different subsets of the data (Chen et al. 2015; Sánchez et al. 2011). Several of the reviewed studies analyzed multiple exposures at multiple times in one regression model (e.g., Jacobson et al. 2015; Roen et al. 2015) (Table S3). Although this approach may enable the estimation of independent effects, estimates may be based on a reduced sample if a complete-case analysis is conducted due to missing exposure data at varying time points across individuals (Chen et al. 2015; Sánchez et al. 2011). The interpretation of some effect estimates may be temporally invalid (Chen et al. 2015); e.g., the effect of prenatal exposure controlling for postnatal exposure. Additionally, prenatal effect estimates may be subject to overadjustment bias if postnatal exposure is on the causal pathway between prenatal exposure and the outcome (Schisterman et al. 2009), which may be the case for persistent chemicals. Estimates may also be affected by multicollinearity, although dimension reduction methods may be used to address this (e.g., Maresca et al. 2016). Allowing prenatal exposure to be a determinant of latent postnatal exposure in a SEM analysis, as performed by Heilmann et al. (2006), is appealing; however, marginal structural models (not reviewed here) may be better equipped to address time-varying exposures and confounders while making fewer assumptions than SEMs (VanderWeele 2012). Several additional methods have been suggested in the single-exposure setting (Sánchez et al. 2011). Multiple informant models (MIM) may be used to jointly estimate separate regression models for each window, allowing for formal testing of differences in effect estimates across windows (Sánchez et al. 2011). Baek et al. (2014) extended MIM for hierarchical data with multiple correlated predictors, examining the school food environment and childhood obesity, an approach that may be applicable to the study of several correlated chemical exposures over time. Alternatively mixed models may be used with random intercepts and slopes if exposure is measured in a sufficient number of occasions to enable the estimation of exposure trajectories (Sánchez et al. 2011). However, we are not aware of any studies that have implemented these methods in the multipollutant setting, and they may be limited to models of only a few exposures. Two novel methods have recently been proposed for mixtures analyses with numerous repeated exposure measurements. Lagged WQSR can be used for modeling high-resolution exposure trajectories where there is heterogeneity in the timing of exposure measurement across individuals (e.g., from baby tooth biomarkers) (Bello et al. 2017). Based on a mixed modeling framework, lagged WQSR quantifies the magnitude and significance of the mixture effect over time, enabling the identification of critical windows of exposure susceptibility as well as the drivers of the association at each time point (Bello et al. 2017). The method is also attractive because of its robustness to missing values in exposure measures over time; however, one caveat is that the direction of expected effects of individual exposures on the outcome are required to be homogenous (Bello et al. 2017). Tree-based distributed lag models do not require homogeneity in the direction of expected effects but can only quantify the combined explanatory power of exposures within a time window, not the direction of the mixture effect, and are more suited to settings with uniformly timed exposure measurement across individuals and datasets with no missing exposure values (Bello et al. 2017). Neither method is able to quantify individual exposure-effect estimates or interactions between mixture components over time, which would require estimation of the complex multidimensional exposure–response surface (Bello et al. 2017).

Synthesis and Recommendations

We have examined a variety of frequentist and Bayesian methods performing dimension reduction, shrinkage, variable selection, statistical learning, or smoothing, which may be preferred to the use of more established methods when modeling EDC mixtures. We concentrated on methods that may be able to achieve the manifold goals of mixtures analyses; however, few methods are able to achieve multiple goals simultaneously. Two methods developed specifically for mixtures analyses in environmental epidemiology deserve special mention: Both WQSR (Carrico et al. 2015) and BKMR (Bobb et al. 2015) provide a measure of mixture health effects while identifying drivers of the association, with the latter also accounting for nonlinearity and interaction in multivariate exposure–response relationships. For multifaceted research questions assessing the health effects of numerous correlated exposures, we suggest exploring staged statistical methods (e.g., preconditioned LASSO or using CART as a preprocessor; Sun et al. 2013), and leveraging the flexibility offered by Bayesian methods such as BHM, BKMR, or BART. However, method selection should be made with cognizance of the dimensionality of the problem, the sample size, the ability to adjust for confounding, and the relative importance of each goal to the research question, so prescriptive recommendations are not possible. Hypothesis tests should be conducted with awareness of multiple testing issues; methods for addressing multiple testing have been discussed in depth elsewhere (Braun et al. 2016; Chadeau-Hyam et al. 2013; Patel 2017). Multiple testing issues are inadequately addressed by many variable selection methods, so tools for valid selective inference may need to be explored (e.g., Berk et al. 2013; Lockhart et al. 2014; Meinshausen and Bühlmann 2010; Tibshirani et al. 2016). Moreover, the use of data-driven model optimization approaches, such as k-fold cross-validation for the selection of tuning parameters in penalized regression, may lead to model selection instability and concerns regarding the reproducibility of results (Lim and Yu 2016; Roberts and Nowak 2014). Cross-validation selects a solution that optimizes predictive performance but not necessarily performance metrics for variable selection and parameter estimation (Lim and Yu 2016). We recommend considering methods that address estimation stability (e.g., Lim and Yu 2016; Roberts and Nowak 2014); focusing on both estimation stability and predictive performance may improve false positive rates (Kirk et al. 2013; Lim and Yu 2016). Bayesian variable selection may also be highly sensitive to prior specification (Coull et al. 2015) and involve uncertainty in the choice of variable selection thresholds (Bleich et al. 2014); therefore, careful specification of priors and hyperparameters is needed. One of our key conclusions is that the selection of chemicals to analyze as a mixture should be made with care. Too many chemicals in one model may increase the risk of multicollinearity, and no statistical method is able to separate the signals of very highly correlated exposures without further information. Conversely, too few chemicals in one model risks bias from omitted copollutant confounders, with the possibility of detecting an association that is due to a correlated copollutant (Braun et al. 2016). Collinearity can exacerbate bias due to model misspecification, so careful elucidation of a causal framework prior to data analysis is essential (Schisterman et al. 2017). These considerations are important in the study of EDCs, where data on important confounders, such as endogenous hormone levels (Kortenkamp 2007), dietary exposures [e.g., omega-3 fatty acids (Engel and Wolff 2013), phytoestrogens, mycoestrogens], and the microbiome [which is involved in xenobiotic metabolism (Dietert and Silbergeld 2015)], may be unavailable. Moreover, bias due to the joint effect of multiple unmeasured (even weak) confounders can be substantial, especially if they are mutually correlated but independent of the included confounders (Groenwold et al. 2016), which may be the case when one class of EDCs is the focus of an analysis and other important classes of EDCs are excluded. For the reasons outlined, opportunistic selection of variables with the assumption that small data sets will necessarily reveal true patterns in exposure–health associations should be discouraged, in favor of careful consideration of biological mechanisms, modes of action, phenomenological similarity, and the potential for interaction, when combining EDCs (Kortenkamp 2007). We stress the importance of considering exposure measurement error in studies of chemical mixtures. Failure to do so may bias effect estimates toward or away from the null, to an unpredictable extent (Zeka and Schwartz 2004). Exposure measurement error can be reduced for nonpersistent chemicals by taking multiple exposure measurements during a hypothesized window of susceptibility; these measurements can be pooled and assayed once per individual to reduce costs (Perrier et al. 2016). Nevertheless, mixture components are likely to be measured with differing degrees of measurement error, so we encourage selection of a statistical method that explicitly accounts for measurement error or exploring the impact of measurement error on effect estimates through sensitivity analysis, rather than rationalizing the sequelae of disregarding this important issue (Thomas et al. 2007; Woodruff et al. 2009).

Limitations

Our findings should be considered in the context of the following limitations. Our results are limited to studies that stated that they were analyzing multiple exposures in the abstract. Although this approach may not have identified all of the existing EDC-mixtures studies, we are likely to have captured the majority of studies that used nonconventional statistical methods, as these studies are more likely to have expressed their methodological novelty in the abstract. In addition, we performed a targeted review of the statistical literature that did not include all possible statistical methods for studying the health effects of prenatal EDC mixtures.

Conclusions

Biomonitoring studies suggest that pregnant women are exposed to multiple EDCs simultaneously. This finding, coupled with toxicological evidence of adverse developmental effects following prenatal exposure to mixtures of EDCs and the possibility of underestimating effects if exposures are studied one at a time, provides compelling motivation for the analysis of EDC exposures in combination rather than separately. Understanding the health risks of exposure to EDC mixtures requires taking into account the potential for copollutant confounding and interaction between exposures, prudent selection of mixture components, as well as acknowledging that analyses based on simplifying assumptions of the nature of EDC exposure–response relationships may have unintended consequences. To this end, we examined the impact of methodological choices and assumptions regarding additivity, linearity, and exposure measurement timing on the appropriateness of causal inference in the study of prenatal exposure to EDC mixtures. We concurrently identified multiple applicable statistical methods and compared their strengths and weaknesses for modeling these exposures. Although several reviews have discussed multipollutant statistical methods in air pollution epidemiology and for the analysis of exposome–health associations (Billionnet et al. 2012; Davalos et al. 2016; Patel 2017; Stafoggia et al. 2017), none to our knowledge have placed them in the context of the specific statistical and epidemiological challenges facing the study of prenatal exposure to EDCs. Nevertheless, our recommendations may also apply to studies of postnatal exposures or exposures to environmental chemicals not acting through the endocrine system. Epidemiological studies are uniquely placed to provide insight into the potential human health risks of complex EDC mixtures that are more representative of real-world exposure patterns than the analysis of single chemicals. Understanding the role of prenatal exposure to EDC mixtures on development may help to solidify evidence for the suspected developmental etiologies of many endocrine-sensitive outcomes in childhood and adulthood, the increasing incidence of which remains unexplained (Diamanti-Kandarakis et al. 2009). We hope that this resource will aid researchers in selecting the appropriate analytical strategies and statistical methodology in this vital research area. Click here for additional data file. Click here for additional data file.

178 in total

1. Something from "nothing"--eight weak estrogenic chemicals combined at concentrations below NOECs produce significant mixture effects.

Authors: Elisabete Silva; Nissanka Rajapakse; Andreas Kortenkamp
Journal: Environ Sci Technol Date: 2002-04-15 Impact factor: 9.028

2. Underestimation of risk due to exposure misclassification.

Authors: Philippe Grandjean; Esben Budtz-Jørgensen; Niels Keiding; Pal Weihe
Journal: Int J Occup Med Environ Health Date: 2004 Impact factor: 1.843

3. Serial levels of serum organochlorines during pregnancy and postpartum.

Authors: M P Longnecker; M A Klebanoff; B C Gladen; H W Berendes
Journal: Arch Environ Health Date: 1999 Mar-Apr

4. Environmental boron exposure and activity of delta-aminolevulinic acid dehydratase (ALA-D) in a newborn population.

Authors: Guy Huel; Chadi Yazbeck; Daniel Burnel; Pascale Missy; Wolfram Kloppmann
Journal: Toxicol Sci Date: 2004-05-12 Impact factor: 4.849

5. Assessment of total effective xenoestrogen burden in adipose tissue and identification of chemicals responsible for the combined estrogenic effect.

Authors: Mariana F Fernández; Ana Rivas; Fátima Olea-Serrano; Isabel Cerrillo; José M Molina-Molina; Patricia Araque; José L Martínez-Vidal; Nicolas Olea
Journal: Anal Bioanal Chem Date: 2004-03-13 Impact factor: 4.142

6. Exposure measurement error in time-series studies of air pollution: concepts and consequences.

Authors: S L Zeger; D Thomas; F Dominici; J M Samet; J Schwartz; D Dockery; A Cohen
Journal: Environ Health Perspect Date: 2000-05 Impact factor: 9.031

7. An assessment of the cord blood:maternal blood methylmercury ratio: implications for risk assessment.

Authors: Alan H Stern; Andrew E Smith
Journal: Environ Health Perspect Date: 2003-09 Impact factor: 9.031

8. Prenatal insecticide exposures and birth weight and length among an urban minority cohort.

Authors: Robin M Whyatt; Virginia Rauh; Dana B Barr; David E Camann; Howard F Andrews; Robin Garfinkel; Lori A Hoepner; Diurka Diaz; Jessica Dietrich; Andria Reyes; Deliang Tang; Patrick L Kinney; Frederica P Perera
Journal: Environ Health Perspect Date: 2004-07 Impact factor: 9.031

9. Combining xenoestrogens at levels below individual no-observed-effect concentrations dramatically enhances steroid hormone action.

Authors: Nissanka Rajapakse; Elisabete Silva; Andreas Kortenkamp
Journal: Environ Health Perspect Date: 2002-09 Impact factor: 9.031

10. Estimation of health effects of prenatal methylmercury exposure using structural equation models.

Authors: Esben Budtz-Jørgensen; Niels Keiding; Philippe Grandjean; Pal Weihe
Journal: Environ Health Date: 2002-10-14 Impact factor: 5.984

35 in total

1. Prenatal exposure to a mixture of persistent organic pollutants (POPs) and child reading skills at school age.

Authors: Ann M Vuong; Changchun Xie; Roman Jandarov; Kim N Dietrich; Hongmei Zhang; Andreas Sjödin; Antonia M Calafat; Bruce P Lanphear; Lawrence McCandless; Joseph M Braun; Kimberly Yolton; Aimin Chen
Journal: Int J Hyg Environ Health Date: 2020-06-07 Impact factor: 5.840

Review 2. Use of Exposomic Methods Incorporating Sensors in Environmental Epidemiology.

Authors: Brett T Doherty; Jeremy P Koelmel; Elizabeth Z Lin; Megan E Romano; Krystal J Godri Pollitt
Journal: Curr Environ Health Rep Date: 2021-02-10

Review 3. Environmental mixtures and children's health: identifying appropriate statistical approaches.

Authors: Eva Tanner; Alison Lee; Elena Colicino
Journal: Curr Opin Pediatr Date: 2020-04 Impact factor: 2.856

4. Maternal serum perfluoroalkyl substance mixtures and thyroid hormone concentrations in maternal and cord sera: The HOME Study.

Authors: Rebecca M Lebeaux; Brett T Doherty; Lisa G Gallagher; R Thomas Zoeller; Andrew N Hoofnagle; Antonia M Calafat; Margaret R Karagas; Kimberly Yolton; Aimin Chen; Bruce P Lanphear; Joseph M Braun; Megan E Romano
Journal: Environ Res Date: 2020-03-16 Impact factor: 6.498

5. Evaluating associations between early pregnancy trace elements mixture and 2nd trimester gestational glucose levels: A comparison of three statistical approaches.

Authors: Yinnan Zheng; Cuilin Zhang; Marc G Weisskopf; Paige L Williams; Birgit Claus Henn; Patrick J Parsons; Christopher D Palmer; Germaine M Buck Louis; Tamarra James-Todd
Journal: Int J Hyg Environ Health Date: 2019-12-28 Impact factor: 5.840

6. Exposure to Polybrominated Diphenyl Ethers and a Polybrominated Biphenyl and Risk of Thyroid Cancer in Women: Single and Multi-Pollutant Approaches.

Authors: Nicole C Deziel; Javier Alfonso-Garrido; Joshua L Warren; Huang Huang; Andreas Sjodin; Yawei Zhang
Journal: Cancer Epidemiol Biomarkers Prev Date: 2019-08-06 Impact factor: 4.254

Review 7. Use of Zebrafish in Drug Discovery Toxicology.

Authors: Steven Cassar; Isaac Adatto; Jennifer L Freeman; Joshua T Gamse; Iñaki Iturria; Christian Lawrence; Arantza Muriana; Randall T Peterson; Steven Van Cruchten; Leonard I Zon
Journal: Chem Res Toxicol Date: 2019-11-16 Impact factor: 3.739

8. Prenatal exposure to mixtures of persistent endocrine disrupting chemicals and early menarche in a population-based cohort of British girls.

Authors: Kristin J Marks; Penelope P Howards; Melissa M Smarr; W Dana Flanders; Kate Northstone; Johnni H Daniel; Antonia M Calafat; Andreas Sjödin; Michele Marcus; Terryl J Hartman
Journal: Environ Pollut Date: 2021-02-09 Impact factor: 8.071

9. Parental preconception exposure to phenol and phthalate mixtures and the risk of preterm birth.

Authors: Yu Zhang; Vicente Mustieles; Paige L Williams; Blair J Wylie; Irene Souter; Antonia M Calafat; Melina Demokritou; Alexandria Lee; Stylianos Vagios; Russ Hauser; Carmen Messerlian
Journal: Environ Int Date: 2021-02-25 Impact factor: 9.621

10. Flame retardants and neurodevelopment: An updated review of epidemiological literature.

Authors: Ann M Vuong; Kimberly Yolton; Kim M Cecil; Joseph M Braun; Bruce P Lanphear; Aimin Chen
Journal: Curr Epidemiol Rep Date: 2020-11-10