Literature DB >> 26640396

Regression calibration with instrumental variables for longitudinal models with interaction terms, and application to air pollution studies.

M Strand¹, S Sillau², G K Grunwald³, N Rabinovitch⁴.

Abstract

In this paper, we derive forms of estimators and associated variances for regression calibration with instrumental variables in longitudinal models that include interaction terms between two unobservable predictors and interactions between these predictors and covariates not measured with error; the inclusion of the latter interactions generalize results we previously reported. The methods are applied to air pollution and health data collected on children with asthma. The new methods allow for the examination of how the relationship between health outcome leukotriene E4 (LTE4, a biomarker of inflammation) and two unobservable pollutant exposures and their interaction are modified by the presence or absence of upper respiratory infections. The pollutant variables include secondhand smoke and ambient (outdoor) fine particulate matter. Simulations verify the accuracy of the proposed methods under various conditions.

Entities: Chemical Disease Gene Species

Keywords: LTE4; ambient PM2.5; cigarette smoke; cotinine; errors in variables; measurement error

Year: 2015 PMID： 26640396 PMCID： PMC4662860 DOI： 10.1002/env.2354

Source DB: PubMed Journal: Environmetrics ISSN： 1099-095X Impact factor: 1.900

1 Introduction

Exposures to ambient air pollution and secondhand smoke (SHS) have been implicated as health risks, particularly for some conditions such as asthma (Chilmonczyk et al.,; Roemer et al.,; Peters et al., 1997; Vedalet al.,; Ostro et al., 2001; Wallace et al., 2003; Rabinovitch et al., ,). In terms of subject exposure, measurement error is inherent to air pollution studies because most have used fixed (central) monitors at a distance from study subjects. Even for studies that have used personal monitors, determining fractions of pollution due to SHS and outdoor sources involves estimation. Regression calibration with instrumental variables (RCIV; Carroll et al., Chapter 6, 2006; Buonaccorsi, Chapter 5, 2010; Buzas and Stefanski, 1996) can be used to minimize bias in health-effect estimates due to measurement error by employing data from both fixed and personal monitors as well as other variables related to exposure. Regression calibration methods are well established and have been extended for longitudinal or clustered data within recent decades, most commonly by employing linear mixed models (Bartlett et al., 2009); nonlinear mixed models, Ko and Davidian, 2000; or generalized linear mixed models (GLMM; Carroll et al., 1997; Wang et al., ,). Specific extensions for RCIV using GLMMs have also been made (Li and Wang, 2012). These articles provide framework for estimation but do not give explicit forms for specific models with interaction terms. Other recent measurement error methods articles involve interaction terms, but not longitudinal data or instrumental variables (e.g., Wong et al.,; Huang et al., 2005). We previously derived explicit forms of RCIV estimators and variances for longitudinal models with interaction terms between the unobservable predictors, that is, the true exposure measurements of interest, while adjusting for covariates not involved in interactions (Strand et al., 2014). Here, we extend these forms to allow interaction between the unobservable predictors and the covariates. This extension allows us to account for situations where the health–exposure relationship may be modified by subject characteristics or conditions. Regression calibration with instrumental variables algorithms employ unbiased but measured-with-error variables (unbiased surrogate variables) and variables that are linearly related to the true predictors of interest (instrumental variables) to obtain consistent and nearly unbiased estimators of the coefficients of the true predictors. Two particular algorithms are often used to carry out RCIV, one that obtains the calibrated estimates as functions of estimates from the two separate model fits (RCIV1) and another that obtains predicted values from one model fit, then plugs these predicted values into a second model in place of the unknown predictors (RCIV2). Carroll et al. (2006) and Hardin and Carroll (2003) describe methods to obtain asymptotic variances for estimated coefficients of interest in generalized linear models using “stacked estimating equations” for RCIV1 and/or RCIV2 estimators, but recommend the use of empirical (sandwich) variances due to model approximations with this approach. Thurston et al. (2003) proved that RCIV1 and RCIV2 estimators have the same asymptotic variances when the outcome follows a generalized linear model of exponential family form and when there may be multiple covariates measured with error. In this article, we determine both model-based and empirical asymptotic variances of RCIV1 estimators in linear mixed models with specific interaction terms by applying the delta method. This approach allows for true covariance structures to be retained (based on underlying models and assumptions), although in practice, we have found that approximations to covariance structures have worked adequately. Our motivating application involves a longitudinal air pollution and health study conducted on children with asthma at the National Jewish Health in Denver and reported in Rabinovitch et al. (2011) and Strand et al.(2014). In the first article, we examined the relationship between a biomarker of inflammation, leukotriene E4 (LTE4), and two instrumental variables related to pollutant exposures: urinary cotinine (related to SHS exposure) and outdoor fine particulate matter (related to personal ambient fine particulate matter exposure), plus their interaction. Using these instrumental variables plus a small number of days of measured-with-error but unbiased exposure concentrations from personal monitors (the unbiased surrogate variables), we then used RCIV in the second article to estimate the slopes of the two unobserved pollutant exposure variables, plus their interaction. Both reports demonstrated that while increases in exposures to ambient fine particulate matter and SHS each had adverse effects on health, greater dose–response relationship for one pollutant occurred when the other pollutant was at lower levels. Some preliminary analyses suggested that the health–pollutant relationships for subjects might depend on whether or not children had upper respiratory infections (URIs or “colds”). The extended models presented in this paper allowed us to examine this, and could be used to study other potential effect modifiers. In this paper, we first derive the form of regression calibration estimators for the extended RCIV1 model, which includes interactions between unobservable predictors and observable covariates, and derive asymptotic standard errors for all estimators in this model using the delta method. We then revisit the air pollution and health data to better understand the role of URIs in the health-exposure relationship. Finally, the precision and accuracy of the estimation method are assessed via simulation.

2 Methods

2.1 The models

We consider linear mixed models, including random intercepts to account for subject heterogeneity in the responses, and a serial correlation structure for within-subject repeated measures. Although fitting models does involve approximation to the covariance structures as will be discussed later, including these two features in the approximated structures has allowed us to obtain reasonably accurate inferential results for parameters of interest, as verified through Monte Carlo simulation. The continuous outcome variable Y for subject i at time j, i = 1,…,n and j = 1,…,r is expressed as a function of the unobservable predictors (X1, X2 and their interaction), covariates measured without error that are involved in interactions with at least one of the unobservable predictors (denoted as ), covariates without error that are not involved in interactions with the unobservable predictors (denoted as Z), and random terms to incorporate the correlation due to longitudinal measurement of subjects. If each Z∗ term is allowed to interact with all terms in X=(1,X1,X2,X1·X2) and we let Z=(Z1,Z2,…,Z), , then the model of interest is where is a random intercept term for subjects and . We assume that the random intercept and error terms are independent of each other and that the subjects are independent. We use the spatial power structure for , which has the form , for days t and in the study period associated with observations j and j′, respectively, for subject i. This is a generalization of the AR(1) structure and is useful when data are collected intermittently, such as in our data application. (In our application, we do not consider days without data to be missing values because data collection was designed to be intermittent.) The fixed-effect parameter vectors are defined as , , , where , for p = 1,...,m. The ⊗ symbol indicates the Kronecker product; because X and are both row vectors, the resulting product will also be a row vector with 4m elements: ( ). Let denote the entire set of 4(m + 1) + k fixed-effect parameters in model (1). In our data application, Y is natural log leukotriene E4, or ln(LTE4), and X1 and X2 are SHS and fine ambient particulate matter exposure concentrations, respectively. Both of the pollutant variables are considered in terms of fine particulate matter (technically, particulate matter less than 2.5 microns in diameter, or PM2.5), and measured in μg per m3. Our primary models included an indicator for URI, or “cold” status ( for presence, 0 for absence) as the covariate allowed to interact with X variables, and study year (Z1,…,Z4; the last year was the reference year) as covariates not involved in interactions. If not all interactions between Z∗ and X variables are required, then the model can be reformulated as where is the (nontrivial) subset of X to be included in interactions with , for p = 1,…,m and is the corresponding subset of . Instead of observing the predictors of interest, X1 and X2, we observe instrumental variables M1 and M2 that are linearly related to them: where are random intercepts for subjects, for p = 1,2, and ω1 and ω2 are error terms with . As with (1) [or (2)], we assume that the random intercept and error terms are independent (for each equation), subjects are independent, and assume the spatial power structure for . Instrumental variables in our application are cotinine (M1, measured in ng per mg of creatinine), a metabolite of nicotine that can be measured in the urine and used as a biomarker of exposure to SHS, and ambient fine particulate matter concentrations measured by a fixed outdoor monitor (M2, measured in μg per m3). We can generalize (3) so that M1 and M2 are included in both equations. Such an expanded model was considered in Strand et al. (2014). One of the difficulties of this model is that matrices associated with estimators have linear dependencies that need to be dealt with. It does not produce problems with estimators, but just complicates derivations. In this article, we will only consider (3) and not the expanded version primarily for the sake of simplifying notation, although results could be generalized to that case. In addition, the use of (3) seemed to be adequate for the data application, although the expanded version was used in the previous article to demonstrate the methods. The primary difficulty in measuring X1 and X2 is because they are mixed together when measured, along with pollution from other sources. Although we do not observe X1 and X2, in our application, there is a small amount of data for “unbiased but measured-with-error” versions of these variables, denoted as W1=X1+U1 and W2=X2+U2, respectively, where U has mean 0 and is assumed to be independent of , p = 1,2. Incorporating W1 and W2 into (3) yields where , for p = 1,2. Portions of total measured fine particulate matter attributable to SHS (W1) and ambient sources (W2) were estimated using personal monitors, chemical signatures, and methods as described in Strand et al. (2014). This approach is expensive, cumbersome, and still results in measurements with error. But as long as we can assume that the W variables are unbiased for the X variables, we can apply RCIV. In (4), let denote the error vector for subject i and p = 1,2. We use the spatial power structure for , although it is likely to just be an approximation. Letting M=(1,M1,M2,M1·M2), the health outcome model fit for RCIV1 is This is essentially model (1) with M used in place of X, that is, unobservable predictors are replaced with their instrumental counterparts. Collectively, we define as the 4(m + 1) + k fixed-effect parameters in model (5) that correspond with the parameters in for model (1). Asterisks are included on the random terms in (5) to distinguish them from those in (1). A version of (5) that involves partial interactions can also be written [analogous to how (2) relates to (1)]. Models (4) and (5) are fitted to obtain estimates that are then combined via RCIV1 to achieve the calibrated estimates, which are derived in the next section. The true covariance structure for response variables in models (4) and (5) as well as between responses in (4) and (5) based on underlying models (1) and (3) are displayed and/or discussed in Appendix A. These covariance structures are quite complicated, but our work has shown that using the “spatial power plus random intercept” structures as approximations in models (4) and (5) works reasonably well. This is true for models here as well as those studied in Strand et al. (2014). To recognize these forms as only approximations, asterisks are included on the covariance parameters to distinguish them from parameters in true underlying structures. While using these simpler covariance structures has appeared to be adequate, we also found that ignoring the repeated measures completely leads to very over-inflated variances.

2.2 Point estimation

Let W=(1,W1,W2,W1·W2). Based on models (1) and (4), and assuming ( ) and ( ) are uncorrelated, we can write E(W|M) = MC, where For our models and assumptions, In the aforementioned equations, we employ the following assumptions that are standard in deriving regression calibration estimators:W and M are surrogates for X in predicting Y, such that (see Carroll et al., 2006 for a definition of surrogacy involving distributions); ; random terms between models in (4) are uncorrelated and M is uncorrelated with U, p = 1,2, such that E(X|M) = E(W|M). For a given application, whether these assumptions are met should be considered carefully. Violations to these assumptions are a matter of degree; small violations may be inconsequential, but larger violations could be very problematic. We return to this again in the Discussion. In order to determine forms of estimators of , and , we first equate terms in the preceding mean derivation with those in the mean of (5), given M and covariates: By properties of the Kronecker product, we can re-express ), and hence, the last aforementioned equation becomes where I is an m × m identity matrix. We also note that (I⊗C)−1=(I⊗C−1). By utilizing all of these equations and replacing parameters with their maximum likelihood estimators in the mixed model fits, we obtain where indicates that all parameters in C are replaced with their estimators; this matrix is invertible as long as estimates yield a nonsingular matrix, which is expected. (e.g., and must be nonzero.) The matrices C−1 and can also easily be written in algebraic form. Often, not all interactions may be of interest. A formulation of the model and estimators for partial interactions is given in Appendix B.

2.3 Variance of

As in Strand et al. (2014), we employed the delta method to obtain the approximate asymptotic variance of . We use the term “approximate” because of the approximations of covariance structures in the model; asymptotic methods are inherently approximate. Based on the delta method, where is the covariance matrix for estimators of fixed-effect parameters in (4) and (5) and Δ is a matrix of partial derivatives with (i,j) element , where [i] denotes the i element of and [j] denotes the j element of = (θ0,θ1,π0,π1,). The diagonal elements in (7) are the covariance matrices for fixed-effect estimators in model (4) (first two diagonal matrices) and model (5) (third diagonal matrix), and the off-diagonal matrices contain covariances between fixed-effect parameter estimators between models. Specific forms of Σ and Δ based on models (1), (4), and (5) and the defined estimators are given in Appendix C. In (7), note that , , . Both model-based and empirical forms of were calculated (Appendix C). When random terms between models in (4) as well as between those in (4) and (5) are uncorrelated, Σ=0, although Σ and Σ are not necessarily 0. For model-based variances, we set all of these submatrices to 0, primarily to avoid non-convergence or non-positive definite matrix issues. For the empirical-based estimates of , we did not set off-diagonal elements to 0. One disadvantage of doing this is the same subject-day records must be used to fit different models in (4) and (5) so that off-diagonal covariances can be estimated (with our given approach). In the following are our steps to calculate model-based and empirical forms of used in the data application and simulations, followed by some important notes. Steps to calculate the model-based form of : Set off-diagonal submatrices in (7) to 0. Obtain diagonal submatrices in (7) from mixed model fits. Numerical forms of the diagonal covariance matrices are obtained easily with common statistical packages (for example, using the COVB option in ODS OUTPUT in SAS PROC MIXED or in the y.fit$varFix object in R, when using the lme function from the nlme package). Compute using forms of Δ as shown in Appendix C, replacing parameters with their estimates. Steps to calculate the empirical form of : Calculate empirical versions of submatrices in (7) using forms shown in Appendix C, where the middle “Var” and “Cov” quantities are calculated based on squared residuals or residual cross products. Compute using forms of Δ as shown in Appendix C, replacing parameters with their estimates.

3 Application

In our analyses, we consider the same data as described and analyzed in Strand et al. (2014), with the addition of the time-varying “cold” indicator variable, obtained through surveys with the children. The variables are defined in Section 2.1, with summary statistics presented in Table1. Within a study year, data collection occurred approximately from October through April. Data from the first 2 study years (2002–2004) were available to fit (4), while data from all 4years (2002–2006) were available to fit (5). Consequently, model-based variances but not empirical variances were calculated for . For ease of interpretation of parameter estimates, we mean-corrected the W variables in the fit of (4). Instrumental data were much more abundant and easier to collect than data from personal monitors, making RCIV a natural choice for analysis. Our assumed model for (1) was

Table 1

Descriptive statistics of variables involved in the analysis

Statistic	Y	W₁	W₂	M₁	M₂
Number of subjects with data available	85	64	50	85	90	86
Average number of responses per subject	24.5	11.5	11.3	24.6	392.7	59.1
Minimum of subject means	3.85	0.80	1.41	−0.75^∗	2.48	0
Average of subject means	4.49	1.51	1.92	1.67	2.67	0.18
Maximum of subject means	5.93	3.22	2.34	5.32	2.80	0.73
Average of subject SD's	0.35	0.42	0.41	0.64	0.62	0.31
SD of subject means	0.36	0.49	0.19	1.35	0.07	0.17

Y=ln(LTE4); W1=ln(SHS exposure+1); W2=ln(ambient fine particulate matter exposure + 1); M1=ln(cotinine); M2=ln(ambient fine particulate matter from fixed monitor);=upper respiratory infection (1=present, 0=absent), that is, ‘cold’. Units for pollutant exposure variables and ambient fine particulate matter measured at the fixed monitor were μg/m; units for LTE4 were pg per mg of creatinine; units for cotinine were ng per mg of creatinine (all before transformation). Results are combined across study years, Z1,…,Z4.

∗Negative values occurred due to natural log transformation.

Descriptive statistics of variables involved in the analysis Y=ln(LTE4); W1=ln(SHS exposure+1); W2=ln(ambient fine particulate matter exposure + 1); M1=ln(cotinine); M2=ln(ambient fine particulate matter from fixed monitor);=upper respiratory infection (1=present, 0=absent), that is, ‘cold’. Units for pollutant exposure variables and ambient fine particulate matter measured at the fixed monitor were μg/m; units for LTE4 were pg per mg of creatinine; units for cotinine were ng per mg of creatinine (all before transformation). Results are combined across study years, Z1,…,Z4. ∗Negative values occurred due to natural log transformation. Table2 shows results of model (4) and (5) fits, while Table3 shows estimates of the key parameters in (8) via RCIV1. (Note that year 4 was set as the reference, so .) Our data suggested that the instrumental (M) variables were only weak to moderately associated with the personal exposure data. This was one of the reasons why more data were sought to fit (4), even though unbiased surrogate data did not exist in the last 2years of the study. Figure1 shows predicted values based on calibrated estimates; the figure demonstrates that while predicted values for LTE4 generally increase as pollutants increase, there was noticeable negative interaction between pollutants for those without colds (p = 0.07); when one pollutant was lower, the dose–response relationship between LTE4 and the other pollutant was stronger. Such a pattern was not observed for those with colds (p = 0.88 for interaction). The quartiles used in Figure1 were determined based on distributions of predicted values of W1 and W2 from fits of (4). This differs from the approach taken for the figure in Strand et al. (2014), for which quartiles from distributions of observed values of W1 and W2 were used. The distributions of predicted values are less variable, and so using quartiles from these distributions leads to estimates more on the interior of the distributions of exposures rather than on the extremes.

Table 2

Estimates of parameters in (4) and (5) for the data application (SE in parentheses)

Predictor or covariance quantity	Regression of (personal SHS exposure) on M₁	Regression of (personal ambient fine particulate matter) on M₂	Regression of Y (LTE₄) on M₁,M₂, M₁·M₂
Intercept	(0.07)	(0.09)	(0.08)
Cotinine (M₁)	(0.03)		(0.03)
Ambient PM_2.5 (M₂)		(0.03)	(0.02)
M₁·M₂
Cold ()			(0.15)
			(0.08)
			(0.05)
			(0.03)
Year 1 (Z₁)^b			(0.04)
Year 2 (Z₂)^b			(0.04)
Year 3 (Z₃)^b			(0.04)
Random subject variance
Correlation
Residual variance

W variables were mean-corrected before the fit of (4). There were 458, 560 and 1438 records for analysis in the regression models for W1, W2 and Y, respectively; see the text for more detail about the variables and models. Correlations are meaningful for responses on consecutive days. Ambient fine particulate matter is denoted as ambient PM2.5.

aMean corrected before model fit.

bRelative to Year 4.

Table 3

Estimates of parameters (or linear combinations of parameters) in model (8) for ln(LTE4) based on the RCIV1 estimation method

	No cold		Cold		Difference
Model term	Estimate (SE)	P-value	Estimate (SE)	P-value	P-value
Intercept	(0.049)	<0.001	(0.06)	<0.001	0.02
SHS slope (at mean	(0.11)	0.04	(0.20)	0.18	0.75
ambient PM_2.5)
Ambient PM_2.5 slope	(0.11)	0.52	(0.30)	0.76	0.92
(at mean SHS)
Interaction	(0.48)	0.07	(1.45)	0.88	0.36
Ambient PM_2.5 slope	1.37 (0.73)	0.06	−0.25 (2.04)	0.90	0.34
(in absence of SHS)

Model-based asymptotic standard errors were derived using the delta method as described in Section 2.3, and p-values are based on Wald Z-tests assuming asymptotic normality. Because W variables were mean corrected before analysis, interpretations for SHS and ambient PM2.5 slopes (rows 2 and 3) are relevant at the mean of the other pollutant, and the intercept is the estimate of mean ln(LTE4) for the reference year (Year 4) at the mean of both pollutants. The larger SEs for “Cold” are due to fewer subject-days for these conditions, relative to “No cold”. Ambient fine particulate matter is denoted as ambient PM2.5.

Figure 1

Relationships between LTE4 and exposure to one pollutant, modified by exposure to a second pollutant, for children on (A) days without colds and (B) days with colds. Solid lines show estimated mean LTE4 as a function of secondhand smoke (SHS), by quartiles of ambient fine particulate matter. Dashed lines show estimated mean LTE4 as a function of ambient fine particulate matter, by quartiles of SHS. Quartiles were determined from distributions of predicted values from model fits of (4). Quartile symbols are Q1=25th percentile, Q2=50th percentile, Q3=75th percentile. Predicted mean values were obtained using RCIV1 methods, using estimates, as shown in Table3, and scaled for the average across years. Because the response variable was analyzed on the natural log scale, the predicted values have the form on the original scale, where c = 1.18 for the average across years. The graph illustrates that for no-cold days, increases in LTE4 per unit increase in one pollutant are steepest when the second pollutant is lower. However, when children have colds, LTE4 is higher and there is less interaction between pollutants on health. Both outcome and predictors were analyzed on the natural log scale but inverted back for presentation, resulting in curves

Estimates of parameters in (4) and (5) for the data application (SE in parentheses) W variables were mean-corrected before the fit of (4). There were 458, 560 and 1438 records for analysis in the regression models for W1, W2 and Y, respectively; see the text for more detail about the variables and models. Correlations are meaningful for responses on consecutive days. Ambient fine particulate matter is denoted as ambient PM2.5. aMean corrected before model fit. bRelative to Year 4. Estimates of parameters (or linear combinations of parameters) in model (8) for ln(LTE4) based on the RCIV1 estimation method Model-based asymptotic standard errors were derived using the delta method as described in Section 2.3, and p-values are based on Wald Z-tests assuming asymptotic normality. Because W variables were mean corrected before analysis, interpretations for SHS and ambient PM2.5 slopes (rows 2 and 3) are relevant at the mean of the other pollutant, and the intercept is the estimate of mean ln(LTE4) for the reference year (Year 4) at the mean of both pollutants. The larger SEs for “Cold” are due to fewer subject-days for these conditions, relative to “No cold”. Ambient fine particulate matter is denoted as ambient PM2.5. Relationships between LTE4 and exposure to one pollutant, modified by exposure to a second pollutant, for children on (A) days without colds and (B) days with colds. Solid lines show estimated mean LTE4 as a function of secondhand smoke (SHS), by quartiles of ambient fine particulate matter. Dashed lines show estimated mean LTE4 as a function of ambient fine particulate matter, by quartiles of SHS. Quartiles were determined from distributions of predicted values from model fits of (4). Quartile symbols are Q1=25th percentile, Q2=50th percentile, Q3=75th percentile. Predicted mean values were obtained using RCIV1 methods, using estimates, as shown in Table3, and scaled for the average across years. Because the response variable was analyzed on the natural log scale, the predicted values have the form on the original scale, where c = 1.18 for the average across years. The graph illustrates that for no-cold days, increases in LTE4 per unit increase in one pollutant are steepest when the second pollutant is lower. However, when children have colds, LTE4 is higher and there is less interaction between pollutants on health. Both outcome and predictors were analyzed on the natural log scale but inverted back for presentation, resulting in curves The estimated slope of the SHS exposure variable in predicting log LTE4—fixing ambient fine particulate matter exposure at its mean level—was approximately three times greater than for ambient fine particulate matter (fixing SHS exposure at its mean level), both for those without and with colds (Table3). Although this difference was not significant for either those with or without colds (p > 0.2), the consistency of estimates between sick and well children is notable. For those without colds, the SHS slope at a fixed level c for ambient fine particulate matter was greater than the slope of ambient fine particulate matter slope at level c of SHS with marginal significance (p = 0.047), while it was very insignificant for those with colds. (This is a different test than when comparing at the mean of the “other” pollutant because the means of the two pollutants differed.) The weaker significance of tests involving those with colds is at least in part due to fewer records under those conditions (approximately 20%). In the absence of SHS, well children had a marginally significant relationship between ambient fine particulate matter and log(LTE4), while there was no apparent relationship when children were sick (last row, Table3). The comparison of SHS slopes in absence of ambient fine particulate matter is less relevant and not included because children are generally exposed to ambient fine particulate matter, but not necessarily cigarette smoke. The increase in LTE4 per IQR increase in one pollutant at the mean of the other pollutant ranged from approximately 3 to 15% both for those with and without colds. Fixing both SHS and ambient fine particulate matter at mean values, those with colds had LTE4 values that were approximately 8% higher than those without (p = 0.02; obtained as [exp(4.41) − exp(4.33)]/exp(4.33), based on intercepts shown in Table3). This suggests that having a cold has an acute effect on LTE4 that is approximately the same as an IQR increase in a pollutant (fixing non-involved pollutants at mean values). The larger random subject variances for W1 and Y models are expected; the first occurring due to differences in smoking behaviors of those living with the study subjects, and the second due to natural subject heterogeneity in LTE4 levels. Both the random subject variance and within-subject correlation were relatively low for W2 models, but for consistency, we kept the terms in that model. Appendix D has a discussion of the predicted values for models without covariate interaction terms.

4 Simulation

Simulations were conducted in order to determine accuracy of the regression calibration estimators and to assess the coverage rates of confidence intervals. The models used were similar to those in the data application, without year indicators (i.e., no Z variables in the model). One indicator covariate ( ) was included and was allowed to interact with both X variables as well as their interaction (i.e., the “full” interaction model (1) was considered). was included to mimic the “cold” variable, with the same approximate rate of event occurrence ( for about 20% of the data). The simulations generally used parameter settings that lead to parameter estimates similar to those observed in the data application. The simulation conditions were as follows: sample sizes ranged from n = 50 to n = 500 subjects, with r = 10 to r = 20 repeated measures per subject on consecutive days. Complete records were simulated so that both model-based and empirical variance estimators could be determined. Errors and random subject intercepts were generated from normal distributions. Additional conditions (e.g., different correlation parameter values or error distribution types) were considered in Strand et al. (2014) for models without covariates. Overall, the results here are similar to those presented in Strand et al. (2014) for models without covariate interaction terms, but more complex pollutant models. The results of some of the simulations are presented in Tables4 and 5. Generally, the results indicate that the methods described in Section 2 yield confidence intervals for parameters of interest that have appropriate (nominal) coverage rate, with some slight overcoverage when using model-based variances, for smaller sample sizes.

Table 4

MSE and Bias simulation results for RCIV1

	MSE					Bias
Para-	n = 50	n = 100	n = 200	n = 100	n = 500	n = 50	n = 100	n = 200	n = 100	n = 500
meter	r = 10	r = 10	r = 10	r = 20	r = 10	r = 10	r = 10	r = 10	r = 20	r = 10
	3.12	1.83	1.20	1.04	0.87	0.04	0.07	−0.003	0.003	−0.005
	2.08	1.26	0.80	0.69	0.58	−0.04	−0.05	0.01	0.001	0.005
	1.64	0.97	0.64	0.57	0.46	−0.02	−0.04	0.0003	−0.003	0.003
	1.09	0.67	0.43	0.37	0.31	0.02	0.02	−0.005	0.000	−0.003
	6.22	3.70	2.79	2.66	2.23	−0.05	−0.17	0.06	0.01	−0.08
	5.65	4.48	4.25	4.18	4.03	0.02	0.11	−0.05	−0.005	0.05
	4.35	3.42	3.25	3.20	3.07	0.02	0.09	−0.03	−0.007	0.04
	2.93	2.30	2.17	2.13	2.05	−0.01	−0.06	0.02	0.004	−0.03

One thousand replicates per condition, using normal errors; r= number of consecutive-day repeated measures per subject; n= subject sample size. Covariance and fixed-effect parameter settings were chosen so that fitted models had similar parameter estimates as those observed for the data application; for fixed-effect parameters, the following values were used:,,,,,,,(related to X variables that were not mean corrected). For more details, see Section 4.

Table 5

Simulation confidence interval coverage rates for RCIV1

		Model-based variance		Empirical variance
Conditions	Parameter set	Min	Max	Min	Max
n = 50, r = 10	β^X	96.6	96.9	95.7	96.2
		96.9	98.6	92.4	95.1
n = 100, r = 10	β^X	94.7	95.1	94.7	95.1
		95.2	96.9	94.5	95.3
n = 200, r = 10	β^X	95.5	95.9	95.0	95.7
		97.5	98.1	95.4	96.5
n = 100, r = 20	β^X	94.2	94.8	94.3	94.9
		96.1	96.7	94.0	95.3
n = 300, r = 10	β^X	94.5	95.0	94.6	95.0
		95.9	96.6	94.8	95.1
n = 500, r = 10	β^X	94.5	95.3	95.1	95.4
		95.4	95.9	94.2	95.5

One thousand replicates per condition; r= number of consecutive-day repeated measures per subject; n= subject sample size. Values are minimum (or maximum) confidence interval (CI) coverage rate among four parameters in set. For 1000 replicates, a correct method will have coverage within 1.4% of 95%, with 95% probability; CI coverages outside of this range are in bold. Other conditions are the same as mentioned in Table4.

MSE and Bias simulation results for RCIV1 One thousand replicates per condition, using normal errors; r= number of consecutive-day repeated measures per subject; n= subject sample size. Covariance and fixed-effect parameter settings were chosen so that fitted models had similar parameter estimates as those observed for the data application; for fixed-effect parameters, the following values were used:,,,,,,,(related to X variables that were not mean corrected). For more details, see Section 4. Simulation confidence interval coverage rates for RCIV1 One thousand replicates per condition; r= number of consecutive-day repeated measures per subject; n= subject sample size. Values are minimum (or maximum) confidence interval (CI) coverage rate among four parameters in set. For 1000 replicates, a correct method will have coverage within 1.4% of 95%, with 95% probability; CI coverages outside of this range are in bold. Other conditions are the same as mentioned in Table4. The properties of estimators and confidence intervals seemed to differ slightly between coefficients related to “main effects”() and interactions (). In particular, more precision was apparent for estimators of “main effect” coefficients, likely because of more data available to estimate them (subject-days with no colds); the interaction coefficients () involve comparisons between “cold” versus “no-cold” conditions. Mean bias was not significantly nonzero (for α = 0.05 with t-tests) across simulation sets. There were slight trends of bias in some simulations, although it did not appear to adversely affect confidence interval coverage rates, and the bias tended to average out between sets of parameter estimates. For example, when and had negative bias, and were positive, and vice versa. Similar results were observed for although even more pronounced (likely because of greater standard errors for those estimates). These patterns were not consistent across simulations, and together with the nonsignificant observed bias, there is no indication of systematic problems with the estimation method. The results suggest that using empirical variances may be more reliable (when possible), although model-based variances did not perform too poorly. Because of the fact that the variances are asymptotic in nature, it is not recommended that sample sizes much less than 100 subjects be used. The simulations discussed earlier and presented in the tables generated data from random terms that were independent between as well as within models. A set of simulations was also conducted using correlated random terms between models in (4) in order to determine how violating the assumption affected results, as discussed in Section 5.

5 Discussion

In our analyses, we found that while pollutants appeared to have additive effects when children were sick, there was a marginally significant interaction between pollutants when they were not sick, such that the health–pollutant relationship for one pollutant was stronger when the other pollutant was lower. This difference in interaction terms (i.e., the three-way interaction) suggests a complex relationship between exposure to pollutants and cold status of children with asthma, with respect to LTE4 levels. One possible explanation for the observed results is that when they are not sick, children are more able to maintain homeostatis by limiting LTE4 responses during co-exposures to pollutants, relative to when they are sick. However, more research is certainly warranted in order to not only test this hypothesis but also to verify the observed patterns here. The conditions and restrictions of the study should be kept in mind here, which include but are not limited to the following: fine particulate matter is only one type of air pollution that does not include toxic gases such as carbon monoxide; the study group here involves children with asthma and not the general population; relatively low levels of fine particulate matter and SHS were observed, and results cannot necessarily be extrapolated to higher, unobserved levels—ambient fine particulate matter concentrations in parts of China are often well over 10 times that observed in Denver (e.g., see Venners et al., 2003); and the toxicity of ambient fine particulate matter in Denver may differ from that in other cities because of different compositions. Specific forms of regression calibration estimators depend on exactly which terms are included in the models. Special cases for models with one predictor measured with error could be obtained easily from the results here. Methods could be extended to models with more than two predictors using the same approach given in this paper; however, notation would become burdensome. Results here and in Strand et al. (2014) suggest that regression calibration for more complicated longitudinal models with interaction terms can yield fairly accurate estimators despite approximations made to covariance structures. In both of these studies, some over-coverage was apparent for relatively small sample sizes (no larger than 100) for model-based variances estimators. Buonaccorsi et al. (2000) also found overestimated variances of regression calibration estimators associated with random effect models, for estimation without instrumental variable estimators; he also suggested modified approaches to reduce this overestimation. Whether or not these results are due to the same issue, the slight overestimation of variances of here did not appear to be large enough to consider adjustments, as long as adequate sample sizes are used. Certain assumptions were made that allowed us to derive the point estimators, and we discuss a few here with respect to our application. The non-differential error (or surragocy) assumption essentially states that the unbiased surrogate and instrumental variables do not provide any more information in predicting the health outcome beyond what the actual exposure variables and covariates do. Because cotinine is a metabolite specific to nicotine, we believe it is reasonable to assume that it will not provide more information in predicting LTE4 than actual SHS exposure. An exception to this would be the fact that cotinine may be affected by the gas portion of secondhand smoke, which could potentially influence LTE4 and is not included in estimated or actual SHS particulate exposure. (In this article, “SHS” only includes particulate.) Similarly, we believe it is reasonable to assume that outdoor fine particulate matter measured at a fixed location will not provide more information in predicting LTE4 than the direct exposures to this pollutant. The assumption that W variables are surrogates for X variables seems even more logical, although the unbiased assumption is not easily verifiable. Further examination of our data indicated the assumption that random terms between pollutant models in (4) are uncorrelated was violated to a small degree. However, additional simulations that included correlation of approximately the same magnitude as was observed had a negligible effect on the point estimates. Another assumption in our models is that the instrumental variables are not measured with error. There is some potential for error in cotinine and creatinine measurements, although it is not expected to be greater than that of the W variables. One may wonder whether regressing the health outcome directly on unbiased surrogate variables is a reasonable alternative to using regression calibration as described in this article. While easier, this approach not only has the disadvantage that the measurement error in the predictors will attenuate the estimates by introducing “classical measurement error”, but also much less data are available to fit this model. This approach yielded effects that were not close to significant (results not presented). In our application, we used a binary covariate that interacted with the pollutant variables to illustrate the methods. Because it is just a binary variable, one could also perform stratified analyses for those with colds and those without, eliminating Z∗ variables from the models. However, doing such would not allow for statistical comparisons to be made between those with and those without colds. Using our presented methods, we could also add continuous covariates into the model such as subject height. In some analyses with other data, we did find height to be an effect modifier in the pollution–health relationship. However, for these particular data, height was insignificant and did not contribute to the model and thus was not included.

15 in total

1. Correcting for measurement error in individual-level covariates in nonlinear mixed effects models.

Authors: H Ko; M Davidian
Journal: Biometrics Date: 2000-06 Impact factor: 2.571

2. Estimation of magnitude in gene-environment interactions in the presence of measurement error.

Authors: M Y Wong; N E Day; J A Luan; N J Wareham
Journal: Stat Med Date: 2004-03-30 Impact factor: 2.373

3. A study of health effect estimates using competing methods to model personal exposures to ambient PM2.5.

Authors: Matthew Strand; Philip K Hopke; Weixiang Zhao; Sverre Vedal; Erwin Gelfand; Nathan Rabinovitch
Journal: J Expo Sci Environ Epidemiol Date: 2007-05-16 Impact factor: 5.563

4. Acute effects of ambient inhalable particles in asthmatic and nonasthmatic children.

Authors: S Vedal; J Petkau; R White; J Blair
Journal: Am J Respir Crit Care Med Date: 1998-04 Impact factor: 21.405

5. Air pollution and exacerbation of asthma in African-American children in Los Angeles.

Authors: B Ostro; M Lipsett; J Mann; H Braxton-Owens; M White
Journal: Epidemiology Date: 2001-03 Impact factor: 4.822

6. The response of children with asthma to ambient particulate is modified by tobacco smoke exposure.

Authors: Nathan Rabinovitch; Lori Silveira; Erwin W Gelfand; Matthew Strand
Journal: Am J Respir Crit Care Med Date: 2011-08-25 Impact factor: 21.405

7. Regression calibration for models with two predictor variables measured with error and their interaction, using instrumental variables and longitudinal data.

Authors: Matthew Strand; Stefan Sillau; Gary K Grunwald; Nathan Rabinovitch
Journal: Stat Med Date: 2013-07-30 Impact factor: 2.373

8. Particulate levels are associated with early asthma worsening in children with persistent disease.

Authors: Nathan Rabinovitch; Matthew Strand; Erwin W Gelfand
Journal: Am J Respir Crit Care Med Date: 2006-02-16 Impact factor: 21.405

9. Particulate matter, sulfur dioxide, and daily mortality in Chongqing, China.

Authors: Scott A Venners; Binyan Wang; Zhonggui Xu; Yu Schlatter; Lihua Wang; Xiping Xu
Journal: Environ Health Perspect Date: 2003-04 Impact factor: 9.031

10. Particle concentrations in inner-city homes of children with asthma: the effect of smoking, cooking, and outdoor pollution.

Authors: Lance A Wallace; Herman Mitchell; George T O'Connor; Lucas Neas; Morton Lippmann; Meyer Kattan; Jane Koenig; James W Stout; Ben J Vaughn; Dennis Wallace; Michelle Walter; Ken Adams; Lee-Jane Sally Liu
Journal: Environ Health Perspect Date: 2003-07 Impact factor: 9.031

1 in total

Review 1. Incorporating Measurement Error from Modeled Air Pollution Exposures into Epidemiological Analyses.

Authors: Evangelia Samoli; Barbara K Butland
Journal: Curr Environ Health Rep Date: 2017-12

1 in total