Literature DB >> 33716387

Design-based single-mediator approach for complex survey data.

Thanh Pham1, Trung Ha1, Julia N Soulakova1.   

Abstract

We discuss a two-step approach to test for a mediated effect using data gathered via complex sampling. The approach incorporates design-based multiple linear regressions and a generalized Sobel's method to test for significance of a mediated effect. We illustrate the applications to a study of nicotine dependence, race/ethnicity and cigarette purchase price among daily smokers in the U.S. The study goal was to assess significance of cigarette purchase price as a mediator in the association between race/ethnicity (non-Hispanic Black/African American, non-Hispanic White) and nicotine dependence measured in terms of the average number of cigarettes smoked per day. The single-mediator model incorporated 18 covariates as control factors. The results indicated a significant mediated effect of cigarette purchase price on the association. However, the relative effect size of 5% indicated low practical significance of the cigarette purchase price as a mediator in the association between race/ethnicity and nicotine dependence. The approach can be modified to studies where data are gathered via other types of complex sampling.

Entities:  

Keywords:  Balanced repeated replications; Multi-stage sampling; Variance estimation in complex surveys

Year:  2019        PMID: 33716387      PMCID: PMC7954225          DOI: 10.1080/03610918.2019.1568472

Source DB:  PubMed          Journal:  Commun Stat Simul Comput        ISSN: 0361-0918            Impact factor:   1.118


Introduction

Many national databases of health outcomes are publicly available for secondary data analysis. For example, data from the Current Population Survey (CPS) and CPS Supplements are commonly used to obtain information on labor force, income, and education in the US (Black, Sanders, and Taylor 2003; Burkhauser, Feng, Jenkins, and Larrimore 2011; U.S. Department of Commerce, Census Bureau 2016). The CPS Tobacco Use Supplement (TUS-CPS) is a survey of use of cigarettes and other tobacco in the U.S. and is administered approximately every 3–4 years. Because the TUS-CPS utilizes complex sampling, researchers should follow the methodological guidelines when analyzing the data (U.S. Department of Commerce, Census Bureau 2016). Specifically, all point estimates should be based on the main weight and variance of the estimates should be computed via the balanced repeated replications (BRR) using replicate weights. The weights are computed and posted with the corresponding data files online; these files are hosted by the U.S. Census Bureau (U.S. Department of Commerce, Census Bureau 2016). Mediation analysis, commonly utilized in social sciences, allows scientists to test if one variable has an effect on another variable through the third variable (Baron and Kenny 1986; MacKinnon 2008). The traditional mediation analysis was proposed for a simple random sample and is not appropriate for analysis of the TUS-CPS and other complex surveys. We propose a generalization of the mediation methodology that can be used for assessing significance of the mediated effect using the TUS-CPS measures. The remainder of the paper is outlined as follows. In Sec. 2, we review the single-mediator model for a simple random sample. In Sec. 3, we describe the procedure for complex sampling to test for the mediated effect; we use the TUS-CPS data as an example. In Sec. 4, we illustrate the application of the procedure to a nicotine dependence study. We conclude with a discussion presented in Sec. 5.

Single-mediator model

Consider a binary or continuous independent variable X, a continuous dependent variable Y, a continuous mediator M, and I binary or continuous covariates Z, i = 1, 2, …, I. Suppose we have a simple random sample of K individuals. Then the single-mediator model (Baron and Kenny 1986; Sobel 1982) can be expressed as follows: where k = 1, 2, …, K; α (j = 1, 2) denotes the regression intercept; β, γ, and δ (i = 1, 2, …, I; j = 1, 2) represent the regression slopes; ε1 and ε2 (k = 1, …, K) are the residuals that are independent, , where denotes unknown (constant) variance (j = 1, 2). In the single-mediator model (1), β1 represents the effect of X on M, β2 represents the direct effect of X on Y, γ represents the effect of M on Y, and β1 ∙ γ represents the mediated (indirect) effect of X on Y. The total effect of X on Y is represented by the sum of the direct effect (β2) and mediated effect (β1 ∙ γ). Figure 1 illustrates the model (1).
Figure 1.

Single-mediator model with I covariates.

To assess significance of the mediated effect we can use the “product of coefficients” approach (MacKinnon, Lockwood, and Williams 2004). Specifically, the null hypothesis H0 : β1 ∙ γ = 0 is tested against the alternative hypothesis H : β1 ∙ γ ≠ 0 via Sobel’s test (Sobel 1982) based on the test statistic where and denote the least squares estimates for β1 and γ, respectively, and the standard error (SE) is given via The test rejects H0 in favor of H at significance level α if Z > zα/2 or Z < −zα/2, where zα/2 is such that for Z0 ~ N(0, 1). If the estimated indirect effect and direct effect are both positive or negative, one can assess the magnitude of the mediated effect using the relative effect size (MacKinnon 2008; Preacher and Kelley 2011): This descriptive measure represents the practical importance of the mediated effect. Because estimates the indirect effect and estimates the total effect, the relative effect size can be interpreted as the proportion (or percentage) of the effect of the independent variable on the dependent variable explained by the mediator (MacKinnon 2008). This is why the relative effect size is also termed the proportion mediated effect (MacKinnon 2008; Preacher and Kelley 2011). However, the relative effect size given in (4) should not be used if the estimated effects, and , have opposite signs(Mackinnon 2008, 83). In the latter case, analogs of this measure should be used, e.g., the estimated coefficients in (4) are replaced by their absolute values (Alwin and Hauser 1975; MacKinnon 2008).

Single-mediator analysis of complex survey data

To incorporate correct adjustments for the survey design used to gather the TUS-CPS data, we propose the following two-step procedure. In the first step, we fit the design-based regression models given in (1) using the survey data. These design-based models should incorporate proper adjustments for the specific design characteristics. Specifically, when analyzing the 2010-11 and 2014-15 TUS-CPS data, we need to use the BRR method with 160 replicate weights to compute the standard errors of estimated model coefficients (U.S. Department of Commerce, Census Bureau 2016; Wolter 2007) as follows. Suppose θ denotes the parameter of interest, is the estimator of θ based on the main weight, and (r = 1, …, 160) is the estimator of θ based on the r − th replicate weight. Then the BRR approach computes the standard error of via: The main weight and replicate weights can be used directly in the SAS SURVEYREG procedure in the SAS® 9.4 Survey Package (SAS Institute Inc. 2013) when fitting the model. Upon completing this step, we have the estimated values of the design-based regression coefficients, , , and , as well as the standard errors and . In the second step, we compute the generalized Sobel’s test statistic using the estimates derived in step 1 via Then we perform testing using a rejection region similar to the one specified in Sec. 2. In addition, (if appropriate) we can compute the relative effect size using the estimates obtained in step 1 and formula (4).

Applications to a study of smoking behavior

To illustrate the proposed procedure, we performed a study of nicotine dependence among daily smokers. The goal was to evaluate the significance of cigarette purchase price as a mediator in the association between race/ethnicity and nicotine dependence among U.S. daily smokers (during the period from 2010 to 2015). The dependent variable was the nicotine dependence measured as the average number of cigarettes smoked per day. The cigarette purchase price (per pack) referred to the last self-purchase. We considered two non-Hispanic racial/ethnic groups of daily smokers: White and Black/African American. Thus, we considered the single-mediator model (1) with M = Cigarette Purchase Price per Pack, Y = Nicotine Dependence (Average Number of Cigarettes Smoked per Day), X = Race/Ethnicity (non-Hispanic White, non-Hispanic Black/African American). Considering race/ethnicity as an independent variable in the model was motivated by the following research findings. First, there are racial/ethnic differences in cigarette purchasing prices. Specifically, among diverse racial/ethnic populations in the U.S., non-Hispanic (NH) American Indian/Alaska Native (AIAN) and NH White adult smokers purchase cigarettes, on average, at lower prices than the other adult smokers (Golden, Kong, and Ribisl 2016). Analyses controlling for additional factors related to consumer behaviors resulted in less pronounced differences in average prices but also indicated that NH AIAN adult smokers, on average, paid $0.38 more per pack than did NH White adult smokers(Golden et al. 2016). These discrepancies could be explained in part by different consumer behaviors. For example, purchasing cigarettes on Indian reservations is associated with lower purchase prices (DeCicca, Kenkel, and Liu 2015; National Research Council 2015; Wang et al. 2017), and the rate of purchasing cigarettes on Indian reservations is significantly higher for NH AIAN relative to NH White daily smokers, and NH White relative to NH Black/African American daily smokers (Soulakova, Pack, and Ha 2018). Second, the levels of nicotine dependence differ across race/ethnicity among daily smokers (Soulakova and Danczak 2017). Specifically, heavy smoking (16+ cigarettes per day) was most prevalent in NH White, NH AIAN and NH Multiracial daily smokers. Smoking within 30 minutes from awakening was most prevalent in NH White, NH Black, NH AIAN and NH Multiracial daily smokers, and night-smoking was most prevalent in NH Black, NH AIAN and NH Multiracial daily smokers. NH Hawaiian/Pacific Islander and Hispanic daily smokers had consistently lower rates for all three nicotine dependence measures (Soulakova and Danczak 2017). Table 1 presents the set of considered covariates. Because some of these covariates are categorical with more than two levels, we fitted the design-based regression models (1) with 18 binary covariates (I = 18) using the pooled 2010–2011 and 2014–2015 TUS-CPS data. The sample of daily smokers (n = 30,777) was representative of about 20,261,285 daily smokers in the population. The cohort was 89.4% (27,507) non-Hispanic White and 10.6% (3,270) non-Hispanic Black/African American. The daily smokers, on average, smoked 16 cigarettes per day (SE = 8.2) and paid $5.15 per pack of cigarettes during their last cigarette purchase (SE=$1.69). Table 1 presents the summary statistics for the factors included as covariates in the models.
Table 1.

Sample summary statistics for factors considered as covariates; 2010–2011 and 2014–2015 tobacco use supplement to the current population survey.

CharacteristicSample countPercent (%)*
Age group
  18–242,14911.6
  25–4411,50337.4
  45–6413,66041.7
  65+3,4659.4
Sex
  Male14,90552.2
  Female15,87248.8
Highest level of education
  Less than high school4,75616.0
  High school (or equivalent)13,07042.3
  Some college or a bachelor’s degree12,20339.4
  Graduate degree (or equivalent)7482.3
Employment record
  Employed17,84458.6
  Unemployed2,6309.6
  Not in labor force10,30331.8
Marital status
  Married (spouse is present or absent)12,51139.0
  Widowed, divorced, or separated10,17430.6
  Never married8,09230.4
Region of residency
  Northeast5,42016.5
  Midwest8,64127.9
  South11,68842.2
  West5,02813.4
Area of residency
  Metropolitan21,76877.7
  Non-metropolitan9,00922.3
Residing in a state with an Indian Reservation
  No10,43332.2
  Yes20,34467.8
Survey mode
  Phone17,39556.5
  In-person13,38243.5
Survey year
  2010–201116,88454.1
  2014–201513,89345.9

All percentages except for the survey mode are based on the 2010–2011 and 2014–2015 population counts.

The significance level was 5%. All computing was performed using SAS/STAT®9.4 (SAS Institute Inc. 2017). Specifically, we used PROC SURVEYFREQ, PROC SURVEYMEANS, and PROC SURVEYREG with the BRR option (with Fay correction) and the main and replicate weights. In addition, we constructed the 95% confidence interval based on the standard normal distribution for the mediated effect β1 ∙ γ. The model for the mean cigarette purchase price per pack (the mediator) was significant (R2 ≈ 24%, F(19, 160)≈191, p < 0.0001); the intercept and all covariates except for sex and survey mode were significant (p’s < 0.0001). The model for the nicotine dependence (the dependent variable) was also significant (R2 ≈ 13%, F(20, 160) ≈ 164, p < 0.0001); the intercept and all covariates were significant (p’s < 0.0300). Table 2 presents the results for each step of the procedure. As is shown, the generalized Sobel’s test statistic had a value of −9.57 (which was in the rejection region), indicating significant mediated effect of cigarette purchase price (p < 0.0001). The corresponding 95% confidence interval for the mediated effect was (−0.28, −0.19), which also illustrates that the effect is significantly different from zero, Therefore, the association between daily smoker’s race/ethnicity and nicotine dependence is mediated by the cigarette purchase price. Because and were both negative, we also computed the relative effect size. However, the relative effect size was only 0.05, indicating low practical importance of the cigarette purchase price as a mediator in the association between race/ethnicity and nicotine dependence.
Table 2.

Testing for the Mediated Effect: Results for Each Step of the Procedure.

Estimated quantityDescriptionEstimateStandard error
Step 1: Estimating the design-based model coefficients
β^1Estimated effect of race/ethnicity on cigarette purchase price  0.580.04
γ^Estimated effect of cigarette purchase price on the average number of cigarettes smoked per day−0.410.03
β^2Estimated effect of race/ethnicity (non-Hispanic Black/African American versus non-Hispanic White) on the average number of cigarettes smoked per day−4.730.15
Step 2: Testing for the mediated effect
β^1γ^Mediated effect−0.240.02
ZGValue of the generalized Sobel’s test statistic−9.57
Additional Step: Calculating the relative effect size
β^1γ^β^2+β^1γ^Relative effect size  0.05
We note that it is important to correctly adjust for the TUS-CPS design specifics. Indeed, if one ignored all survey weights and incorrectly treated the sampling strategy as simple random sampling, then the confidence interval for the mediated effect would be (−0.35, −0.25). While both approaches result in a significant finding, the latter interval would (incorrectly) suggest that the mediated effect is larger (in absolute value). In addition, if one used the main survey weight only (ignoring the replicate weights) and estimated variance using Taylor’s linearization, then after rounding to hundredths, the resulting confidence interval would be the same as the one based on the BRR approach, i.e., (−0.28, −0.19). However, this method cannot be recommended in general, because in other cases this method and the correct one (based on the BRR) could result in discrepant findings (Ha and Soulakova 2018).

Conclusion

In this paper, we illustrated the applications of a single-mediator model for analysis of the TUS-CPS data. However, the approach can be easily modified to handle other types of designs; these adjustments should be incorporated when computing the design-based model coefficients in step 1 and the standard error in step 2. The approach has several limitations. The main limitation is that although the analytical results can be used to inform scientists regarding population-wide characteristics and behaviors, and can be used in future research studies, the “observational nature” of the data prohibits making any definite claims. Therefore, no causal inferences can be made. In addition, while independent regressions described in this paper are commonly used to test for a mediated effect (Hayes 2017; Hill, Burdette, and Hale 2009; Parmelee, Harralson, Smith, and Schumacher 2007; Rutchick, Smyth, Lopoo, and Dusek 2009; Yang, Du, Qu, Gong, and Sun 2013), this approach ignores dependence between the mediator and the dependent variable. In addition, when testing for the mediated effect, we assumed that the generalized Sobel’s test statistic follows standard normal distribution under the null hypothesis. However, this assumption might be violated in practice. This is a concern especially in studies with a small sample size. Moreover, the probability coverage of confidence intervals (based on the Sobel’s standard error) could exceed the nominal confidence level even for large samples, leading to an over-conservative test (MacKinnon, Warsi, and Dwyer 1995). In these instances, alternative methods such as the confidence intervals based on the distribution of the product or resampling have been recommended (MacKinnon et al. 2004). In the considered study of nicotine dependence among daily smokers, we detected a significant mediated effect of cigarette purchase price on the association between race/ethnicity and nicotine dependence. Specifically, non-Hispanic Black/African American daily smokers, on average, were less nicotine dependent than were non-Hispanic White daily smokers. This association was mediated by cigarette purchase price, but the magnitude of the effect was relatively low. Additional findings were (1) non-Hispanic Black/African American daily smokers pay more, on average, for a pack of cigarettes than do non-Hispanic White daily smokers, and (2) the higher cigarette purchase price is associated with lower nicotine dependence. The study of nicotine dependence also has some limitations. First, we used TUS-CPS self-reports; thus, daily smokers were also identified using self-reported current smoking status. Therefore, there could be some misrepresentation of the target population (U.S. daily smokers) in the study. Nonetheless, given that TUS-CPS self-reported smoking information is generally accurate (Soulakova and Crockett 2014; Soulakova, Hartman, Liu, Willis, and Augustine 2012), we anticipate this discrepancy to be negligible. In addition, the surveyed average number of cigarettes smoked per day was truncated at 40 cigarettes for all smokers who indicated smoking more than 40 cigarettes per day. In our cohort, 40 cigarettes per day was observed for 1,076 (3.5%) daily smokers. Therefore, the average number of 16 cigarettes (per day) reported in the study may be a (slight) under-estimate of the true average number of cigarettes smoked per day among daily smokers. The cigarette purchase price used in the study was defined using reports of price paid when last purchased pack or carton of cigarettes. Thus, the measure refers to the average (or actual) price per pack if a carton (or a pack) was purchased. An additional study limitation is that we did not conduct a sensitivity analysis (Imai, Keele, and Yamamoto 2010; Mackinnon 2008, Section 15.7). Future research can be targeted toward adapting mediation methodology to complex survey data. For example, applications of the causal steps approach (Baron and Kenny 1986), difference of coefficients approach (Freedman and Schatzkin 1992; MacKinnon et al. 2004), and resampling approach based on the empirical distribution (MacKinnon et al. 2004) have not yet been addressed for complex sampling. Moreover, methods for complex survey data with a categorical dependent variable and/or mediator, multi-mediator problems, and problems with a mediator-predictor interaction (VanderWeele 2016; Wang, Nelson, and Albert 2013) have yet to be developed.
  16 in total

1.  Sample size for studying intermediate endpoints within intervention trails or observational studies.

Authors:  L S Freedman; A Schatzkin
Journal:  Am J Epidemiol       Date:  1992-11-01       Impact factor: 4.897

Review 2.  Mediation Analysis: A Practitioner's Guide.

Authors:  Tyler J VanderWeele
Journal:  Annu Rev Public Health       Date:  2015-11-30       Impact factor: 21.981

3.  Reliability of adult self-reported smoking history: data from the tobacco use supplement to the current population survey 2002-2003 cohort.

Authors:  Julia N Soulakova; Anne M Hartman; Benmei Liu; Gordon B Willis; Steve Augustine
Journal:  Nicotine Tob Res       Date:  2012-02-07       Impact factor: 4.244

4.  Estimation of causal mediation effects for a dichotomous outcome in multiple-mediator models using the mediation formula.

Authors:  Wei Wang; Suchitra Nelson; Jeffrey M Albert
Journal:  Stat Med       Date:  2013-05-06       Impact factor: 2.373

5.  A Simulation Study of Mediated Effect Measures.

Authors:  David P Mackinnon; Ghulam Warsi; James H Dwyer
Journal:  Multivariate Behav Res       Date:  1995-01-01       Impact factor: 5.923

6.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations.

Authors:  R M Baron; D A Kenny
Journal:  J Pers Soc Psychol       Date:  1986-12

7.  Racial and Ethnic Differences in What Smokers Report Paying for Their Cigarettes.

Authors:  Shelley D Golden; Amanda Y Kong; Kurt M Ribisl
Journal:  Nicotine Tob Res       Date:  2016-02-13       Impact factor: 4.244

8.  Necessary and discretionary activities in knee osteoarthritis: do they mediate the pain-depression relationship?

Authors:  Patricia A Parmelee; Tina L Harralson; Lori A Smith; H Ralph Schumacher
Journal:  Pain Med       Date:  2007 Jul-Aug       Impact factor: 3.750

9.  Impact of Menthol Smoking on Nicotine Dependence for Diverse Racial/Ethnic Groups of Daily Smokers.

Authors:  Julia N Soulakova; Ryan R Danczak
Journal:  Healthcare (Basel)       Date:  2017-01-11

10.  Patterns and correlates of purchasing cigarettes on Indian reservations among daily smokers in the United States.

Authors:  Julia N Soulakova; Richard Pack; Trung Ha
Journal:  Drug Alcohol Depend       Date:  2018-09-13       Impact factor: 4.492

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.