Literature DB >> 35025942

Does weight mediate the effect of smoking on coronary heart disease? Parametric mediational g-formula analysis.

Yaser Mokhayeri¹, Maryam Nazemipour², Mohammad Ali Mansournia³, Ashley I Naimi⁴, Jay S Kaufman⁵.

Abstract

BACKGROUND: In settings in which there are time-varying confounders affected by previous exposure and a time-varying mediator, natural direct and indirect effects cannot generally be estimated unbiasedly. In the present study, we estimate interventional direct effect and interventional indirect effect of cigarette smoking as a time-varying exposure on coronary heart disease while considering body weight as a time-varying mediator.
METHODS: To address this problem, the parametric mediational g-formula was proposed to estimate interventional direct effect and interventional indirect effect. We used data from the Multi-Ethnic Study of Atherosclerosis to estimate effect of cigarette smoking on coronary heart disease, considering body weight as time-varying mediator.
RESULTS: Over a 11-years period, smoking 20 cigarettes per day compared to no smoking directly (not through weight) increased risk of coronary heart disease by an absolute difference of 1.91% (95% CI: 0.49%, 4.14%), and indirectly decreased coronary heart disease risk by -0.02% (95% CI: -0.05%, 0.04%) via change in weight. The total effect was estimated as an absolute 1.89% increase (95% CI: 0.49%, 4.13%).
CONCLUSION: The overall absolute impact of smoking to incident coronary heart disease is modest, and we did not discern any important contribution to this effect relayed through changes to bodyweight. In fact, changes in weight because of smoking have no meaningful mediating effect on CHD risk.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35025942 PMCID： PMC8757910 DOI： 10.1371/journal.pone.0262403

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Coronary heart disease (CHD), also known as coronary artery disease and ischemic heart disease, is the leading cause of death in the world [1]. Although, mean percentage change in number of Years of Life Lost (YLL) from 2007 to 2017 due to CHD has increased by 17.3%, mean percentage change in age-standardized YLL rate has decreased by -9.8% in the same period [2]. Exposure to smoking between 1990 and 2015 declined worldwide by 25%; nevertheless, it still ranked among the leading five risk factors for attributable disability-adjusted life year (DALYs). In high-income countries such as the US, Canada, and the UK, smoking is considered the most important risk factor for attributable DALYs for both sexes [3]. LDL cholesterol, HDL cholesterol, triglyceride, body mass index, glucose, and blood pressure may mediate the relationship between smoking and CHD [4]. In smokers compared to nonsmokers, body mass indices are lower [5-7]. Chen et al. (2019), reported that smoking cessation could reduce 10-year risk of CHD, however, gaining weight following smoking cessation could conceal to a small extent the beneficial effect of the quitting [8]. Tamura et al. (2010) concluded that in spite of gaining weight after smoking cessation, the total risk of CHD was decreased in men [9]. Luo et al. (2013) according to their study on women, indicated that the relationship between smoking cessation and incidence of CHD could be weakened by gaining weight [10]. None of these studies appropriately adjusted for time-varying confounders using cohort data with repeated exposure and mediators [11-13]. G-methods (generalized methods) as causal methods could appropriately adjust for time-varying confounders affected by the prior exposure [14-18]. In frameworks with longitudinal data, intermediate confounder could not be adjusted using standard analytic approaches [19]. The intermediate confounder, which is likely not rare in mediation analysis, is both a mediator-outcome confounder and a variable in the causal direct path [19]. Moreover, to identify natural direct effect (NDE) and natural indirect effect (NIE) four assumptions are needed namely 1- no unmeasured exposure-outcome confounder 2- no- unmeasured mediator-outcome confounder 3- no unmeasured exposure-mediator confounder, and 4- no mediator-outcome confounder affected by the prior exposure [20]. In the presence of intermediate confounder, the fourth assumption would be violated, consequently natural direct and indirect effect are not identified. Addressing partially this problem, VanderWeele and Tchetgen Tchetgen using the mediational g-formula sought to find a solution and proposed the pure interventional direct effect (IDE) also known as a random interventional analogue of the natural direct effect and total interventional indirect effect (IIE) [21]. As VanderWeels and Tchetgen Tcheten’s method was semiparametric, Lin et al developed a fully parametric mediational g-formula [22]. In the present study, we estimate IDE and IIE of cigarette smoking—as a time-varying exposure—on CHD while considering weight as both a time-varying mediator of past cigarette smoking, and a time-varying confounder of future cigarette smoking, using the parametric mediational g-formula to control for time-varying confounding. We estimate the direct (not through weight) and indirect (through weight) effects in data obtained from the Multi-Ethnic Study of Atherosclerosis (MESA).

Methods

Study population

We used the Multi-Ethnic Study of Atherosclerosis (MESA) data, a community-based cohort of adults, started in 2000 on 6814 men and women aged 45 to 84 years. Some additional details of the study have been presented elsewhere [23]. Five participants were excluded from our analysis because they already had prevalent CVD diagnoses at baseline. At the baseline, there were 6809 participants. It should be noted, since mGFORMULA SAS macro does not handle missing data, the final analysis limited to those participants without death or loss to follow up during the period of study (4433 participants). The participants were followed for more than eleven years and five visits. The first visit was scheduled for July 15, 2000 through July 14, 2002 (24 months); the second visit, scheduled for July 15, 2002 through January 14, 2004 (18 months); the third visit, scheduled for January 15, 2004 through July 14, 2005 (18 months); the fourth visit, scheduled for July 15, 2005 through July 14, 2007 (24 months); and the fifth visit, scheduled for April 1, 2010 through September 30, 2011 (18 months).

Exposure, mediator, outcome, and confounders

Average number of cigarettes smoked per day—as the exposure—was self-reported. The data were obtained via questionnaire. Participants were asked if they smoked during last 30 days, those who replied affirmatively were asked for report the average number of cigarettes smoked per day. In this study, we aimed to estimate the causal effect of ‘had everyone been a smoker/smoked 20 cigarettes per day’ vs. ‘had everyone been a non-smoker/smoked 0 cigarettes per day’, over the course of follow-up, on incident CHD or death from CHD. We defined weight in kilogram (kg) as the mediator. Weight was assessed using a standard weighing scale to the nearest 0.5 kg. CHD events as the outcome included myocardial infarction, resuscitated cardiac arrest, definite angina, probable angina (if followed by revascularization), and CHD death. Questionnaires, interviews, and medical records were used to obtain the date of CHD to the nearest day. Time-varying covariates measured at all visits including intentional physical activity, total cholesterol, hypertension, hypertension medication, and current aspirin use entered in the models as potential confounders. Additionally, we also adjusted for baseline age, sex, race/ethnicity (White, Asian, Hispanic, and African-American), alcohol consumption (drinks per week), and diet score [24], baseline smoking (never, former, and current smoker), annual family income, education level. A standard and validated questionnaire was used to obtain data on recreational physical activity. Valid test kits were used to obtain data on total cholesterol. An appropriately sized cuff was used to obtain resting blood pressure from the right arm after five minutes in the seated position. Three readings were taken; the average of the second and third readings were used as the blood pressure levels in the study.

Statistical analysis

If we describe mediation as a two-stage process: 1- M-stage which denotes the process that cause the mediator and 2- Y-stage, a process that cause the outcome; hence, four potential outcomes would be obtained. Y0M0 is the outcome that would be observed if an individual were unexposed and had a mediator if she/he would have if unexposed (Y0M0 = Y0). Y1M1 is the outcome that would be observed if an individual were exposed and had a mediator if she/he would have if exposed (Y1M1 = Y1). Y1M0 is the outcome that would be observed if an individual were exposed but had a mediator if she/he would have if unexposed. Y0M1 is the outcome that would be observed if an individual were unexposed but had a mediator if she/he would have if exposed. Using the parametric mediational g-formula, the M-stage is a random draw form the distribution of the mediator among those with exposure status A = a. Therefore, it would have fixed the mediator to the level that is randomly chosen from the distribution of the mediator among those with exposure status. Mediational g-formula is related to both Robins’ regular g-formula [12] and Pearl’s mediation formula [25]. In fact, in the absence of mediation it reduces to g-formula and in the absence of time-varying confounders reduces to mediation formula. In defining outcomes this way, we are able to quantify outcome risks under several different exposure and mediator scenarios such that IDE and IIE could be estimated. We estimated (ā1, ā2) for each (ā1, ā2) = (ā, ā), (ā, ā), (ā, ā), (ā, ā). These measures indicate the simulated risk of CHD. This way, pure IDE, total IIE, and interventional analogue of total effect (ITE) can be defined as Eqs 1, 2, and 3, respectively. Fig 1 is a causal directed acyclic graph [26-32] depicting the relationship between our time-varying exposure (cigarette smoking), time-varying mediator (weight), time-varying confounders, and CHD. V denotes baseline confounders (e.g. age); A represents the time-varying exposure (cigarette smoking); M denotes time-varying mediator (weight); L represents the time-varying confounders (e.g. physical activity) which are affected by the prior exposures; and Y corresponds to the binary outcome (CHD). Subscripts of 0 and 1 correspond to visits 0 and 1 of the study, respectively (to simplify the graph, just 2 visits were depicted).

Fig 1

Causal diagram depicting the effect of time-varying exposure (smoking) on the outcome (CHD) in the presence of time-varying mediator (weight).

V, A, M, L, and Y stand for baseline confounders, time-varying exposure, time-varying mediator, time-varying confounders, and the outcome, respectively.

Causal diagram depicting the effect of time-varying exposure (smoking) on the outcome (CHD) in the presence of time-varying mediator (weight).

V, A, M, L, and Y stand for baseline confounders, time-varying exposure, time-varying mediator, time-varying confounders, and the outcome, respectively. We used Monte-Carlo estimation [33] to calculate the point estimates, and the nonparametric bootstrap with 500 resamples to obtain 95% confidence intervals. We used mGFORMULA SAS macro (S1 Appendix).

Results

After 51,487 person-years of follow-up (median duration of follow-up 8.45 and IQR of 1.01 years) 388 new CHD cases occurred. Of 6809 eligible participants 401 (5.9%) died from causes other than CHD. Moreover, 1987 (29%) of participants were lost to follow-up during the study. Final sample included in the analysis is 4433 participants. Baseline characteristic of the 6,809 eligible participants are illustrated in Table 1. The smokers compared to quitters and nonsmokers were more likely to drink more alcohol, to have less intentional physical activity, to have less education levels, to have less family income, and to have less prevalence of hypertension and hypertension medication. The quitters compared to smokers and nonsmokers were likely to be men and to have more weight. Complete case analysis was performed, as the proportion of covariate missing data was low: For hypertension 2.2%, physical activity 1.5%, total cholesterol 3.1%, hypertension medication 3.2%, and current aspirin 1.6%.

Table 1

Baseline characteristics of eligible participants in the Multi-Ethnic Study of Atherosclerosis, United States, 2000–2011.

Characteristics	Smoker (n = 890)	Quitter (n = 2493)	Nonsmoker (n = 3426)
Age, y, mean (SD)	58.15 (9.15)	63.48 (9.83)	62.21 (10.52)
Race/ethnicity, n (%)
Caucasian	302 (33.93)	1158 (46.45)	1159 (33.83)
Asian	45 (5.06)	153 (6.14)	606 (17.69)
African-American	340 (38.20)	697 (27.96)	854 (24.93)
Hispanic	203 (22.81)	485 (19.45)	807 (23.56)
Male, n (%)	468 (52.58)	1440 (57.76)	1302 (38)
Diet score, mean (SD)	2.50 (5.10)	4.64 (4.77)	4.61 (4.55)
Alcohol consumption, drinks/week, mean (SD)	6.40 (11.32)	5.81 (9.82)	2.11 (5.81)
Education, n (%)
≤ high school	350 (39.46)	797 (32.09)	1314 (38.48)
college–associate degree	329 (37.09)	761 (30.64)	843 (24.69)
≥ bachelor degree	208 (23.45)	926 (37.28)	1258 (36.84)
Annual family income, n (%)
< $25000	275 (32.51)	666 (27.91)	1115 (33.75)
$25,000-$50,000	277 (32.74)	677 (28.37)	938 (28.39)
> $50,000	294 (34.75)	1043 (43.71)	1251 (37.86)
Hypertension, n (%)	333 (37.42)	1178 (47.25)	1545 (45.10)
Hypertension medication, n (%)	265 (29.78)	979 (39.27)	1291 (37.68)
Current aspirin use, n (%)	143 (16.07)	601 (24.11)	651 (19)
Total cholesterol, mmol/l, mean (SD)	4.98 (1.01)	4.98 (0.90)	5.06 (0.91)
Intentional physical activity, min/week, mean (SD)	351.93 (557.14)	385.23 (547.45)	354.64 (503.80)
Weight, kg, mean (SD)	79.58 (16.92)	81.81 (17.06)	75.99 (16.93)

SD: standard deviation.

SD: standard deviation. The results of the standard parametric g-formula in Table 2 shows that the estimated 11-year risk of CHD in ‘had everyone been a smoker/smoked 20 cigarettes per day’ vs. ‘had everyone been a non-smoker/smoked 0 cigarettes per day’ were 6.92% and 5%, respectively. Therefore, risk ratio was estimated as 1.38 (95% CI: 1.04, 1.86). The estimated risk difference using standard parametric g-formula was 1.92%, which is close to the estimated risk difference for CHD—1.89%—using parametric mediational g-formula (Table 3).

Table 2

Total effect of smoking 20 cigarettes per day compared to no smoking on CHD, using parametric g-formula in the Multi-Ethnic Study of Atherosclerosis, United States, 2000–2011.

Intervention	11-year Risk, %	95% CI	Risk Difference, %	95% CI
No intervention	5.94	3.74, 9.46	0.94	0.03, 2.01
No smoking	5.00	2.65, 8.60	Ref.
20 cigarettes per day	6.92	4.00, 11.14	1.92	0.19, 4.61

CHD: coronary heart disease, CI: confidence interval.

Table 3

Mediation analysis for the effect of smoking 20 cigarettes per day compared to no smoking on CHD, mediated by weight in the Multi-Ethnic Study of Atherosclerosis, United States, 2000–2011.

	Risk Difference, %	95% CI
Interventional total effect	1.89	0.49, 4.13
Interventional direct effect	1.91	0.49, 4.14
Interventional indirect effect	-0.02	-0.05, 0.04
E (Y0M0)	5.03	2.90, 7.95
E (Y1M0)	6.94	4.01, 10.3
E (Y0M1)	4.89	2.90, 7.91
E (Y1M1)	6.98	4.01, 10.05

CHD: coronary heart disease, CI: confidence interval.

CHD: coronary heart disease, CI: confidence interval. CHD: coronary heart disease, CI: confidence interval. In Table 3, using the estimates of the joint exposure and mediator interventions—simulated risks, we calculated interventional total effect, direct effect, and indirect effect (through weight) of smoking on CHD. Using parametric mediational g-formula we simulated CHD risks under no smoking with weight distributed as the weight under no smoking: P(Y0M0); smoking 20 cigarettes per day with weight distributed as the weight under no smoking: P(Y1M0); no smoking with weight distributed as the weight under smoking 20 cigarettes per day: P(Y0M1); and smoking 20 cigarettes per day with weight distributed as the weight under smoking 20 cigarettes per day: P(Y1M1). The results are presented on the risk difference scale. We estimated that smoking could directly (not through weight) increase CHD by 1.91% (95% CI: 0.49%, 4.14%), and decrease CHD as -0.02% (95% CI: -0.05%, 0.04%) via change in weight.

Discussion

We found that smoking might have either a very small or no indirect effect on CHD through weight. Clair et al. (2013), testing the hypothesis that weight gain following smoking cessation does not attenuate the benefits of smoking cessation among adults with and without diabetes, indicated that weight gain after smoking cessation does not modify the association between smoking and risk of CVD events [34]. Feodoroff et al. investigating the dose-dependent effect of smoking on CHD in type 1 diabetes, indicated that smoking one pack per day (20 cigarettes) compared with never smokers adjusted for age, sex, BMI, hypertension, duration of diabetes, and HbA1c could increase the risk of incidence of CHD (HR = 1.45, 95% CI: 1.15, 1.84) [35]. Doyle et al. concluded that risk ratio for smoking 15–35 cigarettes per day compared to non-smokers in men is 1.6 [36]. Woodward et al. reported that men who smoke ≥20 cigarette per day compared to non-smokers for CHD have a hazard ratio 1.93 (95% CI: 1.15, 3.24). This value for women estimated as 3.81 (95% CI: 2.00, 7.27) [37]. Mucha et al. conducted a meta-analysis study and indicated that a low-level use of smoking (≤20 cigarettes per day) results in a risk ratio as 1.70 (95% CI: 1.52, 1.90) and a high-level use (˃20 cigarettes per day) could cause a risk ratio of 2.09 (95% CI: 1.87, 2.34) [38]. Munafò et al. (2009), observed that at the baseline of a longitudinal study of men, never smokers and ex-smokers have a higher BMI (at average 1.6 kg/m2) compared to current smokers. The results were not changed after adjusting for age, socioeconomic position, alcohol, and calorie consumption. They also indicated an increase of 1.56 kg/m2 (95% CI: 1.29, 1.82) after smoking cessation adjusted for age, socioeconomic position, alcohol, and calorie consumption [5]. Sneve et al. using data of the Tromsø study resulted that current smokers in both genders compared to never- smokers have a lower BMI. They also reported that smoking cessation is related to an increase in weight [6]. The validity of the results depends on assumptions of positivity and consistency. The positivity assumption implies that the strata created by all confounder and treatment levels (exposed and unexposed) must be observed. However, the parametric g formula is less prone to bias induced by positivity violations. Consistency implies that the outcome for every exposed or unexposed individual (observed outcome) is equal with the outcome if they had received exposure or remained unexposed (counterfactual outcome), respectively. We note that similarity between observed and estimated risk is a necessary but insufficient condition for the assumption of no misspecification. In our study, the simulated 11-year risk of CHD under no intervention was 5.94% and the observed risk was 6.91%. As a strength of our analysis, the parametric mediational g-formula [22] is a valid statistical strategy to estimate the direct and indirect effects of time-varying exposure on incidence of CHD with time-varying mediator (weight). Using this statistical method, we could appropriately adjust for a variety of time-varying confounders, which are affected by prior exposures. As VanderWeele and Tchetgen Tchetgen’s approach was semiparametric, we used a fully parametric mediational g-formula developed by Lin et al. (2017). Our study is subject to some limitations. First, the validity of the estimated effects relies on the assumption of no unmeasured confounding for the effects of exposure on outcome, exposure on mediator and mediator on outcome, no measurement error, and correct specification of the parametric models used in the analysis. However, there was measurement error in self-reported cigarette smoking due to recall and underreporting biases cigarette smoking [39]. Physical activity was not assessed in visit 4, so we carried forward the values at visit 3. Moreover, we have not included occupational physical activity in the model. The presence of measurement error in confounders like physical activity and alcohol consumption will lead to residual confounding [40]. Second, the parametric mediational g-formula could not be used for multiple mediators. In fact, using this approach, we can just estimate one single direct/indirect effect. Third, the parametric-formula is also subject to the g-null paradox—it could lead to rejecting the causal null, even when it is true [41]. However, it could not be a concern in our setting as the effect of cigarette smoking and weight on CHD has been shown previously. Fourth, missing data, (censoring and competing risk) may result in selection bias but there is no function in the macro of mediational g-formula for handling missing data yet. For future studies, Schomaker and Heumann (2018) regarding some estimators, which require bootstrapping to estimate confidence intervals, presented four methods to combine bootstrap estimation with multiple imputation. They indicated that the proportion of missingness and the number of multiple imputed data sets affect methods performance [42].

Conclusion

The overall absolute impact of smoking to incident coronary heart disease is modest, and we did not discern any important contribution to this effect relayed through changes to bodyweight. In fact, changes in weight because of smoking have no meaningful mediating effect on CHD risk. (PDF) Click here for additional data file. 7 Jul 2021 PONE-D-21-18594 Does Weight Mediate the Effect of Smoking on Coronary Heart Disease? Parametric Mediational G-Formula Analysis PLOS ONE Dear Dr. Mansournia, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. You are invited to further add the program codes in the supplementary files. Please also note that there might be attachments of the reviewers' comments. Please submit your revised manuscript by Aug 21 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Y Zhan Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: PONE-D-21-18594: statistical review SUMMARY. This paper describes a mediation analysis that evaluates whether weight mediates the effect of smoking on coronary heart disease. The core statistical analysis is based on a parametric mediational g-formula that has been recently proposed in the epidemiological literature. Results are in line with other studies that show factors that mediate smoking effects. The main problem with this paper is the presentation of the material, which is a bit too synthetic. Essentially, the authors simply state that they used a SAS macro and show the final results. It is therefore impossible to check whether the statistical methods are correct. Further, the results are not replicable. Technical soundness and replicability are two important requirements for publication in PLOSONE. I'd welcome a revision that adds details on the statistical part of the paper. See the list below for specific issues that should be addressed. MAJOR ISSUES 1. Please clarify what parametric models have been fitted. Which covariates have been included? What was the structure of the random effects used to capture the longitudinal correlation of the data? Could the authors provide an evaluation of the goodness of fit of the model? Could the authors provide a traditional table with estimates and standard errors of the fitted parametric model? 2. Point estimates are computed by "non parametric bootstrap". Shouldn't parametric mediation analysis rely on simulation from the fitted parametric model? Why do we need nonparametric bootstrap here? A nonparametric approach to bootstrap is not obvious under a longitudinal setting becase we need to account for longitudinal correlation. This is why parametric approaches are usually preferred. Please clarify. Furthermore, Monte Carlo outputs are not replicable by definition. The authors should provide data and code (including the chosen random seed) to allow for replicability. 3. The authors say that "there is no function in the macro of mediational g-formula for handling missing data yet". Does it mean that only complete records have been used in the analysis? If it does, then this is a serious limitation that could invalidate the whole study. Please clarify. Reviewer #2: Summary of the manuscript Mokhayeri et al. aimed to explore what mediating effect changes in weight play in the effect of smoking on risk of coronary heart disease (CHD). As smoking status and weight are dynamic factors which can vary over time, they utilise the parametric mediational g-formula to unbiasedly account for exposure-caused mediator-outcome confounding in their analysis. Consistent with prior work they find smoking 20 cigarettes per day compared to none significantly increases the risk of CHD over time, though do not find strong evidence that changes in weight mediate these effects. Overall, this manuscript was a pleasure to read and reflects a well carried-out project with clear public health implications. It does however have a number of minor areas which could be improved prior to publication which would strengthen the manuscript. These mainly concern clarifying causal estimands of interest, methodological decisions and their rationale, providing a clearer explanation of how missing data was accounted for, and providing additional details in the results section. Areas of improvement Title and abstract: The title and abstract are clear and accurately summarise the findings of the paper. As minor areas of improvement, however: 1. The final sentence of the abstract conclusion section may be better worded as 'changes in weight’ as a result of smoking do not appear to have an effect on CHD risk, instead of ‘losing weight’. 2. Likewise, while the summary conclusions are sound, it may be more appropriate to simply say changes in weight have no meaningful mediating effect on CHD risk instead of indicating a slight beneficial effect may be possible, given the 95% confidence interval could equally reflect a slight detrimental effect of weight on CHD risk (95% CI: -0.05%, +0.04%). Introduction: Overall, the introduction section is very well-done, providing a clear background and rationale to the project as well as justifying the use of the mediational g-formula in answering the specific research question of interest. 1. One recommended change however would be for authors to state which of the interventional direct and indirect effects estimated are pure and total (i.e. whether the additive interaction between exposure and mediator is incorporated into the ‘direct’ effect or the ‘indirect’ effect). While this can be worked out through the IDE and IIE formulas provided in pages 6 and 7, making each estimand clear could be helpful for readers given the slight difference in interpretation. For example, if the pure IIE was estimated (YA0M1 - YA0M0), this would reflect the effect of having weight similar to smokers (vs. non-smokers) on CHD risk in a population where everyone was a non-smoker. The total IIE on the other hand would reflect the effect of having weight similar to non-smokers (vs. smokers) in a population where everyone was a smoker, so they get towards different questions. Methods: While the methods section is well-done overall and provides ample information on the study design and setting including the time-periods of each visit, there are several areas where improvements could be made that in my view would strengthen the resulting paper. 1. Authors need to provide a description of how missing data is addressed, given its particular importance to the mGFORMULA SAS macro. If understood correctly, a complete case analysis was performed using only those who did not experience a competing risk event (5.9% of the 6,809 eligible participants) or right-censoring due to loss to follow-up (29% of participants). Given the mGFORMULA SAS macro cannot accommodate competing risks or censoring and requires that all participants included have an equal number of follow-up visits/time-points, this should be stated explicitly as well as the size of the final sample actually included in the analysis. 2. In the ‘Exposure, mediator, outcome, and confounders’ section of the methods, authors describe their comparison as a hypothetical intervention forcing all individuals to smoke 20 cigarettes per day compared to none and interpret it as such. As is discussed elsewhere in the literature though, such as by Schwartz, Gatto, and Campbell in the Annals of Epidemiology in 2016, this may not be the most appropriate interpretation of their comparison given the number of different ways this intervention could be enforced, each having a different relationship with future health and possibly violating Rubin’s Stable Unit Treatment Value assumption. Instead authors may wish to describe their comparison in terms of identifying the realised causal effect of smoking 20 cigarettes per day compared to none among participants in the sample who were exposed. 3. In the ‘Exposure, mediator, outcome, and confounders’ section of the methods, authors indicate that alcohol consumption, diet, and family income are included as time-fixed covariates based on baseline values. Given each of these factors are likely dynamic in nature and treating them as fixed could result in misclassification bias or residual confounding, authors should justify treating these factors as time-fixed, such as due to only having information available at baseline. If authors have longitudinal data on these factors, they should explore treating them as time-varying in their analysis or, if not, consider reporting the extent to which these factors vary within individuals in their sample throughout follow-up. 4. In the same section as above, authors indicate baseline smoking status was included as a time-fixed covariate. It would be helpful if it was made more explicit how this variable was accounted for. For example, if baseline smoking status was included as the average number of cigarettes smoked, it may not account for individuals being former smokers and the continuing CHD risk this would present. Likewise if it does not account for smoking history as of baseline, this would also fail to capture individuals who were former smokers still having a possibly greater risk of CHD than never-smokers. If possible, I would recommend authors include both current smoking status at baseline (as an average number of cigarettes smoked per day) as well as smoking history at baseline (as a categorical for current smoker, former smoker, never smoker) as time-fixed covariates. 5. In the ‘Statistical analysis’ section of the methods, I would recommend a different wording when discussing potential outcomes and exposure and mediator status’, making it explicit that each potential outcome refers to exposure and mediator histories over the course of follow-up instead of at a single time-point, with these histories being some exposure-mediator combination of {0,0,0,0,0} and/or {1,1,1,1,1}. 6. For the line ‘Mediational g-formula is both Robins’ regular g-formula and Pearl’s mediation formula’, I think it would be more appropriate to say the mediational g-formula is related to both Robin’s regular g-formula and Pearl’s mediation formula, only being equal under the conditions authors subsequently state. 7. Possibly in the ‘Statistical analysis’ section, authors should provide additional detail on whether any interaction terms are included between covariates such as between the exposure and mediator, and whether any non-linear terms are considered in the model specification for continuous factors, such as quadratic, cubic, or spline terms. This could be helpful for readers, in that these decisions could influence how accurately the parametric model captures the true causal model. 8. Authors should report any efforts made to address or explore possible biases. For example, authors could assess the validity of their parametric model specification through comparing the CHD risk simulated under the Natural Course (i.e. without intervention) to that actually observed in the sample. Likewise, authors could perform sensitivity analyses (though not required) excluding those who were former smokers so as to limit the likelihood of lagged effects of smoking before the intervention period. As a final suggestion, authors may wish to explore how their findings differ if analysed over a 9- or 13-year period to determine if their effect estimate is stable or affected by chance variation in year-to-year sample CHD incidence. 9. Though it could be a local issue, the Figure 1 causal DAG appears to have a low-resolution, making it difficult to read. If this is the case, authors may wish to provide a higher quality image for better legibility. Results: Authors present the findings of their analysis well, though there are minor areas where improvements could be made. 1. Related to point 1 in the ‘Methods’ improvements section, authors make clear that some individuals die from causes other than CHD or are lost to follow-up. Where possible, in the results section authors should report the final sample size included in the mediational g-formula analyses, as well as provide a brief summary of the characteristics of those not included in analyses to assess the likelihood of differential loss to follow-up by factors related to the outcome of interest. 2. Related to point 1 in the ‘Methods’ improvement section as well, authors do not make clear how missing data is accounted for. This is needed, given Table 1 race, education, and annual family income strata sizes do not sum to each smoking status group sample size (e.g. annual family income among smokers sum to 846, but there are 890 smokers at baseline), suggesting there is some missing data. Where possible, information on missing data should be provided for each variable separated by smoking status, as well as a complete case sample size carried forward for use with the mediational g-formula. This is relevant to the positivity assumption, where a smaller sample size is more likely to result in random non-positivity. 3. In summarising results of the standard parametric g-formula, authors describe the estimated 11-year risk of CHD in smokers and non-smokers. To reflect that the comparison is counterfactual, comparing outcomes had everyone been smokers or non-smokers, I would recommend they word this sentence differently as the estimated risk ‘had everyone been a smoker/smoked 20 cigarettes per day’ vs. ‘had everyone been a non-smoker/smoked 0 cigarettes per day’ for the 11-year follow-up. 4. In Tables 1 or 2, it would be helpful if authors presented the observed 11-year risk of CHD (ideally overall and by smoking status), as well as that simulated in the standard or mediational g-formula under no intervention. This would be a helpful assumption check, given the 'No intervention' output presented by the g-formula represents the simulated CHD risk under the Natural course (i.e. without intervention). If the parametric modelling process accounts for the causal model well, this 'no intervention' risk should be similar to that actually observed in the sample. If it isn't, it may suggest more work is needed in refining the model specification, such as considering different covariates, interaction terms, or non-linear terms for continuous covariates. 5. Either in Table 3 or separately, I would recommend authors present the simulated risk under each counterfactual exposure-mediator intervention. This would be helpful for readers, as it would allow multiplicative effects to be calculated such as E[YA1M1] / E[YA1M0] for the total IIE, as well as the interventional total direct effect and interventional pure indirect effect. Discussion: Overall, authors do an excellent job of summarising related research as well as noting strengths and limitations of their study. Some brief areas of possible improvements are listed below: 1. While authors summarise findings of a number of related articles, I feel the findings of these studies could be more explicitly compared against their own findings, such as through simply restating the summary total, direct, and indirect effects of their own study before discussing total, direct, and weight-mediated effect estimates found in other studies. 2. The discussion section could benefit from authors comparing effects of their analysis using the mediational g-formula to similar studies which did not utilise the mediational g-formula. For example, the 2013 JAMA study by Carole Clair et al. considers the mediating effect of weight in the relationship between smoking cessation and CVD among people with and without diabetes. This may be a more relevant comparison than the study by Nianogo and Arah where the outcome was type 2 diabetes mellitus and not directly related to CHD. 3. On page 11, authors suggest the parametric mediational g-formula is the only known method that is valid for investigating causal effects in the presence of time-varying mediation. This may not necessarily be correct, and could be adapted. As they note, VanderWeele and Tchetgen Tchetgen’s semiparametric approach is also valid. Further, the authors may find the 2017 paper by Zheng and van der Laan in the Journal of Causal Inference relevant, where they discuss the conditional mediation formula and multiple estimators for mediation effects in the presence of time-varying mediation through applying TMLE, IPW, or a non-targeted substitution estimator. 4. Authors currently split their discussion of positivity, consistency, and exchangeability/unmeasured confounding into two sections, separated by study strengths. The discussion may be better laid-out if kept together, relating each assumption to the evidence supporting it and detracting from it one at a time. 5. While authors appropriately note the g-null paradox as a limitation of the g-formula, if there is space in the manuscript it may benefit from a slightly longer discussion, especially given the authors note it is unlikely to be a concern in their study due to prior work suggesting the sharp null doesn’t hold in this context. 6. While authors do an excellent job reviewing study strengths and limitations, they may wish to review work published in Statistics in Medicine in 2018 by Schomaker and Heumann, where they discuss combining bootstrap inference with multiple imputation where the g-formula is used. While this would not account for competing risks and censoring, and is not being recommended as necessary in this review, missing data such as that mentioned in point 2 of the ‘results’ improvement section can be accounted for as a step prior to or alongside the mediational g-formula, either through storing each bootstrap effect estimate across multiply imputed datasets and using percentile-based confidence intervals, or through applying the mediational g-formula to multiply imputed datasets and pooling summary effect estimates through Rubin's rules. Each approach discussed is relatively straight-forward to implement in practice. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Kieran Blaikie [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: Mokhayeri et al PLoS One Review.pdf Click here for additional data file. 11 Nov 2021 Reviewer 1 SUMMARY. This paper describes a mediation analysis that evaluates whether weight mediates the effect of smoking on coronary heart disease. The core statistical analysis is based on a parametric mediational g-formula that has been recently proposed in the epidemiological literature. Results are in line with other studies that show factors that mediate smoking effects. The main problem with this paper is the presentation of the material, which is a bit too synthetic. Essentially, the authors simply state that they used a SAS macro and show the final results. It is therefore impossible to check whether the statistical methods are correct. Further, the results are not replicable. Technical soundness and replicability are two important requirements for publication in PLOSONE. I'd welcome a revision that adds details on the statistical part of the paper. See the list below for specific issues that should be addressed. MAJOR ISSUES 1. Please clarify what parametric models have been fitted. Which covariates have been included? What was the structure of the random effects used to capture the longitudinal correlation of the data? Could the authors provide an evaluation of the goodness of fit of the model? Could the authors provide a traditional table with estimates and standard errors of the fitted parametric model? Our answer: Thank you. Considering our dichotomous outcome, the parametric model is based on a logistic regression and tries to estimate the standardized risk using the probability of confounders (both time-fixed and time-varying) and the conditional distribution of outcome (CHD) given the exposure (smoking) and set of confounders. Time-varying covariates measured at all visits including intentional physical activity, total cholesterol, hypertension, hypertension medication, and current aspirin use entered in the models as potential confounders. Additionally, we also adjusted for baseline age, sex, race/ethnicity (White, Asian, Hispanic, and African-American), alcohol consumption (drinks per week), and diet score. To assess the validity of the parametric model specification, we compared the CHD risk simulated under the Natural Course (without intervention) with the actually observed risk (added to the discussion). We reported CI using bootstrapping with 500 iterations. Regarding the structure of the random effects, there is no effect term in our model. The within subject-correlation has been taken into account using bootstrapping; and regarding the traditional table, in fact, the beta estimates of the outcome and covariate models and their standard errors are not our interest and so the software does not provide them. 2. Point estimates are computed by "non parametric bootstrap". Shouldn't parametric mediation analysis rely on simulation from the fitted parametric model? Why do we need nonparametric bootstrap here? A nonparametric approach to bootstrap is not obvious under a longitudinal setting because we need to account for longitudinal correlation. This is why parametric approaches are usually preferred. Please clarify. Furthermore, Monte Carlo outputs are not replicable by definition. The authors should provide data and code (including the chosen random seed) to allow for replicability. Our answer: Indeed, we used the nonparametric bootstrap to obtain point estimates and confidence intervals for the effects of interest. Use of the nonparametric bootstrap is standard with the parametric g formula. In principle, the parametric bootstrap can also be used, but there is no requirement (i.e., one need not use parametric bootstrap with parametric g formula). The reason researchers employ the nonparametric bootstrap is that it is much simpler to implement, particularly in a setting of longitudinal data such as ours. Specifically, with a simple clustered resample, one can address the longitudinal correlations with the nonparametric bootstrap. Were we to rely on the parametric bootstrap, we would have to employ much more complex modeling strategies, which are not immediately compatible with the g formula. The code—SAS macro—was added as a supplementary file. 3. The authors say that "there is no function in the macro of mediational g-formula for handling missing data yet". Does it mean that only complete records have been used in the analysis? If it does, then this is a serious limitation that could invalidate the whole study. Please clarify. Our answer: While parametric g-formula macro do consider censoring in the analysis, mgformula macro for parametric g-formula cannot handle censoring. Using mGORMULA we did a complete case analysis. We indicated this in the discussion. Final sample included in the analysis is 4433 (Censoring proportion: 34.9%). Lin et al. (Epidemiology, 2016; and Statistics in Medicine, 2017) limited their analyses on participants without death or loss to follow up during the period of study. Reviewer 2 Mokhayeri et al. aimed to explore what mediating effect changes in weight play in the effect of smoking on risk of coronary heart disease (CHD). As smoking status and weight are dynamic factors which can vary over time, they utilise the parametric mediational g-formula to unbiasedly account for exposure-caused mediator-outcome confounding in their analysis. Consistent with prior work they find smoking 20 cigarettes per day compared to none significantly increases the risk of CHD over time, though do not find strong evidence that changes in weight mediate these effects. Overall, this manuscript was a pleasure to read and reflects a well carried-out project with clear public health implications. It does however have a number of minor areas which could be improved prior to publication which would strengthen the manuscript. These mainly concern clarifying causal estimands of interest, methodological decisions and their rationale, providing a clearer explanation of how missing data was accounted for, and providing additional details in the results section. Areas of improvement Title and abstract: The title and abstract are clear and accurately summarise the findings of the paper. As minor areas of improvement, however: 1. The final sentence of the abstract conclusion section may be better worded as 'changes in weight’ as a result of smoking do not appear to have an effect on CHD risk, instead of ‘losing weight’. Our answer: Thank you. It was revised. We made the following change: In fact, changes in weight because of smoking have no meaningful mediating effect on CHD risk. 2. Likewise, while the summary conclusions are sound, it may be more appropriate to simply say changes in weight have no meaningful mediating effect on CHD risk instead of indicating a slight beneficial effect may be possible, given the 95% confidence interval could equally reflect a slight detrimental effect of weight on CHD risk (95% CI: -0.05%, +0.04%). Our answer: Thank you. It was revised. We made the following change: In fact, changes in weight because of smoking have no meaningful mediating effect on CHD risk Introduction: Overall, the introduction section is very well-done, providing a clear background and rationale to the project as well as justifying the use of the mediational g-formula in answering the specific research question of interest. 1. One recommended change however would be for authors to state which of the interventional direct and indirect effects estimated are pure and total (i.e. whether the additive interaction between exposure and mediator is incorporated into the ‘direct’ effect or the ‘indirect’ effect). While this can be worked out through the IDE and IIE formulas provided in pages 6 and 7, making each estimand clear could be helpful for readers given the slight difference in interpretation. For example, if the pure IIE was estimated (YA0M1 - YA0M0), this would reflect the effect of having weight similar to smokers (vs. non-smokers) on CHD risk in a population where everyone was a non-smoker. The total IIE on the other hand would reflect the effect of having weight similar to non-smokers (vs. smokers) in a population where everyone was a smoker, so they get towards different questions. Our answer: Thank you. Indeed, it is a valuable recommendation. It was revised in both introduction and method sections. Interventional direct effect (IDE) is pure effect, and interventional indirect effect (IIE) is total effect. Methods: While the methods section is well-done overall and provides ample information on the study design and setting including the time-periods of each visit, there are several areas where improvements could be made that in my view would strengthen the resulting paper. 1. Authors need to provide a description of how missing data is addressed, given its particular importance to the mGFORMULA SAS macro. If understood correctly, a complete case analysis was performed using only those who did not experience a competing risk event (5.9% of the 6,809 eligible participants) or right-censoring due to loss to follow-up (29% of participants). Given the mGFORMULA SAS macro cannot accommodate competing risks or censoring and requires that all participants included have an equal number of follow-up visits/time-points, this should be stated explicitly as well as the size of the final sample actually included in the analysis. Our answer: Thank you. Yes, mGFORMULA cannot handle competing risk and censoring, and using mGFORMULA we did a complete case analysis. We indicated this in the discussion. Final sample included in the analysis is 4433 (Censoring proportion: 34.9%) and this was added to the results section. Lin et al. (Epidemiology, 2016; and Statistics in Medicine, 2017) limited their analyses on participants without death or loss to follow up during the period of study. 2. In the ‘Exposure, mediator, outcome, and confounders’ section of the methods, authors describe their comparison as a hypothetical intervention forcing all individuals to smoke 20 cigarettes per day compared to none and interpret it as such. As is discussed elsewhere in the literature though, such as by Schwartz, Gatto, and Campbell in the Annals of Epidemiology in 2016, this may not be the most appropriate interpretation of their comparison given the number of different ways this intervention could be enforced, each having a different relationship with future health and possibly violating Rubin’s Stable Unit Treatment Value assumption. Instead, authors may wish to describe their comparison in terms of identifying the realised causal effect of smoking 20 cigarettes per day compared to none among participants in the sample who were exposed. Our answer: Thank you. It was revised and your suggestion was replaced in this section. 3. In the ‘Exposure, mediator, outcome, and confounders’ section of the methods, authors indicate that alcohol consumption, diet, and family income are included as time-fixed covariates based on baseline values. Given each of these factors are likely dynamic in nature and treating them as fixed could result in misclassification bias or residual confounding, authors should justify treating these factors as time-fixed, such as due to only having information available at baseline. If authors have longitudinal data on these factors, they should explore treating them as time-varying in their analysis or, if not, consider reporting the extent to which these factors vary within individuals in their sample throughout follow-up. Our answer: Thank you. Information for diet and alcohol consumption (drinks per week) is available just at baseline. Regarding income, it was roughly stable over time so we include it as a time-fixed variable. 4. In the same section as above, authors indicate baseline smoking status was included as a time-fixed covariate. It would be helpful if it was made more explicit how this variable was accounted for. For example, if baseline smoking status was included as the average number of cigarettes smoked, it may not account for individuals being former smokers and the continuing CHD risk this would present. Likewise if it does not account for smoking history as of baseline, this would also fail to capture individuals who were former smokers still having a possibly greater risk of CHD than never-smokers. If possible, I would recommend authors include both current smoking status at baseline (as an average number of cigarettes smoked per day) as well as smoking history at baseline (as a categorical for current smoker, former smoker, never smoker) as time-fixed covariates. Our answer: Thank you. We adjusted for smoking at baseline as a categorical variable (never, former, and current smoker). We made the following change: baseline smoking (never, former, and current smoker). 5. In the ‘Statistical analysis’ section of the methods, I would recommend a different wording when discussing potential outcomes and exposure and mediator status’, making it explicit that each potential outcome refers to exposure and mediator histories over the course of follow-up instead of at a single time-point, with these histories being some exposure-mediator combination of {0,0,0,0,0} and/or {1,1,1,1,1}. Our answer: With all due respect, to the reviewer, we believe this definition is already simple and easy to understand, and we have made no further changes 6. For the line ‘Mediational g-formula is both Robins’ regular g-formula and Pearl’s mediation formula’, I think it would be more appropriate to say the mediational g-formula is related to both Robin’s regular g-formula and Pearl’s mediation formula, only being equal under the conditions authors subsequently state. Our answer: Thank you. It was revised as suggested by the reviewer. 7. Possibly in the ‘Statistical analysis’ section, authors should provide additional detail on whether any interaction terms are included between covariates such as between the exposure and mediator, and whether any non-linear terms are considered in the model specification for continuous factors, such as quadratic, cubic, or spline terms. This could be helpful for readers, in that these decisions could influence how accurately the parametric model captures the true causal model. Our answer: No interaction terms are included; however. Some non-linear terms are included. For cholesterol and weight, linear term was considered when used as dependent variables, and quadratic linear term when used as independent variables. For cigarette smoking and intentional physical activity logistic, then log-linear when used as dependent variable and quadratic linear term when used as independent variable. The code—SAS macro—was added as a supplementary file. 8. Authors should report any efforts made to address or explore possible biases. For example, authors could assess the validity of their parametric model specification through comparing the CHD risk simulated under the Natural Course (i.e. without intervention) to that actually observed in the sample. Likewise, authors could perform sensitivity analyses (though not required) excluding those who were former smokers so as to limit the likelihood of lagged effects of smoking before the intervention period. As a final suggestion, authors may wish to explore how their findings differ if analysed over a 9- or 13-year period to determine if their effect estimate is stable or affected by chance variation in year-to-year sample CHD incidence. Our answer: Thank you for your suggestion. The natural course compared to the observed risk was added to the discussion. In our study, the simulated 11-year risk of CHD under no intervention was 5.94% and the observed risk was 6.91%. Regarding the analysis over 9- or 13-year period, it should be noted that the authors have no access to data of 13-year period. Moreover, in MESA, there is a gap of three years between visits of four and five. 9. Though it could be a local issue, the Figure 1 causal DAG appears to have a low-resolution, making it difficult to read. If this is the case, authors may wish to provide a higher quality image for better legibility. Our answer. The figure was replaced with a high-resolution one. Results: Authors present the findings of their analysis well, though there are minor areas where improvements could be made. 1. Related to point 1 in the ‘Methods’ improvements section, authors make clear that some individuals die from causes other than CHD or are lost to follow-up. Where possible, in the results section authors should report the final sample size included in the mediational g-formula analyses, as well as provide a brief summary of the characteristics of those not included in analyses to assess the likelihood of differential loss to follow-up by factors related to the outcome of interest. Our answer: final sample size was added. The authors assumed the missingness is completely at random (MCAR). 2. Related to point 1 in the ‘Methods’ improvement section as well, authors do not make clear how missing data is accounted for. This is needed, given Table 1 race, education, and annual family income strata sizes do not sum to each smoking status group sample size (e.g. annual family income among smokers sum to 846, but there are 890 smokers at baseline), suggesting there is some missing data. Where possible, information on missing data should be provided for each variable separated by smoking status, as well as a complete case sample size carried forward for use with the mediational g-formula. This is relevant to the positivity assumption, where a smaller sample size is more likely to result in random non-positivity. Our answer: Complete case analysis was performed, as the proportion of covariate missing data was low. For hypertension 2.2%, physical activity 1.5%, total cholesterol 3.1%, hypertension medication 3.2%, and current aspirin 1.6%.This was added to the result section. 3. In summarizing results of the standard parametric g-formula, authors describe the estimated 11-year risk of CHD in smokers and non-smokers. To reflect that the comparison is counterfactual, comparing outcomes had everyone been smokers or non-smokers, I would recommend they word this sentence differently as the estimated risk ‘had everyone been a smoker/smoked 20 cigarettes per day’ vs. ‘had everyone been a non-smoker/smoked 0 cigarettes per day’ for the 11-year follow-up. Our answer: Thank you. We revised the sentence according to the reviewer’s suggestion. 4. In Tables 1 or 2, it would be helpful if authors presented the observed 11-year risk of CHD (ideally overall and by smoking status), as well as that simulated in the standard or mediational g-formula under no intervention. This would be a helpful assumption check, given the 'No intervention' output presented by the g-formula represents the simulated CHD risk under the Natural course (i.e. without intervention). If the parametric modelling process accounts for the causal model well, this 'no intervention' risk should be similar to that actually observed in the sample. If it is not, it may suggest more work is needed in refining the model specification, such as considering different covariates, interaction terms, or non-linear terms for continuous covariates. Our answer: Thank you. The natural course compared to the observed risk was added to the discussion. In our study, the simulated 11-year risk of CHD under no intervention was 5.94% and the observed risk was 6.91%. 5. Either in Table 3 or separately, I would recommend authors present the simulated risk under each counterfactual exposure-mediator intervention. This would be helpful for readers, as it would allow multiplicative effects to be calculated such as E[YA1M1] / E[YA1M0] for the total IIE, as well as the interventional total direct effect and interventional pure indirect effect. Our answer: thank you for your suggestion. They were added to table 3. Discussion: Overall, authors do an excellent job of summarising related research as well as noting strengths and limitations of their study. Some brief areas of possible improvements are listed below: 1. While authors summarise findings of a number of related articles, I feel the findings of these studies could be more explicitly compared against their own findings, such as through simply restating the summary total, direct, and indirect effects of their own study before discussing total, direct, and weight-mediated effect estimates found in other studies. Our answer. Thank you. Studies used different summary measures and different statistical models (adjusted for various covariates). However, their results indicated gaining weight after smoking cessation, and finally a potential small effect or no effect on CHD risk. 2. The discussion section could benefit from authors comparing effects of their analysis using the mediational g-formula to similar studies which did not utilise the mediational g-formula. For example, the 2013 JAMA study by Carole Clair et al. considers the mediating effect of weight in the relationship between smoking cessation and CVD among people with and without diabetes. This may be a more relevant comparison than the study by Nianogo and Arah where the outcome was type 2 diabetes mellitus and not directly related to CHD. Our answer. The study of Nianogo and Arah was replaced with the results of 2013 JAMA study by Carole Clair et al. The list of references was revised as well. 3. On page 11, authors suggest the parametric mediational g-formula is the only known method that is valid for investigating causal effects in the presence of time-varying mediation. This may not necessarily be correct, and could be adapted. As they note, VanderWeele and Tchetgen Tchetgen’s semiparametric approach is also valid. Further, the authors may find the 2017 paper by Zheng and van der Laan in the Journal of Causal Inference relevant, where they discuss the conditional mediation formula and multiple estimators for mediation effects in the presence of time-varying mediation through applying TMLE, IPW, or a non-targeted substitution estimator. Our answer. Thank you. The comment about the parametric g-formula being the only known method was omitted. 4. Authors currently split their discussion of positivity, consistency, and exchangeability/unmeasured confounding into two sections, separated by study strengths. The discussion may be better laid-out if kept together, relating each assumption to the evidence supporting it and detracting from it one at a time. Our answer. We bring these two parts together (one paragraph). 5. While authors appropriately note the g-null paradox as a limitation of the g-formula, if there is space in the manuscript it may benefit from a slightly longer discussion, especially given the authors note it is unlikely to be a concern in their study due to prior work suggesting the sharp null doesn’t hold in this context. Our answer. Thank you. It was added. We made the following changes: Third, the parametric-formula is also subject to the g-null paradox—it could lead to rejecting the causal null, even when it is true. 6. While authors do an excellent job reviewing study strengths and limitations, they may wish to review work published in Statistics in Medicine in 2018 by Schomaker and Heumann, where they discuss combining bootstrap inference with multiple imputation where the g-formula is used. While this would not account for competing risks and censoring, and is not being recommended as necessary in this review, missing data such as that mentioned in point 2 of the ‘results’ improvement section can be accounted for as a step prior to or alongside the mediational g-formula, either through storing each bootstrap effect estimate across multiply imputed datasets and using percentile-based confidence intervals, or through applying the mediational g-formula to multiply imputed datasets and pooling summary effect estimates through Rubin's rules. Each approach discussed is relatively straight-forward to implement in practice. Our answer: Thank you. Your suggestion was added to the discussion. We made the following changes: For future studies, Schomaker and Heumann (2018) regarding some estimators, which require bootstrapping to estimate confidence intervals, presented four methods to combine bootstrap estimation with multiple imputation. They indicated that the proportion of messiness and the number of multiple imputed data sets affect methods performance Submitted filename: Response to Reviewers.docx Click here for additional data file. 23 Dec 2021 Does Weight Mediate the Effect of Smoking on Coronary Heart Disease? Parametric Mediational G-Formula Analysis PONE-D-21-18594R1 Dear Dr. Mansournia, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Y Zhan Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: (No Response) Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: (No Response) Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: (No Response) Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: (No Response) Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Kieran Blaikie 5 Jan 2022 PONE-D-21-18594R1 Does Weight Mediate the Effect of Smoking on Coronary Heart Disease? Parametric Mediational G-Formula Analysis Dear Dr. Mansournia: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Y Zhan Academic Editor PLOS ONE

37 in total

1. The causal mediation formula--a guide to the assessment of pathways and mechanisms.

Authors: Judea Pearl
Journal: Prev Sci Date: 2012-08

2. Smoking status and body mass index: a longitudinal study.

Authors: Marcus R Munafò; Kate Tilling; Yoav Ben-Shlomo
Journal: Nicotine Tob Res Date: 2009-05-14 Impact factor: 4.244

3. Cross-sectional study on the relationship between body mass index and smoking, and longitudinal changes in body mass index in relation to change in smoking status: the Tromso Study.

Authors: M Sneve; R Jorde
Journal: Scand J Public Health Date: 2008-06 Impact factor: 3.021

4. Dose-dependent effect of smoking on risk of coronary heart disease, heart failure and stroke in individuals with type 1 diabetes.

Authors: Maija Feodoroff; Valma Harjutsalo; Carol Forsblom; Per-Henrik Groop
Journal: Diabetologia Date: 2018-09-18 Impact factor: 10.122

5. Relationship between smoking and cardiovascular risk factors in the development of peripheral arterial disease and coronary artery disease: Edinburgh Artery Study.

Authors: J F Price; P I Mowbray; A J Lee; A Rumley; G D Lowe; F G Fowkes
Journal: Eur Heart J Date: 1999-03 Impact factor: 29.983

6. Parametric Mediational g-Formula Approach to Mediation Analysis with Time-varying Exposures, Mediators, and Confounders.

Authors: Sheng-Hsuan Lin; Jessica Young; Roger Logan; Eric J Tchetgen Tchetgen; Tyler J VanderWeele
Journal: Epidemiology Date: 2017-03 Impact factor: 4.822

7. Mediation analysis with time varying exposures and mediators.

Authors: Tyler J VanderWeele; Eric J Tchetgen Tchetgen
Journal: J R Stat Soc Series B Stat Methodol Date: 2016-06-27 Impact factor: 4.488

8. Effect of physical activity on functional performance and knee pain in patients with osteoarthritis : analysis with marginal structural models.

Authors: Mohammad Ali Mansournia; Goodarz Danaei; Mohammad Hossein Forouzanfar; Mahmood Mahmoodi; Mohsen Jamali; Nasrin Mansournia; Kazem Mohammad
Journal: Epidemiology Date: 2012-07 Impact factor: 4.822

9. A CHecklist for statistical Assessment of Medical Papers (the CHAMP statement): explanation and elaboration.

Authors: Mohammad Ali Mansournia; Gary S Collins; Rasmus Oestergaard Nielsen; Maryam Nazemipour; Nicholas P Jewell; Douglas G Altman; Michael J Campbell
Journal: Br J Sports Med Date: 2021-01-29 Impact factor: 18.473

10. Global, Regional, and National Burden of Cardiovascular Diseases for 10 Causes, 1990 to 2015.

Authors: Gregory A Roth; Catherine Johnson; Amanuel Abajobir; Foad Abd-Allah; Semaw Ferede Abera; Gebre Abyu; Muktar Ahmed; Baran Aksut; Tahiya Alam; Khurshid Alam; François Alla; Nelson Alvis-Guzman; Stephen Amrock; Hossein Ansari; Johan Ärnlöv; Hamid Asayesh; Tesfay Mehari Atey; Leticia Avila-Burgos; Ashish Awasthi; Amitava Banerjee; Aleksandra Barac; Till Bärnighausen; Lars Barregard; Neeraj Bedi; Ezra Belay Ketema; Derrick Bennett; Gebremedhin Berhe; Zulfiqar Bhutta; Shimelash Bitew; Jonathan Carapetis; Juan Jesus Carrero; Deborah Carvalho Malta; Carlos Andres Castañeda-Orjuela; Jacqueline Castillo-Rivas; Ferrán Catalá-López; Jee-Young Choi; Hanne Christensen; Massimo Cirillo; Leslie Cooper; Michael Criqui; David Cundiff; Albertino Damasceno; Lalit Dandona; Rakhi Dandona; Kairat Davletov; Samath Dharmaratne; Prabhakaran Dorairaj; Manisha Dubey; Rebecca Ehrenkranz; Maysaa El Sayed Zaki; Emerito Jose A Faraon; Alireza Esteghamati; Talha Farid; Maryam Farvid; Valery Feigin; Eric L Ding; Gerry Fowkes; Tsegaye Gebrehiwot; Richard Gillum; Audra Gold; Philimon Gona; Rajeev Gupta; Tesfa Dejenie Habtewold; Nima Hafezi-Nejad; Tesfaye Hailu; Gessessew Bugssa Hailu; Graeme Hankey; Hamid Yimam Hassen; Kalkidan Hassen Abate; Rasmus Havmoeller; Simon I Hay; Masako Horino; Peter J Hotez; Kathryn Jacobsen; Spencer James; Mehdi Javanbakht; Panniyammakal Jeemon; Denny John; Jost Jonas; Yogeshwar Kalkonde; Chante Karimkhani; Amir Kasaeian; Yousef Khader; Abdur Khan; Young-Ho Khang; Sahil Khera; Abdullah T Khoja; Jagdish Khubchandani; Daniel Kim; Dhaval Kolte; Soewarta Kosen; Kristopher J Krohn; G Anil Kumar; Gene F Kwan; Dharmesh Kumar Lal; Anders Larsson; Shai Linn; Alan Lopez; Paulo A Lotufo; Hassan Magdy Abd El Razek; Reza Malekzadeh; Mohsen Mazidi; Toni Meier; Kidanu Gebremariam Meles; George Mensah; Atte Meretoja; Haftay Mezgebe; Ted Miller; Erkin Mirrakhimov; Shafiu Mohammed; Andrew E Moran; Kamarul Imran Musa; Jagat Narula; Bruce Neal; Frida Ngalesoni; Grant Nguyen; Carla Makhlouf Obermeyer; Mayowa Owolabi; George Patton; João Pedro; Dima Qato; Mostafa Qorbani; Kazem Rahimi; Rajesh Kumar Rai; Salman Rawaf; Antônio Ribeiro; Saeid Safiri; Joshua A Salomon; Itamar Santos; Milena Santric Milicevic; Benn Sartorius; Aletta Schutte; Sadaf Sepanlou; Masood Ali Shaikh; Min-Jeong Shin; Mehdi Shishehbor; Hirbo Shore; Diego Augusto Santos Silva; Eugene Sobngwi; Saverio Stranges; Soumya Swaminathan; Rafael Tabarés-Seisdedos; Niguse Tadele Atnafu; Fisaha Tesfay; J S Thakur; Amanda Thrift; Roman Topor-Madry; Thomas Truelsen; Stefanos Tyrovolas; Kingsley Nnanna Ukwaja; Olalekan Uthman; Tommi Vasankari; Vasiliy Vlassov; Stein Emil Vollset; Tolassa Wakayo; David Watkins; Robert Weintraub; Andrea Werdecker; Ronny Westerman; Charles Shey Wiysonge; Charles Wolfe; Abdulhalik Workicho; Gelin Xu; Yuichiro Yano; Paul Yip; Naohiro Yonemoto; Mustafa Younis; Chuanhua Yu; Theo Vos; Mohsen Naghavi; Christopher Murray
Journal: J Am Coll Cardiol Date: 2017-05-17 Impact factor: 24.094

1 in total

1. Longitudinal causal effect of modified creatinine index on all-cause mortality in patients with end-stage renal disease: Accounting for time-varying confounders using G-estimation.

Authors: Mohammad Aryaie; Hamid Sharifi; Azadeh Saber; Farzaneh Salehi; Mahyar Etminan; Maryam Nazemipour; Mohammad Ali Mansournia
Journal: PLoS One Date: 2022-08-19 Impact factor: 3.752

1 in total